How to restart training of a saved model? #3

ghost · 2015-04-17T14:55:45Z

Hi,

I've been experimenting with using the model_utils.lua file on some on my own concatenations of gModules. I was just wondering if you could give an example of how to use,

model_utils.combine_all_parameters

and

model_utils.clone_many_times

to get the params and grad_params of a saved protos which can then be used with some of the appropriate lines of train.lua to restart training?

Just to give some context - what I've tried is instead of saving the full protos, just saving the following table,

table_to_save = { options = opt , saved_params=params, saved_grad_params=grad_params }

Then used basically all of train.lua, with the following,

saved_data = torch.load(saved_filename)

opt = saved_data.options

params:copy( saved_data.saved_params )

grad_params:copy( saved_data.grad_params )

That is, I recreate the system using the same options and clone it in the same way - the main change is simply transferring the saved params and grad_params before starting the optimization.

I was just wondering if this is the right way to do it?

Thanks for your help 👍

Best regards,

Aj

The text was updated successfully, but these errors were encountered:

bshillingford · 2015-04-17T15:47:00Z

Hi, I recommend saving the cloned sequences of modules instead of the protos, just to make things easier. This would result in larger saved files of course, though, since the activations would be saved as well (but still the same number of weight matrices).

If you want to just serialize protos: In train.lua, lines 53 onward, the protos will have all their params pointing to subtensors of one shared tensor. So, to serialize just the protos, you can do that in the training loop periodically, I think I did that there. To re load the protos, wrap lines 43-53 in an if statement that checks if you want to load it from a file, then either recreate protos as lines43-53 already do, or deserialize it from a file.

The clone_many_times must be done after this (see line 54 for an explanation why).

Cheers,

Brendan

ghost · 2015-04-17T16:26:22Z

Wow thanks for the quick reply :)

I don't really understand how you can get the params of the cloned sequence of modules, and put them into memory as a shared 1D tensor, that the optimizer can use?

I'm sort of confused because clones is a table of modules, not a single module? Is there some trick I'm missing?

bshillingford · 2015-04-17T16:32:50Z

Here's the sequence of operations:

The prototypes are generated (protos)
Their parameters are flattened by allocating a new tensor that holds all
of their weights/biases so that the optimizer can access them easily, and
recursively replacing the protos' parameters with new tensors pointing to
this one tensor.
Now we create the cloned sequence of modules using the prototypes...
remember that no new params are allocated now, and each instance in the
clone just has a reference to the same tensors.

I think step 3 is where you're a bit confused here. We create the sequences
of clones from the protos after the parameters of the prototypes all
point to the shared tensor for optim.

Does that help?

On Fri, Apr 17, 2015 at 5:26 PM, Ajay Talati notifications@github.com
wrote:

Wow thanks for the quick reply :)

I don't really understand how you can get the params of the cloned
sequence of modules, and put them into memory as a shared 1D tensor, that
the optimizer can use?

I'm sort of confused because clones is a table of modules, not a single
module?

—
Reply to this email directly or view it on GitHub
#3 (comment)
.

ghost · 2015-04-17T16:47:17Z

Yes - I think so? 👍

So the variables, params and grad_params are basically addresses to the shared tensor, and should be sort of saved, (for want of a better word), when I save the clones closure? So when I reload the sequence of clones, the variables params and grad_params will be reloaded and still point to the shared tensor (which holds the weights/biases).

That seems a bit magical, but easy to do ? Basically I just reload the closure clones, and start the optimization loop - as before - no need to use either of the functions from model_utils.lua ?

bshillingford · 2015-04-17T16:55:16Z

Sorry for the confusion, in my first reply I listed two options:

serialize only protos and params and grad_params to save space in the saved model, then recreate clones on load (I explained this one in the " lines 43-53 " paragraph in the first reply: just serialize {params,grad_params,protos} then recreate clones after you deserialize)
just save clones, protos, and params and grad_params all together in a table

If you save clones like you just mentioned, that's correct as long as you serialize params and grad_params at the same time (e.g. put them all in one table {params,grad_params,clones,protos}), which I neglected to mention. No need to use model_utils.lua if you serialize everything together. IIRC protos isn't used past line 50 or so, so you probably don't need to serialize it. I'd serialize it anyway though.

(Torch's serialization system will see that their Tensors point to the same Storage objects as all the params inside the modules, and so they'll point to the same thing on deserialization too.)

Edit: To comment on your initial post: I missed part because I read it before you edited. Your way with copying parameter values would work as well, but option 1 above is the way I usually do it.

ghost · 2015-04-17T20:23:11Z

Wow, thank's a lot for great explanation 👍 - I'm trying it now - it takes a long time for my model to train - so it's not easy to tell if it's working?? I think it is though.

So basically just saving and unpacking to their original names, the 'things' in the following table, i.e

table_to_save = {params,grad_params,clones,protos, opt}

save it, reload it and unpack it

params,grad_params,clones,protos, opt = unpack( table_to_save )

is all you have to do? All the parameter/memory sharing technicalities are magically reloaded. I did'nt think it would be that easy?

On a different level though its still kind of confusing/unsatisfying? This is a bit theoretical, so bear with me, but in terms of information theory and source coding, my system is a variational autoencoder. So if I add the number of bits that it takes for the algorithms of the luajit compiler, the essential modules of Torch I use, my VAE system's .lua files and the my trained models saved parameters, which is just a few million 64 bit numbers, I should have the amount of information that's coded into my algorithmic/generative model, which is basically a probability distribution, of my dataset which is cluttered MNIST32. So in total that lot I guess is 1Gig at the most.

If I clone my trained modules and then save them all, the full amount of data saved comes to about 5 Gig each time. So it just seems that in terms of information and data compression, its more satisfactory just to recreate the system fresh, using model utils, and then :copy the saved parameter numbers into the shared param tensor of the freshly (re) built system.

What do you think?

I need to do some more experiments just for my own sanity to make sure both methods work 👍

bshillingford · 2015-04-17T20:36:03Z

That's the correct way to serialize, yes, and sharing/references are
handled correctly. Without going into too much detail, there's a few
different levels of a serialization system's complexity regarding
pointers/references. In C/C++ notation, if &a.b == &c.b, then when
serializing a and c together we'd expect &a.b == &c.b when deserializing
too. In the case of parameter sharing, a and c are Tensors, and the b is
the shared underlying Storage. Remember there's only one Storage for the
parameters in the entire network. More advanced serialization libraries can
correctly (de)serialize pointer/reference cycles (torch probably does as
well, but I haven't checked, and this situation is probably rare for most
torch code anyway).

The amount of space is large because the activations and gradients in each
clone in the network are being serialized too (i.e. module.output and
module.gradOutput for each module). The values in these are obviously
useless. To avoid this, serialize just params,grad_params,protos,opt and
recreate clones using clone_many_times when you start, or just use your
solution of serializing parameter values and copying them (but remember to
do this after calling combine_all_params).

On Fri, Apr 17, 2015 at 9:23 PM, Ajay Talati notifications@github.com
wrote:

Wow, thank's a lot for great explanation [image: 👍] - I'm trying it
now - it takes a long time for my model to train - so it's not easy to tell
if it's working?? I think it is though.

So basically just saving and unpacking to their original names, the
'things' in the following table, i.e

table_to_save = {params,grad_params,clones,protos, opt}

save it, reload it and unpack it

params,grad_params,clones,protos, opt = unpack( table_to_save )

is all you have to do? All the parameter/memory sharing technicalities are magically
reloaded. I did'nt think it would that easy?

On a different level though its still kind of confusing/unsatisfying? This
is a bit theoretical, so bear with me, but in terms of information theory
and source coding, my system is a variational autoencoder. So if I add the
number of bits that it takes for the algorithms of the luajit compiler,
the essential modules of Torch I use, my VAE system's .lua files and the my
trained models saved parameters, which is just a few million 64 bit
numbers, I should have the amount of information that's coded into my
algorithmic/generative model, which is basically a probability
distribution, of my dataset which is cluttered MNIST32. So in total that
lot I guess is 1Gig at the most.

If I clone my trained modules and then save them all, the full amount of
data saved comes to about 5 Gig each time. So it just seems that in terms
of information and data compression, its more satisfactory just to recreate
the system fresh, using model utils, and then :copy the saved parameter
numbers into the shared param tensor of the freshly (re) built system.

What do you think?

I need to do some more experiments just for my own sanity to make sure
both methods work [image: 👍]

—
Reply to this email directly or view it on GitHub
#3 (comment)
.

ghost · 2015-04-17T20:57:35Z

Brilliant - thank you very much for the clear explanation 👍

ghost · 2015-04-21T23:23:41Z

Hi Brendan,

thanks for all the great help you've given me. Just to share with you a little trick I found.

i) Train a system for say n timesteps/clones of the master modules and save (serialize) the parameter, and grad parameter tensors, and the first clone in the method as you explained above,

ii) then rebuild your system fresh with an extra timestep/clone n+1, using model_utils.clone_many_times, on the first clone

I think this little trick is working, (at least for the variational auto encoder I'm working on 👍 )

ghost · 2015-04-23T14:59:12Z

Hi, just an update on my suggest trick of restarting training with a rebuilt system with an added clone - after doing more controlled experiments - it does not seem to be working?

Basically I've found that there's no substitution for fixing a number of clones/timesteps and being patient, waiting for the system to start breaking it's symmetries.

My suggested trick of using the parameters of a shorter system, as the initial parameters of a system with an extra clone/timestep, seems to restrict the parameter space, and result in a higher final loss. The standard method of simply training using the desired number of timesteps, and being patient, or finding better ways to initialize the system, or other tricks, seems to result in a lower final loss.

Very sorry for the half-baked idea 👎

ghost · 2015-05-01T00:04:53Z

Thanks a lot for all your great help Brendan 👍

I think anyone who reads through this issue will get a few choices of how to save and restart training networks.

Best regards, Aj

mszlazak · 2015-05-01T02:18:16Z

Nice if i could get the code to work.

96749c8#commitcomment-10954747

ghost closed this as completed May 1, 2015

This issue was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to restart training of a saved model? #3

How to restart training of a saved model? #3

ghost commented Apr 17, 2015

bshillingford commented Apr 17, 2015

ghost commented Apr 17, 2015

bshillingford commented Apr 17, 2015

ghost commented Apr 17, 2015

bshillingford commented Apr 17, 2015

ghost commented Apr 17, 2015

bshillingford commented Apr 17, 2015

ghost commented Apr 17, 2015

ghost commented Apr 21, 2015

ghost commented Apr 23, 2015

ghost commented May 1, 2015

mszlazak commented May 1, 2015

How to restart training of a saved model? #3

How to restart training of a saved model? #3

Comments

ghost commented Apr 17, 2015

bshillingford commented Apr 17, 2015

ghost commented Apr 17, 2015

bshillingford commented Apr 17, 2015

ghost commented Apr 17, 2015

bshillingford commented Apr 17, 2015

ghost commented Apr 17, 2015

bshillingford commented Apr 17, 2015

ghost commented Apr 17, 2015

ghost commented Apr 21, 2015

ghost commented Apr 23, 2015

ghost commented May 1, 2015

mszlazak commented May 1, 2015