You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks a lot for this great work that you published transparently
I have a few questions
Firstly, I noticed there are two time-related parameters:
time_batch_size
time_steps
But I am not sure how to change them when finetuning. Say, I have a model trained until 100 steps, and I want to finetune to 200 steps (and ultimately until 1000 timesteps). In this case, what would the values of time_steps and time_batch_size be?
Similarly, what would the values be when training from an intialized model to 100 timesteps?
Also, for number of adam iterations, is it 10000 for each finetune, or is it 10000 across the entire sequence of finetunings?
Finally, I noticed that in your paper, you initialized h and c to zero for the LSTM, but in the code they are intialized randomly. Is there a preferred initalization between these?
Best,
Rami
The text was updated successfully, but these errors were encountered:
RamiCassia
changed the title
How do you sequentially pretrain?
A few questions
Nov 16, 2022
Thanks for your interest. Please see below my replies point-wisely for your questions.
For these two time-related parameters, time_steps denotes the total time steps you would like to train/extrapolate. And time_batch_size is mainly for batching data in ConvLSTM along the time dimension, which is in consideration of the computational memory issue when facing long sequence training/extrapolation. For example, when dealing with a long sequence of 1000, you can set time_batch_size as 500, and then you will have 2 batches of [0,500] and [500,1000]. With this setting, the model can fit into the computational limit of some PCs or servers. In your case, time_steps should be 201 (one-more-step prediction is for calculating the time derivative due to central difference.) and time_batch_size is 200 (if it works for your server).
For an intialized model to 100 timesteps, I think 101 for time_steps and 100 for time_batch_size should be good.
For the number of adam iterations, this is for each finetuning, but they do not have to be 10k. For finetuning, 5k should be enough. In the paper, I was trying to show the best results :). In addition, this proposed method is kinda an "unsupervised" strategy cuz it's only constrained by physics loss. With more epochs, the loss will continuously go down but will be trivial after enough iterations.
Yes, we show the random initialization in the code and mention zero-initialization in the paper. They perform similarly. Random perturbation somehow helps training.
Hello,
Thanks a lot for this great work that you published transparently
I have a few questions
Firstly, I noticed there are two time-related parameters:
But I am not sure how to change them when finetuning. Say, I have a model trained until 100 steps, and I want to finetune to 200 steps (and ultimately until 1000 timesteps). In this case, what would the values of time_steps and time_batch_size be?
Similarly, what would the values be when training from an intialized model to 100 timesteps?
Also, for number of adam iterations, is it 10000 for each finetune, or is it 10000 across the entire sequence of finetunings?
Finally, I noticed that in your paper, you initialized h and c to zero for the LSTM, but in the code they are intialized randomly. Is there a preferred initalization between these?
Best,
Rami
The text was updated successfully, but these errors were encountered: