A few questions #9

RamiCassia · 2022-11-16T11:31:01Z

Hello,

Thanks a lot for this great work that you published transparently

I have a few questions

Firstly, I noticed there are two time-related parameters:

time_batch_size
time_steps

But I am not sure how to change them when finetuning. Say, I have a model trained until 100 steps, and I want to finetune to 200 steps (and ultimately until 1000 timesteps). In this case, what would the values of time_steps and time_batch_size be?

Similarly, what would the values be when training from an intialized model to 100 timesteps?

Also, for number of adam iterations, is it 10000 for each finetune, or is it 10000 across the entire sequence of finetunings?

Finally, I noticed that in your paper, you initialized h and c to zero for the LSTM, but in the code they are intialized randomly. Is there a preferred initalization between these?

Best,
Rami

paulpuren · 2022-11-16T20:48:41Z

Hi Rami,

Thanks for your interest. Please see below my replies point-wisely for your questions.

For these two time-related parameters, time_steps denotes the total time steps you would like to train/extrapolate. And time_batch_size is mainly for batching data in ConvLSTM along the time dimension, which is in consideration of the computational memory issue when facing long sequence training/extrapolation. For example, when dealing with a long sequence of 1000, you can set time_batch_size as 500, and then you will have 2 batches of [0,500] and [500,1000]. With this setting, the model can fit into the computational limit of some PCs or servers. In your case, time_steps should be 201 (one-more-step prediction is for calculating the time derivative due to central difference.) and time_batch_size is 200 (if it works for your server).
For an intialized model to 100 timesteps, I think 101 for time_steps and 100 for time_batch_size should be good.
For the number of adam iterations, this is for each finetuning, but they do not have to be 10k. For finetuning, 5k should be enough. In the paper, I was trying to show the best results :). In addition, this proposed method is kinda an "unsupervised" strategy cuz it's only constrained by physics loss. With more epochs, the loss will continuously go down but will be trivial after enough iterations.
Yes, we show the random initialization in the code and mention zero-initialization in the paper. They perform similarly. Random perturbation somehow helps training.

Hope this answers your questions.

RamiCassia changed the title ~~How do you sequentially pretrain?~~ A few questions Nov 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A few questions #9

A few questions #9

RamiCassia commented Nov 16, 2022 •

edited

paulpuren commented Nov 16, 2022

A few questions #9

A few questions #9

Comments

RamiCassia commented Nov 16, 2022 • edited

paulpuren commented Nov 16, 2022

RamiCassia commented Nov 16, 2022 •

edited