Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A few questions #9

Open
RamiCassia opened this issue Nov 16, 2022 · 1 comment
Open

A few questions #9

RamiCassia opened this issue Nov 16, 2022 · 1 comment

Comments

@RamiCassia
Copy link

RamiCassia commented Nov 16, 2022

Hello,

Thanks a lot for this great work that you published transparently

I have a few questions

Firstly, I noticed there are two time-related parameters:

  • time_batch_size
  • time_steps

But I am not sure how to change them when finetuning. Say, I have a model trained until 100 steps, and I want to finetune to 200 steps (and ultimately until 1000 timesteps). In this case, what would the values of time_steps and time_batch_size be?

Similarly, what would the values be when training from an intialized model to 100 timesteps?

Also, for number of adam iterations, is it 10000 for each finetune, or is it 10000 across the entire sequence of finetunings?

Finally, I noticed that in your paper, you initialized h and c to zero for the LSTM, but in the code they are intialized randomly. Is there a preferred initalization between these?

Best,
Rami

@RamiCassia RamiCassia changed the title How do you sequentially pretrain? A few questions Nov 16, 2022
@paulpuren
Copy link
Collaborator

Hi Rami,

Thanks for your interest. Please see below my replies point-wisely for your questions.

  • For these two time-related parameters, time_steps denotes the total time steps you would like to train/extrapolate. And time_batch_size is mainly for batching data in ConvLSTM along the time dimension, which is in consideration of the computational memory issue when facing long sequence training/extrapolation. For example, when dealing with a long sequence of 1000, you can set time_batch_size as 500, and then you will have 2 batches of [0,500] and [500,1000]. With this setting, the model can fit into the computational limit of some PCs or servers. In your case, time_steps should be 201 (one-more-step prediction is for calculating the time derivative due to central difference.) and time_batch_size is 200 (if it works for your server).
  • For an intialized model to 100 timesteps, I think 101 for time_steps and 100 for time_batch_size should be good.
  • For the number of adam iterations, this is for each finetuning, but they do not have to be 10k. For finetuning, 5k should be enough. In the paper, I was trying to show the best results :). In addition, this proposed method is kinda an "unsupervised" strategy cuz it's only constrained by physics loss. With more epochs, the loss will continuously go down but will be trivial after enough iterations.
  • Yes, we show the random initialization in the code and mention zero-initialization in the paper. They perform similarly. Random perturbation somehow helps training.

Hope this answers your questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants