Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs fails when loading previous model #3

Open
pengguo-seismo opened this issue Jan 2, 2022 · 11 comments
Open

Jobs fails when loading previous model #3

pengguo-seismo opened this issue Jan 2, 2022 · 11 comments

Comments

@pengguo-seismo
Copy link

Hi Paul,

I hope you are doing well.

I have a question when trying to running the python script. It requires to load a previous trained model, './model/checkpoint500.pt'. Can you please tell me how to obtain this model, or how to define the weights/biases for initializing the network?

many thanks in advance.

@paulpuren
Copy link
Collaborator

Hello,

Thank you for your interests in our research. Let us take 2D Burgers equation as an instance. Our goal is to solve the PDE for 1000 time steps. The procedure is to first initialize all the network parameters with function initialize_weights, and then train the model for 100 time steps and save the well-trained model as checkpoint100.pt. Second, we load the checkpoint100.pt as the initialized network parameters for training 200 time steps, then you save another well-trained model as checkpoint200.pt. After repeating many times, you will reach the milestone of 1000 time steps.

Hope that answer your question. Thank you!

@norery
Copy link

norery commented Jan 14, 2022

Thank you for your reply. I have the same problem. I observed that there was no adaptation in the code for multiple training rounds. For example, when I train a step 100 times, what should I change? How do I set the value of 'pre_model_save_path ='?
Thank you in advance!

@paulpuren
Copy link
Collaborator

Thank you for your reply. I have the same problem. I observed that there was no adaptation in the code for multiple training rounds. For example, when I train a step 100 times, what should I change? How do I set the value of 'pre_model_save_path ='? Thank you in advance!

Thank you for your question. Yes, we only show the code for 1000 time steps. When training for the 100 steps, you will directly apply the function initialize_weights, and you do not need pre_model_save_path for 100 steps.

@LiShenshen123
Copy link

Hello, how do I get the parameter pre_model_save_path? Very confused, hope to get your help, thank you very much

@paulpuren
Copy link
Collaborator

Hello, how do I get the parameter pre_model_save_path? Very confused, hope to get your help, thank you very much

Thank you for your question. pre_model_save_path is for the pretrained model. Take 2D burgers as an example. If you pretrain the model starting from 100 steps, then 200 steps, 500 steps. For the 1st pretraining, you do not have pre_model_save_path and directly train the model with the network parameters being initialized based on the function initialize_weights. For the 2nd pretraining, you can initialize the network parameters with the learned model from the 1st pretraining (this is where pre_model_save_path works), and then further train it for 200 steps.

@LiShenshen123
Copy link

For the first pre-training, how to train without pre_model_save_path directly using the network parameters initialized based on the function initialize_weights.
It has always reported an error: FileNotFoundError: [Errno 2] No such file or directory: './model/checkpoint500.pt'

@paulpuren
Copy link
Collaborator

For the first pre-training, how to train without pre_model_save_path directly using the network parameters initialized based on the function initialize_weights. It has always reported an error: FileNotFoundError: [Errno 2] No such file or directory: './model/checkpoint500.pt'

The checkpoint500.pt here is the saved model for training 500 time steps. We show the code for training 1000 time steps based on the pretrained model of 500 time steps, where you find the pre_model_save_path containing checkpoint500.pt. When first training for 100 time steps, you can name it as checkpoint100.

@LiShenshen123
Copy link

I'm still confused, because I still can't run it successfully. I read that your code also needs a network pre-training weight for the first training. As for the network initialization weight you said, I don't know how to implement it. I see that a pre-trained model is loaded in the train function defined in your code. I'm messy, can you send me a debugged code on how to get the pretrained model in the first step. Really hope to get your help. My mailbox is 2858724272@qq.com. thank you very much!

@paulpuren paulpuren reopened this Mar 10, 2022
@paulpuren
Copy link
Collaborator

I'm still confused, because I still can't run it successfully. I read that your code also needs a network pre-training weight for the first training. As for the network initialization weight you said, I don't know how to implement it. I see that a pre-trained model is loaded in the train function defined in your code. I'm messy, can you send me a debugged code on how to get the pretrained model in the first step. Really hope to get your help. My mailbox is 2858724272@qq.com. thank you very much!

Hi, we have tested the code. It works well. The code posted in the repo does not have bugs. You may modify it for your own purpose (e.g., for different pretraining schemes or different PDE systems).

Second, for the first training, you do not need pretrained network parameters (e.g., weights). They are initialized based on the function initialize_weights.

Third, the pretrained model is loaded unless there is pretraining happening. Namely, you will only need it after the 1st pretraining.

@LiShenshen123
Copy link

I'm still confused, because I still can't run it successfully. I read that your code also needs a network pre-training weight for the first training. As for the network initialization weight you said, I don't know how to implement it. I see that a pre-trained model is loaded in the train function defined in your code. I'm messy, can you send me a debugged code on how to get the pretrained model in the first step. Really hope to get your help. My mailbox is 2858724272@qq.com. thank you very much!

Hi, we have tested the code. It works well. The code posted in the repo does not have bugs. You may modify it for your own purpose (e.g., for different pretraining schemes or different PDE systems).

Second, for the first training, you do not need pretrained network parameters (e.g., weights). They are initialized based on the function initialize_weights.

Third, the pretrained model is loaded unless there is pretraining happening. Namely, you will only need it after the 1st pretraining.

Thank you for your reply, this is my first training process and the error says that a pre-trained model is required. Is there any special setup required for the first pre-training? thank you
QQ图片20220310131203

@richardliuss
Copy link

Hi Dr.Ren,
Would you please give us a detailed tutorial that can guide to finish the first training? Like how to modify the code, what kind of file structure is needed.
Please foregive me for my ignorance to your code. Because I am majored in computational fluid dynamics.
Thank you so much.
Richard

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants