-
Notifications
You must be signed in to change notification settings - Fork 19.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Saved model starts with initital loss and accuracy values after loading #12263
Comments
@fchollet I haven't tried saving model using model.save() but I have seen people on other threads saying that the issue was solved with model.save() and models.save_model(). If it is actually solved, ModelCheckpoint should also save optimizer state to resume training but it doesn't (or can't) whatsoever the reason. I have verified the code of ModelCheckpoint callback which indirectly calls model.save() which leads a call to models.save_model(). So theoretically, if the issue in the base i.e. models.save_model() is solved, then it should also be solved in other functions. Sorry, but I don't have a powerful machine to check this practically. If someone here has, I have shared my code on github and the link is provided in the issue. Please try the resume training on it and detect the cause of this problem. I am using the computer provided by a national institute and hence, students here need to share this single computer for their projects. I can't use that computer for such tasks. Thank you.. |
Recently, I tried to check if the weights are saved correctly. For that, I evaluated the model with my validation generator. I saw that the output loss and accuracy remained the same as those in the beginning of model training. Seeing this, I reached a conclusion that its actually the issue with saving weights of the model. I might be wrong here.. BTW, I have also used multi_gpu_model() in my model code. Can it cause this issue? I can't try training model on CPU as it too heavy for that and will take a few days for only 1 epoch to complete. Can anyone help debugging? I see no response in such issues these days. Just list current issues on the README.md in keras github so that users can be aware before trying keras out and wasting months behind it. |
@msymp @gabrieldemarmiesse If don't, just write out the issues and a disclaimer in the main Keras page. I have seen many issues just left out without a solution. Only issues where the author directly points out the problem are considered here. Know that if one can directly point out the issue, he/she can also solve that issue.. such people only open issues requesting a feature update or reporting bugs. While the other who really need help are left unconsidered. |
Thought the problem was with mult_gpu_model but it isn't.. I tried this code here to save and load the serial model whenever these functions are called on parallel model but that too didn't work..
Anything else that I can try? @gabrieldemarmiesse Is anyone actually investigating or that tag is just for show? This is my third issue here on Keras. And both of the previous two issues were solved by me without anyone helping.. This time, I have to deal with the internals of Keras rather than my model and so I am forcing on someone's help.. Heavy model, lack of resources and this.. What should I say! |
@trane293 I saw your comment here - #2378 (comment) I tried contacting you earlier to share your piece of code so that I can verify it on mine's. Can you help on this issue please? |
UPDATE: I tried removing the ModelMGPU wrapper as well as the multi_gpu_model. Hence, I tried to train the model on single GPU and yet it didn't work! @jvishnuvardhan @gabrieldemarmiesse Any updates? |
@ParikhKadam hello,I get the same problem,I reload weights,but get the different loss ,and the loss bigger, I checked the weights, can you tell me how to solve this problem |
@mthaha123 Yes... can you share your code? |
@ParikhKadam here is my code: https://github.com/mthaha123/nice_glow |
@mthaha123 I looked at your code. It seems that you have implemented custom objects (layers) in your model. When you do it, you also need to tell model which are the custom objects and how to load that custom object. Have a look at custom layer implemented by me here.. try to understand get_config method and implement it in your custom layers. That will solve your problem..maybe! If you still have problems, ask me. |
I think I found the reason caused the problem, thank you for help! @ParikhKadam |
@ParikhKadam Hello dear, |
@alirezaghader Can you share your code so that I can have a look? |
2021 same problem still exists......anyone finds solution? or maybe i should use Pytorch rather than Keras? |
I am building a model for machine comprehension. It's a heavy model required to train on lots of data and this requires me more time. I have used keras callbacks to save model after every epoch and also save a history of loss and accuracy.
The problem is, when I am loading a trained model, and try to continue it's training using
initial_epoch
argument, the loss and accuracy values are same as untrained model.Here is the code: https://github.com/ParikhKadam/bidaf-keras
The code used to save and load model is in /models/bidaf.py
The script I am using to load the model is:
The training history is quite good which is:
Also, I have already taken care of loading custom objects such as layers, loss function and accuracy.
I am kind of frustrated by now as I took me days to train this model upto epochs and now I can't resume training. I have referred various threads in keras issues and found many people are facing such issues but can't find a solution.
Someone in a thread said that "Keras will not save RNN states" (I ain't using stateful RNNs) and someone else said "Keras reinitializes all the weights before saving which we can handle using a flag." I mean, if such problems exist in Keras, what will be the use of functions like save().
I have also tried saving only weights after every epoch and then building model from scratch and then loading those weights into it. But that didn't work. You can find the old code I used to save weights only in the above listed github repo's older branches.
I have referred this issue with no help - #4875
That issue is open from past two years. Can't understand what all the developers are doing! Is anyone here who can help? Should I switch to tensorflow or I will face the same issues in that too?
Please help...
The text was updated successfully, but these errors were encountered: