-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accuracy could not match with the log when load_model #17
Comments
Hello, To clarify, are you loading the model at step 67? Is the performance of the model when you load the checkpoint 53? And is the performance of the checkpoint in the log 58? |
Hi @dptam, the step is actually 75. As we see from the log here, in line 20(epoch 19), the log is 0.5812 BTW step 79 is 0.5631 Same thing in COPA dataset , line 221 is 0.62 |
I'm not sure the issue. If you don't mind, could you rerun and add |
Hi @dptam , actually when I tried to run the finish.pt, it can not match the last accuracy in log. |
Is there something wrong with the code? @muqeeth @jmohta @HaokunLiu |
@dptam I have add global step as your suggestion, but it still can not match |
What is in pl_test.py? Mind share with us what you have there? |
Hello, Thanks for rerunning the code. I'm still not sure why loading and rerunning the model doesn't match the log performance - could you share the command used to train the model? Regarding the issue of |
Hi @HaokunLiu @dptam , actually pl_test is just a copy of train, except for loading method. See I was use both your save model method and checkpoing method of pytorch ligetning. See, But here is the thing, the train command is as bellow And the test code is as bellow, actually pl_train/test run the same result And the log here, not use finish.pt but the 51 as suggestion of @dptam |
Hi, I tried to look into a bit and couldn't figure out the cause but found one issue for me at least(not sure if it will be the same for you). Sorry I don't have more time to look into it currently, but maybe you can. When using This causes the |
Hi, @muqeeth @dptam @craffel , when I set the eval_epoch_interval=1. I have some accuracy in my log, and I save my model and checkpoint. But when I tried to reload the model, its accuracy did not match the accuracy.
The text was updated successfully, but these errors were encountered: