Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accuracy could not match with the log when load_model #17

Open
CaffreyR opened this issue Jul 22, 2022 · 10 comments
Open

Accuracy could not match with the log when load_model #17

CaffreyR opened this issue Jul 22, 2022 · 10 comments

Comments

@CaffreyR
Copy link

CaffreyR commented Jul 22, 2022

Hi, @muqeeth @dptam @craffel , when I set the eval_epoch_interval=1. I have some accuracy in my log, and I save my model and checkpoint. But when I tried to reload the model, its accuracy did not match the accuracy.
image
image

@dptam
Copy link
Collaborator

dptam commented Jul 22, 2022

Hello,

To clarify, are you loading the model at step 67? Is the performance of the model when you load the checkpoint 53? And is the performance of the checkpoint in the log 58?

@CaffreyR
Copy link
Author

CaffreyR commented Jul 22, 2022

Hi @dptam, the step is actually 75. As we see from the log here, in line 20(epoch 19), the log is 0.5812
image
And when I enter this code, it run accuracy is 0.5848
image

BTW step 79 is 0.5631

Same thing in COPA dataset , line 221 is 0.62
image
So when I tried to run step 883 887 it result is 0.54 , and 879 is 0.55
image
image

@dptam
Copy link
Collaborator

dptam commented Jul 22, 2022

I'm not sure the issue. If you don't mind, could you rerun and add self.global_step to the metrics dictionary here. This should output the global step in the log that matches the global step used to save the model just to make sure the line number corresponds to the correct ckpt.

@CaffreyR
Copy link
Author

Hi @dptam , actually when I tried to run the finish.pt, it can not match the last accuracy in log.

image

image

image

@CaffreyR
Copy link
Author

Is there something wrong with the code? @muqeeth @jmohta @HaokunLiu

@CaffreyR
Copy link
Author

@dptam I have add global step as your suggestion, but it still can not match
image
image
image

@HaokunLiu
Copy link
Collaborator

HaokunLiu commented Jul 22, 2022

What is in pl_test.py? Mind share with us what you have there?

@dptam
Copy link
Collaborator

dptam commented Jul 23, 2022

Hello,

Thanks for rerunning the code. I'm still not sure why loading and rerunning the model doesn't match the log performance - could you share the command used to train the model?

Regarding the issue of finish.pt not matching the last accuracy in log, see #11 for more details why.

@CaffreyR
Copy link
Author

Hi @HaokunLiu @dptam , actually pl_test is just a copy of train, except for loading method. See I was use both your save model method and checkpoing method of pytorch ligetning. See,
image
And I change a little bit in encoderdecoder.py
image

But here is the thing, the train command is as bellow
image

And the test code is as bellow, actually pl_train/test run the same result
image

And the log here, not use finish.pt but the 51 as suggestion of @dptam
image

@dptam
Copy link
Collaborator

dptam commented Jul 26, 2022

Hi,

I tried to look into a bit and couldn't figure out the cause but found one issue for me at least(not sure if it will be the same for you). Sorry I don't have more time to look into it currently, but maybe you can.

When using t5-small and printing out self.model.lm_head.weight(), the norm is 94070 in the train_step function but 94072 in the predict function. This is due some precision issues when moving from CPU to GPU and one remedy was adding
self.weight = torch.clone(self.model.lm_head.weight).double().cuda().float() at the end of the init function for EncoderDecoder.py and adding self.model.lm_head.weight = torch.nn.Parameter(self.weight) at the beginning of the training_step function.

This causes the self.model.lm_head.weight() to consistently be 94070 in the train_step and predict function, but the accuracy from the log and from loading a validation checkpoint still do not match. I'm not sure why, but one potential further analysis is to look at the other weights of the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants