Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train and validation WER both remain 1 while training VO model #6

Closed
smeetrs opened this issue Jul 14, 2020 · 5 comments
Closed

Train and validation WER both remain 1 while training VO model #6

smeetrs opened this issue Jul 14, 2020 · 5 comments

Comments

@smeetrs
Copy link
Owner

smeetrs commented Jul 14, 2020

Hi,
I am trying to train the video-only model, when the 'PRETRAIN_NUM_WORDS' is 1, it seems that the WER of training and testing set are both 1 all the time and there is no any improvement.

`Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 182 || Tr.Loss: 3.239813 Val.Loss: 3.226672 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Epoch 183: reducing learning rate of group 0 to 1.0000e-06.
Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 183 || Tr.Loss: 3.241490 Val.Loss: 3.221430 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 184 || Tr.Loss: 3.240253 Val.Loss: 3.238177 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 185 || Tr.Loss: 3.228107 Val.Loss: 3.234346 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 186 || Tr.Loss: 3.234290 Val.Loss: 3.216766 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 187 || Tr.Loss: 3.241915 Val.Loss: 3.232590 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 188 || Tr.Loss: 3.233189 Val.Loss: 3.228462 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 189 || Tr.Loss: 3.236741 Val.Loss: 3.223365 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 190 || Tr.Loss: 3.235876 Val.Loss: 3.216625 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 191 || Tr.Loss: 3.241944 Val.Loss: 3.242806 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 192 || Tr.Loss: 3.237240 Val.Loss: 3.243809 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 193 || Tr.Loss: 3.238747 Val.Loss: 3.219588 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000
`
Is this situation normal?
Thanks for your suggestions.

Originally posted by @yuexianghubit in #4 (comment)

@smeetrs smeetrs changed the title Train and test WER both remain 1 while training VO model Train and validation WER both remain 1 while training VO model Jul 14, 2020
@smeetrs
Copy link
Owner Author

smeetrs commented Jul 14, 2020

I have faced this issue a few times during this project. I had faced it while training the VO model also. Most of the times the solution was very trivial and hard to catch. For this particular case, I had tried many things but hadn't been able to solve it. Eventually, I had initialized the model with weights from the previous version of this project. I think you can try changing the seed value a few times (that also helped me once!!) or the using some weight initializers like Xavier etc. For a value, if the train WER doesn't decrease after around 10-20 steps, you can quit the training and try some other value. If nothing works, please wait until the release of the pretrained weights (which I will possibly be able to do soon). Meanwhile, I will also try to see if I can find any solution to this problem.

Thanks.

@smeetrs
Copy link
Owner Author

smeetrs commented Jul 25, 2020

@yuexianghubit have you been able to solve this issue?

@yuexianghubit
Copy link

Yes, I just change the seed value, and then it did help. The final performance I get is about 57%, close to your result.
Thanks.

@smeetrs
Copy link
Owner Author

smeetrs commented Jul 26, 2020

Great!! 👍

I am closing this issue for now as changing the seed value seems to work.

@smeetrs
Copy link
Owner Author

smeetrs commented Apr 12, 2022

A more better solution that seems to be working in such cases is lowering the learning rate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants