Train and validation WER both remain 1 while training VO model #6

smeetrs · 2020-07-14T14:32:39Z

Hi,
I am trying to train the video-only model, when the 'PRETRAIN_NUM_WORDS' is 1, it seems that the WER of training and testing set are both 1 all the time and there is no any improvement.

`Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 182 || Tr.Loss: 3.239813 Val.Loss: 3.226672 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Epoch 183: reducing learning rate of group 0 to 1.0000e-06.
Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 183 || Tr.Loss: 3.241490 Val.Loss: 3.221430 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 184 || Tr.Loss: 3.240253 Val.Loss: 3.238177 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 185 || Tr.Loss: 3.228107 Val.Loss: 3.234346 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 186 || Tr.Loss: 3.234290 Val.Loss: 3.216766 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 187 || Tr.Loss: 3.241915 Val.Loss: 3.232590 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 188 || Tr.Loss: 3.233189 Val.Loss: 3.228462 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 189 || Tr.Loss: 3.236741 Val.Loss: 3.223365 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 190 || Tr.Loss: 3.235876 Val.Loss: 3.216625 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 191 || Tr.Loss: 3.241944 Val.Loss: 3.242806 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 192 || Tr.Loss: 3.237240 Val.Loss: 3.243809 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000

Train: 0%| | 0/512 [00:00<?, ?it/s]Step: 193 || Tr.Loss: 3.238747 Val.Loss: 3.219588 || Tr.CER: 1.000 Val.CER: 1.000 || Tr.WER: 1.000 Val.WER: 1.000
`
Is this situation normal?
Thanks for your suggestions.

Originally posted by @yuexianghubit in #4 (comment)

smeetrs · 2020-07-14T14:55:24Z

I have faced this issue a few times during this project. I had faced it while training the VO model also. Most of the times the solution was very trivial and hard to catch. For this particular case, I had tried many things but hadn't been able to solve it. Eventually, I had initialized the model with weights from the previous version of this project. I think you can try changing the seed value a few times (that also helped me once!!) or the using some weight initializers like Xavier etc. For a value, if the train WER doesn't decrease after around 10-20 steps, you can quit the training and try some other value. If nothing works, please wait until the release of the pretrained weights (which I will possibly be able to do soon). Meanwhile, I will also try to see if I can find any solution to this problem.

Thanks.

smeetrs · 2020-07-25T20:16:01Z

@yuexianghubit have you been able to solve this issue?

yuexianghubit · 2020-07-26T03:30:30Z

Yes, I just change the seed value, and then it did help. The final performance I get is about 57%, close to your result.
Thanks.

smeetrs · 2020-07-26T05:52:25Z

Great!! 👍

I am closing this issue for now as changing the seed value seems to work.

smeetrs · 2022-04-12T13:42:25Z

A more better solution that seems to be working in such cases is lowering the learning rate.

smeetrs changed the title ~~Train and test WER both remain 1 while training VO model~~ Train and validation WER both remain 1 while training VO model Jul 14, 2020

smeetrs mentioned this issue Jul 14, 2020

how much training time #4

Closed

smeetrs closed this as completed Jul 26, 2020

This was referenced Nov 3, 2020

The WER is always 1.000 #13

Closed

When I was training the ao model,the WER and CER were always 1.000 #14

Closed

FL990 mentioned this issue Mar 31, 2022

When pretrain VO model, loss reduce a few epochs, but immediately increase and remain. #43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train and validation WER both remain 1 while training VO model #6

Train and validation WER both remain 1 while training VO model #6

smeetrs commented Jul 14, 2020

smeetrs commented Jul 14, 2020

smeetrs commented Jul 25, 2020

yuexianghubit commented Jul 26, 2020

smeetrs commented Jul 26, 2020

smeetrs commented Apr 12, 2022

Train and validation WER both remain 1 while training VO model #6

Train and validation WER both remain 1 while training VO model #6

Comments

smeetrs commented Jul 14, 2020

smeetrs commented Jul 14, 2020

smeetrs commented Jul 25, 2020

yuexianghubit commented Jul 26, 2020

smeetrs commented Jul 26, 2020

smeetrs commented Apr 12, 2022