-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Got all nan after training 3 epoches #6
Comments
context2vec uses chainer's built-in negative sampling cost function (not cross_entropy) and this function should be stable. nan values could be related to the configuration of the optimizer, though. I think you should consider configuring the Adam optimizer with a slower learning rate or trying a different optimizer (e.g. SGD). |
What is your final loss? You didn't set learning rate in your code. And it seems that there is not a direct setting way in chainer. |
The loss function is chainer.links.NegativeSampling (a popular word2vec loss function). You can look at the code to see exactly how I use it. |
Yes, I read the source code. And the learning rate needs to be setted using this:
You didn't set learning rate in your code. Will it help if set a drop ratio? My accum_loss/word is about 0.5 now. What is yours after 3 epoches? |
I experimented a little with dropout early on and didn't see it helping much on the dev set so I didn't use it, but I didn't explore this option thoroughly. In your case, where performance seems to decrease with epochs, it does seem to make sense to try dropout. If you're first trying to reproduce my results the first thing is to make sure that your setting is identical. Please double check that you followed the ukwac preprocessing described in the paper (lowercasing, all words with frequency less than 100 converted to UNK, etc.). Also, would be good if you could share with me the exact command line that you're using to train your model, so I could double check that you're using the same arguments as I did. PS I already finished my phd and don't currently have a setup for running context2vec, so I can't tell you my accum_loss/word. EDIT: Actually, lowercasing of corpus is done by default in context2vec, so no need to worry about that. The trimming of words with a frequency lower than 100 is done by using the -t argument in train_context2vec.py |
Thank you for your kindness. |
I used -b 1000, but that shouldn't make a difference. I also used max-sent-len=64 when running corpus_by_sent_length.py. Of all that, the addition of the pos tag seems to be the one factor that could be significant. I recommend that you try without it. |
Hello! We are also having this problem (training on gigaword with ≈100k vocab size); we're going to try switching from Adam to SGD for the optimizer, but can you think of any other reason for the numerical instability? From looking at the
Looking at the code, it looks like the loss Training happily continues for a few more days afterwards, but the resulting vectors are (unsurprisingly) full of NaNs. It happens at different points in the training (i.e., not always after the same number of words) so it doesn't seem to be caused by a malformed chunk of input, or anything like that, at least not as far as we can tell. |
@stevenbedrick I decreased the learning rate and got a better result. When you got nan the model is very likely to overfit. So you can stop for you have got a really small number that can't be recognised. Hope my experience helps. |
Yes. I'd suggest you try decreasing the learning rate. That can be done both for Adam and SGD. |
OK, we're going to try turning Adam's alpha value down to 0.0005 and see if that gets us anywhere. Thanks for the tips! |
I trained a [context2vec].(https://github.com/orenmel/context2vec/blob/master/context2vec/train/train_context2vec.py)
I printed the context_v in explore_context2vec.py, and got all nan.
When using tensorflow, I can fix this by adding a small number to the loss function.
cross_entropy = self.target * tf.log(self.prediction+1e-10)
How to do this in chainer?
The text was updated successfully, but these errors were encountered: