-
Notifications
You must be signed in to change notification settings - Fork 359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regarding training process #50
Comments
Yes, no one trains on the full triples! They just continue to sample more negatives. It doesn't matter what the per-batch loss is like. How is the averaged or smoothed loss? That should continue to decrease in a slowing manner. |
Hey Thanks for the reply.
2.Yes the average loss is decreasing in a slowing manner |
For the paper, we simply train on the first N triples in the small triples file. |
ohh okay. Thats clears the doubts. |
Hello Omar,
Thanks for open sourcing the code for amazing work.
This is not really a issue more like a doubt.
It was mentioned in the paper for MSMACRO that it was trained for 200k iterations with batch size of 32 to approximately reproduce the results., so effectively trained for 6.4 million triplets. So this means it was not trained on the full triples.small.tsv (39 mill points). Is my understanding on this correct?
I am trying to training on MSMACRO triples. During training individual current batch loss is decreasing only for the initial few steps and oscillating for later iterations. Did u face the same issue while training ? Should it be viewed as model is not getting trained or since it seeing the new examples every batch it is expected this way?
The text was updated successfully, but these errors were encountered: