Regarding training process #50

wonderingalways · 2021-06-12T14:34:18Z

Hello Omar,
Thanks for open sourcing the code for amazing work.

This is not really a issue more like a doubt.

It was mentioned in the paper for MSMACRO that it was trained for 200k iterations with batch size of 32 to approximately reproduce the results., so effectively trained for 6.4 million triplets. So this means it was not trained on the full triples.small.tsv (39 mill points). Is my understanding on this correct?
I am trying to training on MSMACRO triples. During training individual current batch loss is decreasing only for the initial few steps and oscillating for later iterations. Did u face the same issue while training ? Should it be viewed as model is not getting trained or since it seeing the new examples every batch it is expected this way?

okhat · 2021-06-12T17:38:45Z

Yes, no one trains on the full triples! They just continue to sample more negatives.

It doesn't matter what the per-batch loss is like. How is the averaged or smoothed loss? That should continue to decrease in a slowing manner.

wonderingalways · 2021-06-13T03:59:48Z

Hey Thanks for the reply.

okay that's good to know. So for the results mentioned in the paper what kind of strategy is used for sampling negatives during training?

2.Yes the average loss is decreasing in a slowing manner

okhat · 2021-06-13T07:42:06Z

For the paper, we simply train on the first N triples in the small triples file.

wonderingalways · 2021-06-14T07:04:23Z

ohh okay. Thats clears the doubts.

wonderingalways closed this as completed Jun 14, 2021

Provide feedback