You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question regarding the form of h function. I have a huge dataset, thus it's impossible to store all embeddings in memory, so I decided to increase the batch size and mine negatives from it.
So far so good, but from my understanding due to big dataset size nominator almost equals denominator and h approaches 1.
Do you think that it's a good idea to replace h with an angular similarity between embeddings instead of the ratio proposed in the paper? Or maybe you could kindly propose some other appropriate choice for h?
The text was updated successfully, but these errors were encountered:
brotherofken
changed the title
h-function for infinite dataset
Form of the h function for infinite dataset
Dec 27, 2019
brotherofken
changed the title
Form of the h function for infinite dataset
Form of the h function for infinite dataset
Dec 27, 2019
Thanks for your interest. For the eq. 19 in the paper, h will automatically work if you set N and M as the number of negatives to pair each positive and the number of the dataset size, respectively. h approaches to 1 at the beginning, bu will be adjusted very quickly as the training proceeds. This is how NCE works.
Angular similarity might also work, however loses the spirit of posterior probability in NCE.
I read the paper carefully and found that I missed that the temperature in (19) has to be quite low (0.02-0.3 in your experiments) to compensate for the small value of the N/M ratio.
That became clear now. Thanks!
Thanks for the great code and paper!
I have a question regarding the form of
h
function. I have a huge dataset, thus it's impossible to store all embeddings in memory, so I decided to increase the batch size and mine negatives from it.So far so good, but from my understanding due to big dataset size nominator almost equals denominator and
h
approaches 1.Do you think that it's a good idea to replace
h
with an angular similarity between embeddings instead of the ratio proposed in the paper? Or maybe you could kindly propose some other appropriate choice for h?The text was updated successfully, but these errors were encountered: