Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Form of the h function for infinite dataset #13

Closed
brotherofken opened this issue Dec 27, 2019 · 2 comments
Closed

Form of the h function for infinite dataset #13

brotherofken opened this issue Dec 27, 2019 · 2 comments

Comments

@brotherofken
Copy link

Thanks for the great code and paper!

I have a question regarding the form of h function. I have a huge dataset, thus it's impossible to store all embeddings in memory, so I decided to increase the batch size and mine negatives from it.
So far so good, but from my understanding due to big dataset size nominator almost equals denominator and h approaches 1.

Do you think that it's a good idea to replace h with an angular similarity between embeddings instead of the ratio proposed in the paper? Or maybe you could kindly propose some other appropriate choice for h?

@brotherofken brotherofken changed the title h-function for infinite dataset Form of the h function for infinite dataset Dec 27, 2019
@brotherofken brotherofken changed the title Form of the h function for infinite dataset Form of the h function for infinite dataset Dec 27, 2019
@HobbitLong
Copy link
Owner

Hi, @brotherofken ,

Thanks for your interest. For the eq. 19 in the paper, h will automatically work if you set N and M as the number of negatives to pair each positive and the number of the dataset size, respectively. h approaches to 1 at the beginning, bu will be adjusted very quickly as the training proceeds. This is how NCE works.

Angular similarity might also work, however loses the spirit of posterior probability in NCE.

@brotherofken
Copy link
Author

brotherofken commented Jan 17, 2020

Thanks for the quick response!

I read the paper carefully and found that I missed that the temperature in (19) has to be quite low (0.02-0.3 in your experiments) to compensate for the small value of the N/M ratio.
That became clear now. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants