Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Subsampling frequent outputs #3

Closed
stephanpeitz opened this issue Feb 13, 2019 · 4 comments
Closed

Subsampling frequent outputs #3

stephanpeitz opened this issue Feb 13, 2019 · 4 comments

Comments

@stephanpeitz
Copy link

Hi,

thanks for sharing your code!
I'm just wondering if you have implemented the subsampling of frequent outputs (can't find it in your code) and if it was crucial for the performance.

Cheers,
Stephan

@glample
Copy link
Contributor

glample commented Feb 13, 2019

Hi,

Not sure about what you mean by frequent outputs, but the code that selects the words to mask is here: https://github.com/facebookresearch/XLM/blob/master/src/trainer.py#L295-L305

sample_alpha == 0 will do the same thing as in the original BERT paper, i.e. sample 15% of words at random. A non-zero value of sample_alpha will sample 15% of the words in a batch, but each word will have a different probability of being masked out (i.e. the rare words will have a higher probability to be masked out than the frequent ones).

@stephanpeitz
Copy link
Author

Yes, I was referring to

we also subsample the frequent outputs using an approach similar to Mikolov et al. (2013b)

Thanks for pointing me to the function.

@stephanpeitz
Copy link
Author

Sorry, one more question: which value for sample_alpha have you used in your experiments?

@stephanpeitz stephanpeitz reopened this Feb 14, 2019
@glample
Copy link
Contributor

glample commented Feb 14, 2019

We used sample_alpha = 0.5

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants