Subsampling frequent outputs #3

stephanpeitz · 2019-02-13T00:51:32Z

Hi,

thanks for sharing your code!
I'm just wondering if you have implemented the subsampling of frequent outputs (can't find it in your code) and if it was crucial for the performance.

Cheers,
Stephan

glample · 2019-02-13T01:05:29Z

Hi,

Not sure about what you mean by frequent outputs, but the code that selects the words to mask is here: https://github.com/facebookresearch/XLM/blob/master/src/trainer.py#L295-L305

sample_alpha == 0 will do the same thing as in the original BERT paper, i.e. sample 15% of words at random. A non-zero value of sample_alpha will sample 15% of the words in a batch, but each word will have a different probability of being masked out (i.e. the rare words will have a higher probability to be masked out than the frequent ones).

stephanpeitz · 2019-02-13T04:46:21Z

Yes, I was referring to

we also subsample the frequent outputs using an approach similar to Mikolov et al. (2013b)

Thanks for pointing me to the function.

stephanpeitz · 2019-02-14T00:55:24Z

Sorry, one more question: which value for sample_alpha have you used in your experiments?

glample · 2019-02-14T10:38:11Z

We used sample_alpha = 0.5

stephanpeitz closed this as completed Feb 13, 2019

stephanpeitz reopened this Feb 14, 2019

glample closed this as completed Feb 20, 2019

donglixp mentioned this issue Jun 15, 2019

subsample the frequent outputs using an approach similar to Mikolov et al. (2013b) #104

Closed

JianLiu91 mentioned this issue Oct 24, 2019

Multi-GPU training get stuck after one mini-batch #223

Open

JxuHenry mentioned this issue Oct 28, 2019

I train UNMT with multi-GPU got the following errors! #224

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subsampling frequent outputs #3

Subsampling frequent outputs #3

stephanpeitz commented Feb 13, 2019

glample commented Feb 13, 2019

stephanpeitz commented Feb 13, 2019

stephanpeitz commented Feb 14, 2019

glample commented Feb 14, 2019

Subsampling frequent outputs #3

Subsampling frequent outputs #3

Comments

stephanpeitz commented Feb 13, 2019

glample commented Feb 13, 2019

stephanpeitz commented Feb 13, 2019

stephanpeitz commented Feb 14, 2019

glample commented Feb 14, 2019