Are the EN embeddings the same as the fastText EN embeddings normalized? #26

ronlut · 2018-02-18T12:41:01Z

I was wondering whether the English multilingual word embeddings are just a normalized subset (200K) of the fastText English embeddings?
I can see that the vectors are different between the two, would be happy to know if the reason is normalization or something else :)
Thanks!

glample · 2018-02-18T13:14:40Z

Yes, the English embeddings will be the same as the normalized fastText embeddings. We realized later that the normalization can be a bit detrimental so we will probably remove it soon. If you comment out these 2 lines:
https://github.com/facebookresearch/MUSE/blob/master/src/trainer.py#L249-L250
then the source embeddings should not be modified.

ronlut · 2018-02-19T10:11:21Z

All clear, thank you :)

ronlut closed this as completed Feb 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are the EN embeddings the same as the fastText EN embeddings normalized? #26

Are the EN embeddings the same as the fastText EN embeddings normalized? #26

ronlut commented Feb 18, 2018

glample commented Feb 18, 2018

ronlut commented Feb 19, 2018

Are the EN embeddings the same as the fastText EN embeddings normalized? #26

Are the EN embeddings the same as the fastText EN embeddings normalized? #26

Comments

ronlut commented Feb 18, 2018

glample commented Feb 18, 2018

ronlut commented Feb 19, 2018