Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Are the EN embeddings the same as the fastText EN embeddings normalized? #26

Closed
ronlut opened this issue Feb 18, 2018 · 2 comments
Closed

Comments

@ronlut
Copy link

ronlut commented Feb 18, 2018

I was wondering whether the English multilingual word embeddings are just a normalized subset (200K) of the fastText English embeddings?
I can see that the vectors are different between the two, would be happy to know if the reason is normalization or something else :)
Thanks!

@glample
Copy link
Contributor

glample commented Feb 18, 2018

Yes, the English embeddings will be the same as the normalized fastText embeddings. We realized later that the normalization can be a bit detrimental so we will probably remove it soon. If you comment out these 2 lines:
https://github.com/facebookresearch/MUSE/blob/master/src/trainer.py#L249-L250
then the source embeddings should not be modified.

@ronlut
Copy link
Author

ronlut commented Feb 19, 2018

All clear, thank you :)

@ronlut ronlut closed this as completed Feb 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants