Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

How can I get the words embeddings? #17

Closed
stygian2a opened this issue Feb 25, 2019 · 11 comments
Closed

How can I get the words embeddings? #17

stygian2a opened this issue Feb 25, 2019 · 11 comments

Comments

@stygian2a
Copy link

stygian2a commented Feb 25, 2019

Hello!
Thank you for sharing this code!

Is there an easy way to get the embedding of a particular word?
Those found in table 5. of the paper.
Thank you!

@glample
Copy link
Contributor

glample commented Feb 25, 2019

Hi!

Yes, I would suggest looking at the code in the notebook:
https://github.com/facebookresearch/XLM/blob/master/generate-embeddings.ipynb

Then doing something like that should work:

word_id = dico.index('cat')
model.embeddings.weight[word_id]

@stygian2a
Copy link
Author

Thank you! How can I differentiate words from different languages (ie 'chat' in french means cat)?

@glample
Copy link
Contributor

glample commented Feb 26, 2019

You can just replace "cat" by "chat" in the code above. There is only one share vocabulary, that contains the words for all languages. The vocabulary doesn't keep track of which word is used in which language.

@stygian2a
Copy link
Author

Got it, thx for everything!

@vvssttkk
Copy link

vvssttkk commented Aug 21, 2019

i want get models for russian language, the mlm_xnli15_1024.pth will do?

@glample
Copy link
Contributor

glample commented Aug 21, 2019

Yes, it contains Russian. But these two models will give you a better performance, and also support Russian:

https://dl.fbaipublicfiles.com/XLM/mlm_17_1280.pth
https://dl.fbaipublicfiles.com/XLM/mlm_100_1280.pth

@vvssttkk
Copy link

vvssttkk commented Aug 21, 2019

plus, for the bpe i should use this tokenization for mlm_17_1280.pth?
image

@vvssttkk
Copy link

so, when i run to_bpe(sentences) i get next error
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/sentences.bpe'

tmp folder i should create my hands? and where i should take sentences.bpe?

sorry, but your notebook have many errors at path to another users

@vvssttkk
Copy link

also, what does it mean?
/private/home/aconneau/projects/XLM/data/wiki/17/175k/ i think this path are not available

@vvssttkk
Copy link

vvssttkk commented Aug 21, 2019

plus, at folder tools i didn't give fastBPE
i should install from here. Is it's true?

@vvssttkk
Copy link

vvssttkk commented Aug 23, 2019

solved errors and create new pr describing the steps

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants