Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use fastText for out of sample words? #26

Closed
shgidi opened this issue May 6, 2018 · 9 comments
Closed

How to use fastText for out of sample words? #26

shgidi opened this issue May 6, 2018 · 9 comments

Comments

@shgidi
Copy link

shgidi commented May 6, 2018

When downloading fastText with this method, we get a folder with a file in standard word2vec format, which can be loaded with
model = KeyedVectors.load_word2vec_format(path, binary=False)
But not with
from gensim.models import FastText
model = FastText.load_fasttext_format(path, binary=False)

This disables the ability to get vectors for out-of-vocabulary words.
How can this be done correcly?

@menshikh-iv
Copy link
Contributor

@shgidi

Facebook distribute 2 type of files:

  • .vec contains ONLY word-vectors (no ngrams here), can be loaded with KeyedVectors.load_word2vec_format
  • .bin contains ngrams, can be loaded with FastText.load_fasttext_format

next time please ask in mailing list mailing list

@piskvorky
Copy link
Owner

piskvorky commented Jul 30, 2018

@menshikh-iv is this clear from our documentation?

I see people confused about these formats, how to load them and what can be done with them, all the time.

A clear, authoritative docs section would help us with support too (just point with hyperlink).

@menshikh-iv
Copy link
Contributor

menshikh-iv commented Aug 1, 2018

@piskvorky I agree this situation happens sometimes, it worth to make a tutorial.

@piskvorky
Copy link
Owner

piskvorky commented Aug 2, 2018

A tutorial would be ideal, but a simple paragraph in the docs would go a long way. Can you add it?

@scottlittle
Copy link

This is not working for me with gensim 3.5, python 3.6, and a FB model:

from gensim.models import FastText
model_yelp = FastText.load_fasttext_format('yelp_review_full.bin')

I get:
NotImplementedError: Supervised fastText models are not supported

@menshikh-iv
Copy link
Contributor

@scottlittle please read an exception again: we really don't support supervised fasttext models

@scottlittle
Copy link

@romass12
Copy link

What is meant by supervised fasttext models and how to train for unsupervised?

@menshikh-iv
Copy link
Contributor

@romass12

supervised fasttext models

Exactly what supervised learning means. FB implementation have supervised-mode support (gensim - only unsupervised)

how to train for unsupervised

Just read an Gensim FastText documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants