Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I install the spacy.en model except the method provided? #6

Closed
bityangke opened this issue Sep 27, 2016 · 7 comments
Closed

Comments

@bityangke
Copy link

When I install the spacy.en model using "sputnik --name spacy install en" it was very slow, and failed, so I can not install model until now.
Is there other ways I can do the same job?

@iamaaditya
Copy link
Owner

iamaaditya commented Sep 28, 2016

@bityangke You can do the same job using NLTK's word2vec see this. As long as there is a function word_embeddings which takes word and gives its embeddings value, this will work.

Maybe @honnibal might be able to help you with installation of spacy.en

@honnibal
Copy link

Maybe try again? Yesterday we moved hosts, so it's possible the DNS propagation interfered with your transfer.

Btw, I think that NLTK word2vec tutorial describes training the word vectors, not using them?

@iamaaditya
Copy link
Owner

iamaaditya commented Sep 28, 2016

You are right, that tutorial describes training. To use pre-trained word vectors, all you need are following two lines --

import gensim

model = gensim.models.Word2Vec.load_word2vec_format('./model/GoogleNews-vectors-negative300.bin', binary=True)
(there are multiple sources to download this pre-trained word vector)

Like @honnibal suggested, please try again. Spacy is really good library.

@bityangke
Copy link
Author

Thanks @iamaaditya and @honnibal very much, I have solve this problem.
But when I run:
word_embeddings = spacy.load('en', vectors='en_glove_cc_300_1m_vectors')
I encountered:
RuntimeError: Language not supported: en_glove_cc_300_1m_vectors.
I am now working on this problem.

@honnibal
Copy link

Aaah, sadness! Sorry about this. This was a regression introduced in 0.101.0. It's fixed in 1.0 (out next week!)

The workaround is to add the following line when you import spaCy:

spacy.set_lang_class('en_glove_cc_300_1m_vectors', None)

@bityangke
Copy link
Author

@honnibal Thanks very much!
It works!
At the beginning, I tried like the loading of spacy.en:
spacy.set_lang_class(en.en_glove_cc_300_1m_vectors.lang, en.en_glove_cc_300_1m_vectors)
Haha!

@bityangke
Copy link
Author

@iamaaditya Thank you very much for your wonderful work!
I have tried different pics and questions, the results are perfect!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants