New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error while opening own trained vectors file #9
Comments
Is there any update on this issue? I am also getting the same error. |
Sorry for the delay, we have focused on other things recently. Drop me a mail (hp@spacy.io) if you need to train custom sense2vec models urgently. |
Thanks for getting back to me. It's not very urgent, I've just been I'm using the pre-processing script (merge_text.py) and training the gensim My understanding though, is that I can still get the same result, i.e. Sorry if these are dumb or obvious questions, I'm still learning at this Mike On Wed, Mar 23, 2016 at 4:42 PM, Henning Peters notifications@github.com
|
Your assumptions are all correct --- Gensim saves the model into its own format, and you can just load that up and make
If none of these features are relevant to you, then using Gensim's Word2Vec class might be better for you. |
What would be the recommended way to create a model that can be loaded by |
@elyase : Correct, that's what you should do. I'm sure I had a script that did exactly that, but it seems to have gone missing when the repository was reorganised. Damn. At the moment it's very hard for us to get this library into a great state for users while we also push spaCy forward. It's still quite hard for other people to pitch in on spaCy, but this library is smaller and a bit more accessible. If you write a little conversion script, we'd appreciate the pull request. |
Added a PR (#11) with the conversion script. |
Assuming it's safe to close this? Reopen if necessary. |
@newterminator I am currently trying to do train_word2vec, and I am running errors. I have not found many threads on this; I am directing my in_dir and out_loc to a folder which has the text file of what was outputted by merge_text. However, I keep getting the error that I have too few arguments. I was wondering if you ever ran into this issue... |
I was able to train data using
train_word2vec.py
after preprocessing the data usingmerge_text.py
.Below is the outcome of
train_word2vec.py
:Then I input the
vectors.bin
to the new version 0.2.0 of sense2vec and I got anIOerror
. The following is what I put to load the vectors:The error:
Also I wanted to ask that how do I get the relevant
freqs.json
andstrings.json
for the trained vectors. For thestrings.json
, I have the batch outputs frommerge_text.py
. So they need to be mapped to the relevant information infreqs.json
. If there is already a function that does it and I missed calling it, please let me know.Python version: 2.7.11
Spacy version: 0.100.5
The text was updated successfully, but these errors were encountered: