Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oov word prob is zero #536

Closed
rajhans opened this issue Oct 19, 2016 · 9 comments
Closed

oov word prob is zero #536

rajhans opened this issue Oct 19, 2016 · 9 comments
Labels
bug Bugs and behaviour differing from documentation

Comments

@rajhans
Copy link

rajhans commented Oct 19, 2016

Hi,

I just installed 1.0.1 on MacOSx. I find that the model is assigning zero probability to oov words:
import spacy
nlp=spacy.load('en')
x=nlp(u'this is an oovword')
[(t, t.is_oov, t.prob) for t in x]

[(this, False, -5.36181640625), (is, False, -4.457748889923096), (an, False, -6.014852046966553), (oovword, True, 0.0)]

More context: I had the same experience as the issue referenced here #535, and so I did these sequence of steps:
uninstall 0.100.0, install 1.0.1, download data, uninstall 1.0.1, install 1.0.1, download data

@honnibal
Copy link
Member

Can you do

ls `python -c "import spacy; print(spacy.get_data_path())"`

And tell me what you see?

@rajhans
Copy link
Author

rajhans commented Oct 19, 2016

ls python -c "import spacy; print(spacy.get_data_path())"

Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'get_data_path'

@honnibal
Copy link
Member

Ah.

ls `python -c "import spacy; print(spacy.util.get_data_path())"`

@rajhans
Copy link
Author

rajhans commented Oct 19, 2016

Here it is:

cache cookies.txt en-1.1.0 en_glove_cc_300_1m_vectors-1.0.0

@rajhans
Copy link
Author

rajhans commented Oct 19, 2016

Also
ls -R python -c "import spacy; print(spacy.util.get_data_path())""

gives the following in en-1.1.0/vocab directory:

gazetteer.json lexemes.bin serializer.json tag_map.json
lemma_rules.json oov_prob strings.json vec.bin

@honnibal
Copy link
Member

Sorry I was in a hurry and didn't read your issue properly. This is obviously a bug — sec.

@honnibal honnibal added the bug Bugs and behaviour differing from documentation label Oct 19, 2016
@honnibal
Copy link
Member

Published v1.0.3 on PyPi. Should be fixed. Thanks again!

@rajhans
Copy link
Author

rajhans commented Oct 20, 2016

Fantastic! Thanks Matthew.

@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

No branches or pull requests

2 participants