Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

latest spaCy tagger and parser returning unexpected results #535

Closed
bdewilde opened this issue Oct 19, 2016 · 7 comments
Closed

latest spaCy tagger and parser returning unexpected results #535

bdewilde opened this issue Oct 19, 2016 · 7 comments
Labels
bug Bugs and behaviour differing from documentation

Comments

@bdewilde
Copy link

After downloading the latest version of spaCy and updating the models, I no longer get reasonable POS tagging or dependency parsing. Here's an example in Python 3.5 on macOS Sierra:

>>> import spacy
>>> en_nlp = spacy.load('en')
>>> en_doc = en_nlp('Hello, world. Here are two sentences.')
>>> [tok.text for tok in en_doc]
['Hello', ',', 'world', '.', 'Here', 'are', 'two', 'sentences', '.']
>>> [tok.pos_ for tok in en_doc]
['PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'PUNCT', 'PUNCT']
>>> [tok.tag_ for tok in en_doc]
['""', '""', '""', '""', '""', '""', '""', '""', '""']
>>> [tok.dep_ for tok in en_doc]
['ROOT', 'ROOT', 'ROOT', 'ROOT', 'ROOT', 'ROOT', 'ROOT', 'ROOT', 'ROOT']

Is this no longer correct code for v1.0.1? If it is, do you have any ideas what's going wrong?

@honnibal
Copy link
Member

Code's correct; I must've broken something --- thanks!

@honnibal honnibal added the bug Bugs and behaviour differing from documentation label Oct 19, 2016
@honnibal
Copy link
Member

Hm, correct on my Linux server in both 2.7 and 3.5. Trying on OSX — but, very curious. I don't yet see what could be wrong.

@honnibal
Copy link
Member

Ah.

One nice change in 1.0.1 is that the tokenizer is bundled by default. Are you sure you're executing a version of spaCy that has the data installed?

There's a bug here though...It's loaded the parser and tagger, even though their models are empty!

@bdewilde
Copy link
Author

bdewilde commented Oct 19, 2016

How strange. :( I did a full un- then re-install of spacy, downloaded the models, and checked that it loads as specified in the docs; that process gave the results originally posted. Repeating this process, however, appears to have done the trick. (See below.) I have no idea what went wrong the first time, since I confirmed in my terminal history that nothing was done differently.

Technology! ¯_(ツ)_/¯

$ pip uninstall spacy
$ pip install spacy
...
Successfully installed spacy-1.0.1
$ python -m spacy.en.download
Downloading parsing model
Downloading...
Downloaded 532.28MB 100.00% 6.22MB/s eta 0s
archive.gz checksum/md5 OK
Model successfully installed.
Downloading GloVe vectors
Downloading...
Downloaded 708.08MB 100.00% 9.44MB/s eta 0s
archive.gz checksum/md5 OK
Model successfully installed.
$ python -c "import spacy; spacy.load('en'); print('OK')"
OK
$ python
Python 3.5.2 (default, Aug 15 2016, 12:43:16)
[GCC 4.2.1 Compatible Apple LLVM 7.3.0 (clang-703.0.31)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import spacy
>>> en_nlp = spacy.load('en')
>>> en_doc = en_nlp('Hello, world. Here are two sentences.')
>>> [tok.text for tok in en_doc]
['Hello', ',', 'world', '.', 'Here', 'are', 'two', 'sentences', '.']
>>> [tok.pos_ for tok in en_doc]
['INTJ', 'PUNCT', 'NOUN', 'PUNCT', 'ADV', 'VERB', 'NUM', 'NOUN', 'PUNCT']
>>> [tok.dep_ for tok in en_doc]
['ROOT', 'punct', 'npadvmod', 'punct', 'advmod', 'ROOT', 'nummod', 'nsubj', 'punct']

Think we can assign blame to the gremlins in my machine, and close the issue.

@honnibal
Copy link
Member

That's creepy. Hope I didn't mess something weird up.

The loading stuff is all new code, so I expect there might be new bugs.

@rajhans
Copy link

rajhans commented Oct 19, 2016

I am facing the exact same issue when using the latest model (macOSx). I did complete uninstall and re-install, and now I find that all pos_ are 'PUNCT'. Definitely weird.

@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Bugs and behaviour differing from documentation
Projects
None yet
Development

No branches or pull requests

3 participants