Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different POS tags for same sentence repeated in paragraph #954

Closed
garyspatterson opened this issue Apr 4, 2017 · 4 comments
Closed

Different POS tags for same sentence repeated in paragraph #954

garyspatterson opened this issue Apr 4, 2017 · 4 comments
Labels
lang / en English language data and models models Issues related to the statistical models

Comments

@garyspatterson
Copy link

I am seeing odd behavior with regards to fine-grained POS tags for a text with identical repeated sentences: 'The cactus also bears fruit. The cactus also bears fruit.' For the first sentence, the 'cactus' token is tagged as NN, whereas in the second sentence, it is NNS. If you take away the 'also' in the second sentence, the tag is correctly 'NN'. I had assumed that POS tagging was done at the sentence level of analysis, so I'm curious why this is happening. Thanks!

for t in sent:
print t, t.tag_, t.dep_

The DT det
cactus NN nsubj
also RB advmod
bears VBZ ROOT
fruit NN dobj
. . punct
The DT det
cactus NNS nsubj
also RB advmod
bears VBZ ROOT
fruit NN dobj
. . punct

Your Environment

spaCy 1.6
spyder 3.0.2
Error also replicates on displacy.

@honnibal
Copy link
Member

honnibal commented Apr 4, 2017

The POS tagger has always been document level -- it's the parser that decides the sentence boundaries (relying heavily on POS tag features).

This specific example is interesting though. I wouldn't have predicted this, especially across the previous model as well.

@garyspatterson
Copy link
Author

OK cool thanks for the reply.

@ines ines added docs Documentation and website models Issues related to the statistical models lang / en English language data and models and removed docs Documentation and website models Issues related to the statistical models labels May 13, 2017
@ines
Copy link
Member

ines commented May 13, 2017

Closing this and making #1057 the master issue – work in progress for spaCy v2.0!

@ines ines closed this as completed May 13, 2017
@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lang / en English language data and models models Issues related to the statistical models
Projects
None yet
Development

No branches or pull requests

3 participants