Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out-of-vocabulary items tagged as personal pronouns #753

Closed
adam-ra opened this issue Jan 18, 2017 · 4 comments
Closed

Out-of-vocabulary items tagged as personal pronouns #753

adam-ra opened this issue Jan 18, 2017 · 4 comments
Labels
lang / en English language data and models models Issues related to the statistical models

Comments

@adam-ra
Copy link

adam-ra commented Jan 18, 2017

Spacy (1.6.0) has a tendency to tag unknown words as pronouns.
For instance: “feels hot/feverish” is tokenised as two tokens and ‘hot/feverish’ is tagged as PRP.

Personal pronouns are closed-class words and it is very unlikely that any new personal pronoun will get introduced to the language, so it would be an improvement if the statistical model was somehow tamed not to produce these eagerly. Probably the same applies to other closed classes, for instance prepositions.

@honnibal honnibal added the bug Bugs and behaviour differing from documentation label Jan 18, 2017
@honnibal
Copy link
Member

honnibal commented Jan 18, 2017

This really sounds like a bug. The statistical model's behaviour shouldn't be changing.

@adam-ra
Copy link
Author

adam-ra commented Apr 12, 2017

I confirm that prepositions also crop up, e.g., “I feel overexcited” -> ‘overexcited’ tagged as “IN”
Spacy 1.7.2, en_depent_web_md-1.2.1

@ines ines added lang / en English language data and models models Issues related to the statistical models performance and removed bug Bugs and behaviour differing from documentation labels May 13, 2017
@ines
Copy link
Member

ines commented May 13, 2017

Closing this and making #1057 the master issue – work in progress for spaCy v2.0!

@ines ines closed this as completed May 13, 2017
@lock
Copy link

lock bot commented May 8, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 8, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lang / en English language data and models models Issues related to the statistical models
Projects
None yet
Development

No branches or pull requests

3 participants