Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing capitalized acronyms #483

Closed
charlescearl opened this issue Aug 23, 2016 · 1 comment
Closed

Parsing capitalized acronyms #483

charlescearl opened this issue Aug 23, 2016 · 1 comment

Comments

@charlescearl
Copy link

I came across this issue earlier.

NLP = spacy.en.English()
for sent in NLP(u"I like learning about CIA. Don't you?").sents:
    print sent

=> I like learning about CIA. Don't you?

That is CIA. is one token.

But

for sent in NLP(u"I like learning about cia. Don't you?").sents:
    print sent

=>
I like learning about cia.
Don't you?

It seems that a workaround for this, in entity parsing is to allow for the capitalized entities have attached punctuation if at the end of sentence.

for tok in NLP(u"I like learning about CIA. Don't you?"):
    print "{} {}".format(tok.text, tok.ent_type_ if tok.ent_type_ else "Not an entity")

=>

I Not an entity
like Not an entity
learning Not an entity
about Not an entity
CIA. ORG
Do Not an entity
n't Not an entity
you Not an entity
? Not an entity
@lock
Copy link

lock bot commented May 9, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked as resolved and limited conversation to collaborators May 9, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants