Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I get this to work? #49

Closed
SallyBean opened this issue Jul 31, 2022 · 4 comments
Closed

How can I get this to work? #49

SallyBean opened this issue Jul 31, 2022 · 4 comments

Comments

@SallyBean
Copy link

I think I'm missing something here and can't seem to resolve it.

The code works with the example texts provided in much of the documentation (e.g. "She does not like Steve Jobs but likes Apple products."), and the term 'cannot' appears in the termset - how can I identify these simple negations? Please note the print is indented in the original code.

Here's my code:

pip install negspacy

import spacy
from negspacy.negation import Negex

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("negex", config={"ent_types":["PERSON","ORG"]})

ts = termset("en")

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("negex", config={"ent_types":["PERSON","ORG"]})

doc = nlp("Men cannot play football.")
for e in doc.ents:
print(e.text,` e._.negex)

@jenojp
Copy link
Owner

jenojp commented Jul 31, 2022

Hi - there could be two issues here:

  1. you're limiting the entity types to PERSON or ORG that you're doing negation on.
  2. Even if you weren't limiting types, there are no entities in the example that show up using "en_core_web_sm".

You can create and entity ruler . See the example below where football is negated by cannot which is a preceding term in the "en" termset. If you want to change termsets, see this example.

import spacy
from negspacy.negation import Negex
from negspacy.termsets import termset

ts = termset("en")

nlp = spacy.load("en_core_web_sm")

ruler = nlp.add_pipe("entity_ruler")
patterns = [{"label": "SPORT", "pattern": "football"},
            {"label": "SPORT", "pattern": [{"LOWER": "ice"}, {"LOWER": "hockey"}]}]
ruler.add_patterns(patterns)

nlp.add_pipe(
    "negex",
    config={
        "neg_termset":ts.get_patterns()
    }
)

doc = nlp("Men cannot play football.")

for e in doc.ents:
    print(e.text, e._.negex)

@SallyBean
Copy link
Author

Thanks so much for your help! This makes sense. Apologies for the delay in getting back to you.

Although, if I replace 'football' with 'hockey' in the doc - nothing is returned - am I missing something else?

Huge apologies, I'm very new to this and learning.

@jenojp
Copy link
Owner

jenojp commented Aug 26, 2022

So for the example code I pasted above, it's looking specifically for 'ice hockey' not just 'hockey'. If you changed the patterns to remove 'ice' as shown below then it would work.

            {"label": "SPORT", "pattern": [{"LOWER": "hockey"}]}]

@jenojp jenojp closed this as completed Aug 30, 2022
@SallyBean
Copy link
Author

Thanks @jenojp, this is really helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants