New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rule matcher example doesn't match #3862
Comments
I can reproduce your issue with the latest spaCy commit in some way. With ValueError: [E103] Trying to set conflicting doc.ents: '(5, 6, 'ORG')' and '(5, 9, 'EVENT')'. A token can only be part of one entity, so make sure the entities you're setting don't overlap. That confirms your assumption that overlapping entities are not allowed just as the docs you linked:
I can get the import spacy
from spacy.matcher import Matcher
from spacy.tokens import Span
nlp = spacy.blank("en")
matcher = Matcher(nlp.vocab)
def add_event_ent(matcher, doc, i, matches):
# Get the current match and create tuple of entity label, start and end.
# Append entity to the doc's entity. (Don't overwrite doc.ents!)
match_id, start, end = matches[i]
entity = Span(doc, start, end, label="EVENT")
doc.ents += (entity,)
print(entity.text) # prints Google I/O
pattern = [{"ORTH": "Google"}, {"ORTH": "I"}, {"ORTH": "/"}, {"ORTH": "O"}]
matcher.add("GoogleIO", add_event_ent, pattern)
doc = nlp(u"This is a text about Google I/O")
matches = matcher(doc) So the whole issue seems to be related to #3052 with inaccurate models/predictions. |
Thanks, that's a good point! Previous models maybe didn't predict "Google" here, so the problem only started coming up in v2.1.x. I think we should rewrite the example to use Btw, on a related note: For v2.1.4, I added a |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Hi,
I noticed that an example for the
Matcher
doesn't seem to produce what the documentation suggests it should. Specifically, the following code block is expected to invoke theadd_event_ent
callback once theGoogle I/O
pattern is matched (I've added two inline print statements for clarity):However, it seems like (at least, on
en_core_web_sm
anden_core_web_md
models) the input text is tokenized such that the trailing period is included with theO
, which is outside of the rule's pattern.Assuming it was a typo, I tried removing the period and spaCy raised a Cython error on
Span
initialization:I think it may just be that the
EntityRecognizer
picks upGoogle I/O.
as aPRODUCT
entity type, and the Matcher doesn't allow for overlapping entity types. In a fresh environment:I tried looking around previous issues and #3287 seems to be the closest issue, to which this commit aimed to resolve. Sorry if this has already been resolved!
Which page or section is this issue related to?
https://spacy.io/usage/rule-based-matching#on_match
The text was updated successfully, but these errors were encountered: