Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No new annotation when keywords are repeated (Window strategy) #18

Closed
scossin opened this issue Mar 21, 2023 · 0 comments
Closed

No new annotation when keywords are repeated (Window strategy) #18

scossin opened this issue Mar 21, 2023 · 0 comments

Comments

@scossin
Copy link
Owner

scossin commented Mar 21, 2023

from iamsystem import Matcher
matcher = Matcher.build(
    keywords=["cancer"]
)
text = "cancer cancer"
annots = matcher.annot_text(text=text)
for annot in annots:
    print(annot)
# cancer	0 6	cancer

It outputs a single annotation although the word 'cancer' is repeated twice. This behavior was explained in a comment in the code:

# Don't create multiple annotations for the same transition

         Don't create multiple annotations for the same transition. For example 'cancer cancer' with keyword 'cancer': if an annotation was created for the first 'cancer' occurrence, don't create a new one for the second occurrence.

The rationale was to avoid the creation of two annotations for repeated words when the window is large:

from iamsystem import Matcher
matcher = Matcher.build(
    keywords=["cancer de prostate"],
    w=20
)
text = "cancer de prostate token token token token prostate"
annots = matcher.annot_text(text=text)
for annot in annots:
    print(annot)
# cancer de prostate	0 18	cancer de prostate

However, this is not appropriate for all use cases and is not the behavior a user expects; therefore multiple sequences of words that match a keyword should be annotated several times by default.

scossin added a commit that referenced this issue Mar 21, 2023
scossin added a commit that referenced this issue Mar 22, 2023
…s-are-repeated-window-strategy

Fix issue #18 no new annotation when a keyword is repeated (window strategy)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant