A small spaCy pipeline component for matching within document sentences using regular expressions.
- Free software: MIT license
- Documentation: https://sentency.readthedocs.io.
- spaCy component for sentence-by-sentence pattern matching
- Find matches with complex patterns using the power of regular expressions
- Easily convert simple keywords into valid regular expressions
- Specify matching patterns as well as patterns to ignore
- Annotate matches for NER (Named Entity Recognition) tasks
pip install sentency
The following minimally complex example showcases the features of sentenCy.
import spacy
from spacy import displacy
from sentency.regex import regexize_keywords
from sentency.sentency import Sentex
text = """
Screening for abdominal aortic aneurysm.
Impression: There is evidence of a fusiform
abdominal aortic aneurysm measuring 3.4 cm.
"""
aaa_keywords = "abdominal aortic aneurysm"
ignore_keywords = "screening aneurysm"
keyword_regex = regexize_keywords(aaa_keywords)
ignore_regex = regexize_keywords(ignore_keywords)
nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
"sentex", config={
"sentence_regex": keyword_regex,
"ignore_regex": ignore_regex,
"annotate_ents": True,
"label": "AAA"
}
)
doc = nlp(text)
displacy.render(doc, style="ent", options = {"ents": ["AAA"]})
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.