Skip to content

g-delong/sentency

Repository files navigation

sentency

PYPI Status

Documentation Status

A small spaCy pipeline component for matching within document sentences using regular expressions.

Features

  • spaCy component for sentence-by-sentence pattern matching
  • Find matches with complex patterns using the power of regular expressions
  • Easily convert simple keywords into valid regular expressions
  • Specify matching patterns as well as patterns to ignore
  • Annotate matches for NER (Named Entity Recognition) tasks

Installation

pip install sentency

Usage

The following minimally complex example showcases the features of sentenCy.

import spacy
from spacy import displacy

from sentency.regex import regexize_keywords
from sentency.sentency import Sentex

text = """
Screening for abdominal aortic aneurysm. 
Impression: There is evidence of a fusiform 
abdominal aortic aneurysm measuring 3.4 cm.
"""
aaa_keywords = "abdominal aortic aneurysm"
ignore_keywords = "screening aneurysm"

keyword_regex = regexize_keywords(aaa_keywords)
ignore_regex = regexize_keywords(ignore_keywords)

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(
"sentex", config={
        "sentence_regex": keyword_regex, 
        "ignore_regex": ignore_regex,
        "annotate_ents": True,
        "label": "AAA"
        }
)

doc = nlp(text)

displacy.render(doc, style="ent", options = {"ents": ["AAA"]})

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published