# Overview


In this notebook, we'll look at a few examples of how cycontext can be used to extract information from clinical text.

In [1]:
import spacy
from spacy.pipeline import EntityRuler

from medspacy.visualization import visualize_dep, visualize_ent

from cycontext import ConTextItem, ConTextComponent

# 1. Classifying documents as positive or negative for pneumonia
In this example, we'll use cycontext to assert whether mentions of pneumonia are experienced are not. We'll then infer whether an entire document is positive or negative for pneumonia.

In [2]:
nlp = spacy.load("en_core_web_sm", disable="ner")

In [3]:
texts = ['interval opacification within the left lower lobe consistent with consolidation.',
         "No radiographic evidence of pneumonia.",
         "Reason: evaluate for CHF, infiltrate. IMPRESSION:  Left lower lobe pneumonia.",
         "Possible consolidation.",
        
]

Let's first define patterns to match all of the phrases related to pneumonia. We'll use the `EntityRuler` class to extract these as entities in the pipeline.

In [4]:
targets = [
    {'label': 'EVIDENCE_OF_PNEUMONIA',
      'pattern': [{'LOWER': {'REGEX': 'pneumonias?'}}]},
    {'label': 'EVIDENCE_OF_PNEUMONIA', 
     'pattern': [{'LOWER': {'REGEX': 'pna'}}]},
    {'label': 'EVIDENCE_OF_PNEUMONIA',
     'pattern': "consolidation",},
    {'label': 'EVIDENCE_OF_PNEUMONIA',
     'pattern': [{'LOWER': {'REGEX': 'infiltrat(e|es|ion)'}}]},
#     {'label': 'EVIDENCE_OF_PNEUMONIA',
#     'pattern': [{"POS": {"IN": ["ADJ", "NOUN"]}, "OP": "*"}, {'LOWER': {'REGEX': 'opacit(y|ies)'}}]},
          ]

In [5]:
ruler = EntityRuler(nlp, overwrite_ents=True)
ruler.add_patterns(targets)
nlp.add_pipe(ruler)

Now we'll define our modifiers using ConTextItem. We'll define items two different categories: **"DEFINITE_NEGATED_EXISTENCE"** for when pneumonia is explicitly negated, and **"INDICATION"** for when pneumonia is being checked for in an exam.

In [6]:
# item_data = [
#     ConTextItem(literal='indication', category='INDICATION', pattern=None, rule='BIDIRECTIONAL'),
#     ConTextItem(literal='no evidence of', category='DEFINITE_NEGATED_EXISTENCE', 
#                 pattern=[{'LOWER': {'IN': ['no', 'without']}}, {'LOWER': {'IN': ['definite', 'other', 'definitive', 'secondary', 'indirect']}, 'OP': '?'}, {'LOWER': {'IN': ['radiographic', 'sonographic', 'ct']}, 'OP': '?'}, 
#                          {'LOWER': 'evidence'}, {'LOWER': {'IN': ['of', 'for']}}], rule='FORWARD'),
#     ConTextItem(literal='reason', category='INDICATION', pattern=None, rule='FORWARD'),
#     ConTextItem(literal='eval for', category='INDICATION', pattern=None, rule='FORWARD'),
# ]

Now we'll instantiate ConText, add it to the pipeline, and process the texts. For each entity in a Doc, we can check whether it's negated or family history.

In [7]:
context = ConTextComponent(nlp, rules="default")
# context.add(item_data)
nlp.add_pipe(context)

In [8]:
docs = list(nlp.pipe(texts))

In [9]:
visualize_ent(docs[0])

In [10]:
visualize_ent(docs[1])

In [11]:
visualize_ent(docs[2])

In [12]:
visualize_ent(docs[3])

In [13]:
for doc in docs:
    print(doc)
    print("ent", "is_negated", "is_uncertain", sep="\t")
    for ent in doc.ents:
        print(ent, ent._.is_negated, ent._.is_uncertain, sep="\t")
    print()

interval opacification within the left lower lobe consistent with consolidation.
ent	is_negated	is_uncertain
consolidation	False	False

No radiographic evidence of pneumonia.
ent	is_negated	is_uncertain
pneumonia	True	False

Reason: evaluate for CHF, infiltrate. IMPRESSION:  Left lower lobe pneumonia.
ent	is_negated	is_uncertain
infiltrate	True	False
pneumonia	False	False

Possible consolidation.
ent	is_negated	is_uncertain
consolidation	False	True



We have now asserted whether mentions of pneumonia are positive are negative. However, we are often interested in classifying pneumonia at a **document level**, not a span level. For example, although there are two mentions of pneumonia below and one is not experienced, the document overall is.

To do this, we'll apply some document inferencing logic. We'll then write a simple function which returns True if there is *at least one* mention of pneumonia which is definitively experienced by the patient. We'll also write a helper function to exclude mentions which are negated, uncertain, etc. 

We'll register a new extension, `Doc._.pneumonia_positive` and use this function as the getter.

This logic could also be put into a component as part of a processing pipeline.

In [14]:
from spacy.tokens import Doc

In [15]:
from cycontext.context_component import DEFAULT_ATTRS
DEFAULT_ATTRS

{'NEGATED_EXISTENCE': {'is_negated': True},
 'POSSIBLE_EXISTENCE': {'is_uncertain': True},
 'HISTORICAL': {'is_historical': True},
 'HYPOTHETICAL': {'is_hypothetical': True},
 'FAMILY': {'is_family': True}}

In [16]:
def get_pneumonia_positive(doc):
    """Return True if a doc contains at least one mention of 
    pneumonia which is not negated, uncertain, historical, 
    hypothetical, or experienced by family.
    """
    for ent in doc.ents:
        if ent.label_ != "EVIDENCE_OF_PNEUMONIA":
            continue
        if include_ent(ent):
#             print(ent)
            return True
    return False

def include_ent(ent):
    if ent._.is_negated:
        return False
    if ent._.is_uncertain:
        return False
    if ent._.is_historical:
        return False
    if ent._.is_hypothetical:
        return False
    if ent._.is_family:
        return False
    return True

In [17]:
docs

[interval opacification within the left lower lobe consistent with consolidation.,
 No radiographic evidence of pneumonia.,
 Reason: evaluate for CHF, infiltrate. IMPRESSION:  Left lower lobe pneumonia.,
 Possible consolidation.]

In [18]:
Doc.set_extension("pneumonia_positive", getter=get_pneumonia_positive, force=True)

In [19]:
pos_docs = [doc for doc in docs if doc._.pneumonia_positive is True]
neg_docs = [doc for doc in docs if doc._.pneumonia_positive is False]

In [20]:
pos_docs

[interval opacification within the left lower lobe consistent with consolidation.,
 Reason: evaluate for CHF, infiltrate. IMPRESSION:  Left lower lobe pneumonia.]

In [21]:
neg_docs

[No radiographic evidence of pneumonia., Possible consolidation.]

# 2. Extracting anatomical sites of surgical site infections
So far, we've been using cycontext to **assert** whether clinical conditions are actually present by checking for negation, indication, and family history. However, the ConText algorithm can be used to find other relationships between concepts in text. 

In this example, we'll show how cycontext can be used to find the anatomical sites of surgical site infections, as done in [this paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5977582/). After extracting the target entities, we'll identify anatomical sites as modifiers and connect them to our targets.

In [22]:
nlp = spacy.load("en_core_web_sm")
_ = nlp.remove_pipe("ner")

We'll start by extracting targets using the EntityRuler. The evidence of infections in these text are **"abscess"**, "**hematomas"**, and **"collection of fluid"**.

In [23]:
texts = ["There is abscess in the abdomen.",
        "There is a collection of fluid in the jejunum.",
        "hematomas are seen around the right lower quadrant"]

In [24]:
targets = [
    {"label": "EVIDENCE_OF_SSI",
     "pattern": "abscess"
    },
    
    {"label": "EVIDENCE_OF_SSI",
     "pattern": "hematomas"
    },
    
    {"label": "EVIDENCE_OF_SSI",
     "pattern": "collection of fluid"
    },
]

In [25]:
ruler = EntityRuler(nlp, overwrite_ents=True)
ruler.add_patterns(targets)
nlp.add_pipe(ruler)

Now we'll instantiate context. This time, we'll set `add_attrs` to False because we aren't interested in looking for negation or temporality.

We'll now define ConTextItems which match the anatomical sites in the text. When matching, these anatomical sites will be treated the same way as the negation and indication modifiers above. If there are targets within the scope of the matched TagObjects, an edge will be created between them.

In [26]:
context = ConTextComponent(nlp, add_attrs=False)

In [27]:
item_data = [
    ConTextItem(literal='abdomen', category='ANATOMICAL_SITE', rule='BIDIRECTIONAL'),
    ConTextItem(literal='jejunum', category='ANATOMICAL_SITE', rule='BIDIRECTIONAL'),
    ConTextItem(literal='right lower quadrant', category='ANATOMICAL_SITE', rule='BIDIRECTIONAL')
]

In [28]:
context.add(item_data)

nlp.add_pipe(context)

In [29]:
docs = list(nlp.pipe(texts))

In [30]:
visualize_ent(docs[0], colors={"ANATOMICAL_SITE": "#c3fca4",
                                  "EVIDENCE_OF_SSI": "orange"})

In [31]:
visualize_dep(docs[0])

Now that we have our edges, we'll go a step further and add a new attribute to the targets called `anatomical_site`. If a target has a modifier of **"ANATOMICAL_SITE"**, we'll set this new attribute to be the text of the matched span 

In [32]:
from spacy.tokens import Span

In [33]:
Span.set_extension("anatomical_site", default=None, force=True)

In [34]:
for doc in docs:
    for ent in doc.ents:
        for mod in ent._.modifiers:
            if mod.category == 'ANATOMICAL_SITE':
                ent._.anatomical_site = mod.span.text
        print("{0} --> {1}".format(ent, ent._.anatomical_site))

abscess --> abdomen
collection of fluid --> jejunum
hematomas --> right lower quadrant


# 3. Family History of Breast Cancer
Another task for cycontext might be identifying patients who have a family history of breast cancer. In this case, we want to first extract mentions of "breast cancer" - these will be our target concepts. We then need to define **"FAMILY_HISTORY"** modifiers, as well as any other semantic modifiers such as negation.

In [35]:
nlp = spacy.load("en_core_web_sm", disable="ner")
# _ = nlp.remove_pipe("ner")

Here are our example texts. The first two are both positive for family history of breast cancer. The final two are negative: In one, the patient themself experiences breast cancer; in the last one, family history of cancer is explicitly negated.

In [36]:
texts = ["She has a family history of breast cancer.",
        "The pt's mother passed away of breast cancer several years ago.",
         "The patient was diagnosed with breast cancer in 2012.",
        "No family history of breast ca.",
]

We'll define one simple rule to match "breast cancer" and "breast ca" in our texts. We'll add these to an EntityRuler and add that to the pipeline.

In [37]:
targets = [
    {"label": "BREAST_CANCER",
     "pattern": [{"LOWER": "breast"}, 
                 {"LOWER": {"IN": ["ca", "cancer"]}}
                ]
    },
]

In [38]:
ruler = EntityRuler(nlp, overwrite_ents=True)
ruler.add_patterns(targets)
nlp.add_pipe(ruler)

We then load context with the default rules and process our documents. To infer at a document level, we'll check if any entity in the document has a label of **"BREAST_CANCER"** and `is_family` is **True** but `is_negated` is **False**.

In [39]:
context = ConTextComponent(nlp, rules="default")

In [40]:
nlp.add_pipe(context)

In [41]:
def get_family_history_breast_ca(doc):
    for ent in doc.ents:
        if ent.label_ != "BREAST_CANCER":
            continue
        # Check if it was family history and if it was not negated
        if ent._.is_family and not ent._.is_negated:
            return True
    return False

In [42]:
Doc.set_extension("family_history_breast_ca", getter=get_family_history_breast_ca, force=True)

In [43]:
docs = list(nlp.pipe(texts))

In [44]:
for doc in docs:
    print(doc)
    print(doc._.family_history_breast_ca)
    print()

She has a family history of breast cancer.
True

The pt's mother passed away of breast cancer several years ago.
True

The patient was diagnosed with breast cancer in 2012.
False

No family history of breast ca.
False



In [45]:
visualize_dep(docs[3])