# Overview


In this notebook, we'll look at a few examples of how cycontext can be used to extract information from clinical text.

In [1]:
import spacy
from spacy.pipeline import EntityRuler

from cycontext import ConTextItem, ConTextComponent

# 1. Classifying documents as positive or negative for pneumonia
In this example, we'll use cycontext to assert whether mentions of pneumonia are experienced are not. We'll then infer whether an entire document is positive or negative for pneumonia.

In [2]:
nlp = spacy.load("en_core_web_sm")
_ = nlp.remove_pipe("ner")

In [3]:
texts = ['interval opacification within the left lower lobe consistent with consolidation.',
         "INDICATION: Pneumonia.",
        "basilar lung opacities with residual opacity",
         "Reason: eval for CHF, infiltrate. IMPRESSION:  Left lower lobe pneumonia.",
        "Worsening consolidation in the left lower lobe.",
        "Bilateral pulmonary opacities",
        "No radiographic evidence of pneumonia."
]

Let's first define patterns to match all of the phrases related to pneumonia. We'll use the `EntityRuler` class to extract these as entities in the pipeline.

In [4]:
targets = [
    {'label': 'EVIDENCE_OF_PNEUMONIA',
      'pattern': [{'LOWER': {'REGEX': 'pneumonias?'}}]},
    {'label': 'EVIDENCE_OF_PNEUMONIA', 
     'pattern': [{'LOWER': {'REGEX': 'pna'}}]},
    {'label': 'EVIDENCE_OF_PNEUMONIA',
     'pattern': 
         [{"POS": {"IN": ["ADJ", "NOUN"]}, "OP": "*"}, 
          {'LOWER': {'REGEX': 'consolidations?'}}]},
    {'label': 'EVIDENCE_OF_PNEUMONIA',
     'pattern': [{'LOWER': {'REGEX': 'infiltrat(e|es|ion)'}}]},
    {'label': 'EVIDENCE_OF_PNEUMONIA',
    'pattern': [{"POS": {"IN": ["ADJ", "NOUN"]}, "OP": "*"}, {'LOWER': {'REGEX': 'opacit(y|ies)'}}]},
          ]

In [5]:
ruler = EntityRuler(nlp, overwrite_ents=True)
ruler.add_patterns(targets)
nlp.add_pipe(ruler)

Now we'll define our modifiers using ConTextItem. We'll define items two different categories: **"DEFINITE_NEGATED_EXISTENCE"** for when pneumonia is explicitly negated, and **"INDICATION"** for when pneumonia is being checked for in an exam.

In [6]:
item_data = [
    ConTextItem(literal='indication', category='INDICATION', pattern=None, rule='BIDIRECTIONAL'),
    ConTextItem(literal='no evidence of', category='DEFINITE_NEGATED_EXISTENCE', 
                pattern=[{'LOWER': {'IN': ['no', 'without']}}, {'LOWER': {'IN': ['definite', 'other', 'definitive', 'secondary', 'indirect']}, 'OP': '?'}, {'LOWER': {'IN': ['radiographic', 'sonographic', 'ct']}, 'OP': '?'}, 
                         {'LOWER': 'evidence'}, {'LOWER': {'IN': ['of', 'for']}}], rule='FORWARD'),
    ConTextItem(literal='reason', category='INDICATION', pattern=None, rule='FORWARD'),
    ConTextItem(literal='eval for', category='INDICATION', pattern=None, rule='FORWARD'),
]

Now we'll instantiate ConText, add it to the pipeline, and process the texts.

In [7]:
context = ConTextComponent(nlp, add_attrs=True)
context.add(item_data)
nlp.add_pipe(context)

In [8]:
docs = list(nlp.pipe(texts))

We have now asserted whether mentions of pneumonia are positive are negative. However, we are often interested in classifying pneumonia at a **document level**, not a span level. For example, although there are two mentions of pneumonia below and one is not experienced, the document overall is.

In [9]:
from cycontext import viz

In [10]:
viz.visualize_ent(docs[3])

To do this, we will applying some document inferencing logic. We'll then write a simple function which returns True if there is *at least one* mention of pneumonia where `is_experienced` is True. We'll register a new extension, `Doc._.pneumonia_positive` and use this function as the getter.

This logic could also be put into a component as part of a processing pipeline.

In [11]:
from spacy.tokens import Doc

In [12]:
def get_pneumonia_positive(doc):
    for ent in doc.ents:
        if ent.label_ != "EVIDENCE_OF_PNEUMONIA":
            continue
        if ent._.is_experienced:
            return True
    return False

In [13]:
Doc.set_extension("pneumonia_positive", getter=get_pneumonia_positive, force=True)

In [14]:
pos_docs = [doc for doc in docs if doc._.pneumonia_positive is True]
neg_docs = [doc for doc in docs if doc._.pneumonia_positive is False]

In [15]:
pos_docs

[interval opacification within the left lower lobe consistent with consolidation.,
 basilar lung opacities with residual opacity,
 Reason: eval for CHF, infiltrate. IMPRESSION:  Left lower lobe pneumonia.,
 Worsening consolidation in the left lower lobe.,
 Bilateral pulmonary opacities]

In [16]:
neg_docs

[INDICATION: Pneumonia., No radiographic evidence of pneumonia.]

# 2. Extracting anatomical sites of surgical site infections
So far, we've been using cycontext to **assert** whether clinical conditions are actually present by checking for negation, indication, and family history. However, the ConText algorithm can be used to find other relationships between concepts in text. In this example, we'll show how cycontext can be used to find the anatomical sites of surgical site infections, as done in [this paper](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5977582/).

In [17]:
nlp = spacy.load("en_core_web_sm")
_ = nlp.remove_pipe("ner")

We'll start by extracting targets using the EntityRuler. The evidence of infections in these text are **"abscess"**, "**hematomas"**, and **"collection of fluid"**.

In [18]:
texts = ["There is abscess in the abdomen.",
        "There is a collection of fluid in the jejunum.",
        "Hematomas are seen around the right lower quadrant"]

In [19]:
targets = [
    {"label": "EVIDENCE_OF_SSI",
     "pattern": [{"LOWER": "abscess"}]
    },
    
    {"label": "EVIDENCE_OF_SSI",
     "pattern": [{"LOWER": "hematomas"}]
    },
    
    {"label": "EVIDENCE_OF_SSI",
     "pattern": [{"LEMMA": "collection"}, {"LOWER": "of"}, {"LOWER": "fluid"}]
    },
]

In [20]:
ruler = EntityRuler(nlp, overwrite_ents=True)
ruler.add_patterns(targets)
nlp.add_pipe(ruler)

Now we'll instantiate context. This time, we'll set `add_attrs` to False because we aren't interested in looking for negation or temporality.

We'll now define ConTextItems which match the anatomical sites in the text. When matching, these anatomical sites will be treated the same way as the negation and indication modifiers above. If there are targets within the scope of the matched TagObjects, an edge will be created between them.

In [21]:
context = ConTextComponent(nlp, add_attrs=False)

In [22]:
item_data = [
    ConTextItem(literal='abdomen', category='ANATOMICAL_SITE', rule='BIDIRECTIONAL'),
    ConTextItem(literal='jejunum', category='ANATOMICAL_SITE', rule='BIDIRECTIONAL'),
    ConTextItem(literal='right lower quadrant', category='ANATOMICAL_SITE', rule='BIDIRECTIONAL')
]

In [23]:
context.add(item_data)

nlp.add_pipe(context)

In [24]:
docs = list(nlp.pipe(texts))

In [25]:
viz.visualize_ent(docs[0], colors={"ANATOMICAL_SITE": "#c3fca4",
                                  "EVIDENCE_OF_SSI": "orange"})

In [26]:
viz.visualize_dep(docs[0])

Now that we have our edges, we'll go a step further and add a new attribute to the targets called `anatomical_site`. If a target has a modifier of **"ANATOMICAL_SITE"**, we'll set this new attribute to be the text of the matched span 

In [27]:
from spacy.tokens import Span

In [28]:
Span.set_extension("anatomical_site", default=None, force=True)

In [29]:
for doc in docs:
    for ent in doc.ents:
        for mod in ent._.modifiers:
            if mod.category == 'ANATOMICAL_SITE':
                ent._.anatomical_site = mod.span.text
        print("{0} --> {1}".format(ent, ent._.anatomical_site))

abscess --> abdomen
collection of fluid --> jejunum
Hematomas --> right lower quadrant


# 3. Family History of Breast Cancer
Another task for cycontext might be identifying patients who have a family history of breast cancer. In this case, we want to first extract mentions of "breast cancer" - these will be our target concepts. We then need to define **"FAMILY_HISTORY"** modifiers, as well as any other semantic modifiers such as negation.

In [30]:
nlp = spacy.load("en_core_web_sm")
_ = nlp.remove_pipe("ner")

Here are our example texts. The first two are both positive for family history of breast cancer. The final two are negative: In one, the patient themself experiences breast cancer; in the last one, family history of cancer is explicitly negated.

In [31]:
texts = ["She has a family history of breast cancer.",
        "The pt's mother passed away of breast cancer several years ago.",
         "The patient was diagnosed with breast cancer in 2012.",
        "No fh breast ca.",
]

We'll define one simple rule to match "breast cancer" and "breast ca" in our texts. We'll add these to an EntityRuler and add that to the pipeline.

In [32]:
targets = [
    {"label": "BREAST_CANCER",
     "pattern": [{"LOWER": "breast"}, 
                 {"LOWER": {"IN": ["ca", "cancer"]}}
                ]
    },
]

In [33]:
ruler = EntityRuler(nlp, overwrite_ents=True)
ruler.add_patterns(targets)
nlp.add_pipe(ruler)

Next, we'll define our ConTextItems. We need to identify the following family history modifiers:
- "family history of"
- "fh"
- "mother"

And the negation modifier "no".

In [34]:
item_data = [
    ConTextItem(literal='family history of', category='FAMILY_HISTORY', rule='FORWARD'),
    ConTextItem(literal='fh', category='FAMILY_HISTORY', rule='FORWARD'),
    ConTextItem(literal='mother', category='FAMILY_HISTORY', rule='FORWARD'),
    ConTextItem(literal='no', category='DEFINITE_NEGATED_EXISTENCE', rule='FORWARD')
]

We'll define new attributes to be set on our target spans. If an entity is modified by "DEFINITE_NEGATED_EXISTENCE", then `is_negated` becomes True. If it is modified by "FAMILY_HISTORY", then `is_family_history` becomes True.

In [35]:
attr_mapping = {"DEFINITE_NEGATED_EXISTENCE": ("is_negated", True),
               "FAMILY_HISTORY": ("is_family_history", True)}

In [36]:
Span.set_extension("is_negated", default=False, force=True)
Span.set_extension("is_family_history", default=False, force=True)

Finally, we'll instantiate our ConTextComponent with this attribute mapping and add our item data. To infer at a document level, we'll check if any entity in the document has a label of **"BREAST_CANCER"** and `is_family_history` is **True** but `is_negated` is **False**.

In [37]:
context = ConTextComponent(nlp, add_attrs=attr_mapping)

In [38]:
context.add(item_data)

nlp.add_pipe(context)

In [39]:
def get_family_history_breast_ca(doc):
    for ent in doc.ents:
        if ent.label_ != "BREAST_CANCER":
            continue
        # Check if it was family history and if it was not negated
        if ent._.is_family_history and not ent._.is_negated:
            return True
    return False

In [40]:
Doc.set_extension("family_history_breast_ca", getter=get_family_history_breast_ca, force=True)

In [41]:
docs = list(nlp.pipe(texts))

In [42]:
for doc in docs:
    print(doc)
    print(doc._.family_history_breast_ca)
    print()

She has a family history of breast cancer.
True

The pt's mother passed away of breast cancer several years ago.
True

The patient was diagnosed with breast cancer in 2012.
False

No fh breast ca.
False



In [43]:
viz.visualize_dep(docs[3])