In [1]:
import sys
sys.path.insert(0, "../..")

In [2]:
import spacy
import medspacy

from medspacy.context import ConTextComponent, ConTextRule

# Clinical Text and Contextual Analysis

Clinical text often contains mentions of concepts which the patient did not actually experience. For example:

- "There is *no evidence of* **pneumonia**"
- "*Mother* with **breast cancer**"
- "Patient presents for *r/o* **COVID-19**"

In all of these instances, we need to use the contextual clues around the entity to assert attributes like negation, experiencer, and uncertainty.

One method for this is the [ConText algorithm](https://www.sciencedirect.com/science/article/pii/S1532046409000744). ConText links target entities like problems with semantic modifiers like those shown above. The medSpaCy implementation of ConText is found in `medspacy.context`.

This notebook will show a quick example of how to use ConText in medspaCy. Subsequent notebooks will provide additional details and explain more complex functionality..

# Using ConText
You can instantiate ConText in two ways:
- With the `medspacy.load()` function, which loads a full pipeline
- Directly instantiating `ConTextComponent` and adding it to an existing pipeline

### Option 1

In [3]:
nlp = medspacy.load(enable=["sentencizer", "context"])

In [4]:
nlp.pipe_names

['medspacy_context']

In [5]:
context = nlp.get_pipe("medspacy_context")

In [6]:
context

<medspacy.context.context_component.ConTextComponent at 0x7fa0f73a0c70>

### Option 2

In [7]:
nlp = spacy.load("en_core_web_sm", disable=["ner"])

In [8]:
context = ConTextComponent(nlp)

In [9]:
nlp.add_pipe("medspacy_context")

<medspacy.context.context_component.ConTextComponent at 0x7fa0fa9db100>

In [10]:
nlp.pipe_names

['tok2vec',
 'tagger',
 'parser',
 'attribute_ruler',
 'lemmatizer',
 'medspacy_context']

## Processing a doc with ConText
Let's start with a blank pipeline and show how this processing sequence works.

In [11]:
nlp = spacy.load("en_core_web_sm", disable=["ner"])

In [12]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer']

In [13]:
text = "There is no evidence of pneumonia."

In [14]:
doc = nlp(text)

Target concepts should be stored as document spans in the attribute `Doc.ents`. This will usually be extracted using a trained NER model or rule-based component. For now, we'll manually define this span.

In [15]:
from spacy.tokens import Span

In [16]:
doc.ents = (Span(doc, 5, 6, label="EVIDENCE_OF_PNEUMONIA"),)

In [17]:
doc.ents

(pneumonia,)

# Instantiating ConText
We use context through the `ConTextComponent` object. ConTextComponent offers both out-of-the-box default functionality, as well as ways to customize and curate the algorithm. We'll start by using all of the default rules, which can be loaded by passing **"default"** to the `rules` argument:

In [18]:
help(ConTextComponent)

Help on class ConTextComponent in module medspacy.context.context_component:

class ConTextComponent(builtins.object)
 |  ConTextComponent(nlp, name='medspacy_context', add_attrs=True, phrase_matcher_attr='LOWER', rules='default', rule_list=None, allowed_types=None, excluded_types=None, use_context_window=False, max_scope=None, max_targets=None, terminations=None, prune=True, remove_overlapping_modifiers=False)
 |  
 |  The ConTextComponent for spaCy processing.
 |  
 |  Methods defined here:
 |  
 |  __call__(self, doc)
 |      Applies the ConText algorithm to a Doc.
 |      
 |      Args:
 |          doc: a spaCy Doc
 |      
 |      Returns:
 |          doc: a spaCy Doc
 |  
 |  __init__(self, nlp, name='medspacy_context', add_attrs=True, phrase_matcher_attr='LOWER', rules='default', rule_list=None, allowed_types=None, excluded_types=None, use_context_window=False, max_scope=None, max_targets=None, terminations=None, prune=True, remove_overlapping_modifiers=False)
 |      Create a n

In [19]:
context = ConTextComponent(nlp, rules="default")

# Applying ConText
Once we've added the ConTextItems, we call the ConTextComponent object directly on a Doc. Usually this will be done under the hood when you call `nlp(text)` or `nlp.pipe(texts)`.

In [20]:
context(doc)

There is no evidence of pneumonia.

This adds the following attributes:
- `Doc._.context_graph`: An object containing the targets, modifiers, and relationships between them
- `Span._.modifiers`: A tuple added to each span which will contain the modifiers which modify each target entity
- Additional ConText attributes (optional)

## ConTextGraph
This object contains the main findings of the ConText algorithm. It handles applying the modifiers to the sentences, defining their scopes, and identifying target concepts which they modify.

In [21]:
doc._.context_graph

<ConTextGraph> with 1 targets and 1 modifiers

### Modifiers
The `modifiers` attribute is a list of `ConTextModifier` objects, which are the result of a ConTextRule matching a span of text in `doc`. In this example, "no evidence of" and has a category of "DEFINITE_NEGATED_EXISTENCE", as defined by `context_item`.

In [22]:
doc._.context_graph.modifiers

[<ConTextModifier> [no evidence of, NEGATED_EXISTENCE]]

The `scope` object contains the span of text which is modified by the ConTextModifier:

In [23]:
modifier = doc._.context_graph.modifiers[0]
modifier.scope

pneumonia.

### Targets
The `targets` attribute contains the list of entities in `doc.ents`:

In [24]:
doc._.context_graph.targets

(pneumonia,)

### Edges
This is the primary role of the ConText algorithm. Once modifiers and targets have been identified, any targets within the scope of a modifier are said to be **modified by** that modifier. In this example, this gives us the contextual semantic information that this entity is negated.

In [25]:
for target, modifier in doc._.context_graph.edges:
    print("[{0}] is modified by [{1}]".format(target, modifier))

[pneumonia] is modified by [<ConTextModifier> [no evidence of, NEGATED_EXISTENCE]]


## Span._.modifiers
These relationships are also stored as a list in the `target._.modifiers` attribute. This allows us to identify all modifiers for a target entity:

In [26]:
for ent in doc.ents:
    print("{0} is modified by [{1}]".format(ent, ent._.modifiers))

pneumonia is modified by [(<ConTextModifier> [no evidence of, NEGATED_EXISTENCE],)]


# Additional Span attributes
In addition to storing the results in the ConTextGraph, ConText also sets several additional span-level attributes which contain the contextual information for that target.

- `is_negated`: True if a target is modified by 'NEGATED_EXISTENCE', default False
- `is_uncertain`: True if a target is modified by 'POSSIBLE_EXISTENCE', default False
- `is_historical`: True if a target is modified by 'HISTORICAL', default False
- `is_hypothetical`: True if a target is modified by 'HYPOTHETICAL', default False
- `is_family`: True if a target is modified by 'FAMILY', default False

In [27]:
from spacy.tokens.span import Span

In [28]:
for ent in doc.ents:
    print(ent)
    print("is_negated: ", ent._.is_negated)
    print("is_uncertain: ", ent._.is_uncertain)
    print("is_historical: ", ent._.is_historical)
    print("is_hypothetical: ", ent._.is_hypothetical)
    print("is_family: ", ent._.is_family)
    

pneumonia
is_negated:  True
is_uncertain:  False
is_historical:  False
is_hypothetical:  False
is_family:  False


These attributes can be left out by setting `set_attrs` to `False` when initializing the `ConTextComponent`.

# Visualization
When building or explaining a clinical NLP system, it can be especially helpful to view visual representations of the entities and modifiers. We can use [spaCy's displacy](https://spacy.io/usage/visualizers) to display this information.

[Medspacy](https://github.com/medspacy/medspacy) has a wrapper for displacy in the `visualization` module. The `visualize_ent` function displays targets and modifiers in a document in an NER-style form, highlighting the clinical entities and modifiers in a Doc:

In [29]:
from medspacy.visualization import visualize_dep, visualize_ent

In [30]:
visualize_ent(doc)

The `visualize_dep` function uses a dependency-parse style graphic to show the relationships between targets and modifiers:

In [31]:
visualize_dep(doc)

# Defining Modifier Rules
In this sentence, **"pneumonia"** is negated. This negation is indicated by the contextual information. We can extract this by identifying the semantic modifier and relating it to the clinical entity.

In medspaCy, we define modifiers in the `ConTextRule` class. We'll explain the ConTextRule class in more detail in another notebook. For now, we'll define this simple item:

In [32]:
context_rule = ConTextRule("no evidence of", "NEGATED_EXISTENCE", direction="FORWARD")

We then add this ItemData to the context object in a list.

In [33]:
context.add([context_rule])

# Next Steps
In the next notebook, we'll see how to add ConText to a spaCy pipeline to process multiple documents with different targets and modifiers.