In [1]:
import sys

In [2]:
sys.path.insert(0, "../..")

# Overview

This notebook will continue showing how to define ConText modifiers with some more advanced options.

In [3]:
import medspacy

from medspacy.context import ConTextRule, ConTextComponent

from medspacy.visualization import visualize_dep, visualize_ent

from spacy.tokens import Span

In [4]:
nlp = medspacy.load()

## 1: **"TERMINATE"** rule
As said before, the scope of a modifier is originally set to the entire sentence either before after a ConTextModifier, as defined by the ItemData's `direction` attribute. However, the scope can be modified by **termination points**, which is another ConTextModifier with the rule **"TERMINATE"**. For example, in "There is no evidence of pneumonia but there is CHF", the negation modifier should modify "pneumonia" but not "CHF". This can be achieved by defining a ConTextRule to terminate at the word "but".

In [5]:
text = "There is no evidence of pneumonia but there is CHF"

In [6]:
context_rules = [ConTextRule("no evidence of", "NEGATED_EXISTENCE", "FORWARD")]

In [7]:
context = ConTextComponent(nlp, rules=None)
context.add(context_rules)

In [8]:
doc = nlp(text)

  matches = self.matcher(doc)


In [9]:
doc.ents = (Span(doc, 5, 6, "CONDITION"), Span(doc, 9, 10, "CONDITION"))

In [10]:
doc.ents

(pneumonia, CHF)

In [11]:
context(doc)

  matches = self.matcher(doc)


There is no evidence of pneumonia but there is CHF

Here, you can see that both **"pneumonia"** and **"CHF"** are modified:

In [12]:
visualize_dep(doc)

In [13]:
modifier = doc._.context_graph.modifiers[0]
modifier

<ConTextModifier> [no evidence of, NEGATED_EXISTENCE]

In [14]:
# The scope includes both "pneumonia" and "CHF", so both would be negated
modifier.scope

pneumonia but there is CHF

Now let's add an additional ConTextRule with **"TERMINATE"**:

In [15]:
context_rules2 = [ConTextRule("but", "CONJ", "TERMINATE")]

In [16]:
context.add(context_rules2)

In [17]:
# doc = nlp(text) 
context(doc)

  matches = self.matcher(doc)


There is no evidence of pneumonia but there is CHF

In [18]:
doc._.context_graph.modifiers

[<ConTextModifier> [no evidence of, NEGATED_EXISTENCE],
 <ConTextModifier> [but, CONJ]]

In [19]:
modifier = doc._.context_graph.modifiers[0]

In [20]:
visualize_dep(doc)

In [21]:
# The scope now only encompasses "pneumonia"
modifier.scope

pneumonia

## 2: Pseudo-modifiers
Sometimes, substrings of a phrase will incorrectly cause a modifier to be negated. For example:

---
"There are no findings to explain the weakness"

---

**"No"** is often a negation modifier, but **"no findings to explain"** should not **"weakness"**:

In [22]:
text = "There are no findings to explain the weakness"
doc = nlp(text)

  matches = self.matcher(doc)


In [23]:
doc.ents = (Span(doc, 7, 8, "CONDITION"),)

In [24]:
context = ConTextComponent(nlp, rules="default")

In [25]:
context(doc)

There are no findings to explain the weakness

In [26]:
visualize_dep(doc)

If we add **"no findings to explain"** as a **"PSEUDO"** modifier, it will supercede the substring **"no"** and will not modify the target concept:

In [27]:
pseudo = ConTextRule("no findings to explain", "PSEUDO", direction="PSEUDO")

In [28]:
context.add([pseudo])

In [29]:
context(doc)

There are no findings to explain the weakness

In [30]:
visualize_dep(doc)

## 3: Pruned modifiers
If two ConTextItems result in TagObjects where one is the substring of another, the modifiers will be pruned to keep **only** the larger span. We saw this above with **"no"** and **"no findings to explain"**. 

As another example, **"no history of"** is a negation modifier, while **"history of"** is a historical modifier. Both match the text "no history of afib", but only "no history of" should ultimately modify "afib".

In [31]:
item_data = [ConTextRule("no history of", "DEFINITE_NEGATED_EXISTENCE", "FORWARD"),
            ConTextRule("history", "HISTORICAL", "FORWARD"),
            ]

In [32]:
text = "no history of"

In [33]:
context = ConTextComponent(nlp, rules=None)
context.add(item_data)

In [34]:
doc = nlp(text)
context(doc)

  matches = self.matcher(doc)
  matches = self.matcher(doc)


no history of

In [35]:
# Two overlapping modifiers
doc._.context_graph.modifiers

[<ConTextModifier> [no history of, DEFINITE_NEGATED_EXISTENCE]]

## 4: Manually limiting scope
By default, the scope of a modifier is the **entire sentence** in the direction of the rule up until a termination point (see above). However, sometimes this is too much. In long sentences, this can cause a modifier to extend far beyond its location in the sentence. Some modifiers are really meant to be attached to a single concept, but they are instead distributed to all targets.

To fix this, ConText allows optional attributes in `ItemData` to limit the scope: `max_scope` and `max_targets`. Both attributes are explained below.

### max_targets
Some modifiers should really only attach to a single target. For example, in the sentence below:

**"Pt presents with diabetes, pneumonia?"**

**"?"** indicates uncertainty, but *only* with **"pneumonia"**. **"Diabetes"** should not be affected. We can achieve this by creating a bidirectional rule with a `max_targets` of **1**. This will limit the number of targets to 1 *on each side* of the tag object.

Let's first see what this looks like *without* defining `max_targets`:

In [36]:
text = "Pt presents with diabetes, pneumonia?"

In [37]:
doc = nlp(text)
doc.ents = (doc[3:4], doc[5:6])
doc.ents

  matches = self.matcher(doc)


()

In [38]:
item = ConTextRule("?", category="UNCERTAIN",
                           direction="BACKWARD", 
                   max_scope=None)

In [39]:
context = ConTextComponent(nlp, rules=None)
context.add([item])

In [40]:
context(doc)

  matches = self.matcher(doc)


Pt presents with diabetes, pneumonia?

In [41]:
# Both are modified
visualize_dep(doc)

Now, let's start over and set `max_targets` to **1**:

In [42]:
doc = nlp(text)
doc.ents = (doc[3:4], doc[5:6])

  matches = self.matcher(doc)


In [43]:
rule = ConTextRule("?", category="UNCERTAIN",
                           direction="BACKWARD", 
                   max_targets=1)

In [44]:
context = ConTextComponent(nlp, rules=None)
context.add([rule])

In [45]:
context(doc)

  matches = self.matcher(doc)


Pt presents with diabetes, pneumonia?

In [46]:
# Only "pneumonia" is modified
visualize_dep(doc)

### max_scope
One limitation of using `max_targets` is that in a sentence like the example above, each concept has to be extracted as an entity in order for it to reduce the scope - if **"pneumonia"** was not extracted, then **"vs"** would still etend as far back as **"diabetes"**. 

We can address this by explicitly setting the scope to be no greater than a certain number of tokens using `max_scope`. For example, lab results may show up in a text document with many individual results:

---
Adenovirus DETECTED<br>
SARS NOT DETECTED<br>
...
Cov HKU1 NOT DETECTED<br>

---

Texts like this are often difficult to parse and they are often not ConText-friendly because many lines can be extracted as a single sentence. By default, a modifier like **"NOT DETECTED"** could extend far back to a concept such as **"Adenovirus"**, which we see returned positive. We may also not explicitly extract every virus tested in the lab, so `max_targets` won't work. 

With text formats like this, we can be fairly certain that **"Not Detected"** will only modify the single concept right before it. We can set `max_scope` to be so **only** a single concept will be modified.

In [47]:
text = """Adenovirus DETECTED Sars NOT DETECTED Pneumonia NOT DETECTED"""

In [48]:
doc = nlp(text)
doc.ents = (doc[0:1], doc[2:3], doc[5:6])
doc.ents

  matches = self.matcher(doc)


()

In [49]:
print([sent for sent in doc.sents])

[Adenovirus DETECTED Sars NOT DETECTED Pneumonia NOT DETECTED]


In [50]:
#assert len(list(doc.sents)) == 1

In [51]:
rules = [ConTextRule("DETECTED", category="POSITIVE_EXISTENCE",
                           direction="BACKWARD", 
                   max_scope=None),
             ConTextRule("NOT DETECTED", category="DEFINITE_NEGATED_EXISTENCE",
                           direction="BACKWARD", 
                   max_scope=None),
            ]

In [52]:
context = ConTextComponent(nlp, rules=None)
context.add(rules)

In [53]:
context(doc)

  matches = self.matcher(doc)


Adenovirus DETECTED Sars NOT DETECTED Pneumonia NOT DETECTED

In [54]:
visualize_dep(doc)

Let's now set `max_scope`  to 1 and we'll find that only **"pneumonia"** and **"Sars"** are modified by **"NOT DETECTED"**:

In [55]:
doc = nlp(text)
doc.ents = (doc[0:1], doc[2:3], doc[5:6])
doc.ents

  matches = self.matcher(doc)


()

In [56]:
rules = [ConTextRule("DETECTED", category="POSITIVE_EXISTENCE",
                           direction="BACKWARD", 
                   max_scope=1),
             ConTextRule("NOT DETECTED", category="DEFINITE_NEGATED_EXISTENCE",
                           direction="BACKWARD", 
                   max_scope=1),
            ]

In [57]:
context = ConTextComponent(nlp, rules=None)
context.add(rules)

In [58]:
context(doc)

  matches = self.matcher(doc)


Adenovirus DETECTED Sars NOT DETECTED Pneumonia NOT DETECTED

In [59]:
visualize_dep(doc)

### Using `context_window`
The default scope for context modifier is the sentence containing an entity. This means if the pipeline doesn't include a sentence splitting component, ConText will throw an error:

In [62]:
nlp_no_sents = medspacy.load(enable=[])
doc = nlp_no_sents("There is no evidence of pneumonia. Scheduled visit in two weeks.")
context = ConTextComponent(nlp)
try:
    context(doc)
except ValueError as e:
    print(e)

[E030] Sentence boundaries unset. You can add the 'sentencizer' component to the pipeline with: `nlp.add_pipe('sentencizer')`. Alternatively, add the dependency parser or sentence recognizer, or set sentence boundaries by setting `doc[i].is_sent_start`.


However, sentence splitting is an expensive operation. To avoid that processing step, you can set context to only use the `max_scope` argument shown above. To do this, pass in the argument `use_context_window=True` and a value for `max_scope`:

In [63]:
nlp_no_sents = medspacy.load(enable=[])
context = ConTextComponent(nlp, use_context_window=True, max_scope=3)
doc = nlp_no_sents("There is no evidence of pneumonia. Scheduled visit in two weeks.")
context(doc)

There is no evidence of pneumonia. Scheduled visit in two weeks.

In [64]:
for modifier in doc._.context_graph.modifiers:
    print(modifier, modifier.scope)

<ConTextModifier> [no evidence of, NEGATED_EXISTENCE] pneumonia. Scheduled


While this can allow faster processing, it also risks setting a less accurate modifying scope without knowing sentence boundries.

## 5: Filtering target types
You may want modifiers to only modify targets with certain semantic classes. You can specify which types to be modified/not be modified through the `allowed_types` and `excluded_types` arguments. 

For example, in the sentence:

---
"She is not prescribed any beta blockers for her hypertension."

---

**"Beta blockers"** is negated by the phrase **not prescribed"**, but **"hypertension"** should not be negated. By default, a modifier will modify all concepts in its scope, regardless of semantic type:

In [65]:
from spacy.tokens import Span

In [66]:
# Let's write a function to create this manual example
def create_medication_example():
    doc = nlp("She is not prescribed any beta blockers for her hypertension.")
    # Manually define entities
    medication_ent = Span(doc, 5, 7, "MEDICATION")
    condition_ent = Span(doc, 9, 10, "CONDITION")
    doc.ents = (medication_ent, condition_ent)
    return doc

In [67]:
doc = create_medication_example()
doc

  matches = self.matcher(doc)


She is not prescribed any beta blockers for her hypertension.

In [68]:
# Define our item data without any type restrictions
rules = [ConTextRule("not prescribed", "NEGATED_EXISTENCE", "FORWARD")]

In [69]:
context = ConTextComponent(nlp, rules="other", rule_list=rules)

In [70]:
context(doc)

  matches = self.matcher(doc)


She is not prescribed any beta blockers for her hypertension.

In [71]:
# Visualize the modifiers
visualize_dep(doc)

To change this, we can make sure that **"not prescribed"** only modifies **MEDICATION** entities by setting `allowed_types` to **"MEDICATION"**;

In [72]:
rules = [ConTextRule("not prescribed", "NEGATED_EXISTENCE", "FORWARD", allowed_types={"MEDICATION"})]

In [73]:
context = ConTextComponent(nlp, rules="other", rule_list=rules)

In [74]:
doc = create_medication_example()
context(doc)

  matches = self.matcher(doc)
  matches = self.matcher(doc)


She is not prescribed any beta blockers for her hypertension.

Now, only **"beta blockers"** will be negated:

In [75]:
visualize_dep(doc)

The same can be achieved by setting `excluded_types` to `{"CONDITION"}`.

In [76]:
rules = [ConTextRule("not prescribed", "NEGATED_EXISTENCE", "FORWARD", excluded_types={"CONDITION"})]

## 7: Callbacks
We can also define callback functions which can allow for flexible control of either which phrases are matched in the text or which concepts are modified. The two callback arguments are `on_match` and `on_modifies`.

### `on_match`
This functionality is taken directly from spaCy's [rule-based matching](https://spacy.io/usage/rule-based-matching). When a match is found in the text, the `on_match` argument will run a callback function which can perform additional processing on the match, such as removing it if a certain condition is met.

For example, let's say that we want the phrase **"Positive"** to be used as a positive modifier, but this is sometimes ambiguously used in the context of mental health (ie., **"positive thinking"**).

Let's write a word-sense disambiguation function which will remove a match if mental health phrase is in the sentence. See the [spaCy documentation](https://spacy.io/usage/rule-based-matching) for more details:

In [77]:
def wsd_positive(matcher, doc, i, matches):
    (_, start, end) = matches[i]
    span = doc[start:end]
    
    # Check if words related to mental health are in the sentence
    sent = span.sent
    for mh_phrase in ["resilience", "mental health", "therapy", "therapist", "feedback"]:
        if mh_phrase in sent.text.lower():
            print("Removing", span)
            matches.pop(i)
            return
    
    # If not, keep the match
    print("Keeping", span)

In [78]:
texts = [
    "Therapist encouraging him to be positive during COVID-19 pandemic.",
    "Positive for COVID-19.",
]

In [79]:
docs = list(nlp.pipe(texts))

  matches = self.matcher(doc)
  matches = self.matcher(doc)


In [80]:
docs[0].ents = (Span(docs[0], 7, 8, "CONDITION"),)
docs[1].ents = (Span(docs[1], 2, 3, "CONDITION"),)

In [81]:
context = ConTextComponent(nlp, rules=None)

In [82]:
rule = ConTextRule("positive", "POSITIVE_EXISTENCE", on_match=wsd_positive)

In [83]:
context.add([rule])

In [84]:
for doc in docs:
    context(doc)

Removing positive
Keeping Positive


  matches = self.matcher(doc)
  matches = self.matcher(doc)


In [85]:
visualize_dep(docs[0])

In [86]:
visualize_dep(docs[1])

### `on_modifies`
A modifier will usually modify a target concept so long as it is within the scope. `on_modifies` runs a one-time check between the modifier and a potential target concept to decide whether or not it will modify that concept.

The function passed in for this argument should take these 3 arguments:
- `target`: The entity Span
- `modifier`: The modifying Span
- `span_between`: The Span in between the target and modifier 

And it must return either `True`, which will cause the modifier to apply to the target, or `False`.

For example, in the example below, "No evidence" may incorrectly modify both "Pneumonia" and "COVID". The phrase "post" might indicate that COVID is not negated, so this can prevent our modifier from applying to it.

In [87]:
text = "No evidence of pneumonia post COVID."

In [88]:
doc = nlp(text)
doc.ents = (Span(doc, 3, 4, "CONDITION"), Span(doc, 5, 6, "CONDITION"), )

  matches = self.matcher(doc)


In [89]:
context = ConTextComponent(nlp, rules=None)

In [90]:
def post_in_span_between(target, modifier, span_between):
    print("Evaluating whether {0} will modify {1}".format(modifier, target))
    if "post" in span_between.text.lower():
        print("Will not modify")
        print()
        return False
    print("Will modify")
    print()
    return True

In [91]:
rule = ConTextRule("no evidence of", "NEGATED_EXISTENCE", "FORWARD", on_modifies=post_in_span_between)

In [92]:
context.add([rule])

In [93]:
context(doc)

Evaluating whether No evidence of will modify pneumonia
Will modify

Evaluating whether No evidence of will modify COVID
Will not modify



  matches = self.matcher(doc)


No evidence of pneumonia post COVID.

In [94]:
visualize_dep(doc)

# Setting additional Span attributes
As seen in an earlier notebook, ConText registers two new attributes for target Spans: `is_experienced` and `is_current`. These values are set to default values of True and changed if a target is modified by certain modifiers. This logic is set in the variable `DEFAULT_ATTRS`. This is a dictionary which maps modifier category names to the attribute name/value pair which should be set if a target is modified by that modifier type.

In [95]:
from medspacy.context import DEFAULT_ATTRS

In [96]:
DEFAULT_ATTRS

{'NEGATED_EXISTENCE': {'is_negated': True},
 'POSSIBLE_EXISTENCE': {'is_uncertain': True},
 'HISTORICAL': {'is_historical': True},
 'HYPOTHETICAL': {'is_hypothetical': True},
 'FAMILY': {'is_family': True}}

## Defining custom attributes
Rather than using the logic shown above, you can set your own attributes by creating a dictionary with the same structure as DEFAULT_ATTRS and passing that in as the `add_attrs` parameter. If setting your own extensions, you must first call `Span.set_extension` on each of the extensions. 

If more complex logic is required, custom attributes can also be set manually outside of the ConTextComponent, for example as a post-processing step.

Below, we'll create our own attribute mapping and have them override the default ConText attributes. We'll defined `is_experienced` and `is_family_history`. Because both a negated concept and a family history concept are not actually experienced by a patient, we'll specify both to set `is_experienced` to False. We'll also set the family history modifier to add a new attribute called `is_family_history`.

In [97]:
from spacy.tokens import Span

In [98]:
# Define modifiers and Span attributes
custom_attrs = {
    'NEGATED_EXISTENCE': {'is_experienced': False},
    'FAMILY_HISTORY': {'is_family_history': True,
                      'is_experienced': False},
}

In [99]:
# Register extensions - is_experienced should be True by default, `is_family_history` False
Span.set_extension("is_experienced", default=True)
Span.set_extension("is_family_history", default=False)

In [100]:
context = ConTextComponent(nlp, rules=None, add_attrs=custom_attrs)
context.context_attributes_mapping

{'NEGATED_EXISTENCE': {'is_experienced': False},
 'FAMILY_HISTORY': {'is_family_history': True, 'is_experienced': False}}

In [101]:
rules = [ConTextRule("no evidence of", "NEGATED_EXISTENCE", "FORWARD"),
            ConTextRule("family history", "FAMILY_HISTORY", "FORWARD"),
            ]

context.add(rules)

In [102]:
doc = nlp("There is no evidence of pneumonia. Family history of diabetes.")

doc.ents = doc[5:6], doc[-2:-1]

doc.ents

  matches = self.matcher(doc)


()

In [103]:
context(doc)

  matches = self.matcher(doc)


There is no evidence of pneumonia. Family history of diabetes.

The new attributes are now available in `ent._`:

In [104]:
for ent in doc.ents:
    print(ent)
    print("is_experienced: ", ent._.is_experienced)
    print("is_family_history: ", ent._.is_family_history)
    print()