In [1]:
import sys
sys.path.insert(0, "../..")

# Overview

medspaCy comes with a default knowledge base which is loaded by default. While these rules will cover a large number of use cases, users will often want to customize or extend the modifiers included in a knowledge base. Users can define their own modifiers and control their behavior through the `ContextItem` class.

In this notebook, we'll dive deeper into the `ConTextRule` and `ConTextModifier` classes and show how to use them to add and customize new rules 

In [2]:
import medspacy

from medspacy.context import ConTextRule, ConTextComponent

from medspacy.visualization import visualize_dep, visualize_ent

In [3]:
nlp = medspacy.load()

# Modifiers
## ConTextRule
The knowledge base of cycontext is defined by ConTextRule objects. A ConTextRule is instantiated with a number of parameters which allow for defining and controlling modifier behavior, but we'll focus on these 4 primary arguments:

- **literal** (str): The actual string of a concept. If pattern is None,
    this string will be lower-cased and matched to the lower-case string.
- **category** (str): The semantic class of the item.
- **pattern** (list or None): A spaCy pattern to match using token attributes.
    See https://spacy.io/usage/rule-based-matching.
- **rule** (str): The directionality or action of a modifier.
    One of ("forward", "backward", "bidirectional", or "terminate").

## ConTextModifier

When a ConTextRule is matched to a string of text, it generates a `ConTextModifier` which is stored in `doc._.context_graph.modifiers`. If it modifies any targets, these relationships can be found as tuples in `doc._.context_graph.edges`. The ConTextModifier also contains a reference to the original ConTextRule.

In addition to the attributes of the original ItemData such as **literal** and **category**, a ConTextModifier contains the following attributes:
- **span**: The spaCy Span of the matched text
- **scope**: The spaCy Span of the Doc which is within the ConTextModifier's scope. Any targets in this scope will be modified by the ConTextModifier
- **start**: Start index
- **end**: End index (non-inclusive)

## 1. Default Rules
When you instantiate `ConTextComponent`, a default list of `ConTextRule`s is loaded and included in the `context.item_data` attribute.

In [4]:
context = ConTextComponent(nlp, rules="default")

In [5]:
context.rules[:5]

[ConTextRule(literal='absence of', category='NEGATED_EXISTENCE', pattern=None, direction='FORWARD'),
 ConTextRule(literal='adequate to rule out', category='NEGATED_EXISTENCE', pattern=[{'LOWER': {'IN': ['adequate', 'sufficient']}}, {'LOWER': 'to'}, {'LOWER': 'rule'}, {'LOWER': {'IN': ['him', 'her', 'them', 'patient', 'pt']}, 'OP': '?'}, {'LOWER': 'out'}, {'LOWER': {'IN': ['against', 'for']}, 'OP': '?'}], direction='FORWARD'),
 ConTextRule(literal='adequate to rule the patient out', category='NEGATED_EXISTENCE', pattern=[{'LOWER': {'IN': ['adequate', 'sufficient']}}, {'LOWER': 'to'}, {'LOWER': 'rule'}, {'LOWER': 'the'}, {'LOWER': {'IN': ['patient', 'pt']}}, {'LOWER': 'out'}, {'LOWER': {'IN': ['against', 'for']}, 'OP': '?'}], direction='FORWARD'),
 ConTextRule(literal='any other', category='NEGATED_EXISTENCE', pattern=None, direction='FORWARD'),
 ConTextRule(literal='apart from', category='NEGATED_EXISTENCE', pattern=[{'LOWER': 'apart'}, {'LOWER': {'IN': ['for', 'from']}}], direction='TE

In [6]:
print(context.rules[0])

ConTextRule(literal='absence of', category='NEGATED_EXISTENCE', pattern=None, direction='FORWARD')


In [7]:
print(type(context.rules[0]))

<class 'medspacy.context.context_rule.ConTextRule'>


In [8]:
len(context.rules)

97

We can also see the unique categories in the knowledge base by checking `context.categories`:

In [9]:
context.categories

{'FAMILY',
 'HISTORICAL',
 'HYPOTHETICAL',
 'NEGATED_EXISTENCE',
 'POSSIBLE_EXISTENCE'}

## 2: Basic Usage
Here, we'll load a blank context component and define our own item data. We'll an example we've seen earlier, where we need to negate **"pneumonia"**:

In [10]:
doc = nlp("There is no evidence of pneumonia.")

  matches = self.matcher(doc)


In [11]:
from spacy.tokens import Span

In [12]:
doc.ents = (Span(doc, 5, 6, "CONDITION"),)

First, we instantiate context and pass in `rules=None` so that we have an empty knowledge base:

In [13]:
context = ConTextComponent(nlp, rules=None)

In [14]:
context.rules

[]

Next, we'll define a ConTextRule with following arguments:
- `literal=`**"no evidence of"**: This is the string of text which ConText will look for in the text (case insensitive)
- `category=`**"NEGATED_EXISTENCE"**: The semantic class assigned to our modifier
- `direction=`**"forward"**: This defines the *directionality* of the rule. A later example shows more examples of this

We'll leave the other arguments blank. Next, we instantiate our ConTextRule as `item` and put it in a list called `item_data`.

In [15]:
rule = ConTextRule(literal="no evidence of", category="NEGATED_EXISTENCE", direction="FORWARD")
rules = [rule]

We then add the modifiers to ConText with the `context.add()` method:

In [16]:
context.add(rules)

In [17]:
context.rules

[ConTextRule(literal='no evidence of', category='NEGATED_EXISTENCE', pattern=None, direction='FORWARD')]

Now we can call context on our doc. This will typically happen under the hood as part of the nlp pipeline, but you can call it manually on a doc as well:

In [18]:
context(doc)

  matches = self.matcher(doc)


There is no evidence of pneumonia.

We can see if any modifiers were created by context by looking at the `doc._.context_graph` attribute, which stores all of the information generated on a doc by context. `modifiers` stores the `TagObjects` created by context, and `edges` stores the relationships between the modifiers and targets. Here, we match a modifier with the custom `item_data` that we created, but there are no edges because there are no target concepts in `doc.ents` yet.

In [19]:
print(doc._.context_graph)
print(doc._.context_graph.modifiers)
print(doc._.context_graph.edges)
print(doc.ents)

<ConTextGraph> with 1 targets and 1 modifiers
[<ConTextModifier> [no evidence of, NEGATED_EXISTENCE]]
[(pneumonia, <ConTextModifier> [no evidence of, NEGATED_EXISTENCE])]
(pneumonia,)


In [20]:
visualize_dep(doc)

Each element of `context_graph.modifiers` is a`ConTextModifier`. Let's look at the tag object in this doc and see some of the attributes which are available: 

In [21]:
modifier = doc._.context_graph.modifiers[0]

In [22]:
type(modifier)

medspacy.context.context_modifier.ConTextModifier

`modifier.span` is the spaCy Span of the Doc which was matched, and has a `start` and `end` index:

In [23]:
print(modifier.span)
print(modifier.start, modifier.end)

no evidence of
2 5


`tag_object.scope` shows what part of the sentence could be modified by the modifier. Any targets in this span of text will be modified:

In [24]:
print(modifier.scope)

pneumonia.


We can also see the original `ConTextRule` object and attributes:

In [25]:
print(modifier.category, ",", modifier.direction)

NEGATED_EXISTENCE , FORWARD


In [26]:
# The reference to the original ConTextRule
print(modifier.rule)
assert modifier.rule is rules[0]

ConTextRule(literal='no evidence of', category='NEGATED_EXISTENCE', pattern=None, direction='FORWARD')


## 3: Pattern-matching
In this example, we'll use a matching pattern to generate a more flexible matching criteria to match multiple texts with a single ConTextRule. If only `literal` is supplied, the exact phrase is matched in lower case. spaCy offers powerful rule-based matching which operates on each token in a Doc. Matching patterns can use the text, regular expression patterns, linguistic attributes such as part of speech, and operators such as **"?"** (0 or 1) or **"*"** (0 or more) to match sequences of text. 

For more detailed information, see spaCy's documentation on rule-based matching: https://spacy.io/usage/rule-based-matching.

The ConTextRule below has the same literal, categorym, and rule as our previous example, but it also includes a pattern which allows the tokens "evidence" and "of" to be optional. This will then match both "no evidence of" and "no" and assign both spans of text to be negation modifiers.

In [27]:
rules = [ConTextRule(literal="no evidence of", 
                         category="NEGATED_EXISTENCE", 
                         direction="forward", 
                         pattern=[{"LOWER": "no"}, 
                                  {"LOWER": "evidence", "OP": "?"},
                                  {"LOWER": "of", "OP": "?"},
                                 ]
                        )]

In [28]:
context = ConTextComponent(nlp)
context.add(rules)

In [29]:
texts = ["THERE IS NO EVIDENCE OF PNEUMONIA.",
        "There is no CHF."]

In [30]:
docs = list(nlp.pipe(texts))

  matches = self.matcher(doc)
  matches = self.matcher(doc)


In [31]:
# Add entities
docs[0].ents = (Span(docs[0], 5, 6, "CONDITION"),)
docs[1].ents = (Span(docs[1], 3, 4, "CONDITION"),)

In [32]:
for doc in docs:
    context(doc)

In [33]:
print(docs[0]._.context_graph.modifiers)
visualize_dep(docs[0])

[<ConTextModifier> [NO EVIDENCE OF, NEGATED_EXISTENCE]]


In [34]:
print(docs[1]._.context_graph.modifiers)
visualize_dep(docs[1])

[<ConTextModifier> [no, NEGATED_EXISTENCE]]


You can also use regular expressions as the `pattern` argument, although this isn't recommend since spaCy doesn't natively support regular expression matching and may result in unexpected spans:

In [35]:
rules = [
    ConTextRule("no known history of", "HISTORICAL", pattern=r"no known (hx|history)"),
]

In [36]:
context.add(rules)



In [37]:
doc = nlp("There is no known hx of pneumonia.")
doc.ents = (Span(docs[0], 5, 6, "CONDITION"),)

  matches = self.matcher(doc)


In [38]:
context(doc)

There is no known hx of pneumonia.

In [39]:
visualize_dep(doc)

## Example 4: `direction` argument
The `rudirectionle` attribute defines which direction modifiers should operate in the sentence. You can imagine an arrow starting at the modifier in a phrase and moving *towards* the target. We'll show some examples below.

This argument can take one of 5 values. In this notebook, we'll explain the primary 3:
- **FORWARD**: If the modifier comes before the target, the arrow will move **forward** in the sentence all targets in the sentence *after* the ConTextModifier will be modified. 
- **"BACKWARD"**: The arrow will move **backward** in the sentence and match all targets *before*. 
- **"BIDIRECTIONAL"**: It will look **both ahead and behind** (this is the default).

The additional values, **"TERMINATE"** and **"PSEUDO"**, will be explained in the next notebook.

## Modifier Scope

The scope of a modifier is bounded to be within the same sentence, so no modifier will affect targets in other sentences. This can be problematic in poorly split documents, but it prevents all targets in a document from being incorrectly modified by a ConText item. A scope is also defined by any termination points, which will be shown in the next example.

In [40]:
item_data = [ConTextRule("no evidence of", "NEGATED_EXISTENCE", "FORWARD"),
            ConTextRule("is ruled out", "NEGATED_EXISTENCE", "BACKWARD"),
             ConTextRule("unlikely", "POSSIBLE_EXISTENCE", "BIDIRECTIONAL"),
            ]

In [41]:
texts = ["No evidence of pneumonia.",
        "PE is ruled out",
        "unlikely to be malignant", 
        "malignancy unlikely",]

In [42]:
docs = list(nlp.pipe(texts))

  matches = self.matcher(doc)
  matches = self.matcher(doc)
  matches = self.matcher(doc)
  matches = self.matcher(doc)


In [43]:
# Add entities
docs[0].ents = (Span(docs[0], 3, 4, "CONDITION"),)
docs[1].ents = (Span(docs[1], 0, 1, "CONDITION"),)
docs[2].ents = (Span(docs[2], 3, 4, "CONDITION"),)
docs[3].ents = (Span(docs[3], 0, 1, "CONDITION"),)

In [44]:
context = ConTextComponent(nlp, rules=None)
context.add(item_data)

In [45]:
for doc in docs:
    context(doc)
    modifier = doc._.context_graph.modifiers[0]
    print(modifier, modifier.direction)
    visualize_dep(doc)
    print()

<ConTextModifier> [No evidence of, NEGATED_EXISTENCE] FORWARD


  matches = self.matcher(doc)



<ConTextModifier> [is ruled out, NEGATED_EXISTENCE] BACKWARD


  matches = self.matcher(doc)



<ConTextModifier> [unlikely, POSSIBLE_EXISTENCE] BIDIRECTIONAL


  matches = self.matcher(doc)



<ConTextModifier> [unlikely, POSSIBLE_EXISTENCE] BIDIRECTIONAL


  matches = self.matcher(doc)





# Reading and Writing a Knowledge Base
ConTextItems can be saved as JSON and read in, which allows a knowledge base to be reused and scaled. When you install `cycontext` with pip or `python setup.py install`, it includes a JSON file of default modifier rules.

The filepath on your local machine can be accessed in the constant `DEFAULT_RULES_FILEPATH`. Let's look at the first 10 lines of this file: 

In [46]:
from medspacy.context import DEFAULT_RULES_FILEPATH

In [47]:
with open(DEFAULT_RULES_FILEPATH) as f:
    print(f.read()[:500])

{
  "context_rules": [
    {
      "category": "NEGATED_EXISTENCE",
      "literal": "absence of",
      "pattern": null,
      "direction": "FORWARD"
    },
    {
      "category": "NEGATED_EXISTENCE",
      "literal": "adequate to rule out",
      "pattern": [
        {
          "LOWER": {
            "IN": ["adequate", "sufficient"]
          }
        },
        {
          "LOWER": "to"
        },
        {
          "LOWER": "rule"
        },
        {
          "LOWER": {
            "IN


A JSON file of item data can be loaded with the `ConTextRule.from_json` method:

In [48]:
item_data = ConTextRule.from_json(DEFAULT_RULES_FILEPATH)

In [49]:
for item in item_data[:5]:
    print(item)

ConTextRule(literal='absence of', category='NEGATED_EXISTENCE', pattern=None, direction='FORWARD')
ConTextRule(literal='adequate to rule out', category='NEGATED_EXISTENCE', pattern=[{'LOWER': {'IN': ['adequate', 'sufficient']}}, {'LOWER': 'to'}, {'LOWER': 'rule'}, {'LOWER': {'IN': ['him', 'her', 'them', 'patient', 'pt']}, 'OP': '?'}, {'LOWER': 'out'}, {'LOWER': {'IN': ['against', 'for']}, 'OP': '?'}], direction='FORWARD')
ConTextRule(literal='adequate to rule the patient out', category='NEGATED_EXISTENCE', pattern=[{'LOWER': {'IN': ['adequate', 'sufficient']}}, {'LOWER': 'to'}, {'LOWER': 'rule'}, {'LOWER': 'the'}, {'LOWER': {'IN': ['patient', 'pt']}}, {'LOWER': 'out'}, {'LOWER': {'IN': ['against', 'for']}, 'OP': '?'}], direction='FORWARD')
ConTextRule(literal='any other', category='NEGATED_EXISTENCE', pattern=None, direction='FORWARD')
ConTextRule(literal='apart from', category='NEGATED_EXISTENCE', pattern=[{'LOWER': 'apart'}, {'LOWER': {'IN': ['for', 'from']}}], direction='TERMINATE')

The items can also be saved as JSON by using the `ConTextRule.to_json` method:

In [50]:
ConTextRule.to_json(item_data[:2], "2_modifiers.json")

In [51]:
import json
with open("2_modifiers.json") as f:
    print(json.load(f))

{'context_rules': [{'direction': 'FORWARD', 'literal': 'absence of', 'category': 'NEGATED_EXISTENCE'}, {'direction': 'FORWARD', 'literal': 'adequate to rule out', 'category': 'NEGATED_EXISTENCE', 'pattern': [{'LOWER': {'IN': ['adequate', 'sufficient']}}, {'LOWER': 'to'}, {'LOWER': 'rule'}, {'LOWER': {'IN': ['him', 'her', 'them', 'patient', 'pt']}, 'OP': '?'}, {'LOWER': 'out'}, {'LOWER': {'IN': ['against', 'for']}, 'OP': '?'}]}]}


# Next Steps
The next notebook will show more complex examples of controlling modifier behavior.