In [None]:
import sys
sys.path.insert(0, "../..")

# Overview

medspaCy comes with a default knowledge base which is loaded by default. While these rules will cover a large number of use cases, users will often want to customize or extend the modifiers included in a knowledge base. Users can define their own modifiers and control their behavior through the `ContextItem` class.

In this notebook, we'll dive deeper into the `ConTextItem` and `TagObject` classes and show how to use them to add and customize new rules 

In [None]:
import medspacy

from medspacy.context import ConTextItem, ConTextComponent

from medspacy.visualization import visualize_dep, visualize_ent

In [None]:
nlp = medspacy.load(enable=["sentencizer"])

# Modifiers
## ConTextItem
The knowledge base of cycontext is defined by ConTextItem objects. A ConTextItem is instantiated with a number of parameters which allow for defining and controlling modifier behavior, but we'll focus on these 4 primary arguments:

- **literal** (str): The actual string of a concept. If pattern is None,
    this string will be lower-cased and matched to the lower-case string.
- **category** (str): The semantic class of the item.
- **pattern** (list or None): A spaCy pattern to match using token attributes.
    See https://spacy.io/usage/rule-based-matching.
- **rule** (str): The directionality or action of a modifier.
    One of ("forward", "backward", "bidirectional", or "terminate").

## TagObject

When a ConTextItem is matched to a string of text, it generates a `TagObject` which is stored in `doc._.context_graph.modifiers`. If it modifies any targets, these relationships can be found as tuples in `doc._.context_graph.edges`. The TagObject also contains a reference to the original ConTextItem.

In addition to the attributes of the original ItemData such as **literal** and **category**, a TagObject contains the following attributes:
- **span**: The spaCy Span of the matched text
- **scope**: The spaCy Span of the Doc which is within the TagObject's scope. Any targets in this scope will be modified by the TagObject
- **start**: Start index
- **end**: End index (non-inclusive)

## 1. Default Rules
When you instantiate `ConTextComponent`, a default list of `ConTextItem`s is loaded and included in the `context.item_data` attribute.

In [None]:
context = ConTextComponent(nlp, rules="default")

In [None]:
context.item_data[:5]

In [None]:
print(context.item_data[0])

In [None]:
print(type(context.item_data[0]))

In [None]:
len(context.item_data)

We can also see the unique categories in the knowledge base by checking `context.categories`:

In [None]:
context.categories

## 2: Basic Usage
Here, we'll load a blank context component and define our own item data. We'll an example we've seen earlier, where we need to negate **"pneumonia"**:

In [None]:
doc = nlp("There is no evidence of pneumonia.")

In [None]:
from spacy.tokens import Span

In [None]:
doc.ents = (Span(doc, 5, 6, "CONDITION"),)

First, we instantiate context and pass in `rules=None` so that we have an empty knowledge base:

In [None]:
context = ConTextComponent(nlp, rules=None)

In [None]:
context.item_data

Next, we'll define a ConTextItem with following arguments:
- `literal=`**"no evidence of"**: This is the string of text which ConText will look for in the text (case insensitive)
- `category=`**"NEGATED_EXISTENCE"**: The semantic class assigned to our modifier
- `rule=`**"forward"**: This defines the *directionality* of the rule. A later example shows more examples of this

We'll leave the other arguments blank. Next, we instantiate our ConTextItem as `item` and put it in a list called `item_data`.

In [None]:
item = ConTextItem(literal="no evidence of", category="NEGATED_EXISTENCE", rule="FORWARD")
item_data = [item]

We then add the modifiers to ConText with the `context.add()` method:

In [None]:
context.add(item_data)

In [None]:
context.item_data

Now we can call context on our doc. This will typically happen under the hood as part of the nlp pipeline, but you can call it manually on a doc as well:

In [None]:
context(doc)

We can see if any modifiers were created by context by looking at the `doc._.context_graph` attribute, which stores all of the information generated on a doc by context. `modifiers` stores the `TagObjects` created by context, and `edges` stores the relationships between the modifiers and targets. Here, we match a modifier with the custom `item_data` that we created, but there are no edges because there are no target concepts in `doc.ents` yet.

In [None]:
print(doc._.context_graph)
print(doc._.context_graph.modifiers)
print(doc._.context_graph.edges)
print(doc.ents)

In [None]:
visualize_dep(doc)

Each element of `context_graph.modifiers` is a`TagObject`. Let's look at the tag object in this doc and see some of the attributes which are available: 

In [None]:
tag_object = doc._.context_graph.modifiers[0]

`tag_object.span` is the spaCy Span of the Doc which was matched, and has a `start` and `end` index:

In [None]:
print(tag_object.span)
print(tag_object.start, tag_object.end)

`tag_object.scope` shows what part of the sentence could be modified by the modifier. Any targets in this span of text will be modified:

In [None]:
print(tag_object.scope)

We can also see the original `ConTextItem` object and attributes:

In [None]:
print(tag_object.category, ",", tag_object.rule)

In [None]:
# The reference to the original ConTextItem
print(tag_object.context_item)
assert tag_object.context_item is item_data[0]

## 3: Pattern-matching
In this example, we'll use a matching pattern to generate a more flexible matching criteria to match multiple texts with a single ConTextItem. If only `literal` is supplied, the exact phrase is matched in lower case. spaCy offers powerful rule-based matching which operates on each token in a Doc. Matching patterns can use the text, regular expression patterns, linguistic attributes such as part of speech, and operators such as **"?"** (0 or 1) or **"*"** (0 or more) to match sequences of text. 

For more detailed information, see spaCy's documentation on rule-based matching: https://spacy.io/usage/rule-based-matching.

The ConTextItem below has the same literal, categorym, and rule as our previous example, but it also includes a pattern which allows the tokens "evidence" and "of" to be optional. This will then match both "no evidence of" and "no" and assign both spans of text to be negation modifiers.

In [None]:
item_data = [ConTextItem(literal="no evidence of", 
                         category="NEGATED_EXISTENCE", 
                         rule="forward", 
                         pattern=[{"LOWER": "no"}, 
                                  {"LOWER": "evidence", "OP": "?"},
                                  {"LOWER": "of", "OP": "?"},
                                 ]
                        )]

In [None]:
context = ConTextComponent(nlp)
context.add(item_data)

In [None]:
texts = ["THERE IS NO EVIDENCE OF PNEUMONIA.",
        "There is no CHF."]

In [None]:
docs = list(nlp.pipe(texts))

In [None]:
# Add entities
docs[0].ents = (Span(docs[0], 5, 6, "CONDITION"),)
docs[1].ents = (Span(docs[1], 3, 4, "CONDITION"),)

In [None]:
for doc in docs:
    context(doc)

In [None]:
print(docs[0]._.context_graph.modifiers)
visualize_dep(docs[0])

In [None]:
print(docs[1]._.context_graph.modifiers)
visualize_dep(docs[1])

You can also use regular expressions as the `pattern` argument, although this isn't recommend since spaCy doesn't natively support regular expression matching and may result in unexpected spans:

In [None]:
item_data = [
    ConTextItem("no known history of", "HISTORICAL", pattern=r"no known (hx|history)"),
]

In [None]:
context.add(item_data)

In [None]:
doc = nlp("There is no known hx of pneumonia.")
doc.ents = (Span(docs[0], 5, 6, "CONDITION"),)

In [None]:
context(doc)

In [None]:
visualize_dep(doc)

## Example 4: `rule` argument
The `rule` attribute defines which direction modifiers should operate. You can imagine an arrow starting at the modifier in a phrase and moving *towards* the target. 

This argument can take one of 5 values. In this notebook, we'll explain the primary 3:
- **FORWARD**: If the modifier comes before the target, the arrow will move **forward** in the sentence all targets in the sentence *after* the TagObject will be modified. 
- **"BACKWARD"**: The arrow will move **backward** in the sentence and match all targets *before*. 
- **"BIDIRECTIONAL"**: It will look **both ahead and behind** (this is the default).

The additional values, **"TERMINATE"** and **"PSEUDO"**, will be explained in the next notebook.

## Modifier Scope

The scope of a modifier is bounded to be within the same sentence, so no modifier will affect targets in other sentences. This can be problematic in poorly split documents, but it prevents all targets in a document from being incorrectly modified by a ConText item. A scope is also defined by any termination points, which will be shown in the next example.

In [None]:
item_data = [ConTextItem("no evidence of", "NEGATED_EXISTENCE", "FORWARD"),
            ConTextItem("is ruled out", "NEGATED_EXISTENCE", "BACKWARD"),
             ConTextItem("unlikely", "POSSIBLE_EXISTENCE", "BIDIRECTIONAL"),
            ]

In [None]:
texts = ["No evidence of pneumonia.",
        "PE is ruled out",
        "unlikely to be malignant", 
        "malignancy unlikely",]

In [None]:
docs = list(nlp.pipe(texts))

In [None]:
# Add entities
docs[0].ents = (Span(docs[0], 3, 4, "CONDITION"),)
docs[1].ents = (Span(docs[1], 0, 1, "CONDITION"),)
docs[2].ents = (Span(docs[2], 3, 4, "CONDITION"),)
docs[3].ents = (Span(docs[3], 0, 1, "CONDITION"),)

In [None]:
context = ConTextComponent(nlp, rules=None)
context.add(item_data)

In [None]:
for doc in docs:
    context(doc)
    modifier = doc._.context_graph.modifiers[0]
    print(modifier, modifier.rule)
    visualize_dep(doc)
    print()

# Reading and Writing a Knowledge Base
ConTextItems can be saved as JSON and read in, which allows a knowledge base to be reused and scaled. When you install `cycontext` with pip or `python setup.py install`, it includes a JSON file of default modifier rules.

The filepath on your local machine can be accessed in the constant `DEFAULT_RULES_FILEPATH`. Let's look at the first 10 lines of this file: 

In [None]:
from medspacy.context import DEFAULT_RULES_FILEPATH

In [None]:
with open(DEFAULT_RULES_FILEPATH) as f:
    print(f.read()[:500])

A JSON file of item data can be loaded with the `ConTextItem.from_json` method:

In [None]:
item_data = ConTextItem.from_json(DEFAULT_RULES_FILEPATH)

In [None]:
for item in item_data[:5]:
    print(item)

The items can also be saved as JSON by using the `ConTextItem.to_json` method:

In [None]:
ConTextItem.to_json(item_data[:2], "2_modifiers.json")

In [None]:
import json
with open("2_modifiers.json") as f:
    print(json.load(f))

# Next Steps
The next notebook will show more complex examples of controlling modifier behavior.