# Overview


In this notebook, we'll dive deeper into the `ConTextItem` and `TagObject` classes which represent the modifiers in a document.

In [1]:
import spacy

from cycontext import ConTextItem, ConTextComponent

from cycontext.viz import visualize_dep, visualize_ent

In [2]:
nlp = spacy.load("en_core_web_sm")

# Modifiers
## ConTextItem
The knowledge base of cycontext is defined by ConTextItem objects. A ConTextItem is instantiated with the following parameters:

- **literal**: An exact phrase which represents a concept. If no rule is provided, this phrase will be lower-cased and matched against the lower-case clinical document
- **category**: The semantic class of a modifier, such as **"DEFINITE_NEGATED_EXISTENCE"** or **"HISTORICAL"**
- **rule**: The directionality of a modifier. Either **"FORWARD"**, **"BACKWARD"**, **"BIDIRECTIONAL"**, or **"TERMINATE"**
- **pattern** (opt): A list defining a spaCy matching rule
- **metadata** (opt): An optional dict of additional data to store with the ConTextItem, such as SNOMED codes or custom attributes.

## TagObject

When a ConTextItem is matched to a string of text, it generates a `TagObject` which is stored in `doc._.context_graph.modifiers`. If it modifies any targets, these relationships can be found as tuples in `doc._.context_graph.edges`. The TagObject also contains a reference to the original ConTextItem.

In addition to the attributes of the original ItemData such as **literal** and **category**, a TagObject contains the following attributes:
- **span**: The spaCy Span of the matched text
- **scope**: The spaCy Span of the Doc which is within the TagObject's scope. Any targets in this scope will be modified by the TagObject
- **start**: Start index
- **end**: End index (non-inclusive)

# Examples

## Example 1: Basic Usage
This example demonstrates the objects and attributes described above.

In [3]:
item_data = [ConTextItem(literal="no evidence of", category="DEFINITE_NEGATED_EXISTENCE", rule="forward")]

In [4]:
context = ConTextComponent(nlp)

context.add(item_data)

In [5]:
doc = nlp("There is no evidence of pneumonia.")

context(doc)

There is no evidence of pneumonia.

In [6]:
doc._.context_graph.modifiers

[<TagObject> [no evidence of, DEFINITE_NEGATED_EXISTENCE]]

In [7]:
tag_object = doc._.context_graph.modifiers[0]

In [8]:
print(tag_object.span)
print(tag_object.start, tag_object.end)

no evidence of
2 5


In [9]:
print(tag_object.scope)

pneumonia.


In [10]:
print(tag_object.category, ",", tag_object.rule)


DEFINITE_NEGATED_EXISTENCE , FORWARD


In [11]:
# The reference to the original ConTextItem
print(tag_object.context_item)
assert tag_object.context_item is item_data[0]

ConTextItem(literal='no evidence of', category='DEFINITE_NEGATED_EXISTENCE', pattern=None, rule='FORWARD')


## Example 2: Pattern-matching
In this example, we'll use a matching pattern to generate a more flexible matching criteria to match multiple texts with a single ConTextItem. If only `literal` is supplied, the exact phrase is matched in lower case. spaCy offers powerful rule-based matching which operates on each token in a Doc. Matching patterns can use the text, regular expression patterns, linguistic attributes such as part of speech, and operators such as **"?"** (0 or 1) or **"*"** (0 or more) to match sequences of text. 

For more detailed information, see spaCy's documentation on rule-based matching: https://spacy.io/usage/rule-based-matching.

The ConTextItem below has the same literal, categorym, and rule as our previous example, but it also includes a pattern which allows the tokens "evidence" and "of" to be optional. This will then match both "no evidence of" and "no" and assign both spans of text to be negation modifiers.

In [12]:
item_data = [ConTextItem(literal="no evidence of", 
                         category="DEFINITE_NEGATED_EXISTENCE", 
                         rule="forward", 
                         pattern=[{"LOWER": "no"}, 
                                  {"LOWER": "evidence", "OP": "?"},
                                  {"LOWER": "of", "OP": "?"},
                                 ]
                        )]

In [13]:
context = ConTextComponent(nlp)
context.add(item_data)

In [14]:
texts = ["THERE IS NO EVIDENCE OF PNEUMONIA.",
        "There is no CHF."]

In [15]:
docs = list(nlp.pipe(texts))
for doc in docs:
    context(doc)

In [16]:
for doc in docs:
    print(doc._.context_graph.modifiers)

[<TagObject> [NO EVIDENCE OF, DEFINITE_NEGATED_EXISTENCE]]
[<TagObject> [no, DEFINITE_NEGATED_EXISTENCE]]


Under the hood, these matches are generated using two of spaCy's rule-based matching classes: 
- **[PhraseMatcher](https://spacy.io/api/phrasematcher)** for literals
- **[Matcher](https://spacy.io/api/matcher)** for patterns

In [17]:
context.matcher

<spacy.matcher.matcher.Matcher at 0x11312e248>

In [18]:
context.phrase_matcher

<spacy.matcher.phrasematcher.PhraseMatcher at 0x1153f4518>

## Example 3: Direction
The `rule` attribute defines which direction modifiers should operate. If **"forward"**, the modifier will modify all targets *after* the TagObject in the sentence. If **"backward"**, it will match all targets *before*. If **"bidirectional"** it will look both ahead and behind.

The scope of a modifier is bounded to be within the same sentence, so no modifier will affect targets in other sentences. This can be problematic in poorly split documents, but it prevents all targets in a document from being incorrectly modified by a ConText item. A scope is also defined by any termination points, which will be shown in the next example.

In [19]:
item_data = [ConTextItem("no evidence of", "DEFINITE_NEGATED_EXISTENCE", "FORWARD"),
            ConTextItem("is ruled out", "DEFINITE_NEGATED_EXISTENCE", "BACKWARD"),
             ConTextItem("unlikely", "DEFINITE_NEGATED_EXISTENCE", "BIDIRECTIONAL"),
            ]

In [20]:
texts = ["No evidence of pneumonia.",
        "PE is ruled out",
        "unlikely to be malignant", 
        "malignancy unlikely"]

In [21]:
docs = nlp.pipe(texts)

In [22]:
context = ConTextComponent(nlp)
context.add(item_data)

In [23]:
for doc in docs:
    context(doc)
    modifier = doc._.context_graph.modifiers[0]
    print(doc)
    print(modifier)
    print(modifier.rule)
    print()

No evidence of pneumonia.
<TagObject> [No evidence of, DEFINITE_NEGATED_EXISTENCE]
FORWARD

PE is ruled out
<TagObject> [is ruled out, DEFINITE_NEGATED_EXISTENCE]
BACKWARD

unlikely to be malignant
<TagObject> [unlikely, DEFINITE_NEGATED_EXISTENCE]
BIDIRECTIONAL

malignancy unlikely
<TagObject> [unlikely, DEFINITE_NEGATED_EXISTENCE]
BIDIRECTIONAL



## Example 4: Termination points
As said before, the scope of a modifier is originally set to the entire sentence either before after a TagObject, as defined by the ItemData's `rule` attribute. However, the scope can be modified by **termination points**, which is another TagObject with the rule **"TERMINATE"**. For example, in "There is no evidence of pneumonia but there is CHF", the negation modifier should modify "pneumonia" but not "CHF". This can be achieved by defining a ConTextItem to terminate at the word "but".

In [24]:
text = "There is no evidence of pneumonia but there is CHF"

In [25]:
item_data1 = [ConTextItem("no evidence of", "DEFINITE_NEGATED_EXISTENCE", "FORWARD")]

In [26]:
context = ConTextComponent(nlp, prune=False)
context.add(item_data1)

In [27]:
doc = nlp(text)
context(doc)

There is no evidence of pneumonia but there is CHF

In [28]:
tag_object = doc._.context_graph.modifiers[0]
tag_object

<TagObject> [no evidence of, DEFINITE_NEGATED_EXISTENCE]

In [29]:
# The scope includes both "pneumonia" and "CHF", so both would be negated
tag_object.scope

pneumonia but there is CHF

In [30]:
# Now add an additional ConTextItem with "TERMINATE"
item_data2 = [ConTextItem("but", "CONJ", "TERMINATE")]

In [31]:
context.add(item_data2)

In [32]:
doc = nlp(text)
context(doc)

There is no evidence of pneumonia but there is CHF

In [33]:
tag_object = doc._.context_graph.modifiers[0]
tag_object

<TagObject> [no evidence of, DEFINITE_NEGATED_EXISTENCE]

In [34]:
# The scope now only encompasses "pneumonia"
tag_object.scope

pneumonia

## Example 5: Pruned modifiers
If two ConTextItems result in TagObjects where one is the substring of another, the modifiers will be pruned to keep **only** the larger span. For example, **"no history of"** is a negation modifier, while **"history of"** is a historical modifier. Both match the text "no history of afib", but only "no history of" should ultimately modify "afib".

By default, prune is set to `True`, but can be set to `False` when initiating the context component, as shown below.

In [35]:
item_data = [ConTextItem("no history of", "DEFINITE_NEGATED_EXISTENCE", "FORWARD"),
            ConTextItem("history", "HISTORICAL", "FORWARD"),
            ]

In [36]:
text = "no history of"

In [37]:
context = ConTextComponent(nlp, prune=False)
context.add(item_data)

In [38]:
doc = nlp(text)
context(doc)

no history of

In [39]:
doc._.context_graph.modifiers

[<TagObject> [no history of, DEFINITE_NEGATED_EXISTENCE],
 <TagObject> [history, HISTORICAL]]

In [40]:
# Now set prune to True
context = ConTextComponent(nlp, prune=True)
context.add(item_data)

In [41]:
doc = nlp(text)
context(doc)

no history of

In [42]:
# Only one modifier is left
doc._.context_graph.modifiers

[<TagObject> [no history of, DEFINITE_NEGATED_EXISTENCE]]

## Example 6: Manually limiting scope
By default, the scope of a modifier is the **entire sentence** in the direction of the rule up until a termination point (see above). However, sometimes this is too much. In long sentences, this can cause a modifier to extend far beyond its location in the sentence. Some modifiers are really meant to be attached to a single concept, but they are instead distributed to all targets.

To fix this, cycontext allows optional attributes in `ItemData` to limit the scope: `max_scope` and `max_targets`. Both attributes are explained below.

### max_targets
Some modifiers should really only attach to a single target. For example, in the sentence below:

**"Pt presents with diabetes, pneumonia vs COPD"**

**"vs"** indicates uncertainty, but *only* between **"pneumonia"** and **"COPD"**. **"Diabetes"** should not be affected. We can achieve this by creating a bidirectional rule with a `max_targets` of **1**. This will limit the number of targets to 1 *on each side* of the tag object.

Let's first see what this looks like *without* defining `max_targets`:

In [43]:
text = "Pt presents with diabetes, pneumonia vs COPD"

In [44]:
doc = nlp(text)
doc.ents = (doc[3:4], doc[5:6], doc[7:8])
doc.ents

(diabetes, pneumonia, COPD)

In [45]:
item = ConTextItem("vs", category="UNCERTAIN",
                           rule="BIDIRECTIONAL", 
                   max_scope=None)

In [46]:
context = ConTextComponent(nlp)
context.add([item])

In [47]:
context(doc)

Pt presents with diabetes, pneumonia vs COPD

In [48]:
visualize_dep(doc)

Now, let's start over and set `max_targets` to **1**:

In [49]:
doc = nlp(text)
doc.ents = (doc[3:4], doc[5:6], doc[7:8])

In [50]:
item = ConTextItem("vs", category="UNCERTAIN",
                           rule="BIDIRECTIONAL", 
                   max_targets=1)

In [51]:
context = ConTextComponent(nlp)
context.add([item])

In [52]:
context(doc)

Pt presents with diabetes, pneumonia vs COPD

In [53]:
visualize_dep(doc)

### max_scope
One limitation of using `max_targets` is that in a sentence like the example above, each concept has to be extracted as an entity in order for it to reduce the scope - if **"pneumonia"** was not extracted, then **"vs"** would still etend as far back as **"diabetes"**. 

We can address this by explicitly setting the scope to be no greater than a certain number of tokens using `max_scope`. For example, lab results may show up in a text document with many individual results:

---
Adenovirus DETECTED<br>
SARS NOT DETECTED<br>
...
Cov HKU1 NOT DETECTED<br>

---

Texts like this are often difficult to parse and they are often not ConText-friendly because many lines can be extracted as a single sentence. By default, a modifier like **"NOT DETECTED"** could extend far back to a concept such as **"Adenovirus"**, which we see returned positive. We may also not explicitly extract every virus tested in the lab, so `max_targets` won't work. 

With text formats like this, we can be fairly certain that **"Not Detected"** will only modify the single concept right before it. We can set `max_scope` to be so **only** a single concept will be modified.

In [54]:
text = """Adenovirus DETECTED Sars NOT DETECTED Pneumonia NOT DETECTED"""

In [55]:
doc = nlp(text)
doc.ents = (doc[0:1], doc[2:3], doc[5:6])
doc.ents

(Adenovirus, Sars, Pneumonia)

In [56]:
assert len(list(doc.sents)) == 1

In [57]:
item_data = [ConTextItem("DETECTED", category="POSITIVE_EXISTENCE",
                           rule="BACKWARD", 
                   max_scope=None),
             ConTextItem("NOT DETECTED", category="DEFINITE_NEGATED_EXISTENCE",
                           rule="BACKWARD", 
                   max_scope=None),
            ]

In [58]:
context = ConTextComponent(nlp)
context.add(item_data)

In [59]:
context(doc)

Adenovirus DETECTED Sars NOT DETECTED Pneumonia NOT DETECTED

In [60]:
visualize_dep(doc)

Let's now set `max_scope`  to 1 and we'll find that only **"pneumonia"** and **"Sars"** are modified by **"NOT DETECTED"**:

In [61]:
doc = nlp(text)
doc.ents = (doc[0:1], doc[2:3], doc[5:6])
doc.ents

(Adenovirus, Sars, Pneumonia)

In [62]:
item_data = [ConTextItem("DETECTED", category="POSITIVE_EXISTENCE",
                           rule="BACKWARD", 
                   max_scope=1),
             ConTextItem("NOT DETECTED", category="DEFINITE_NEGATED_EXISTENCE",
                           rule="BACKWARD", 
                   max_scope=1),
            ]

In [63]:
context = ConTextComponent(nlp)
context.add(item_data)

In [64]:
context(doc)

Adenovirus DETECTED Sars NOT DETECTED Pneumonia NOT DETECTED

In [65]:
visualize_dep(doc)

# Setting additional Span attributes
As seen in an earlier notebook, cycontext registers two new attributes for target Spans: `is_experienced` and `is_current`. These values are set to default values of True and changed if a target is modified by certain modifiers. This logic is set in the variable `DEFAULT_ATTRS`. This is a dictionary which maps modifier category names to the attribute name/value pair which should be set if a target is modified by that modifier type.

In [66]:
from cycontext.context_component import DEFAULT_ATTRS

In [67]:
DEFAULT_ATTRS

{'DEFINITE_NEGATED_EXISTENCE': ('is_experienced', False),
 'FAMILY_HISTORY': ('is_experienced', False),
 'INDICATION': ('is_experienced', False),
 'HISTORICAL': ('is_current', False)}

This logic can be defined by creating a dictionary with the same structure as DEFAULT_ATTRS and passing that in as the `add_attrs` parameter. If setting your own extensions, you must first call `Span.set_extension` on each of the extensions. 

If more complex logic is required, custom attributes can also be set manually outside of the ConTextComponent, for example as a post-processing step.

In [68]:
from spacy.tokens import Span

In [69]:
# Define modifiers and Span attributes
custom_attrs = {'DEFINITE_NEGATED_EXISTENCE': ('is_negated', True),
 'FAMILY_HISTORY': ('is_family_history', True),
}

In [70]:
# Register extensions
Span.set_extension("is_negated", default=False)
Span.set_extension("is_family_history", default=False)

In [71]:
context = ConTextComponent(nlp, add_attrs=custom_attrs)
context.context_attributes_mapping

{'DEFINITE_NEGATED_EXISTENCE': ('is_negated', True),
 'FAMILY_HISTORY': ('is_family_history', True)}

In [72]:
item_data = [ConTextItem("no evidence of", "DEFINITE_NEGATED_EXISTENCE", "FORWARD"),
            ConTextItem("family history", "FAMILY_HISTORY", "FORWARD"),
            ]

context.add(item_data)

In [73]:
doc = nlp("There is no evidence of pneumonia. Family history of diabetes.")

doc.ents = doc[5:6], doc[-2:-1]

doc.ents

(pneumonia, diabetes)

In [74]:
context(doc)

There is no evidence of pneumonia. Family history of diabetes.

The new attributes are now available in `ent._`:

In [75]:
for ent in doc.ents:
    print(ent)
    print("is_negated: ", ent._.is_negated)
    print("is_family_history: ", ent._.is_family_history)
    print()

pneumonia
is_negated:  True
is_family_history:  False

diabetes
is_negated:  False
is_family_history:  True



# Reading and Writing a Knowledge Base
ConTextItems can be saved as JSON and read in, which allows a knowledge base to be reused and scaled. This can be done by the `ConTextItem.from_json` method:

In [76]:
json_filepath = "../kb/pneumonia_modifiers.json"

In [77]:
item_data = ConTextItem.from_json(json_filepath)

In [78]:
for item in item_data[:10]:
    print(item)

ConTextItem(literal='are ruled out', category='DEFINITE_NEGATED_EXISTENCE', pattern=None, rule='BACKWARD')
ConTextItem(literal='be ruled out', category='INDICATION', pattern=None, rule='BACKWARD')
ConTextItem(literal='being ruled out', category='INDICATION', pattern=None, rule='BACKWARD')
ConTextItem(literal='can be ruled out', category='DEFINITE_NEGATED_EXISTENCE', pattern=None, rule='BACKWARD')
ConTextItem(literal='cannot be excluded', category='AMBIVALENT_EXISTENCE', pattern=None, rule='BACKWARD')
ConTextItem(literal='cannot totally be excluded', category='PROBABLE_NEGATED_EXISTENCE', pattern=None, rule='BACKWARD')
ConTextItem(literal='could be ruled out', category='DEFINITE_NEGATED_EXISTENCE', pattern=None, rule='BACKWARD')
ConTextItem(literal='examination', category='INDICATION', pattern=[{'LOWER': {'REGEX': '(examination|exam|study)'}}], rule='BACKWARD')
ConTextItem(literal='free', category='DEFINITE_NEGATED_EXISTENCE', pattern=None, rule='BACKWARD')
ConTextItem(literal='has been

In [79]:
item.to_dict()

{'rule': 'BACKWARD',
 'literal': 'has been ruled out',
 'pattern': None,
 'allowed_types': None,
 'filtered_types': None,
 'category': 'DEFINITE_NEGATED_EXISTENCE',
 'metadata': None}

The items can also be saved as JSON by using the `ConTextItem.to_json` method:

In [80]:
ConTextItem.to_json(item_data[:2], "2_modifiers.json")

In [81]:
import json
with open("2_modifiers.json") as f:
    print(json.load(f))

{'item_data': [{'rule': 'BACKWARD', 'literal': 'are ruled out', 'pattern': None, 'allowed_types': None, 'filtered_types': None, 'category': 'DEFINITE_NEGATED_EXISTENCE', 'metadata': None}, {'rule': 'BACKWARD', 'literal': 'be ruled out', 'pattern': None, 'allowed_types': None, 'filtered_types': None, 'category': 'INDICATION', 'metadata': None}]}
