# How does the pyConText work

Before we explain its processing mechanism, let's learn a few important concepts.
## The information model

The information model is an abstraction and representation of concepts (a formal definition can be found at [Terminology for Policy-Based Management](https://tools.ietf.org/html/rfc3198)). In pyConText, we set up a simple information model to represent the concepts we are looking for, which includes two components: targets and modifiers.

* A **target** is the component of this IM to describe the core information of the concept. For instance, *"breast cancer"* in "brother- breast CA."

* A ** modifier** is the component to describe a certain property of a target. For instance, *"brother"* in "brother- breast CA."


**Question**: Why we don't represent this concept by just using *"brother breast CA"* without separating the target and the modifier?


## Three types of predefined modifiers in pyConText

* **Negation**: whether a target is negated or not, e.g. "no *masses*".
* **Historical**: whether the concept is a historical (e.g., "a remote history of *diverticulitis* in the 70s"), present(e.g., "found by EMS at scene *unresponsive*"), or hypothetical (e.g., "if the *pain* exacerbated").  
    Note: The meaning of "present" by a physician is different from what we normally say "present."
* **Nonpatient**: whether the concept is referring to the patient or not, e.g. "Sister with *breast cancer*"





##  A typical pyConText rule
The pyConText rule file can be found at [pneumonia_modifiers.yml](https://raw.githubusercontent.com/chapmanbe/pyConTextNLP/master/KB/pneumonia_modifiers.yml)  

A typical pyConText rule has four elements,     For instance: 

![a screenshot of modifier rule file in yml format](img/rule.png)
    


The four elements are:

1) The lexicon (e.g. "no evidence of")  
2) The type (e.g. "DEFINITE_NEGATED_EXISTENCE")  
3) The regular expression (optional) used to capture the literal in the text. If no regular expression is provided, a regular expression is generated literally from the literal.  
4) The direction states to which direction that the modifier operates in the sentence: current valid values are: "forward", the item can modify objects following it in the sentence; "backward", the item can modify objects preceding it in the sentence; or "bidirectional", the item can modify objects preceding and following it in the sentence. 

## How does the pyConText work --- a simple explanation

The pyConText will first *locate* a target term, and then *look around* it to see if there is any context clue that matches the context lexicon in the pyConText rule. If there is, pyConText will mark the clue with the context type of that rule. 

### Negation example:

Let's use the above rule as the example:

<img src="img/context.png" alt="visualize negation" style="width:290px;"/>

# Play with pyConText

In [1]:
import pyConTextNLP.pyConText as pyConText
# itemData has been rewritten, so that it can take relative local path, where you can redirect it to your customized yml files later
import itemData
# The following code is not included in the original pyConText, make sure you set it up correctly when you reuse it somewhere else.
# ---Need to copy visual.py and tmp directory.
from visual import Vis, view_pycontext_output
modifiers=itemData.get_items('https://raw.githubusercontent.com/chapmanbe/pyConTextNLP/master/KB/pneumonia_modifiers.yml')
targets = itemData.get_items("https://raw.githubusercontent.com/chapmanbe/pyConTextNLP/master/KB/pneumonia_targets.yml")

  context_items =  [contextItem((d["Lex"],


## Define pyContext markup functions
The function *markup_sentence* is directly from the example on [pyConTextNLP](https://github.com/chapmanbe/pyConTextNLP/blob/master/notebooks/MultiSentenceDocuments.ipynb)

In [32]:
def markup_sentence(s, modifiers, targets, prune_inactive=True):
    """
    """
    markup = pyConText.ConTextMarkup()
    markup.setRawText(s)
    markup.cleanText()
    markup.markItems(modifiers, mode="modifier")
    markup.markItems(targets, mode="target")
    markup.pruneMarks()
    markup.dropMarks('Exclusion')
    # apply modifiers to any targets within the modifiers scope
    markup.applyModifiers()
    markup.pruneSelfModifyingRelationships()
    if prune_inactive:
        markup.dropInactiveModifiers()
    return markup

The function *markup_doc* is a simplified document markup function, where you need to replace with NLTK's sentence splitter later.

In [17]:
def markup_doc(doc_text:str):
    rslts=[]
    context = pyConText.ConTextDocument()
    for s in doc_text.split('.'):
        m = markup_sentence(s, modifiers=modifiers, targets=targets)
        context.addMarkup(m)
    return context

Now process an example document.

In [18]:
doc_text='No visible infiltrate. No evidence of pneumonia on X-ray.'

In [19]:
context=markup_doc(doc_text)

Let's visualize what have been marked up in the example above.

In [20]:
vis=Vis('https://raw.githubusercontent.com/chapmanbe/pyConTextNLP/master/KB/pneumonia_modifiers.yml')

In [21]:
view_pycontext_output(context,vis)

# Customize your pyConText for temperature identification

Now you need to customize your version of pyConText to identify temperature mentions in clinical notes. Think about the information model, which way will be easier to set up the targets and modifiers.

## create your targets.yml
Create your targets.yml under the ['KB' directory](KB) following the example of:

https://raw.githubusercontent.com/chapmanbe/pyConTextNLP/master/KB/pneumonia_targets.yml

Make sure there isn't a leftover '---' at the end of your file.

In [22]:
# Test your file to see if it is correctly formated
my_targets=itemData.get_items('KB/targets.yml')

## create your modifiers.yml

Create your modifiers.yml under the ['KB' directory](KB) following the example of:

https://raw.githubusercontent.com/chapmanbe/pyConTextNLP/master/KB/pneumonia_modifiers.yml



In [23]:
my_modifiers=itemData.get_items('KB/modifiers.yml')

## Revise the *markup_doc* function
You will want to use NLTK's sentence spliter instead of split on periods.

In [24]:
from nltk.tokenize import sent_tokenize
def markup_doc(doc_text:str):
    rslts=[]
    context = pyConText.ConTextDocument()
    for s in sent_tokenize(doc_text):
        m = markup_sentence(s, modifiers=my_modifiers, targets=my_targets)
        context.addMarkup(m)
    return context
vis=Vis('KB/modifiers.yml')

## Test your rules
Now test your new rules on the following test cases, see if they behave as expected.

### Test case 1

In [25]:
doc_text='He had a temp up to 102.5 rectally.'
context=markup_doc(doc_text)
view_pycontext_output(context,vis,1)

### Test case 2

In [26]:
doc_text='Hct 36-39 and has been stable here. '
context=markup_doc(doc_text)
view_pycontext_output(context,vis,2)
# no temperature mentions

### Test case 3

In [27]:
doc_text='Vitals: 38.5 C BP 118/72 HR 103 R 40.'
context=markup_doc(doc_text)
view_pycontext_output(context,vis,3)
# 38.5

### Test case 4

In [28]:
doc_text='Currently drop to 37.5 F.'
context=markup_doc(doc_text)
view_pycontext_output(context,vis,4)
# 104.5

### Test case 5

In [34]:
doc_text='''vital signs 
were 52, 218/109, T:38 Celsius, O2 Sat 100% on 100% FiO2'''
context=markup_doc(doc_text)
view_pycontext_output(context,vis,5)
# 38