# How does the pyConText work

Before we explain its processing mechanism, let's learn a few important concepts.



## 1. Information model

An information model is an abstraction and representation of concepts (a formal definition can be found at [Terminology for Policy-Based Management](https://tools.ietf.org/html/rfc3198)). In pyConText, we set up a simple information model to represent the concepts we are looking for, which includes two components: targets and modifiers.

Consider the following sentence:
*"Family history: mother breast CA."*

* A **target** is the component of this information model to describe the core information of the concept. For instance: *"breast cancer"*.

* A ** modifier** is the component to describe a certain property of a target. For instance: *"mother"* .


**Question**: Why we don't represent the concept of *"family history of breast cancer"* by just using *"mother breast CA"* without separating the target and the modifier?


## 2. Three types of modifiers in pyConText

pyConText is a python implementation of ConText algorithm.

ConText determines the values for three contextual properties of a clinical condition: Negation, Temporality, and Experiencer. 
- The contextual property **Negation** specifies the status of the clinical existence of a condition. The default value of this property is ***affirmed***. If a clinical condition occurs within the scope of a trigger term for negation, ConText will change the default value to ***negated***. For example, in the sentence “The patient denies any nausea,” the value of Negation for the condition “nausea” will be negated.
- The contextual property **Temporality** places a condition along a simple time line. The default value of Temporality is ***recent***. Given appropriate trigger terms, ConText can change the value of this property to either ***historical*** or ***hypothetical***. The value hypothetical covers all conditions that temporally are neither recent nor historical. A typical example of a hypothetical condition would be “fever” in a sentence such as “Patient should return if she develops fever.”
- Finally, the contextual property **Experiencer** describes whether the patient or someone else experiences the condition. The default value is ***patient***, which, in the presence of a trigger term, can be changed to ***other***. For example, in the sentence “The patient's father has a history of CHF”, the value of Experiencer for the condition CHF is other. 

## 3. A typical pyConText rule
The pyConText rule file can be found at [KB/fam_bc_modifiers.yml](KB/fam_bc_modifiers.yml)  

A typical pyConText rule has four elements,     For instance: 
![a screenshot of modifier rule file in yml format](img/snapshot2.png)
    
The four elements are:

1) The lexicon (e.g. "can be ruled out")  
2) The type (e.g. "DEFINITE_NEGATED_EXISTENCE")  
3) The regular expression (optional) used to capture the literal in the text. If no regular expression is provided, a regular expression is generated literally from the literal.  
4) The direction states to which direction that the modifier operates in the sentence: current valid values are: "forward", the item can modify objects following it in the sentence; "backward", the item can modify objects preceding it in the sentence; or "bidirectional", the item can modify objects preceding and following it in the sentence. 

 

## 4. How does the pyConText work --- a simple explanation

The pyConText will first *locate* a target term, and then *look around* it to see if there is any context clue that matches the context lexicon in the pyConText rule. If there is, pyConText will mark the clue with the context type of that rule. 

### 4.1 Negation example:

Let's use the above rule as the example:

![an example visualization of pyConText](img/snapshot7.png)

As you can see, "can be ruled out" is identifed and linked to the target "breast cancer." The "dne" is the first character of each word in "DEFINITE_NEGATED_EXISTENCE."


### 4.2 Historical example

Here is an example rule to identify historical context:

![an example visualization of pyConText](img/snapshot9.png)

This rule uses a simple regular expression <span style="color:darkred">'\b\d+ years ago'</span> to express the clue related 'x years ago', where 'x' can be any positive number. For example, '20 years ago' can be identified as below:



![an example visualization of pyConText](img/snapshot8.png)

'his' is the first three characters of "HISTORICAL."

### 4.3 Nonpatient example

By default, any concept mentioned in clinical text is referring to the patient unless we find a none patient context clue. For this task, we are targeting the family history, so we need to make some context rules to identify the family related context. For example:
![an example visualization of pyConText](img/snapshot10.png)

When executing pyConText, the word "sister" is picked up as the "FAMILY" context for the target term "breast cancer":

![an example visualization of pyConText](img/snapshot3.png)


### 4.4 Read more:

The actual mechanism is much more complicated than this simple explanation. More detailed information can be found in this paper:

> Harkema H, Dowling JN, Thornblade T, Chapman WW. [ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports](http://www.ncbi.nlm.nih.gov/pubmed/19435614). J Biomed Inform 2009;42(5):839-851) 

## 5. pyConText Playground
<font color='darkblue'><p>Feel free to make up some examples and try it yourself to see what can be produced by pyConText. Here is a playground for you. The cell below is to set up everything, let's ignore for now what DocumentClassifier does. You are welcome to explore the code on your own.

In [None]:
# ignore everything inside here, we will explain later
from DocumentClassifier import DocumentClassifier
from visual import Vis, view_pycontext_output
pos_doc_type='FAM_BREAST_CA_DOC'
TARGETS_FILE_PATH = 'KB/fam_bc_targets.yml'
MODIFIERS_FILE_PATH = 'KB/fam_bc_modifiers.yml'
FEATURE_INFERENCER_FILE_PATH = 'KB/fam_bc_featurer_inferences.csv'
DOC_INFERENCER_FILE_PATH = 'KB/fam_bc_doc_inferences.csv'

# clear just in case files/regular expressions have been updated
vis = Vis(context_file=MODIFIERS_FILE_PATH, file_name="context_graph.html")
classifier = DocumentClassifier(TARGETS_FILE_PATH, MODIFIERS_FILE_PATH,
                            FEATURE_INFERENCER_FILE_PATH, DOC_INFERENCER_FILE_PATH,
                            {pos_doc_type})
classifier.reset_saved_predictions()

<font color='darkblue'><p> Try different input string (str), see what happens

In [None]:
# This is your input string, just make sure the target term 'breast cancer' is included.
str = '''mother does not have breast cancer'''
classifier.predict(str)

view_pycontext_output(classifier.get_last_context_doc(), vis)

## 6. Export Context annotations to csv

<font color='darkblue'><p>Let's try to export the context annotations into csv format. Because the original context markups are stored as graphs, and not easy to parse, you may want to reuse the "convertMarkups2DF" function in "visual.py", which makes this task must easier.

In [None]:
from pyConTextNLP.utils import get_document_markups
from visual import convertMarkups2DF
import csv

<font color='darkblue'><p>Let's read the latest markups from classifier's cache， and convert them into pandas dataframe.

In [None]:
markups=get_document_markups(classifier.get_last_context_doc())
annotations, relations, doc_txt=convertMarkups2DF(markups)


<font color='darkblue'><p>See what's inside "annotations" dataframe:

In [None]:
annotations

<font color='darkblue'><p>And what's inside "relations" dataframe:

In [None]:
relations

<font color='darkblue'><p>Now we combine these two dataframes into a dictionary, in which modifiers are attached to their corresponding target concepts.

In [None]:
concepts=dict()
# read all the target concepts
for index, row in annotations.iterrows():
    if row['vis_category']=='Target':
        concepts[row['markup_id']]=[row['type'], int(row['start']) ,int(row['end']),row['txt']]

# attach related modifiers
for index, row in relations.iterrows():
    if row['arg2_cate']=='Target':
        concepts[row['arg2_id']].append(row['type'])    
    

In [None]:
concepts

<font color='darkblue'><p>Write the dictionary values into a csv file:

In [None]:
with open('tmp/output.csv', 'w', newline='') as csvfile:
    spamwriter = csv.writer(csvfile)
    spamwriter.writerows(list(concepts.values()))

Now a new file has been created. Open the folder and review output.csv file.

### Have quiestions? Please ask!