# Implementation

System design outlines the main steps in the system processing. The same design can be implemented in multiple ways.

### Consider the following use case
We are tasked with finding possible symptoms of depression and labeling them with Negation context. 

In order to simplify future implementations as suggested by [Leidner 2003](Leidner_2003.pdf), we can utilize a framework. Framework (as opposed to toolkit - think NLTK) does not necessarily provide specific algorithms, but rather simplifies working with different algorithms by specifying data formats.

This workbook illustrates a simple framework called [pipeUtils](pipeUtils_doc.html) that directs how data is represented during processing. 

The framwork has two main classes: Annotation and Document.


In [1]:
from pipeUtils import Annotation
from pipeUtils import Document

Document is a container for all data related to a specific text passage. In its simple implementation, a document text can either be specified as a string:
    
    from pipeUtils import Document
    doc = Document(text='This is the document text')
    
or loaded from a text file:

In [2]:
doc = Document()
doc.load_document_from_file('data/test.txt')

Since we are working with brat annotation tool, our framework has functionality to read brat annotations into the document. Each annotation becomes an object of Annotation class.

In [3]:
doc.load_annotations_from_brat('data/test.ann')
print(doc.toString())

test.txt
-------
General Medical Clinic
05/28/2010 13:00


CC
Follow up depression.

Subjective

Depression
The pt indicates Citalopram is helping control her depression symptoms but she continues to feel depressed most days.  Her sleep and fatigue have improved significantly with use of Citalopram.  She denies suicidal ideation.  Her PHQ-9 score is 18 today.

Hypertension
No Light-headedness.  The pt reports compliance with use of lisinopril and metoprolol.  She has been on these two medications for several years and has never used any other antihypertensive medications in her life.  She has not been checking her BP at home.

Osteoarthritis
Knee pain is well controlled currently.  No knee pain today.

Coronary Artery Disease
No angina.  No dyspnea.


Allergies
NKDA

PMH
Depression
Hypertension
Iron deficiency anemia
Osteoarthritis
Coronary Artery Disease
Hyperlipidemia

PSurgHx
None 

FamHx
None significant

SocHx
Lifetime non-user of tobacco.
Drinks alcohol rarely.
Has 5 adult chil

Now the document is loaded and the reference standard annotations are added.
Let's add the processing logic.

First, define target and negation keywords

In [None]:
import re

target_regexes = []
regexes = ['pain',
          'depres\\w+', #matches "depres" and one or more alphanumerics. Double slash needed inside a string
          'suicidal\\s*ideation'  #\\s* means none or more white-space characters including new line
          ]
for reg in regexes:
    target_regexes.append(re.compile(reg, re.IGNORECASE))
    
neg_regex = '(\\bno\\b|denies)'   # \b matches word boundaries  #matchs "no", as separate word, OR "denies"

Second, search the document for the target keywords. 
If a target keyword is found, look at text right before the target keyword  (back 30 characters) and search for negation keyword.

In [None]:
ann_index=0
for reg in target_regexes:
    for match in reg.finditer(doc.text):
        ann_id = 'NLP_'+ str(ann_index)
        ann_index=ann_index+1
        new_annotation = Annotation(start_index=int(match.start()), 
                                    end_index=int(match.end()), 
                                    type='DepressionSymptoms',
                                    ann_id = ann_id
                                    )
        new_annotation.spanned_text = doc.text[new_annotation.start_index:new_annotation.end_index]
        
        # Check negation right before the found target up to 30 charachers before, 
        # making sure that the pre-text does not cross the text boundary and is valid
        
        if new_annotation.start_index - 30 > 0:
            pre_text_start = new_annotation.start_index - 30
        else:
            pre_text_start = 0
        
        # ending index of the pre_text is the beginning of the found target    
        pre_text_end = new_annotation.start_index    
        
        # substring the document text to identify the pre_text string
        pre_text = doc.text[pre_text_start: pre_text_end]
        
        # We do not need to know the exact location of the negation keyword, so re.search is acceptable
        if re.search(neg_regex, pre_text , re.IGNORECASE):
            new_annotation.attributes["Negation"] ='Negated'
        doc.annotations.append(new_annotation)


Now let's see what annotations are included in the document after processing

In [None]:
print(doc.toString())

### End of regex implementation