## Exercise 1:
In class we have gone through a rule-based NLP pipeline by executing components one by one. Typically, we create a pipeline as a single class that links all modules.

In this exercise, you will need to write a pipeline class. Let's call it **MyPipe**, it can be initiated through taking a set of rules. After that, it can be called through a **process** function, which can take a document text, and output a set of values: 
- document level classification
- context document, which is an object of type pyConTextGraph.ConTextDocument;
- annotations (in dataframe format);
- relations (in dataframe format).

In [1]:
# state your import here
import re
from PyRuSH.RuSH import RuSH
from pyConTextNLP import pyConTextGraph
from pyConTextNLP.utils import get_document_markups


from DocumentClassifier import FeatureInferencer
from DocumentClassifier import DocumentInferencer
from nlp_pneumonia_utils import markup_sentence
from itemData import get_item_data
from visual import convertMarkups2DF

In [12]:
# begin to define MyPipe class
class MyPipe:
    def __init__(self, sentence_rules, target_rules, context_rules, feature_inference_rule, document_inference_rule):
        # initiate necessary components here
        self.sentence_segmenter = RuSH(sentence_rules)
        self.feature_inferencer=FeatureInferencer(feature_inference_rule)
        self.document_inferencer = DocumentInferencer(document_inference_rule)
        self.targets=get_item_data(target_rules)
        self.modifiers=get_item_data(context_rules)
    
    def process(self, doc_text):        
        #process your input doc_text, return the required results
        sentences = self.sentence_segmenter.segToSentenceSpans(doc_text)
        context_doc = pyConTextGraph.ConTextDocument()
        
        for sentence in sentences:
            sentence_text=doc_text[sentence.begin:sentence.end].lower()
            # Process every sentence by adding markup
            m = markup_sentence(sentence_text, modifiers=self.modifiers, targets=self.targets)
            context_doc.addMarkup(m)
            context_doc.getSectionMarkups()
            #print(m)
            
        markups = get_document_markups(context_doc)    
        annotations, relations, doc_txt = convertMarkups2DF(markups) 
        
        inferenced_types = self.feature_inferencer.process(annotations, relations)
        doc_class = self.document_inferencer.process(inferenced_types)
        
        
         
        return doc_class, context_doc, annotations, relations

Once your pipeline class is defined, you can use it multiple times for different set of rules.

## Exercise 2:
Now you can select documents from the MIMIC database (limit to 5 documents that contain your target concept), write a script to process all of them, and output a dictionary which uses document name as keys and document level classification as values.

In [13]:
import pymysql
import pandas as pd
import getpass
conn = pymysql.connect(host="mysql",
                       port=3306,user="jovyan",
                       passwd=getpass.getpass("Enter MySQL passwd for jovyan"),db='mimic2')

Enter MySQL passwd for jovyan········


In [14]:
doc_text = pd.read_sql("SELECT subject_id, text from noteevents   where text like '% fever %' order by rand() limit  5",conn)
doc_text

Unnamed: 0,subject_id,text
0,7440,\n\n\n DATE: [**3456-3-5**] 4:06 PM\n ...
1,27231,\n\n\n DATE: [**2887-8-22**] 1:42 AM\n ...
2,1604,\n\n\n DATE: [**2883-6-15**] 12:54 PM\n ...
3,9969,\n\n\n DATE: [**2926-7-17**] 11:38 PM\n ...
4,21857,"\ntrach, fever\nsicu npn (in ccu)\no- t max 10..."


In [15]:
for index, row in doc_text.iterrows():
    print(index, row[0], row[1])

0 7440 


     DATE: [**3456-3-5**] 4:06 PM
     CT CHEST W/CONTRAST; CT ABDOMEN W/CONTRAST                      Clip # [**Clip Number (Radiology) 14506**]
     CT PELVIS W/CONTRAST; CT 150CC NONIONIC CONTRAST
     Reason: Please eval for empyema, abscess, loculated infections. Note
     Admitting Diagnosis: PNEUMONIA;TELEMETRY
     ______________________________________________________________________________
     UNDERLYING MEDICAL CONDITION:
      74 year old man with ESRD on HD, dementia with presumed aspiration pneumonia 
      with RML and RLL collapse and mental status changes, now with new and 
      persistently elevated WBC.
     REASON FOR THIS EXAMINATION:
      Please eval for empyema, abscess, loculated infections. Note that pt has ESRD 
      but is on HD and per renal fellow he may receive contrast.
     No contraindications for IV contrast
     ______________________________________________________________________________
                                     FINAL REPO

In [16]:
# configure your rules 
# because there are too many sentence segmentation rules, let's read them from an external file
sentence_rules='KB/rush_rules.tsv'
# you can point target_rules to a file path instead, if there are many rules
target_rules='''
Comments: ''
Direction: ''
Lex: fever
Regex: ''
Type: FEVER
---
Comments: ''
Direction: ''
Lex: high temperature
Regex: '1\d\d\.\d F'
Type: FEVER'''
# context rules are often lengthy, you can point context_rules to an external rule files instead
context_rules='''Comments: ''
Direction: forward
Lex: 'no'
Regex: ''
Type: DEFINITE_NEGATED_EXISTENCE
---
Comments: ''
Direction: forward
Lex: 'denies'
Regex: ''
Type: DEFINITE_NEGATED_EXISTENCE
'''
# define the feature inference rule
feature_inference_rule='''
#Conclusion type, Evidence type, Modifier values associated with the evidence
NEGATED_CONCEPT,FEVER,DEFINITE_NEGATED_EXISTENCE
'''
# define the document inference rule
document_inference_rule='''
#Conclusion Type at document level, Evidence type at mention level
FEVER_DOC,FEVER
        
#Default document type
NO_FEVER
'''

In [17]:
#initiate an instance of MyPipe
myPipe=MyPipe(sentence_rules, target_rules, context_rules, feature_inference_rule, document_inference_rule)

In [35]:
# write you code here, fill the results by processing each document through MyPipe
results=dict()  # this dictionary will contain document names as keys and a document-level classification as values.
for index, row in doc_text.iterrows():
    row[1].replace("\n", "")
    results[index] = myPipe.process(row[1])

In [37]:
print(results[1])

('fever_doc', __________________________________________
,   markup_id vis_category  start   end    txt   type
0        T0       Target    8.0  13.0  fever  fever
1        T1       Target   83.0  88.0  fever  fever, Empty DataFrame
Columns: [relation_id, type, arg1_cate, arg1_id, arg2_cate, arg2_id]
Index: [])


## Exercise 3:

Now you get the results, but how can you be sure if they are correct? Let's dig a little deeper to visualize them. 

Hint: **view_pycontext_output** can take in a list of ConTextDocuments and visualize them.


In [28]:
from visual import view_pycontext_output
from visual import Vis

In [42]:
#get the context documents that you created in Exercise 2.
var = results[2]
var1 = var[1]
context_docs=dict()


In [41]:
#visualize all documents
view_pycontext_output(var1)

Change the file name to contain your UNID.

In [25]:
view_pycontext_output(context_docs, Vis(file_name="u0420717.html"))

No documents to view.


***Please make a screenshot of the visualization file and submit it as homework assignment.***