# Extract the temperature values

In the previous notebook, we learned how pyConText works and how to customize the rules.

In this notebook, we will learn:
* how to inspect the packages developed by other people. 
* how to use your code to extract the pyConText processed results.

## Inspect pyConText output
On pyConText github project, there isn't sufficient documentation of pyConText's API. This is common in less popular packages, especially in research projects. To make use of it, we need to inspect the code closer, starting from the output.

In [1]:
import pyConTextNLP.pyConText as pyConText
# itemData has been rewritten, so that it can take relative local path, where you can redirect it to your customized yml files later
import itemData

In [2]:
my_targets=itemData.get_items('KB/targets.yml')
my_modifiers=itemData.get_items('KB/modifiers.yml')

Copy over the functions: *markup_sentence* and *markup_doc* (your revised version) from previous notebook below. 

In [3]:
def markup_sentence(s, modifiers, targets, prune_inactive=True):
    """
    """
    markup = pyConText.ConTextMarkup()
    markup.setRawText(s)
    markup.cleanText()
    markup.markItems(modifiers, mode="modifier")
    markup.markItems(targets, mode="target")
#     drop this line below---it may cause unpredicted edge drop errors
#     markup.pruneMarks()
    markup.dropMarks('Exclusion')
    # apply modifiers to any targets within the modifiers scope
    markup.applyModifiers()
    markup.pruneSelfModifyingRelationships()
    if prune_inactive:
        markup.dropInactiveModifiers()
    return markup

In [4]:
from nltk.tokenize import sent_tokenize
def markup_doc(doc_text:str):
    rslts=[]
    context = pyConText.ConTextDocument()
    for s in sent_tokenize(doc_text):
        m = markup_sentence(s, modifiers=my_modifiers, targets=my_targets)
        context.addMarkup(m)
    return context


Now process the same example below.

In [5]:
doc_text='''vital signs 
were 52, 218/109, T:38 Celsius, O2 Sat 100% on 100% FiO2'''

In [6]:
context=markup_doc(doc_text)

To find out what type of the output variable is, we use

In [7]:
type(context)

pyConTextNLP.pyConText.ConTextDocument

This is a python class. Now we can go back to the source code to find what methods and variables are available in this class.

https://github.com/chapmanbe/pyConTextNLP/blob/master/pyConTextNLP/pyConText.py

Of course, you will find a lot of methods and variables. To find the one you need, you can:
1. make your best guess to see which one likely can generate the data you want.
2. try to find out if this class is used somewhere else, and learn how it is used.

Hint for option 2: you can start from either [visual.py](visual.py) or https://github.com/chapmanbe/pyConTextNLP/blob/master/pyConTextNLP/display/html.py 

## Task 1: Read identified targets and modifiers

Let's define a function *get_output*, which takes a ConTextDocument object as input, and output the targets and modifers.

Make sure you preserve the linkage between the targets and corresponding modifiers. You can use any **data structure** (list, dictionary, etc.) you prefer to store the information.

However, please consider the next step that you need to check each target to see if it is modified by a specific modifier, a proper data structure can speed up the process in task 2.

In [25]:
## your code goes here
def get_output(context:pyConText.ConTextDocument)->dict:
    output=dict()
    dg=context.getDocumentGraph()
#     print(dg)
    for s_node, e_node in dg.edges(): 
        category=s_node.getCategory()[0]
        if category not in output:
            output[category]=set()
        output[category].add(e_node)
    return output

In [26]:
# Test your code:
get_output(context)

{'degree': {<id> 37120452511672786194331496608650436368 </id> <phrase> 38 </phrase> <category> ['number'] </category> },
 'terminate': {<id> 37119049380914658572912714945289985808 </id> <phrase> T: </phrase> <category> ['degree'] </category> ,
  <id> 37119184861072557964929999905445060368 </id> <phrase> T </phrase> <category> ['degree'] </category> ,
  <id> 37118644525004210682147611935703768848 </id> <phrase> Celsius </phrase> <category> ['degree'] </category> }}

## Task 2: Extract the numeric value of identifed temperature

Now you have the extracted the targets and modifiers, find out the temperature values. You need to complete a function that take in a document, and output a list of [float]. (https://docs.python.org/3/library/functions.html#float) In your future project, you can build features on this numeric values.

In [27]:
from typing import List
def extract_temperature(doc_text:str)->List[float]:
#  your code goes here    
    context=markup_doc(doc_text)
    output=get_output(context)
    values=[float(v.getPhrase()) for v in output["degree"]]
    return values

In [28]:
### Test your code
extract_temperature(doc_text)

[38.0]

## In fact, you can write a much simpler code which process the two tasks above in one line
This is optional.

In [29]:
from typing import List
def extract_temperature2(doc_text:str)->List[float]:
#  your code goes here   
    return [float(v.getPhrase()) for v in set(k[1] for k in markup_doc(doc_text).getDocumentGraph().edges() if 'degree'==k[0].getCategory()[0])]

In [30]:
## test your code
extract_temperature2(doc_text)

[38.0]

## Now run your function over the corpus

Execute your *extract_temperature* or *extract_temperature2* over the corpus (unzip the files). You should return a dictionary with document names as keys and list of extracted temperature values as the values.

In [31]:
# your code goes here
from pathlib import Path
{f.name: extract_temperature2(f.read_text()) for f in Path('data').glob('*.txt')}

{'case_10_PULSE_OX.txt': [35.6, 36.4],
 'case_11_SP02.txt': [37.6, 37.7],
 'case_13_Saturation.txt': [35.8],
 'case_14_Sat.txt': [96.8, 97.1],
 'case_14_Saturation.txt': [],
 'case_15_Sat.txt': [99.7, 37.6, 37.6],
 'case_16_Sat.txt': [36.4, 36.4],
 'case_16_Saturation.txt': [],
 'case_17_PULSE_OX.txt': [37.8, 37.8, 95.7],
 'case_17_Saturation.txt': [],
 'case_18_SP02.txt': [38.1, 38.1],
 'case_18_Sat.txt': [],
 'case_18_Saturation.txt': [],
 'case_19_PULSE_OX.txt': [36.6, 36.6],
 'case_19_Sat.txt': [35.7, 36.4, 97.5],
 'case_19_Saturation.txt': [],
 'case_1_SP02.txt': [37.1, 36.8],
 'case_1_Sat.txt': [],
 'case_20_SP02.txt': [37.5, 37.3],
 'case_20_Sat.txt': [],
 'case_21_PULSE_OX.txt': [35.9, 35.9],
 'case_22_SP02.txt': [98.0, 97.3, 38.1, 38.1],
 'case_23_SP02.txt': [38.9, 99.5, 102.0, 38.9],
 'case_23_Saturation.txt': [98.8],
 'case_24_SP02.txt': [98.0, 37.0, 37.0],
 'case_24_Sat.txt': [98.4],
 'case_25_PULSE_OX.txt': [35.9, 35.9, 96.6, 35.9, 102.0, 35.9],
 'case_25_SP02.txt': [101.7

['data/case_60_Saturation.txt'](data/case_42_PULSE_OX.txt)

In [15]:
from visual import Vis, view_pycontext_output

In [16]:
vis=Vis('KB/modifiers.yml')

In [40]:
my_modifiers=itemData.get_items('KB/modifiers.yml')
view_pycontext_output(markup_doc(Path('data/case_10_PULSE_OX.txt').read_text()),vis)