# Week 6 Exercises

_McKinney 6.1_

There are multiple ways to solve the problems below.  You can use any one of several approaches.  For example, you can read CSV files using Pandas or the csv module.  Your score won't depend on which modules you choose to use unless explicitly noted below, but your programming style will still matter.

### 30.1 List of Allergies

In the /data directory on the Jupyter server, there is a file called `allergies.json` that contains a list of patient allergies.  It is taken from sample data provided by the EHR vendor, Epic, here: https://open.epic.com/Clinical/Allergy

Take some time to look at the structure of the file.  You can open it directly in Jupyter by clicking the _Home_ icon, then the _from_instructor_ folder, and then the _data_ folder.

Within the file, you'll see that it is a dictionary with many items in it.  One of those items is called `entry` and that item is a list of things.  You can tell that because the item name is immediately followed by an opening square bracket, signifying the start of a list.  It's line 11 of the file: `  "entry": [`

Write a function named `allergy_count(json_file)` that takes as one parameter the name of the JSON file and returns an integer number of entries in that file.  Your function should open the file, read the json into a Python object, and return how many items there are in the list of `entry`s.

In [1]:
import json
from pathlib import Path
HOME = str(Path.home())

ALLERGIES_FILE="/data/allergies.json"

In [2]:
### BEGIN SOLUTION
def allergy_count(json_file):
    """(file name) -> integer
    
    Input parameter is a json file.  The function opens the file, reads the file into a dictionary, 
    and counts how many entries are under dictionry key "entry".
    
    """
    f=open(json_file)
    data=json.load(f)
    
    return len(data['entry'])
### END SOLUTION

In [3]:
allergy_count(ALLERGIES_FILE)

4

In [4]:
assert type(allergy_count(ALLERGIES_FILE)) == int
assert allergy_count(ALLERGIES_FILE) == 4

### 30.2 Number of Patients

If you dig a little bit deaper into this list of allergies, you'll see that each result has a patient associated with it.  Create a funcation called `patient_count(json_file)` that will count how many unique patients we have in this JSON structure.  

In [5]:
### BEGIN SOLUTION

def patient_count(json_file):
    """(file name) -> integer
    
    Input parameter is a json file.  The function opens the file, reads the file into a dictionary, 
    and counts how many unique patient names are under dictionry key "entry".
    
    """
    
    
    f=open(json_file)
    data=json.load(f)
    
    name=set()
    
    for item in data['entry']:
        name.add(item['resource']['patient']['display'])
       
    return len(name)  

### END SOLUTION

In [6]:
patient_count(ALLERGIES_FILE)

2

### 30.3 How Many Allergies per Patient

Although each entry is a separate allergy, several of them are for the same patient.  Write a function called `allergy_per_patient(json_file)` that counts up how many allergies each patient has.


In [7]:
### BEGIN SOLUTION

def allergy_per_patient(json_file):
    """(file name)->dictionary
    
    Input parameter is a json file.  The function opens the file, reads the file into a dictionary, 
    and return a dictionary of which the key is patient name, and value is number of allergies the patient has.
    
    
    """
    f=open(json_file)
    data=json.load(f)
    
    allergy={}
    
    for item in data['entry']:
        if item['resource']['patient']['display'] not in allergy:
            allergy[item['resource']['patient']['display']]=1
        else:
            allergy[item['resource']['patient']['display']]+=1
    
    return allergy

### END SOLUTION

In [8]:
allergy_per_patient(ALLERGIES_FILE)

{'Jason Argonaut': 3, 'Paul Boal': 1}

### 30.4 Patient Allergies and Reaction

You'll see in the file that each of the items in the `entry` list have several other attributes including a patient name, substance text representation, and a reaction manifestation.  Create a function named `allergy_list(json_file)` that will create an output list that has patient name, allergy, and reaction for each `entry`.  The actual result you should get will be:

```python
[['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis']]
```

You'll notice that the reaction and the manifestation of that action are lists.  You only need to capture the first reaction and the first manifestation of the action.  That is, if there is a list of things, just output the first one.

In [9]:
import json

### BEGIN SOLUTION

def allergy_list(json_file):
    """(file name)->list
    
    Input parameter is a json file.  The function opens the file, reads the file into a dictionary, 
    and return a list of list.  Each list contains patient name, allergy substance and reaction.
    
    
    """
    f=open(json_file)
    data=json.load(f)
    
    allergy_list=[]
    
    for item in data['entry']:
        allergy_list.append([item['resource']['patient']['display'], item['resource']['substance']['text'], 
                             item['resource']['reaction'][0]["manifestation"][0]['text']])
    
    return allergy_list
### END SOLUTION

In [10]:
allergy_list(ALLERGIES_FILE)

[['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis']]

In [11]:
output=[['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis']]

assert allergy_list(ALLERGIES_FILE) == output


### 30.5 Allergy Reaction

Write a function called `allergy_reaction(json_file,patient,substance)` that takes three parameter and returns the reaction that will happen if the patient takes the specified substance.  Solve this, in part, by calling your `allergy_list` function inside your new `allergy_reaction` function.

If the substance is not found in the allergy list, the function should return None.

In [12]:
import json

### BEGIN SOLUTION

def allergy_reaction(json_file,patient,substance):
    """(file name, str, str)-> str
    
    Input parameters are a json file, patient name and allergy substance, and return a str 
    that indicates the allergy reaction the patient has or None if substance is not found in the file. The function 
    calls another function allergy_list(json_file) to generate an allergy list containing patient name, allergy 
    substance and reacction.  If the patient parameter matches the patient name in the allergy list and substance 
    matches the substance in the allergy list, the function will return the allery reaction the patient has, otherwise,
    return None.
    
    
    """
    reaction_list=allergy_list(json_file)
    
    for reaction in reaction_list:
        if (reaction[0]==patient) and (reaction[1]==substance):
            return reaction[2]

    return None 
    
### END SOLUTION

In [13]:
allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN G')

'Hives'

In [14]:
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN G') == 'Hives'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS') == 'Itching'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'STRAWBERRY') == 'Anaphylaxis'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN') == None
assert allergy_reaction(ALLERGIES_FILE, 'Paul Boal', 'PENICILLIN G') == 'Bruising'

---
---

# Stretch (Extra) Problem

Work on either of the stretch problems below can earn you up to 25 free points toward the midterm assignment.  That is, if you complete one of these extra problems successfully, you can skip 1 of the problems that will appear on the midterm exam coming up next week.

The midterm will be distributed 10/14 and due 10/24.



---
---

### STRETCH for October 2022 - For those looking for an additional challenge

As I've mentioned in class, CMS is now enforcing a rule around price transparency.  Every facility that take Medicare payments is required to publish a "machine readable" file with it's pricing infomration for a number of common procedures across all of the payers they work with.  There are two examples of such files in the `/data/` directory: `whiteriver.json` and `saline.xml`.

If you want to compare contracted prices across these two hospitals, you'll need to read in the information from both of those files into some kind of data structure, then merge the data together from those two files.  See what you can do.

See if you can create an output file that has the following fields:
* HOSPITAL
* PROCEDURE_CODE
* PAYER
* AMOUNT

If you choose to work on this, you may get stuck at some point and you won't know if you're _doing it right_. Make some assumptions. Document your questions in this notebook.



```
Procedure Code |  Description  |  Gross Charges  |  Aetna  |  QualChoice
```

In [15]:
def json_dict(file_name):
    """(file_name) -> dictionary
    The input is a json file name.  The function returns a nested dictionary 
    including procedue code, description, grosscharge (i.e. outpatientGrossCharge), 
    the amount that Aeta gets charged, and the amount that QualChoice gets charged.
    
    for example, this dictionary will look like:
    {'11400':{'descripton': EXC TR-EXT B9+MARG 0.5 CM<', 'grosscharge': '1106.00',
    'Aetna': '973.2800', 'QualChoice': '619.3600'} ,...}    
    
    """ 
    import json
    f=open(file_name)   # "/data/whiteriver.json"
    data=json.load(f)

    whiteriver_dic={}

    for each in data["root"]["StandardCharges"]:
    
        whiteriver_dic[each["ProcedureCode"]]={}
        whiteriver_dic[each["ProcedureCode"]]["description"]=each["Description"]
    
        whiteriver_dic[each["ProcedureCode"]]["grosscharge"]=each["OutpatientGrossCharge"]
        whiteriver_dic[each["ProcedureCode"]]["Aetna"]=each["AETNA_Outpatient"]
        whiteriver_dic[each["ProcedureCode"]]["QualChoice"]=each["QUALCHOICE_Outpatient"]
    
    return whiteriver_dic

In [16]:
whiteriver_dict=json_dict("/data/whiteriver.json")  
len(whiteriver_dict)

100

In [17]:
def xml_dict(file_name):
    """(file_name) -> dictionary
        The input is a XML file name, and the function return a nested dictionary containing
        procedure code, grosscharge, description, the amount that Aeta gets charged (if data is
        provided, otherwise input None), and the amount that QualChoice gets charged 
        (if the data is provided, otherwise input None).
        
        for example, the dictionary could look like {'11400': {'grosscharge': '4354.395833',
       'description': 'EXC TR-EXT B9+MARG 0.5 CM<', 'Aetna': '461.083333', 
       'QualChoice'=None},...}
    
    """
    import xml.etree.ElementTree as ET
    mytree=ET.parse(file_name)  # '/data/saline.xml'
    myroot=mytree.getroot()

#print(myroot.tag, myroot[0].tag, myroot[0].attrib)

# myroot.tag -> standhard charge
# myroot[0].tag -> facility
# myroot[0].attrib -> {'Name': 'SALINE MEMORIAL HOSPITAL'}

#dictionary={procedureCode: {grossCharge:$$$, Aetna:$$$, Qualchoice:$$$}}

    saline_dic={}

    for x in myroot[0]:
# x.tag is createddate, patient, patient, patient
# x.attrib is {type:inpatient}, {etc.}
    
        for child in x.getchildren(): 
        # print(child.tag + " : " + str(child.attrib))   
        # Charge : {'Type': 'DRG'}, Charge : {'Type': 'HCPCS'}, Charge : {'Type': 'HCPCS'}
        #if child.attrib.get('Type')=="HCPCS" # not necessary
        
            for grandchild in child.getchildren():  
            #print(grandchild.attrib.get('Code'))  # code
                saline_dic[grandchild.attrib.get('Code')]={}
            
            #print(grandchild.find("GrossCharge").text) # grosscharge cost
                saline_dic[grandchild.attrib.get('Code')]["grosscharge"]=grandchild.find("GrossCharge").text
                saline_dic[grandchild.attrib.get('Code')]["description"]=grandchild.find("Description").text
                
                for payer in grandchild.findall("Contracts"): # contracts
                    names=payer.findall("Contract")           # contract
                    for name in names:                       
                    #print(name.attrib.get("Payer"))       # payer & charge 
                    #print(name.attrib.get("Charge"))
                        if "AETNA" in name.attrib.get("Payer"):
                            saline_dic[grandchild.attrib.get('Code')]["Aetna"]=name.attrib.get("Charge")
                        else:
                            saline_dic[grandchild.attrib.get('Code')]["Aetna"]=None

                        if "QUALCHOICE" in name.attrib.get("Payer"):
                            saline_dic[grandchild.attrib.get('Code')]["QualChoice"]=name.attrib.get("Charge")
                        else:
                            saline_dic[grandchild.attrib.get('Code')]["QualChoice"]=None
    return saline_dic


In [18]:
saline_dict=xml_dict('/data/saline.xml')
len(saline_dict)

1507

In [19]:
# We want to find procedure codes that exit in both whiteriver_dict and saline_dict and
# return a final_file (a nested dictionary) that contains only the matched procedure codes 
# and rest of the costs information associated with these codes from the two dictionaries.  

# for example, the final_file will look like:
# {'Whiteriver': {'11400': {'description': 'EXC TR-EXT B9+MARG 0.5 CM<',
#    'grosscharge': '1106.00',
#    'Aetna': '973.2800',
#    'QualChoice': '619.3600'},... 'Saline': {'11400': {'grosscharge': '4354.395833',
#    'description': 'EXC TR-EXT B9+MARG 0.5 CM<', 'Aetna': None, 'QualChoice': None},...}

matched_code=[]  # contain all matched procedure codes
whiteriver_dictmatch={} # extract info of those matched procedure codes from whiteriver_dict
saline_dictmatch={} # extract info of those matched procedure codes from saline_dict
final_file={} # create a final_file (nested dictioary) from the above two dictionaries

for name in set(whiteriver_dict).intersection(set(saline_dict)):
    matched_code.append(name)
    
    saline_dictmatch[name]=saline_dict[name]
    whiteriver_dictmatch[name]=whiteriver_dict[name]
    
        
final_file["Whiteriver"]=whiteriver_dictmatch
final_file["Saline"]=saline_dictmatch   

In [20]:
len(matched_code), len(whiteriver_dictmatch), len(saline_dictmatch)

(40, 40, 40)

In [21]:
final_file

{'Whiteriver': {'12001': {'description': 'RPR S/N/AX/GEN/TRNK 2.5CM/<',
   'grosscharge': '428.00',
   'Aetna': '376.6400',
   'QualChoice': '239.6800'},
  '11403': {'description': 'EXC TR-EXT B9+MARG 2.1-3CM',
   'grosscharge': '15553.61',
   'Aetna': '13687.1768',
   'QualChoice': '8710.0216'},
  '11623': {'description': 'EXC S/N/H/F/G MAL+MRG 2.1-3',
   'grosscharge': '3091.55',
   'Aetna': '2720.5640',
   'QualChoice': '1731.2680'},
  '11406': {'description': 'EXC TR-EXT B9+MARG >4.0 CM',
   'grosscharge': '7326.42',
   'Aetna': '6447.2496',
   'QualChoice': '4102.7952'},
  '12032': {'description': 'INTMD RPR S/A/T/EXT 2.6-7.5',
   'grosscharge': '12062.49',
   'Aetna': '10614.9912',
   'QualChoice': '6754.9944'},
  '10060': {'description': 'DRAINAGE OF SKIN ABSCESS',
   'grosscharge': '11506.95',
   'Aetna': '10126.1160',
   'QualChoice': '6443.8920'},
  '12034': {'description': 'INTMD RPR S/TR/EXT 7.6-12.5',
   'grosscharge': '10235.79',
   'Aetna': '9007.4952',
   'QualChoice': 

---

## Check your work above

If you didn't get them all correct, take a few minutes to think through those that aren't correct.


## Submitting Your Work

In order to submit your work, you'll need to use the `git` command line program to **add** your homework file (this file) to your local repository, **commit** your changes to your local repository, and then **push** those changes up to github.com.  From there, I'll be able to **pull** the changes down and do my grading.  I'll provide some feedback, **commit** and **push** my comments back to you.  Next week, I'll show you how to **pull** down my comments.

First run through everything one last time and submit your work:
1. Use the `Kernel` -> `Restart Kernel and Run All Cells` menu option to run everything from top to bottom and stop here.
2. Then open a new command line by clicking the `+` icon above the file list and chosing `Terminal`
3. At the command line in the new Terminal, follow these steps:
  1. Change directories to your project folder and the week03 subfolder (`cd <folder name>`)
  2. Make sure your project folders are up to date with github.com (`git pull`)
  3. Add the homework files for this week (`git add <file name>`)
  4. Commit your changes (`git commit -a -m "message"`)
  5. Push your changes (`git push`)
  
If anything fails along the way with this submission part of the process, let me know.  I'll help you troubleshoort.