# Week 6 Exercises

_McKinney 6.1_

There are multiple ways to solve the problems below.  You can use any one of several approaches.  For example, you can read CSV files using Pandas or the csv module.  Your score won't depend on which modules you choose to use unless explicitly noted below, but your programming style will still matter.

### 30.1 List of Allergies

In the /data directory on the Jupyter server, there is a file called `allergies.json` that contains a list of patient allergies.  It is taken from sample data provided by the EHR vendor, Epic, here: https://open.epic.com/Clinical/Allergy

Take some time to look at the structure of the file.  You can open it directly in Jupyter by clicking the _Home_ icon, then the _from_instructor_ folder, and then the _data_ folder.

Within the file, you'll see that it is a dictionary with many items in it.  One of those items is called `entry` and that item is a list of things.  You can tell that because the item name is immediately followed by an opening square bracket, signifying the start of a list.  It's line 11 of the file: `  "entry": [`

Write a function named `allergy_count(json_file)` that takes as one parameter the name of the JSON file and returns an integer number of entries in that file.  Your function should open the file, read the json into a Python object, and return how many items there are in the list of `entry`s.

In [None]:
import json
from pathlib import Path
HOME = str(Path.home())
ALLERGIES_FILE="/data/allergies.json"

In [None]:
### BEGIN SOLUTION
### END SOLUTION

In [None]:
### BEGIN SOLUTION
def allergy_count(json_file):
    with open(json_file) as f:
        allergies = json.load(f)
        
    return(len(allergies.get('entry')))
### END SOLUTION

In [None]:
allergy_count(ALLERGIES_FILE)

In [None]:
assert type(allergy_count(ALLERGIES_FILE)) == int
assert allergy_count(ALLERGIES_FILE) == 4

### 30.2 Number of Patients

If you dig a little bit deaper into this list of allergies, you'll see that each result has a patient associated with it.  Create a funcation called `patient_count(json_file)` that will count how many unique patients we have in this JSON structure.  

In [None]:
### BEGIN SOLUTION
### END SOLUTION

In [None]:
### BEGIN SOLUTION
def patient_count(json_file):
    patients = set()
    with open(json_file) as f:
        allergies = json.load(f)
        
    for entry in allergies.get('entry'):
        resource = entry.get('resource')
        patient = resource.get('patient')
        name = patient.get('display')
        patients.add(name)
        
    return patients
### END SOLUTION

In [None]:
patient_count(ALLERGIES_FILE)

### 30.3 How Many Allergies per Patient

Although each entry is a separate allergy, several of them are for the same patient.  Write a function called `allergy_per_patient(json_file)` that counts up how many allergies each patient has.


In [None]:
### BEGIN SOLUTION
### END SOLUTION

In [None]:
### BEGIN SOLUTION
def allergy_per_patient(json_file):
    patients = {}
    with open(json_file) as f:
        allergies = json.load(f)
        
    for entry in allergies.get('entry'):
        resource = entry.get('resource')
        patient = resource.get('patient')
        name = patient.get('display')
        patients[name] = patients.setdefault(name,0) + 1
        
    return patients
### END SOLUTION

In [None]:
allergy_per_patient(ALLERGIES_FILE)

### 30.4 Patient Allergies and Reaction

You'll see in the file that each of the items in the `entry` list have several other attributes including a patient name, substance text representation, and a reaction manifestation.  Create a function named `allergy_list(json_file)` that will create an output list that has patient name, allergy, and reaction for each `entry`.  The actual result you should get will be:

```python
[['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis']]
```

You'll notice that the reaction and the manifestation of that action are lists.  You only need to capture the first reaction and the first manifestation of the action.  That is, if there is a list of things, just output the first one.

In [5]:
import json
from pathlib import Path
HOME = str(Path.home())
ALLERGIES_FILE="/data/allergies.json"

In [6]:
import json
import ast

def allergy_list(json_file):
    """(json_file)->str this function converts a json file to a list of patients' allergies and reactions. """
    pt = []
    name = []
    allergen= []
    resource={}
    reaction=[]
    rec_def={}
    m={}
    ma=[]
    r=[]
    allergy=[]
    patient=[]
    output=[]
    allergy_list=[]
       
         
    with open(json_file) as f:
        allergies=json.load(f)
        
    for entry in allergies.get('entry'):
        patient=entry.get('resource').get('patient').get('display')
        substance=entry.get('resource').get('substance').get('text')
        resource=entry['resource']
        rec_list=resource['reaction']
        rec_dic= dict(rec_list[0])
        m=rec_dic.get('manifestation')
        m2=dict(m[0])
        reaction=m2.get('text')
        #output=patient +','+ ' '+ substance+ ','+' ' + reaction
        #allergy_list.append(output)
        output.append([patient,substance,reaction])
    return output
               
        
             



In [7]:
allergy_list(ALLERGIES_FILE)

[['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis']]

In [None]:
output=[['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis']]

assert allergy_list(ALLERGIES_FILE) == output


### 30.5 Allergy Reaction

Write a function called `allergy_reaction(json_file,patient,substance)` that takes three parameter and returns the reaction that will happen if the patient takes the specified substance.  Solve this, in part, by calling your `allergy_list` function inside your new `allergy_reaction` function.

If the substance is not found in the allergy list, the function should return None.

In [None]:
import json
from pathlib import Path
HOME = str(Path.home())
ALLERGIES_FILE="/data/allergies.json"

In [23]:
import json
def allergy_reaction(json_file, patient, substance):
     """ (json_file, str, str)-> 
     This function utilizes the allergy_list function to define a reaction a patient would have if they were exposed to a specific substance. """
    al_l=[]
    al=allergy_list(json_file)
    al_l.append(al)
    reac_l=[]
    op=al_l[0]
    patient = op[0]
    sub= op[1]
    r=zip(op[::1])
    for substance in al_l:
        if substance in substance:
            x= al_l(output[reaction])
        else:
            x= None
        
   
        reac_l.append(patient, substance,)
        
        result=((r)==x)
        print(result)
        return result
        
        
        


In [None]:
import json


def allergy_reaction(json_file, patient, substance):
    """ (json_file, str, str)-> This function utilizes the allergy_list function to define a reaction a patient would have if they were exposed to a specific substance. """
    rl=[]
    allergy_list(json_file)
        #allergy_list=allergy_list(json_file)- continued to get error message-UnboundLocalError: local variable 'allergy_list' referenced before assignment
    def output (patient, substance, reaction):
        op=allergy_list(output[0], output[1], output[2])
        for s in op['substance']:
            if substance=='substance':
                x=output['reaction']
            else:
                x=None
            rl.append((patient, substance)==x)
            print(rl)
            return rl
    


            

        


In [22]:
   for  in allergy_list(json_file):
        output=allergy_list('output')
        patient= allergy_list(output[0])
        substance=allery_list(output[1])
        reaction=allergy_list(output[2])
    print(patient


ef grades_sum(scores):
    sum1 = 0
    for i in scores:
        sum1 = sum1 + i
    print sum1
    return sum1

grades_sum([100, 100, 90, 40, 80, 100, 85, 70, 90, 65, 90, 85, 50.5])

def grades_average(grades):

    average = grades_sum(grades)/float(len(grades))

    print average

    return average

grades_average([100, 100, 90, 40, 80, 100, 85, 70, 90, 65, 90, 85, 50.5])

IndentationError: unindent does not match any outer indentation level (<tokenize>, line 6)

In [24]:
allergy_reaction(ALLERGIES_FILE)

TypeError: allergy_reaction() missing 2 required positional arguments: 'patient' and 'substance'

In [None]:
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN G') == 'Hives'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS') == 'Itching'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'STRAWBERRY') == 'Anaphylaxis'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN') == None
assert allergy_reaction(ALLERGIES_FILE, 'Paul Boal', 'PENICILLIN G') == 'Bruising'

---
---

# Stretch (Extra) Problems

Work on either of the stretch problems below can earn you up to 25 free points toward the midterm assignment.  That is, if you complete one of these extra problems successfully, you can skip 1 of the problems that will appear on the midterm exam coming up next week.

The midterm will be distribute this Saturday 3/13.

This assignment is due on Sunday 3/14.  If you are trying for one of these extra problems Slack me, and I'll provide you feedback on how you did on these before end of day Monday 3/15.  That way you can choose what to complete on the midterm.


---
---

### STRETCH for March 2021 - For those looking for an additional challenge

As I've mentioned in class, CMS is now enforcing a rule around price transparency.  Every facility that take Medicare payments is required to publish a "machine readable" file with it's pricing infomration for a number of common procedures across all of the payers they work with.  There are two examples of such files in the `/data/` directory: `whiteriver.json` and `saline.xml`.

If you want to compare contracted prices across these two hospitals, you'll need to read in the information from both of those files into some kind of data structure, then merge the data together from those two files.  See what you can do.

See if you can create an output file that has the following fields:
* HOSPITAL
* PROCEDURE_CODE
* PAYER
* AMOUNT

If you choose to work on this, you may get stuck at some point and you won't know if you're _doing it right_. Make some assumptions. Document your questions in this notebook.



---
---

### STRETCH from March 2020 - For those looking for an additional challenge

The Coronavirus is creating quite the stir right now.  There are some sources suggesting that trends show it is going to be significantly more serious than SARS was back in the 2002 timeframe.  Here's one visualization trying to demonstrate that: https://www.reddit.com/r/China_Flu/comments/ev2b4v/i_updated_some_charts_comparing_this_outbreak/

Someone on Kaggle has generously already compiled a dataset based on information from Johns Hopkins about the Coronavirus outbreak.  https://www.kaggle.com/brendaso/2019-coronavirus-dataset-01212020-01262020  Create a Kaggle account, if you don't already have one.  Download this data set and then upload it to your Jupyter Home folder.  (The "up arrow" button is for uploading a file.)

Use Python's built-in `csv` module to read the data from this file and generate the following information: **what are the total confirmed cases in all of Mainland China as of the latest information in the data set?**  Some important things to note:
* Each entry for a given city has the **cumulative** number of cases.  So that column is not additive (it cannot be summed).  You'll have to find a way to filter your data for the last day for each city, then total those up.
* If you choose to parse the date column, you will want to lookup how to do that using Python's `datetime` module.  Especially the `strptime` function.  https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior  Hint: you can parse a date string in the format 2/17/2020 using the code below.  This link will tell you what things like `%m` and `%Y` mean.

```
from datetime import datetime
d = datetime.strptime('2/17/2020', '%m/%d/%Y')
```

If you want to take this another step, **create a list of tuples that contain (observate date, total confirmed) totalled over all locations represented in the data**

In [None]:
import pandas as pd
import numpy as np
import csv
import os
CoV= pd.pandas.read_csv('2019nC0vSUMMARY.csv')
CoVChina=CoV[CoV["Country"]=="Mainland China"]
CoVCH2= CoVChina.groupby(by="Province/State")
#|print(CoVCH2.max())
#Col=CoVCH2["Confirmed"]
#Col_max=Col.max
Max_CoVCh=CoVCH2.max()
Total_Confirmed_df= Max_CoVCh['Confirmed'].sum()
print(Total_Confirmed)
print(Max_CoVCh)

#with open ('2019nC0vSUMMARY.csv') as csv_file:
 #   csv_reader(csv_file, delimiter=',')
  #  line_count=0
#CoV_df.head()
#print(os.getcwd())
#print(CoVCH2)
(r'https://jupyter.slucor.net/user/skuca/lab/tree/hds5210-2021/week06/2019nC0vSUMMARY.csv')


---

## Submitting Your Work

In order to submit your work, you'll need to use the `git` command line program to **add** your homework file (this file) to your local repository, **commit** your changes to your local repository, and then **push** those changes up to github.com.  From there, I'll be able to **pull** the changes down and do my grading.  I'll provide some feedback, **commit** and **push** my comments back to you.  Next week, I'll show you how to **pull** down my comments.

To run through everything one last time and submit your work:
1. Use the `Kernel` -> `Restart Kernel and Run All Cells` menu option to run everything from top to bottom and stop here.
2. Save this note with Ctrl-S (or Cmd-S)
2. Skip down to the last command cell (the one starting with `%%bash`) and run that cell.

If anything fails along the way with this submission part of the process, let me know.  I'll help you troubleshoort.

In [None]:
assert False, "DO NOT REMOVE THIS LINE"

---

In [26]:
%%bash
git pull
git add week06_assignment_2.ipynb
git commit -a -m "Submitting the week 6 programming assignment"
git push

Already up to date.
[main 73560a8] Submitting the week 6 programming assignment
 1 file changed, 24 insertions(+), 24 deletions(-)


To github.com:skuca/hds5210-2021.git
   db5af3d..73560a8  main -> main



---

If the message above says something like _Submitting the week 6 programming assignment_ or _Everything is up to date_, then your work was submitted correctly.