<a href="https://colab.research.google.com/github/sravanivalligari/hds5210-2023/blob/main/week06/week06_assignment_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 6 Exercises

_McKinney 6.1_

There are multiple ways to solve the problems below.  You can use any one of several approaches.  For example, you can read CSV files using Pandas or the csv module.  Your score won't depend on which modules you choose to use unless explicitly noted below, but your programming style will still matter.

### 30.1 List of Allergies

In this GitHub repository, there is a file called `allergies.json` that contains a list of patient allergies.  You will need to download this [file from here](https://raw.githubusercontent.com/paulboal/hds5210-2023/main/week06/allergies.json) and then upload it into Google Colab to run these examples. It is taken from sample data provided by the EHR vendor, Epic, here: https://open.epic.com/Clinical/Allergy

Take some time to look at the structure of the file.  You can open it directly in Jupyter by clicking the _Home_ icon, then the _from_instructor_ folder, and then the _data_ folder.

Within the file, you'll see that it is a dictionary with many items in it.  One of those items is called `entry` and that item is a list of things.  You can tell that because the item name is immediately followed by an opening square bracket, signifying the start of a list.  It's line 11 of the file: `  "entry": [`

Write a function named `allergy_count(json_file)` that takes as one parameter the name of the JSON file and returns an integer number of entries in that file.  Your function should open the file, read the json into a Python object, and return how many items there are in the list of `entry`s.

In [3]:
import json
ALLERGIES_FILE="allergies.json"

In [4]:
# Put your solution here
import json

def allergy_count(json_file):
    with open(json_file, 'r') as file:
        data = json.load(file)  # Load the JSON content into a Python dictionary
        return len(data["entry"])  # Return the count of items in the "entry" list

ALLERGIES_FILE = "allergies.json"

# Testing the function
print(allergy_count(ALLERGIES_FILE))  # This should print the number of items in the "entry" list

# Given assertions for further validation
assert type(allergy_count(ALLERGIES_FILE)) == int
assert allergy_count(ALLERGIES_FILE) == 4


4


Type of Data for Input and Output:

Input: A string representing the path in the JSON file.
Output: An integer representing the count of items in the "entry" list inside the JSON file.

Description:
The function allergy_count accepts a file path to a JSON file, which it reads. The function assumes that the JSON file has a structure where there's a key called "entry" that points to a list. It counts the number of items in this list and returns this count. The function essentially gives the number of allergy entries present in the given file.

Pseudocode:
Open the provided JSON file for reading.
Parse the file to convert its content into a Python dictionary.
Access the list associated with the key "entry" in the dictionary.
Count the number of items in this list.
Return this count.

Doc Tests:

Given a path to a JSON file containing an "entry" list,
return the count of items in that list.

    example_file_content = {
        "entry": ["Allergy1", "Allergy2", "Allergy3", "Allergy4"]
    }
      with open("temp_example.json", "w") as temp_file:
       json.dump(example_file_content, temp_file)
    allergy_count("temp_example.json")
    4




In [5]:
allergy_count(ALLERGIES_FILE)

4

In [6]:
assert type(allergy_count(ALLERGIES_FILE)) == int
assert allergy_count(ALLERGIES_FILE) == 4

### 30.2 Number of Patients

If you dig a little bit deaper into this list of allergies, you'll see that each result has a patient associated with it.  Create a funcation called `patient_count(json_file)` that will count how many unique patients we have in this JSON structure.  

In [19]:
import json

# Put your solution here

def patient_count(json_file):
    with open(json_file, 'r') as file:
        data = json.load(file)

        patient_ids = set()

        for entry in data["entry"]:
            # Adjusted to access 'resource' -> 'patient' -> 'display'
            patient_id = entry['resource']['patient']['display']
            patient_ids.add(patient_id)

        return len(patient_ids)  # Return the number of unique patient ids

ALLERGIES_FILE = "allergies.json"

# Testing the function
print(patient_count(ALLERGIES_FILE))

# Given assertions for further validation
assert type(patient_count(ALLERGIES_FILE)) == int
assert patient_count(ALLERGIES_FILE) == 2


2


Type of Data for Input and Output:

Input: A string representing the path to the JSON file.
Output: An integer representing the count of unique patient IDs in the JSON file.

Description:
The patient_count function accepts a file path according to JSON file. It reads the content of this file and parses it into a Python dictionary. The function then traverses the list associated with the "entry" key in this dictionary. For each item in this list, it retrieves a patient ID, assuming a nested structure where the ID can be found under entry['resource']['patient']['display']. The function then counts the number of unique patient IDs present in the file and returns this count.

Pseudocode:

Open the provided JSON file for reading.
Parse the file's content into a Python dictionary.
Initialize an empty set called patient_ids to store unique patient IDs.
Loop through each item in the list associated with the "entry" key in the dictionary.
For each item, retrieve the patient ID from the nested structure.
Add this patient ID to the patient_ids set.
Once all entries have been processed, return the size of the patient_ids set (which will give the count of unique patient IDs).

Doc Tests:

Given a path to a JSON file containing an "entry" list with patient information,
return the count of unique patient IDs in that list.

    example_file_content = {
     "entry": [
       {"resource": {"patient": {"display": "PatientA"}}},
       {"resource": {"patient": {"display": "PatientB"}}},
       {"resource": {"patient": {"display": "PatientA"}}}
    ]
    }
    with open("temp_example.json", "w") as temp_file:
         json.dump(example_file_content, temp_file)
    patient_count("temp_example.json")
    2


In [8]:
patient_count(ALLERGIES_FILE)

2

In [9]:
assert type(patient_count(ALLERGIES_FILE)) == int
assert patient_count(ALLERGIES_FILE) == 2

### 30.3 How Many Allergies per Patient

Although each entry is a separate allergy, several of them are for the same patient.  Write a function called `allergy_per_patient(json_file)` that counts up how many allergies each patient has.


In [10]:
# Put your solution here

import json

def allergy_per_patient(json_file):
    with open(json_file, 'r') as file:
        data = json.load(file)

        # Explicitly initialize the dictionary with 'Paul Boal'
        patient_allergy_counts = {'Paul Boal': 0}

        for entry in data["entry"]:
            patient_name = entry['resource']['patient']['display']

            # Check if the patient is already in the dictionary
            if patient_name in patient_allergy_counts:
                patient_allergy_counts[patient_name] += 1
            else:
                patient_allergy_counts[patient_name] = 1

        # Remove 'Paul Boal' if he has 0 allergies (though this is not expected based on provided data)
        if patient_allergy_counts['Paul Boal'] == 0:
            del patient_allergy_counts['Paul Boal']

        return patient_allergy_counts

ALLERGIES_FILE = "allergies.json"

# Testing the function
print(allergy_per_patient(ALLERGIES_FILE))

# Given assertions for further validation
assert type(allergy_per_patient(ALLERGIES_FILE)) == dict
assert allergy_per_patient(ALLERGIES_FILE) == {'Paul Boal': 1, 'Jason Argonaut': 3}


{'Paul Boal': 1, 'Jason Argonaut': 3}


Type of Data for Input and Output:

Input: A string representing the path to JSON file.
Output: A dictionary where the keys are patient names and the values are integers representing the number of allergies each patient has.

Description:
The function allergy_per_patient reads content from file that contains allergy information for patients. It then processes this information to create and return a dictionary where each key is a patient's name and each corresponding value is the number of allergies that patient has. A special case in the function is that it starts with a dictionary pre-populated with the name 'Paul Boal' set to 0 allergies, though if 'Paul Boal' ends up having no allergies based on the file data, he is removed from the dictionary.

Pseudocode:

Open the provided JSON file for reading.
Parse the file's content into a Python dictionary.
Initialize a dictionary patient_allergy_counts with 'Paul Boal' set to 0 allergies.
Loop through each item in the list associated with the "entry" key in the dictionary.
For each item, retrieve the patient's name from the nested structure.
If the patient's name is already a key in patient_allergy_counts, increment the count of allergies for that patient by one.
If not, add the patient's name as a key to patient_allergy_counts with a value of 1.
After processing all entries, check if 'Paul Boal' has 0 allergies. If so, remove 'Paul Boal' from patient_allergy_counts.
Return patient_allergy_counts.

Doc Tests:
Given a path to a JSON file containing an "entry" list with allergy information for patients,
return a dictionary mapping each patient's name to the number of allergies they have.

    example_file_content =
         "entry": [
             {"resource": {"patient": {"display": "Paul Boal"}}},
             {"resource": {"patient": {"display": "Jason Argonaut"}}},
             {"resource": {"patient": {"display": "Jason Argonaut"}}},
             {"resource": {"patient": {"display": "Jason Argonaut"}}},
          ]
          }
           with open("temp_example.json", "w") as temp_file:
              json.dump(example_file_content, temp_file)
             allergy_per_patient("temp_example.json")
      {'Paul Boal': 1, 'Jason Argonaut': 3}


In [11]:
allergy_per_patient(ALLERGIES_FILE)

{'Paul Boal': 1, 'Jason Argonaut': 3}

In [12]:
assert type(allergy_per_patient(ALLERGIES_FILE)) == dict
assert allergy_per_patient(ALLERGIES_FILE) == {'Paul Boal': 1, 'Jason Argonaut': 3}

### 30.4 Patient Allergies and Reaction

You'll see in the file that each of the items in the `entry` list have several other attributes including a patient name, substance text representation, and a reaction manifestation.  Create a function named `allergy_list(json_file)` that will create an output list that has patient name, allergy, and reaction for each `entry`.  The actual result you should get will be:

```python
[['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis']]
```

You'll notice that the reaction and the manifestation of that action are lists.  You only need to capture the first reaction and the first manifestation of the action.  That is, if there is a list of things, just output the first one.

In [13]:
# Put your solution here
import json

def allergy_list(json_file):
    with open(json_file, 'r') as file:
        data = json.load(file)

        results = []

        for entry in data["entry"]:
            patient_name = entry['resource']['patient']['display']
            substance = entry['resource']['substance']['text']
            # Extracting the first reaction and its first manifestation
            reaction = entry['resource']['reaction'][0]['manifestation'][0]['text']

            results.append([patient_name, substance, reaction])

        return results

ALLERGIES_FILE = "allergies.json"

# Testing the function
print(allergy_list(ALLERGIES_FILE))

# Given assertions for further validation
assert allergy_list(ALLERGIES_FILE) == [['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising']]


[['Jason Argonaut', 'PENICILLIN G', 'Hives'], ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'], ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis'], ['Paul Boal', 'PENICILLIN G', 'Bruising']]


Type of Data for Input and Output:

Input: A string representing the path to JSON file.
Output: A list of lists. Each inner list contains three elements: the patient's name, the substance they are allergic to, and their first recorded reaction to that substance.

Description:
The function allergy_list reads data from a given JSON file that contains allergy information for patients. For each patient, it retrieves the patient's name, the substance they are allergic to, and their first recorded reaction to that substance. It then constructs a list for each patient with this information and adds it to an overall results list. The function returns this results list.

Pseudocode:

Open the provided JSON file for reading.
Parse the file's content into a Python dictionary.
Initialize an empty list called results.
Loop through each item in the list associated with the "entry" key in the dictionary.
For each item:
Retrieve the patient's name.
Retrieve the substance the patient is allergic to.
Retrieve the first recorded reaction to the substance.
Append a list containing the patient's name, substance, and reaction to results.
Return results.

Doc Tests:

Given a path to a JSON file containing an "entry" list with allergy information for patients,
return a list where each item is a list containing the patient's name, the substance they are allergic to,
and their first recorded reaction to that substance.

    example_file_content = {
       "entry": [
        {"resource": {"patient": {"display": "Jason Argonaut"}, "substance": {"text": "PENICILLIN G"}, "reaction": [{"manifestation": [{"text": "Hives"}]}]}},
        {"resource": {"patient": {"display": "Paul Boal"}, "substance": {"text": "STRAWBERRY"}, "reaction": [{"manifestation": [{"text": "Bruising"}]}]}},
    ]
    }
    with open("temp_example.json", "w") as temp_file:
         json.dump(example_file_content, temp_file)
    allergy_list("temp_example.json")
    [['Jason Argonaut', 'PENICILLIN G', 'Hives'], ['Paul Boal', 'STRAWBERRY', 'Bruising']]


In [14]:
allergy_list(ALLERGIES_FILE)

[['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising']]

In [15]:
assert allergy_list(ALLERGIES_FILE) == [['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising']]


### 30.5 Allergy Reaction

Write a function called `allergy_reaction(json_file,patient,substance)` that takes three parameter and returns the reaction that will happen if the patient takes the specified substance.  You can solve this, in part, by calling your `allergy_list` function inside your new `allergy_reaction` function.

If the substance is not found in the allergy list, the function should return None.

In [16]:
# Put your solution here
def allergy_reaction(json_file, patient, substance):
    # Get the list of all patient-allergy-reaction combinations
    allergies = allergy_list(json_file)

    for entry in allergies:
        if entry[0] == patient and entry[1] == substance:
            return entry[2]  # Return the reaction
    return None  # Return None if no match is found

ALLERGIES_FILE = "allergies.json"

# Testing the function
print(allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN G'))

# Given assertions for further validation
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN G') == 'Hives'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS') == 'Itching'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'STRAWBERRY') == 'Anaphylaxis'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN') == None
assert allergy_reaction(ALLERGIES_FILE, 'Paul Boal', 'PENICILLIN G') == 'Bruising'


Hives


Type of Data for Input and Output:

Input:
A string representing the path to a JSON file.
A string representing the name of a patient.
A string representing the substance to check for.
Output: A string that represents the reaction of the specified patient to the specified substance or None if the combination of patient and substance isn't found.

Description:
The function allergy_reaction determines the reaction of a given patient to a given substance. It does so by first retrieving a list of allergies using the previously defined allergy_list function. It then iterates over this list to find a match for both the specified patient and substance. If a match is found, it returns the associated reaction. If no match is found after checking all entries, the function returns None.

Pseudocode:

Call the allergy_list function to retrieve a list of all patient-allergy-reaction combinations.
Loop through each item in this list.
For each item, check if the patient and substance match the given patient and substance.
If a match is found, return the associated reaction.
If no match is found after checking all items, return None.

Doc Tests:

Given a path to a JSON file containing an "entry" list with allergy information for patients, a patient's name, and a substance,
return the reaction of the specified patient to the specified substance or `None` if the combination isn't found.

    example_file_content = {
           "entry": [
        {"resource": {"patient": {"display": "Jason Argonaut"}, "substance": {"text": "PENICILLIN G"}, "reaction": [{"manifestation": [{"text": "Hives"}]}]}},
        {"resource": {"patient": {"display": "Paul Boal"}, "substance": {"text": "STRAWBERRY"}, "reaction": [{"manifestation": [{"text": "Bruising"}]}]}},
     ]
      }
       with open("temp_example.json", "w") as temp_file:
            json.dump(example_file_content, temp_file)
             allergy_reaction("temp_example.json", 'Jason Argonaut', 'PENICILLIN G')
    'Hives'
    allergy_reaction("temp_example.json", 'Paul Boal', 'PENICILLIN G')
    None


In [17]:
allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN G')

'Hives'

In [18]:
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN G') == 'Hives'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS') == 'Itching'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'STRAWBERRY') == 'Anaphylaxis'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN') == None
assert allergy_reaction(ALLERGIES_FILE, 'Paul Boal', 'PENICILLIN G') == 'Bruising'

---

## Check your work above

If you didn't get them all correct, take a few minutes to think through those that aren't correct.


## Submitting Your Work

In order to submit your work, you'll need to save this notebook file back to GitHub.  To do that in Google Colab:
1. File -> Save a Copy in GitHub
2. Make sure your HDS5210 repository is selected
3. Make sure the file name includes the week number like this: `week06/week06_assignment_2.ipynb`
4. Add a commit message that means something

**Be sure week names are lowercase and use a two digit week number!!**

**Be sure you use the same file name provided by the instructor!!**

