<a href="https://colab.research.google.com/github/syalam1998/hds5210-2023/blob/main/week06_assignment_2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 6 Exercises

_McKinney 6.1_

There are multiple ways to solve the problems below.  You can use any one of several approaches.  For example, you can read CSV files using Pandas or the csv module.  Your score won't depend on which modules you choose to use unless explicitly noted below, but your programming style will still matter.

### 30.1 List of Allergies

In this GitHub repository, there is a file called `allergies.json` that contains a list of patient allergies.  You will need to download this [file from here](https://raw.githubusercontent.com/paulboal/hds5210-2023/main/week06/allergies.json) and then upload it into Google Colab to run these examples. It is taken from sample data provided by the EHR vendor, Epic, here: https://open.epic.com/Clinical/Allergy

Take some time to look at the structure of the file.  You can open it directly in Jupyter by clicking the _Home_ icon, then the _from_instructor_ folder, and then the _data_ folder.

Within the file, you'll see that it is a dictionary with many items in it.  One of those items is called `entry` and that item is a list of things.  You can tell that because the item name is immediately followed by an opening square bracket, signifying the start of a list.  It's line 11 of the file: `  "entry": [`

Write a function named `allergy_count(json_file)` that takes as one parameter the name of the JSON file and returns an integer number of entries in that file.  Your function should open the file, read the json into a Python object, and return how many items there are in the list of `entry`s.

In [22]:
import json
ALLERGIES_FILE="allergies.json"

In [23]:
import json
def allergy_count(json_file):
    """
    (Json File) -> int

    This function counts number of entries in the given JSON file.
    This function takes JSON file as input and iterates through all of the entries present in JSON file.
    After Iterating throug all of the entries, It counts how many entries were present in JSON file.

    >>> allergy_count(ALLERGIES_FILE)
    4
    """
    with open(json_file, 'r') as file:
        data = json.load(file)
        if 'entry' in data:
            return len(data['entry'])
        else :
            return "No entries found"


# importing doctest and running examples from docstring
import doctest
doctest.run_docstring_examples(allergy_count, globals(), verbose=True)


Finding tests in NoName
Trying:
    allergy_count(ALLERGIES_FILE)
Expecting:
    4
ok


In [24]:
allergy_count(ALLERGIES_FILE)

4

In [25]:
assert type(allergy_count(ALLERGIES_FILE)) == int
assert allergy_count(ALLERGIES_FILE) == 4

### 30.2 Number of Patients

If you dig a little bit deaper into this list of allergies, you'll see that each result has a patient associated with it.  Create a funcation called `patient_count(json_file)` that will count how many unique patients we have in this JSON structure.  

In [26]:
import json

def patient_count(json_file):
    """
    (Json File) -> int

    This function counts number of unique patients in the given JSON file.
    This function takes JSON file as input and iterates through all of the entries present in JSON file.
    When Iterating through every entry, The patient name is stored in a set and finally returns the number of unique patients.

    >>> patient_count(ALLERGIES_FILE)
    2
    """
    with open(json_file, 'r') as file:
        data = json.load(file)
        if 'entry' in data:
            # A set to store unique patient identifiers
            unique_patients = set()

            # Iterating through the entries and extracting patient references
            for entry in data['entry']:
                if 'resource' in entry and 'patient' in entry['resource']:
                    patient_reference = entry['resource']['patient']['display']
                    unique_patients.add(patient_reference)

            return len(unique_patients)
        else:
            return "No entries found"

# importing doctest and running examples from docstring
import doctest
doctest.run_docstring_examples(patient_count, globals(), verbose=True)


Finding tests in NoName
Trying:
    patient_count(ALLERGIES_FILE)
Expecting:
    2
ok


In [27]:
patient_count(ALLERGIES_FILE)

2

In [28]:
assert type(patient_count(ALLERGIES_FILE)) == int
assert patient_count(ALLERGIES_FILE) == 2

### 30.3 How Many Allergies per Patient

Although each entry is a separate allergy, several of them are for the same patient.  Write a function called `allergy_per_patient(json_file)` that counts up how many allergies each patient has.


In [29]:
import json
from collections import defaultdict

def allergy_per_patient(json_file):
  """
    (Json File) -> dict

    This function counts number of allergies per patient in the given JSON file.
    This function takes JSON file as input and iterates through all of the entries present in JSON file.
    After Iterating throug all of the entries, It adds the patient name and counts allergies of that patient and finally returns that dictionary

    >>> allergy_per_patient(ALLERGIES_FILE)
    {'Jason Argonaut': 3, 'Paul Boal': 1}
    """
  with open(json_file, 'r') as file:
    data = json.load(file)
    if 'entry' in data:
        # creating a dictionary
        allergies_per_patient = defaultdict(int)

          # Iterating through the entries and extract patient and allergy information
        for entry in data['entry']:
            if 'resource' in entry and 'patient' in entry['resource'] and 'substance' in entry['resource']:
                patient_reference = entry['resource']['patient']['display']
                allergies_per_patient[patient_reference] += 1

        return dict(allergies_per_patient)


# importing doctest and running examples from docstring
import doctest
doctest.run_docstring_examples(allergy_per_patient, globals(), verbose=True)


Finding tests in NoName
Trying:
    allergy_per_patient(ALLERGIES_FILE)
Expecting:
    {'Jason Argonaut': 3, 'Paul Boal': 1}
ok


In [30]:
allergy_per_patient(ALLERGIES_FILE)

{'Jason Argonaut': 3, 'Paul Boal': 1}

In [31]:
assert type(allergy_per_patient(ALLERGIES_FILE)) == dict
assert allergy_per_patient(ALLERGIES_FILE) == {'Paul Boal': 1, 'Jason Argonaut': 3}

### 30.4 Patient Allergies and Reaction

You'll see in the file that each of the items in the `entry` list have several other attributes including a patient name, substance text representation, and a reaction manifestation.  Create a function named `allergy_list(json_file)` that will create an output list that has patient name, allergy, and reaction for each `entry`.  The actual result you should get will be:

```python
[['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis']]
```

You'll notice that the reaction and the manifestation of that action are lists.  You only need to capture the first reaction and the first manifestation of the action.  That is, if there is a list of things, just output the first one.

In [32]:
import json

def allergy_list(json_file):
    """
    (Json File) -> List of lists

    This function lists all the patient names, their allergy and its reaction
    This function takes JSON file as input and iterates through all of the entries present in JSON file.
    After Iterating throug all of the entries, It adds each patient name and extracts their allergy, reaction from the entry and adds it to the list of lists.
    Finally that allergy list is returned.

    >>> allergy_list(ALLERGIES_FILE)
    [['Jason Argonaut', 'PENICILLIN G', 'Hives'], ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'], ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis'], ['Paul Boal', 'PENICILLIN G', 'Bruising']]
    """
    with open(json_file, 'r') as file:
        data = json.load(file)
        if 'entry' in data:
            output_list = []

            for entry in data['entry']:
                if 'resource' in entry and 'patient' in entry['resource'] and 'substance' in entry['resource']:
                    patient_name = entry['resource']['patient']['display']
                    allergy = entry['resource']['substance']['text']

                    # Extracting the first reaction and manifestation if they exist
                    reaction = entry['resource'].get('reaction', [{}])[0].get('manifestation', [''])[0].get('text', [''][0])

                    output_list.append([patient_name, allergy, reaction])

            return output_list

# importing doctest and running examples from docstring
import doctest
doctest.run_docstring_examples(allergy_list, globals(), verbose=True)

Finding tests in NoName
Trying:
    allergy_list(ALLERGIES_FILE)
Expecting:
    [['Jason Argonaut', 'PENICILLIN G', 'Hives'], ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'], ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis'], ['Paul Boal', 'PENICILLIN G', 'Bruising']]
ok


In [33]:
allergy_list(ALLERGIES_FILE)

[['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising']]

In [34]:
assert allergy_list(ALLERGIES_FILE) == [['Jason Argonaut', 'PENICILLIN G', 'Hives'],
 ['Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS', 'Itching'],
 ['Jason Argonaut', 'STRAWBERRY', 'Anaphylaxis'],
 ['Paul Boal', 'PENICILLIN G', 'Bruising']]


### 30.5 Allergy Reaction

Write a function called `allergy_reaction(json_file,patient,substance)` that takes three parameter and returns the reaction that will happen if the patient takes the specified substance.  You can solve this, in part, by calling your `allergy_list` function inside your new `allergy_reaction` function.

If the substance is not found in the allergy list, the function should return None.

In [35]:
def allergy_reaction(json_file, patient, substance):
    """
    (Json File,String,String) -> String

    This function takes the JSON file, patient name and specified substance and returns the reaction that
    will happen when that particular patient takes that substance

    This function first runs the function allergy_list and next finds out the given patient name in allergies list
    and returns the reaction name if the given substance also matches in the entry of allergies list. If no reaction
    is found, it function returns None.

    >>> allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN G')
    'Hives'
    >>> allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS')
    'Itching'
    >>> allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'STRAWBERRY')
    'Anaphylaxis'
    >>> allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN')

    >>> allergy_reaction(ALLERGIES_FILE, 'Paul Boal', 'PENICILLIN G')
    'Bruising'
    """
    # Get the list of allergies using the allergy_list function
    allergies = allergy_list(json_file)

    # Iterating through the allergies to find a match for the specified patient and substance
    for entry in allergies:
        patient_name, allergy, reaction = entry

        if patient_name == patient and allergy == substance:
            return reaction

# importing doctest and running examples from docstring
import doctest
doctest.run_docstring_examples(allergy_reaction, globals(), verbose=True)


Finding tests in NoName
Trying:
    allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN G')
Expecting:
    'Hives'
ok
Trying:
    allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS') 
Expecting:
    'Itching'
ok
Trying:
    allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'STRAWBERRY') 
Expecting:
    'Anaphylaxis'
ok
Trying:
    allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN')  
Expecting nothing
ok
Trying:
    allergy_reaction(ALLERGIES_FILE, 'Paul Boal', 'PENICILLIN G')  
Expecting:
    'Bruising'
ok


In [36]:
allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN G')

'Hives'

In [37]:
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN G') == 'Hives'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'SHELLFISH-DERIVED PRODUCTS') == 'Itching'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'STRAWBERRY') == 'Anaphylaxis'
assert allergy_reaction(ALLERGIES_FILE, 'Jason Argonaut', 'PENICILLIN') == None
assert allergy_reaction(ALLERGIES_FILE, 'Paul Boal', 'PENICILLIN G') == 'Bruising'

---

## Check your work above

If you didn't get them all correct, take a few minutes to think through those that aren't correct.


## Submitting Your Work

In order to submit your work, you'll need to save this notebook file back to GitHub.  To do that in Google Colab:
1. File -> Save a Copy in GitHub
2. Make sure your HDS5210 repository is selected
3. Make sure the file name includes the week number like this: `week06/week06_assignment_2.ipynb`
4. Add a commit message that means something

**Be sure week names are lowercase and use a two digit week number!!**

**Be sure you use the same file name provided by the instructor!!**

