# HDS5210-2020 Midterm

In the midterm, you're going to focus on using the programming skills that you've developed so far to build a calculator for the Apache II scoring system for ICU Mortality.  
* https://www.mdcalc.com/apache-ii-score#evidence
* https://reference.medscape.com/calculator/apache-ii-scoring-system

For the midterm, we'll be building a calculator for the Apache II score and then running that against a patient file that's available to you out on the internet.  This will be broken down into three main steps:
1. Create your JSON file to encapsulate all of the calculation rules for Apache II
2. Create functions to calculate the Apache II score using your JSON configuration
3. Create a function to loop over the patients in a file on the internet and calculate Apach II scores for all of them



---

## Part 1: Creating a JSON Rules File

Look at the rules for the Apache II scoring system on the pages above.  The first step in the midterm is to use those rules and create a JSON configuration file as described in the 2019 midterm video.  I've provided a starter file named `apache.json` to get you started.

Inside that file, you'll find placeholders for all of the measures that go into the Apache II scoring model:
* Organ Failure History
* Age
* Temperature
* [pH](https://en.wikipedia.org/wiki/PH)
* Heart rate
* Respiratory rate
* [Sodium](https://www.mayoclinic.org/diseases-conditions/hyponatremia/symptoms-causes/syc-20373711)
* [Potassium](https://www.emedicinehealth.com/hyperkalemia/article_em.htm)
* [Creatinine](https://www.medicalnewstoday.com/articles/322380)
* [Hematocrit](https://labtestsonline.org/tests/hematocrit)
* White Blood Count
* [FiO2](https://www.ausmed.com/cpd/articles/oxygen-flow-rate-and-fio2)
* [PaO2](https://www.verywellhealth.com/partial-pressure-of-oyxgen-pa02-914920)
* [A-a gradient](https://www.ncbi.nlm.nih.gov/books/NBK545153/)


You may need to create a sort of nested set of rules in some cases.  For instance, the rule for Creatinine says to use certain ranges and points in the case of Acute Renal Failure and a different set of points for Chronic Renal Failure.

Similarly, the rule for FiO2 says to use PaO2 to calculate scores if the FiO2 is <50, and to use A-a Gradient if the PaO2 is >50.

When you've created your `apache.json` file, make sure it's in the same directory as this notebook.

### Testing your JSON

The assert() functions below should all run just fine.  If you want to change the names of any of the keys in the JSON I provided you, you may, but you'll also need to update this test code so that it doesn't fail.  Remember, your notebook should be able to run end-to-end before you submit it.

In [11]:
import json

with open('apache.json') as f:
    rules = json.load(f)

assert('Organ Failure History' in rules.keys())
assert('Age' in rules.keys())
assert('Temperature' in rules.keys())
assert('pH' in rules.keys())
assert('Heart Rate' in rules.keys())
assert('Respiratory Rate' in rules.keys())
assert('Sodium' in rules.keys())
assert('Potassium' in rules.keys())
assert('Creatinine' in rules.keys())
assert('Hematocrit' in rules.keys())
assert('White Blood Count' in rules.keys())
assert('Oxygenation' in rules.keys())

---

## Part 2: Functions to evaluate rules

Write a series of functions, enough to satisfy all of the main criteria that we're using to calculate the Apache II score.  That list is the same as the assert statements above.

* Each of your functions should be well documented.
* Each function should have "config_file" as one of it's parameters.
* Each function should return a numerical score value.
* Similar to what we discussed in the review, if you can generalize some rules, do so.  You should **NOT** end up with one function for each input variable.  If you did that, you'd have a lot of repetative code.

The Glasgow Coma Scale is simply a 1-to-1 score translation.  Simply add the Glasgow Coma Scale value.  So, you don't need to write a function for this. [Glasgow Coma Scale](https://www.cdc.gov/masstrauma/resources/gcs.pdf)



In [12]:
import json

def get_points(input_name, input_value, config_file):
    """ (str, float/int/str, str) -> int
    This function will return a score for a simple variable in the Apache II Score. A simple variable is a variable that has one condition. Examples are age and temperature.
    A complex variable has multiple conditions. An example is the oxygenation score, which depends on the FiO2 level, and the level of PaO2 or the A-a gradient.
    
    The function matches the input_name and input_value with the appropriate score from the Apache II Score.
    
    input_name is expected to be a string, input_value is expected to be an integer or float but can be a string, and config_file is expected to be a json file. 
    
    >>> get_points("Age", 50, 'apache.json')
    2
    >>> get_points("Temperature", 32, 'apache.json')
    2
    >>> get_points("pH", 7.33, 'apache.json')
    0
    """
    config = json.load(open(config_file))
    rules = config.get(input_name)
    score = 0
    
    for rule in rules:
        if float(input_value) >= rule.get('min') and float(input_value) < rule.get('max'):
            score = rule.get('points')
    
    return score

In [13]:
def creatinine_points(renal_failure, creatinine, config_file):
    """ (str, float/int/str, str) -> int
    This function will return a score for creatinine variable in the Apache II Score. 
    
    renal_failure is expected to be a string. renal_failure should be one of the following: "Acute Renal Failure", "Chronic Renal Failure", or "No Renal Failure"
    creatinine is expected to be an integer or float, but can be a string. config_file is expected to be a json file.
    
    The function takes the renal_failure status and creatinine level and matches them to the appropriate Apache II Score.
    
    >>> creatinine_points("Acute Renal Failure", 2, 'apache.json')
    6
    >>> creatinine_points("Chronic Renal Failure", 2, 'apache.json')
    3
    >>> creatinine_points("No Renal Failure", 1, 'apache.json')
    0
    """
    config = json.load(open(config_file))
    renal_rules = config.get("Creatinine")
    creatinine_rules = {}
    creatinine_score = 0
    
    for renal_rule in renal_rules:
        if renal_rule == renal_failure:
            creatinine_rules = renal_rules.get(renal_failure)
            
    for creatinine_rule in creatinine_rules:
        if float(creatinine) >= creatinine_rule.get('min') and float(creatinine) < creatinine_rule.get('max'):
            creatinine_score = creatinine_rule.get('points')
    
    return creatinine_score  

In [14]:
def oxygenation_points(FiO2, PaO2, A_a_gradient, config_file):
    """ (float/int/str, float/int/str, float/int/str, str) -> int
    This function will return a score for oxygenation variable in the Apache II Score. 
    
    FiO2, PaO2, and A_a_gradient are all expected to be float or integers, but can be strings. config_file is expected to be a json file.
    
    If FiO2 < 50, the function will use PaO2 to calculate the score. If FiO2 > 50, the function will use the A_a_gradient to calculate the score.
    
    >>> oxygenation_points(30, 55, 200, 'apache.json')
    3
    >>> oxygenation_points(30, 61, 500, 'apache.json')
    1
    >>> oxygenation_points(70, 55, 200, 'apache.json')
    2
    >>> oxygenation_points(70, 55, 500, 'apache.json')
    4
    """
    config = json.load(open(config_file))
    oxy_rules = config.get("Oxygenation")
    FiO2_rules = {}
    Fi02_score = 0
    
    for oxy_rule in oxy_rules:
        if float(FiO2) < 50:
            FiO2_rules = oxy_rules.get("FiO2 < 50%")
        else:
            FiO2_rules = oxy_rules.get("FiO2 > 50%")

    for FiO2_rule in FiO2_rules:
        if float(FiO2) < 50:
            if float(PaO2) >= FiO2_rule.get('min') and float(PaO2) < FiO2_rule.get('max'):
                FiO2_score = FiO2_rule.get('points')
        elif float(FiO2) >= 50:
            if float(A_a_gradient) >= FiO2_rule.get('min') and float(A_a_gradient) < FiO2_rule.get('max'):
                FiO2_score = FiO2_rule.get('points')   
                       
    return FiO2_score

In [15]:
def organ_failure_history(organ_failure, config_file):
    """ (str, str) -> int
    This function will return a score for organ failure history variable in the Apache II Score. 
    
    organ_failure is expected to be a string. config_file is expected to be a json file.
    
    >>> organ_failure_history("Emergency", 'apache.json')
    5
    >>> organ_failure_history("Elective", 'apache.json')
    2
    >>> organ_failure_history("None", 'apache.json')
    0
    """
    config = json.load(open(config_file))
    organ_rules = config.get("Organ Failure History")
    organ_score = 0
    
    for organ_rule in organ_rules:
        if organ_rule == organ_failure:
            organ_score = organ_rules.get(organ_failure)
    return organ_score

### Testing you Functions

Write enough test cases to verify that your functions work for evaulating all of the scoring inputs.  Have at least 3 test cases for each input.

These tests can be written the same as the assertions we've use in previous assignments.  For example, if you a function for `temperature_score` then you write a test case like:

```
assert( temperature_score(37) == 0 )
```

In [16]:
assert(organ_failure_history("Emergency", 'apache.json') == 5)
assert(organ_failure_history("Elective", 'apache.json') == 2)
assert(organ_failure_history("None", 'apache.json') == 0)
assert(get_points("Age", 50, 'apache.json') == 2)
assert(get_points("Age", 65, 'apache.json') == 5)
assert(get_points("Age", 75, 'apache.json') == 6)
assert(get_points("Temperature", 32, 'apache.json') == 2)
assert(get_points("Temperature", 38.5, 'apache.json') == 1)
assert(get_points("Temperature", 41, 'apache.json') == 4)
assert(get_points("pH", 7.15, 'apache.json') == 3)
assert(get_points("pH", 7.33, 'apache.json') == 0)
assert(get_points("pH", 7.5, 'apache.json') == 1)
assert(get_points("Heart Rate", 40, 'apache.json') == 3)
assert(get_points("Heart Rate", 70, 'apache.json') == 0)
assert(get_points("Heart Rate", 140, 'apache.json') == 3)
assert(get_points("Respiratory Rate", 10, 'apache.json') == 1)
assert(get_points("Respiratory Rate", 35, 'apache.json') == 3)
assert(get_points("Respiratory Rate", 50, 'apache.json') == 4)
assert(get_points("Sodium", 120, 'apache.json') == 2)
assert(get_points("Sodium", 150, 'apache.json') == 1)
assert(get_points("Sodium", 160, 'apache.json') == 3)
assert(get_points("Potassium", 3, 'apache.json') == 1)
assert(get_points("Potassium", 3.5, 'apache.json') == 0)
assert(get_points("Potassium", 6, 'apache.json') == 3)
assert(creatinine_points("Acute Renal Failure", 2, 'apache.json') == 6)
assert(creatinine_points("Chronic Renal Failure", 2, 'apache.json') == 3)
assert(creatinine_points("No Renal Failure", 1, 'apache.json') == 0)
assert(get_points("Hematocrit", 20, 'apache.json') == 2)
assert(get_points("Hematocrit", 30, 'apache.json') == 0)
assert(get_points("Hematocrit", 50, 'apache.json') == 2)
assert(get_points("White Blood Count", 3, 'apache.json') == 0)
assert(get_points("White Blood Count", 20, 'apache.json') == 2)
assert(get_points("White Blood Count", 40, 'apache.json') == 4)
assert(oxygenation_points(30, 55, 200, 'apache.json') == 3)
assert(oxygenation_points(30, 61, 500, 'apache.json') == 1)
assert(oxygenation_points(70, 55, 200, 'apache.json') == 2)
assert(oxygenation_points(70, 61, 500, 'apache.json') == 4)

---

## Part 3: Put it all together

Create a new function called `apache_score()` that takes all of the necessary inputs and returns the final Apache II score.  Use any variable names that you want.  For clarity and organization, my recommendation is to create them in the same order as they're documented in the website.

1. Organ Failure History
2. Age
3. Temperature
4. pH 
5. Heart rate
6. Respiratory rate
7. Sodium
8. Potassium
9. Creatinine
10. Acute renal failure
11. Hematocrit
12. White Blood Count
13. Glasgow Coma Scale
14. FiO2
15. PaO2
16. A-a gradient


In [17]:
def apache_score(organ_failure, age, temperature, pH, heart_rate, respiratory_rate, sodium, potassium, creatinine, renal_failure, hematocrit, white_blood_count, glasgow_coma_scale, fio2, pao2, a_a_gradient):
    """ (str, float/int/str, float/int/str, float/int/str, float/int/str, float/int/str, float/int/str, float/int/str, float/int/str, str, float/int/str, float/int/str, float/int/str, float/int/str, float/int/str, float/int/str) -> int
    This function will return a score for the Apache II Score by calculating the individual Apache II Score variables and summing them together. 
    
    organ_failure  and renal_failure are expected to be a string. ag, temperature, pH, heart_rate, respiratory_rate, sodium, potassium, creatnine, hematocrit, white_blood_count, glasgow_coma_scale, fio2, pao2 and a_a_gradient are all expected to be integers or floats, but can be a string.
    
    >>> apache_score("Emergency", 6, 27, 7.3, 174, 10, 170, 7.8, 1.6, "Chronic Renal Failure", 3, 28, 13, 61, 59, 458)
    35
    >>> apache_score("Elective", 3, 4, 7.2, 74, 53, 106, 3.2, 1.2, "Acute Renal Failure", 60, 22, 11, 66, 59, 482)
    31
    """
    score = 0
    config_file = 'apache.json'
    score += organ_failure_history(organ_failure, config_file)
    score += get_points("Age", age, config_file)
    score += get_points("Temperature", temperature, config_file)
    score += get_points("pH", pH, config_file)
    score += get_points("Heart Rate", heart_rate, config_file)
    score += get_points("Respiratory Rate", respiratory_rate, config_file)
    score += get_points("Sodium", sodium, config_file)
    score += get_points("Potassium", potassium, config_file)
    score += creatinine_points(renal_failure, creatinine, config_file)
    score += get_points("Hematocrit", hematocrit, config_file)
    score += get_points("White Blood Count", white_blood_count, config_file)
    score += (15 - float(glasgow_coma_scale))
    score += oxygenation_points(fio2, pao2, a_a_gradient, config_file)

    return float(score)

### Testing your Function

Write a few test cases to make sure that your code functions correctly.  In the last step, you'll have LOTS of test cases run through, but you should do some of your before moving on.

In [18]:
assert(apache_score("Emergency", 6, 27, 7.3, 174, 10, 170, 7.8, 1.6, "Chronic Renal Failure", 3, 28, 13, 61, 59, 458) == 35)
assert(apache_score("Elective", 3, 4, 7.2, 74, 53, 106, 3.2, 1.2, "Acute Renal Failure", 60, 22, 11, 66, 59, 482) == 31)

---

## Part 4: Accessing and processing the patient file

Fill out the simple function below to retrieve the patient data as a CSV file from any given URL and return a list of all of the Apache II scores based on the data you find for those patients.
* The patient file will be a CSV
* It will have column headers that match the labels shown above
* The columns will not necessarily appear in the order shown above
* You should output only the Apache II scores, not any other information
* Your output should be a list in the same order as the input rows

In [19]:
import requests
import csv

url = 'https://hds5210-2020.s3.amazonaws.com/TestPatients.csv'

rsp = requests.get(url)

with open('patients.csv','w') as f:
    f.write(rsp.text)

apache_scores = []   
with open('patients.csv') as f:
    reader = csv.reader(f)
    header = next(reader)
    for record in reader:
        organ_failure = record[header.index('Organ Failure History')]  
        age = record[header.index('Age')]  
        temperature = record[header.index('Temperature')]
        pH = record[header.index('pH')] 
        heart_rate = record[header.index('Heart Rate')] 
        respiratory_rate = record[header.index('Respiratory Rate')] 
        sodium = record[header.index('Sodium')]
        potassium = record[header.index('Potassium')] 
        creatinine = record[header.index('Creatinine')]
        renal_failure = record[header.index('Acute Renal Failure')] 
        hematocrit = record[header.index('Hematocrit')] 
        white_blood_count = record[header.index('White Blood Count')] 
        glasgow_coma_scale = record[header.index('Glasgow Coma Scale')] 
        fio2 = record[header.index('FiO2')] 
        pao2 = record[header.index('PaO2')] 
        a_a_gradient = record[header.index('A-a Gradient')] 
        model_score = apache_score(organ_failure, age, temperature, pH, heart_rate, respiratory_rate, sodium, potassium, creatinine, renal_failure, hematocrit, white_blood_count, glasgow_coma_scale, fio2, pao2, a_a_gradient)
        apache_scores.append(model_score)
print(apache_scores)        

[35.0, 31.0, 47.0, 34.0, 44.0, 35.0, 31.0, 49.0, 40.0, 48.0, 42.0, 43.0, 32.0, 41.0, 42.0, 49.0, 37.0, 37.0, 38.0, 43.0, 41.0, 31.0, 38.0, 30.0, 41.0, 41.0, 34.0, 46.0, 40.0, 47.0, 36.0, 43.0, 41.0, 46.0, 44.0, 40.0, 39.0, 37.0, 41.0, 30.0, 46.0, 30.0, 41.0, 44.0, 35.0, 36.0, 40.0, 40.0, 30.0, 50.0, 52.0, 43.0, 46.0, 34.0, 33.0, 42.0, 41.0, 31.0, 46.0, 46.0, 34.0, 36.0, 33.0, 38.0, 26.0, 29.0, 46.0, 25.0, 40.0, 38.0, 39.0, 34.0, 39.0, 53.0, 44.0, 49.0, 37.0, 36.0, 36.0, 51.0, 33.0, 36.0, 43.0, 41.0, 24.0, 50.0, 29.0, 40.0, 36.0, 50.0, 29.0, 37.0, 34.0, 45.0, 34.0, 40.0, 37.0, 37.0, 47.0, 31.0, 33.0, 50.0, 28.0, 37.0, 33.0, 44.0, 40.0, 38.0, 40.0, 31.0, 44.0, 37.0, 44.0, 47.0, 25.0, 34.0, 32.0, 41.0, 38.0, 34.0, 38.0, 39.0, 41.0, 34.0, 29.0, 44.0, 22.0, 40.0, 34.0, 43.0, 40.0, 31.0, 35.0, 33.0, 27.0, 44.0, 52.0, 40.0, 53.0, 40.0, 40.0, 48.0, 31.0, 42.0, 49.0, 43.0, 35.0, 31.0, 36.0, 42.0, 34.0, 31.0, 30.0, 28.0, 38.0, 47.0, 32.0, 47.0, 31.0, 41.0, 40.0, 33.0, 35.0, 42.0, 49.0, 55.0, 51.

### Testing your Function

The URL for the test data is: https://hds5210-2020.s3.amazonaws.com/TestPatients.csv


You can verify your results by comparing them against this data: https://hds5210-2020.s3.amazonaws.com/Scores.csv

CORRECTION ADDED 3/29 - If you calculated the Glasgow Coma Scale points as per the actual instructions in MDCalc, then please use this set of corrected scores to compare your results with: https://hds5210-2020.s3.amazonaws.com/Scores_corrected.csv


In [21]:
import requests
import csv

url = 'https://hds5210-2020.s3.amazonaws.com/Scores_corrected.csv'

scores_corrected = requests.get(url)
with open('scores_corrected.csv','w') as f:
    f.write(scores_corrected.text)

verify = []
counter = 0
with open('scores_corrected.csv') as f:
    reader = csv.reader(f)
    header = next(reader)
    for record in reader:
        score = record[header.index('TOTAL')]
        if float(score) == apache_scores[counter]:
            verify.append(True)
            counter += 1
        else:
            verify.append(False)
            counter += 1
print(verify)

[True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, Tru