# HDS5210-2020 Midterm

In the midterm, you're going to focus on using the programming skills that you've developed so far to build a calculator for the Apache II scoring system for ICU Mortality.  
* https://www.mdcalc.com/apache-ii-score#evidence
* https://reference.medscape.com/calculator/apache-ii-scoring-system

For the midterm, we'll be building a calculator for the Apache II score and then running that against a patient file that's available to you out on the internet.  This will be broken down into three main steps:
1. Create your JSON file to encapsulate all of the calculation rules for Apache II
2. Create functions to calculate the Apache II score using your JSON configuration
3. Create a function to loop over the patients in a file on the internet and calculate Apach II scores for all of them



---

## Part 1: Creating a JSON Rules File

Look at the rules for the Apache II scoring system on the pages above.  The first step in the midterm is to use those rules and create a JSON configuration file as described in the 2019 midterm video.  I've provided a starter file named `apache.json` to get you started.

Inside that file, you'll find placeholders for all of the measures that go into the Apache II scoring model:
* Organ Failure History
* Age
* Temperature
* [pH](https://en.wikipedia.org/wiki/PH)
* Heart rate
* Respiratory rate
* [Sodium](https://www.mayoclinic.org/diseases-conditions/hyponatremia/symptoms-causes/syc-20373711)
* [Potassium](https://www.emedicinehealth.com/hyperkalemia/article_em.htm)
* [Creatinine](https://www.medicalnewstoday.com/articles/322380)
* [Hematocrit](https://labtestsonline.org/tests/hematocrit)
* White Blood Count
* [FiO2](https://www.ausmed.com/cpd/articles/oxygen-flow-rate-and-fio2)
* [PaO2](https://www.verywellhealth.com/partial-pressure-of-oyxgen-pa02-914920)
* [A-a gradient](https://www.ncbi.nlm.nih.gov/books/NBK545153/)


You may need to create a sort of nested set of rules in some cases.  For instance, the rule for Creatinine says to use certain ranges and points in the case of Acute Renal Failure and a different set of points for Chronic Renal Failure.

Similarly, the rule for FiO2 says to use PaO2 to calculate scores if the FiO2 is <50, and to use A-a Gradient if the PaO2 is >50.

When you've created your `apache.json` file, make sure it's in the same directory as this notebook.

### Testing your JSON

The assert() functions below should all run just fine.  If you want to change the names of any of the keys in the JSON I provided you, you may, but you'll also need to update this test code so that it doesn't fail.  Remember, your notebook should be able to run end-to-end before you submit it.

In [1]:
import json

with open('apache.json') as f:
    rules = json.load(f)

assert('Organ Failure History' in rules.keys())
assert('Age' in rules.keys())
assert('Temperature' in rules.keys())
assert('pH' in rules.keys())
assert('Heart Rate' in rules.keys())
assert('Respiratory Rate' in rules.keys())
assert('Sodium' in rules.keys())
assert('Potassium' in rules.keys())
assert('Creatinine' in rules.keys())
assert('Hematocrit' in rules.keys())
assert('White Blood Count' in rules.keys())
assert('FiO2' in rules.keys())

---

## Part 2: Functions to evaluate rules

Write a series of functions, enough to satisfy all of the main criteria that we're using to calculate the Apache II score.  That list is the same as the assert statements above.

* Each of your functions should be well documented.
* Each function should have "config_file" as one of it's parameters.
* Each function should return a numerical score value.
* Similar to what we discussed in the review, if you can generalize some rules, do so.  You should **NOT** end up with one function for each input variable.  If you did that, you'd have a lot of repetative code.

The Glasgow Coma Scale is simply a 1-to-1 score translation.  Simply add the Glasgow Coma Scale value.  So, you don't need to write a function for this. [Glasgow Coma Scale](https://www.cdc.gov/masstrauma/resources/gcs.pdf)

**CORRECTION ADDED 2/29** - The Glasgow Coma Scale points should be calculated as `15 - Glasgow Coma Scale` rather than what I just stated above.  My preference would be that you do the calculation correctly, as per MDCalc, and then use the **corrected** scores files to compare against as noted in Part 4.

In [2]:
def score_with_range(criteria,num,config_file,sub_layer=0):
    '''(string,float,string,string) -> int or string
    This function generally receive the criteria(Ex.'Heart Rate'), the criteria value, json file, 
    and an optional sublayer criteria. The sublayer should be a dictionary key under the criteria. 
    It will load the json file and get lists of the value range of the criteria, or the sublayer if it presists.
    In for loop, if the criteria value is in the range of min and max, it will return the value of points.
    In most cases, point will be a number but some case like 'FiO2', it will be a dictionary sublayer.
    See fio2_score() function for use case.
    '''
    
    with open(config_file) as f:
        rules = json.load(f)
    rule_list=rules.get(criteria)
    if sub_layer!=0:
        rule_list=rule_list.get(sub_layer)
    for a in rule_list:
        if a.get('min') <= num < a.get('max'):
            score=a.get('points')
            break
    return score



def organ_failure_history_score(history,config_file):
    '''(string,string) -> int
    This function opens the jason file and get the score of a stated history in Organ Failure History dictionary. 
    '''
    
    with open(config_file) as f:
        rules = json.load(f)
    score = rules.get('Organ Failure History').get(history)
    return score



def age_score(age,config_file):
    '''(float,string) -> int
    This function calls the score_with_range() function with an 'Age' criteria, age value, and json file.
    It returns the score of the age input.
    '''
    
    score = score_with_range('Age',age,config_file)
    return score



def temperature_score(temp,config_file):
    '''(float,string) -> int
    This function calls the score_with_range() function with an 'Temperature' criteria, temp value, and json file.
    It returns the score of the temp input.
    '''
    
    score = score_with_range('Temperature',temp,config_file)
    return score



def ph_score(ph,config_file):
    '''(float,string) -> int
    This function calls the score_with_range() function with an 'pH' criteria, ph value, and json file.
    It returns the score of the ph input.
    '''
    
    score = score_with_range('pH',ph,config_file) 
    return score



def heart_rate_score(hr,config_file):
    '''(float,string) -> int
    This function calls the score_with_range() function with an 'Heart Rate' criteria, hr value, and json file.
    It returns the score of the hr input.
    '''
    
    score = score_with_range('Heart Rate',hr,config_file) 
    return score



def respiratory_rate_score(rate,config_file):
    '''(float,string) -> int
    This function calls the score_with_range() function with an 'Respiratory Rate' criteria, rate value, and json file.
    It returns the score of the rate input.
    '''
    
    score = score_with_range('Respiratory Rate',rate,config_file) 
    return score



def sodium_score(na,config_file):
    '''(float,string) -> int
    This function calls the score_with_range() function with an 'Sodium' criteria, na value, and json file.
    It returns the score of the na input.
    '''
    
    score = score_with_range('Sodium',na,config_file) 
    return score



def potassium_score(k,config_file):
    '''(float,string) -> int
    This function calls the score_with_range() function with an 'Potassium' criteria, k value, and json file.
    It returns the score of the k input.
    '''
    
    score = score_with_range('Potassium',k,config_file) 
    return score



def creatinine_score(creatinine,ARF,config_file):
    '''(float,string,string) -> int
    This function calls the score_with_range() function with an 'Creatinine' criteria, 
    creatinine value, json file, and Acute Renal Failure condition.
    It returns the score of the creatinine input.
    '''
    
    score = score_with_range('Creatinine',creatinine,config_file,ARF)
    return score



def hematocrit_score(h,config_file):
    '''(float,string) -> int
    This function calls the score_with_range() function with an 'Hematocrit' criteria, h value, and json file.
    It returns the score of the h input.
    '''
    
    score = score_with_range('Hematocrit',h,config_file) 
    return score



def white_blood_count_score(wbc,config_file):
    '''(float,string) -> int
    This function calls the score_with_range() function with an 'White Blood Count' criteria, wbc value, and json file.
    It returns the score of the wbc input.
    '''
    
    score = score_with_range('White Blood Count',wbc,config_file) 
    return score



def glasgow_score(gscale):
    '''(float) -> float
    This function receives the glasgow coma scale and compute the glasgow coma score by the equation: 15-gscale
    It returns the glasgow score of glasgow scale.
    '''
    
    score = 15-gscale
    return score


def fio2_score(fio2,PaO2,Aa_gradient,config_file):
    '''(float,float,float,string) -> int
    This function calls the score_with_range() function with an 'FiO2' criteria, fio2 value, json file, and 'Value' key.
    The called function will decide which criteria (PaO2 or A-a gradient) will be use to compute the Oxygenation score.
    This function then calls the score_with_range() again with the right criteria value, sublayer and get the true score from it.
    It returns only one score of the given fio2, PaO2, and Aa_gradient input.
    '''
    
    sublayer = score_with_range('FiO2',fio2,config_file,'Value')
    score = score_with_range('FiO2',locals()[sublayer],config_file,sublayer.replace('_',' '))
    return score

### Testing you Functions

Write enough test cases to verify that your functions work for evaulating all of the scoring inputs.  Have at least 3 test cases for each input.

These tests can be written the same as the assertions we've use in previous assignments.  For example, if you a function for `temperature_score` then you write a test case like:

```
assert( temperature_score(37) == 0 )
```

In [3]:
# Put the name of your configuration file below
config_file='apache.json'

In [4]:
assert( organ_failure_history_score("Nonoperative",config_file) == 5)
assert( organ_failure_history_score("Elective",config_file) == 2)
assert( organ_failure_history_score("None",config_file) == 0)

assert( age_score(12,config_file) == 0 )
assert( age_score(65,config_file) == 5 )
assert( age_score(64,config_file) == 3 )

assert( temperature_score(37,config_file) == 0 )
assert( temperature_score(39,config_file) == 3 )
assert( temperature_score(29,config_file) == 4 )

assert( ph_score(12,config_file) == 4 )
assert( ph_score(7.25,config_file) == 2 )
assert( ph_score(3,config_file) == 4 )

assert( heart_rate_score(180,config_file) == 4 )
assert( heart_rate_score(68,config_file) == 2 )
assert( heart_rate_score(45,config_file) == 3 )

assert( respiratory_rate_score(51,config_file) == 4 )
assert( respiratory_rate_score(3,config_file) == 4 )
assert( respiratory_rate_score(12,config_file) == 0 )

assert( sodium_score(160,config_file) == 3 )
assert( sodium_score(144,config_file) == 0 )
assert( sodium_score(111,config_file) == 3 )

assert( potassium_score(8,config_file) == 4 )
assert( potassium_score(6.5,config_file) == 3 )
assert( potassium_score(1,config_file) == 4 )

assert( creatinine_score(2,'Acute Renal Failure',config_file) == 6 )
assert( creatinine_score(0.6,'Chronic Renal Failure',config_file) == 0 )
assert( creatinine_score(9,'Chronic Renal Failure',config_file) == 4 )

assert( hematocrit_score(54,config_file) == 2 )
assert( hematocrit_score(41,config_file) == 0 )
assert( hematocrit_score(29,config_file) == 2 )

assert( white_blood_count_score(4,config_file) == 0 )
assert( white_blood_count_score(25,config_file) == 2 )
assert( white_blood_count_score(41,config_file) == 4 )

assert( glasgow_score(4) == 11 )
assert( glasgow_score(5) == 10 )
assert( glasgow_score(1) == 14 )

assert ( fio2_score(50,55,600,config_file) == 4)
assert ( fio2_score(40,61,100,config_file) == 1)
assert ( fio2_score(61,54,120,config_file) == 0)

---

## Part 3: Put it all together

Create a new function called `apache_score()` that takes all of the necessary inputs and returns the final Apache II score.  Use any variable names that you want.  For clarity and organization, my recommendation is to create them in the same order as they're documented in the website.

1. Organ Failure History
2. Age
3. Temperature
4. pH 
5. Heart rate
6. Respiratory rate
7. Sodium
8. Potassium
9. Creatinine
10. Acute renal failure
11. Hematocrit
12. White Blood Count
13. Glasgow Coma Scale
14. FiO2
15. PaO2
16. A-a gradient


In [5]:
def apache_score(history,age,temp,ph,hr,rate,na,k,creatinine,ARF,h,wbc,gscore,fio2,PaO2,Aa_gradient,config_file):
    '''(string,float,float,float,float,float,float,float,float,string,float,float,float,float,float,float,string) -> float
    This function receives the all the required values for computing a Apache II scoring system.
    It calls a specific function for each criteria to get the score for that criteria and sum all of it together.
    It returns the total score for Apache II.
    '''
    
    total=0
    total+=organ_failure_history_score(history,config_file)
    total+=age_score(age,config_file)
    total+=temperature_score(temp,config_file)
    total+=ph_score(ph,config_file)
    total+=heart_rate_score(hr,config_file)
    total+=respiratory_rate_score(rate,config_file)
    total+=sodium_score(na,config_file)
    total+=potassium_score(k,config_file)
    total+=creatinine_score(creatinine,ARF,config_file)
    total+=hematocrit_score(h,config_file)
    total+=white_blood_count_score(wbc,config_file)
    total+=glasgow_score(gscore)
    total+=fio2_score(fio2,PaO2,Aa_gradient,config_file)
    return total

### Testing your Function

Write a few test cases to make sure that your code functions correctly.  In the last step, you'll have LOTS of test cases run through, but you should do some of your before moving on.

In [6]:
assert(apache_score("Nonoperative",12,37,12,180,51,160,8,2,'Acute Renal Failure',54,4,4,50,55,600,'apache.json') == 47)

---

## Part 4: Accessing and processing the patient file

Fill out the simple function below to retrieve the patient data as a CSV file from any given URL and return a list of all of the Apache II scores based on the data you find for those patients.
* The patient file will be a CSV
* It will have column headers that match the labels shown above
* The columns will not necessarily appear in the order shown above
* You should output only the Apache II scores, not any other information
* Your output should be a list in the same order as the input rows

In [7]:
import pandas as pd

def score_from_file(patient_data_url,config_file):
    '''(string,string) -> list
    This function receives the url of paitents data and json file.
    It use panda to read the patient data url containing csv file.
    For every patient, it gets every variable value and pass them to apache_score() function.
    Finally, it returns list of the Apache II scores in the same sequence as the input patient.
    '''
    
    data = pd.read_csv(patient_data_url)
    total=[]
    for i in range(len(data)):
        history=data['Organ Failure History'][i]
        age=data['Age'][i]
        temp=data['Temperature'][i]
        ph=data['pH'][i]
        hr=data['Heart Rate'][i]
        rate=data['Respiratory Rate'][i]
        na=data['Sodium'][i]
        k=data['Potassium'][i]
        creatinine=data['Creatinine'][i]
        ARF=data['Acute Renal Failure'][i]
        h=data['Hematocrit'][i]
        wbc=data['White Blood Count'][i]
        gscore=data['Glasgow Coma Scale'][i]
        fio2=data['FiO2'][i]
        PaO2=data['PaO2'][i]
        Aa_gradient=data['A-a Gradient'][i]
        total.append(apache_score(history,age,temp,ph,hr,rate,na,k,creatinine,ARF,h,wbc,gscore,fio2,PaO2,Aa_gradient,config_file))
    return total

### Testing your Function

The URL for the test data is: https://hds5210-2020.s3.amazonaws.com/TestPatients.csv


You can verify your results by comparing them against this data: https://hds5210-2020.s3.amazonaws.com/Scores.csv

**CORRECTION ADDED 3/29** - If you calculated the Glasgow Coma Scale points as per the actual instructions in MDCalc, then please use this set of corrected scores to compare your results with: https://hds5210-2020.s3.amazonaws.com/Scores_corrected.csv


In [8]:
test_url='https://hds5210-2020.s3.amazonaws.com/TestPatients.csv'
grade_url='https://hds5210-2020.s3.amazonaws.com/Scores_corrected.csv'
grader = pd.read_csv(grade_url)
config_file='apache.json'
result=score_from_file(test_url,config_file)
for i in range(len(grader)):
    assert(result[i] == grader['TOTAL'][i])
    print("Got a score of {}, should have been {}".format(result[i], grader['TOTAL'][i]))

Got a score of 35, should have been 35.0
Got a score of 31, should have been 31.0
Got a score of 47, should have been 47.0
Got a score of 34, should have been 34.0
Got a score of 44, should have been 44.0
Got a score of 35, should have been 35.0
Got a score of 31, should have been 31.0
Got a score of 49, should have been 49.0
Got a score of 40, should have been 40.0
Got a score of 48, should have been 48.0
Got a score of 42, should have been 42.0
Got a score of 43, should have been 43.0
Got a score of 32, should have been 32.0
Got a score of 41, should have been 41.0
Got a score of 42, should have been 42.0
Got a score of 49, should have been 49.0
Got a score of 37, should have been 37.0
Got a score of 37, should have been 37.0
Got a score of 38, should have been 38.0
Got a score of 43, should have been 43.0
Got a score of 41, should have been 41.0
Got a score of 31, should have been 31.0
Got a score of 38, should have been 38.0
Got a score of 30, should have been 30.0
Got a score of 4