# HDS5210-2021 Midterm

In the midterm, you're going to focus on using the programming skills that you've developed so far to build a calculator three different risk scores and apply that to a data file. The three calculations you're going to write functions for are: 
* CHA2DS2-VASc Score for Atrial Fibrillation Stroke Risk - [link](https://www.mdcalc.com/cha2ds2-vasc-score-atrial-fibrillation-stroke-risk)
* HEART Score for Major Cardiac Events - [link](https://www.mdcalc.com/heart-score-major-cardiac-events)
* Framingham Risk Score for Hard Coronary Heart Disease - [link](https://www.mdcalc.com/framingham-risk-score-hard-coronary-heart-disease)

In each of the next three parts, you'll be programming a function to calculate each score.  In the last part of the midterm, you'll take those functions and use them to calculate risk scores for a list of patients from a CSV file and select a limited group of patients that match a fourth set of risk assessment criteria.


---

## Part 1: CHA2DS2-VASc

This scoring mechanism for Atrial Fibrillation Stroke uses 7 inputs:
* Age (Number)
* Sex (Male / Female)
* CHF History (True / False)
* Hypertension History (True / False)
* Stroke History (True / False)
* Vascular Disease History (True / False)
* Diabetes History (True / False)

Fill out the function below with logic to calculate the numeric risk score for teh given input.

Be sure to provide meaningful documentation and at least two test cases in your documentation.  Also make sure your code satisfies the test cases provided in the assert statements.

In [1]:
def cha2ds2_vasc(age, sex, chf, hypertension, stroke, vascular, diabetes):
    """
    (int,str,bool,bool,bool,bool,bool)
    Return the CHA-DS-VASc score, which predicts a patient's stroke risk when patients have an existing 
    atrial fibrillation. The score ranges from low (0) to high (9).
    
    >>> cha2ds2_vasc(30,'Female',False,False,False,False,False)
    1
    
    >>> cha2ds2_vasc(65,'Male',False,True,True,False,True)
    5
    """
    
    AFS_score = 0
    history = [chf, hypertension, stroke, vascular, diabetes]
    
    if age < 65:
        AFS_score += 0
    elif (age >= 65) and (age <= 74):
        AFS_score += 1
    else:
        AFS_score += 2
    
    if sex == 'Female':
        AFS_score += 1 
    else:
        AFS_score += 0
        
    for index,condition in enumerate(history):
        if condition == False:
            AFS_score += 0
        else:
            if index == 2:
                AFS_score += 2
            else:
                AFS_score += 1
        
    return(AFS_score)

In [None]:
import doctest
doctest.run_docstring_examples(cha2ds2_vasc, globals(), verbose=True)

Testing your code with assertions....

In [None]:
assert cha2ds2_vasc(82,'Male',False,True,True,True,True) == 7
assert cha2ds2_vasc(22,'Male',False,False,True,False,False) == 2
assert cha2ds2_vasc(32,'Female',True,True,True,True,True) == 7
assert cha2ds2_vasc(21,'Female',True,True,True,False,False) == 5
assert cha2ds2_vasc(52,'Female',True,True,False,False,False) == 3
assert cha2ds2_vasc(88,'Male',True,True,True,False,False) == 6
assert cha2ds2_vasc(22,'Male',False,False,True,False,False) == 2
assert cha2ds2_vasc(71,'Female',False,False,False,True,True) == 4
assert cha2ds2_vasc(89,'Female',True,False,False,True,True) == 6
assert cha2ds2_vasc(54,'Male',True,False,False,False,True) == 2
assert cha2ds2_vasc(89,'Female',False,False,True,True,False) == 6
assert cha2ds2_vasc(36,'Male',False,True,False,True,True) == 3
assert cha2ds2_vasc(57,'Female',True,False,False,True,True) == 4
assert cha2ds2_vasc(22,'Female',False,True,False,True,False) == 3
assert cha2ds2_vasc(40,'Female',True,True,True,False,False) == 5
assert cha2ds2_vasc(54,'Female',False,False,False,True,True) == 3
assert cha2ds2_vasc(39,'Male',True,False,False,False,False) == 1
assert cha2ds2_vasc(61,'Female',False,False,False,True,False) == 2
assert cha2ds2_vasc(57,'Female',True,False,True,False,False) == 4
assert cha2ds2_vasc(76,'Female',True,True,True,True,True) == 9
assert cha2ds2_vasc(83,'Male',False,False,False,False,False) == 2
assert cha2ds2_vasc(86,'Female',False,True,False,False,False) == 4
assert cha2ds2_vasc(61,'Female',True,False,False,False,True) == 3
assert cha2ds2_vasc(46,'Male',True,True,True,True,False) == 5
assert cha2ds2_vasc(25,'Male',True,True,False,True,True) == 4
assert cha2ds2_vasc(62,'Male',False,True,True,True,True) == 5
assert cha2ds2_vasc(59,'Male',False,True,True,False,False) == 3
assert cha2ds2_vasc(60,'Female',False,True,True,False,True) == 5
assert cha2ds2_vasc(53,'Male',False,True,True,False,False) == 3

---

## Part 2: HEART Score

The HEART score is a predictor for major cardiac events.  It requires 5 high-level inputs:
* History (Slightly / Moderately / Highly suspicious)
* EKG (Normal / Non-specific repolarization disturbance / Significant ST deviation)
* Age (Number)
* Risk Factors (Number of risk factors)
* Initial Troponin (Number of times the normal limit)

Fill out the function below with logic to calculate the numeric risk score for teh given input.

Be sure to provide meaningful documentation and at least two test cases in your documentation. Also make sure your code satisfies the test cases provided in the assert statements.

In [2]:
def heart(history, ekg, age, risks, troponin):
    """
    (str,str,int,int,float) -> int
    Returns a HEART score, which predicts a patient's risk for major cardiac events from 0 (low) to 10 (high).
    
    >>> heart('Slightly suspicious','Normal',30, 0, 0)
    0
    >>> heart('Highly suspicious','Significant ST deviation',71, 7, 5)
    10
    """
    
    heart_score = 0
    
    if history.split()[0] == 'Slightly':
        heart_score += 0 
    elif history.split()[0] == 'Moderately':
        heart_score += 1 
    else:
        heart_score += 2
    
    if ekg.split()[0] == 'Normal':
        heart_score += 0
    elif ekg.split()[0] == 'Non-specific':
        heart_score += 1
    else: 
        heart_score += 2
    
    if age < 45:
        heart_score += 0  
    elif (age >= 45) and (age <= 64):
        heart_score += 1
    else:
        heart_score += 2
        
    if risks == 0:
        heart_score += 0
    elif (risks == 1) or (risks == 2):
        heart_score += 1
    else:
        heart_score += 2
        
    if troponin <= 1.0:
        heart_score += 0
    elif (troponin > 1.0) and (troponin <= 3.0):
        heart_score += 1
    else:
        heart_score += 2
        
    return(heart_score)

In [None]:
import doctest
doctest.run_docstring_examples(heart, globals(), verbose=True)

In [None]:
assert heart('Moderately suspicious','Normal',82,4,3.8) == 7
assert heart('Slightly suspicious','Non-specific repolarization',22,2,2.3) == 3
assert heart('Slightly suspicious','Non-specific repolarization',32,4,1.3) == 4
assert heart('Highly suspicious','Non-specific repolarization',21,1,1.1) == 5
assert heart('Slightly suspicious','Normal',52,5,1.2) == 4
assert heart('Moderately suspicious','Significant ST deviation',88,5,0.5) == 7
assert heart('Slightly suspicious','Non-specific repolarization',22,5,3.0) == 4
assert heart('Slightly suspicious','Significant ST deviation',71,4,3.9) == 8
assert heart('Moderately suspicious','Non-specific repolarization',89,5,0.3) == 6
assert heart('Highly suspicious','Normal',54,4,3.9) == 7
assert heart('Moderately suspicious','Normal',89,3,0.3) == 5
assert heart('Slightly suspicious','Non-specific repolarization',36,1,0.4) == 2
assert heart('Moderately suspicious','Normal',57,4,1.3) == 5
assert heart('Slightly suspicious','Normal',22,5,0.2) == 2
assert heart('Slightly suspicious','Normal',40,4,3.9) == 4
assert heart('Highly suspicious','Normal',54,3,3.1) == 7
assert heart('Highly suspicious','Significant ST deviation',39,4,0.9) == 6
assert heart('Moderately suspicious','Normal',61,2,1.9) == 4
assert heart('Slightly suspicious','Normal',57,1,1.7) == 3
assert heart('Moderately suspicious','Significant ST deviation',76,2,1.7) == 7
assert heart('Slightly suspicious','Normal',83,1,1.0) == 3
assert heart('Highly suspicious','Normal',86,1,2.3) == 6
assert heart('Highly suspicious','Non-specific repolarization',61,2,3.5) == 7
assert heart('Slightly suspicious','Normal',46,2,1.0) == 2
assert heart('Slightly suspicious','Significant ST deviation',25,4,3.1) == 6
assert heart('Moderately suspicious','Non-specific repolarization',62,1,2.4) == 5
assert heart('Highly suspicious','Non-specific repolarization',59,2,3.6) == 7
assert heart('Moderately suspicious','Significant ST deviation',60,1,2.1) == 6
assert heart('Slightly suspicious','Normal',53,4,0.1) == 3

## Part 3: Framingham Risk Score for Hard Coronary Heart Disease

The Framingham Risk Score for Hard Coronary Heart Disease is intended for non diabetic patients age 30-79 only.  So, if the patient's age is < 30 or > 79, your function should return `-1` rather than a specific risk score.

The Framingham Risk Score takes 7 inputs:
* Age (Number)
* Sex (Male / Female)
* Smoker (True / False)
* Total cholesterol (Number)
* HDL cholesterol (Number)
* Systolic BP (Number)
* Blood pressure being treated with medicines (No / Yes)

You'll not that rather than being a basic parametric equation, this is a regression function defined by coefficients.  It also requires you take the natural logarithm (`ln`) of many of the parameters.  To help you out, here's an example of how to interpret the formulat provided in the website's **Evidence** tab.

Take special note of the footnotes in the logic:
* *Yes=1, No=0 (for Treated for blood pressure and Smoker)
* ** Men: if age >70, use ln(70) x Smoker. Women: if age >78, use ln(78) x Smoker.


---

These segments of the equation and the coefficient table...

> $ L_{Men} = \beta \times \ln(Age) + \beta \times \ln(cholesterol) ... $
>
> | Variable &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | Men | Women
> | :---------------------------- | :--: | :--: |
> | $ \ln(Age) $ | 52.00961 | 31.764001
> | $ \ln(Cholesterol) $ | 20.014077 | 22.465206

Could be written in Python as follows:

```python
import math

L = 0
if sex == "Male":
    L += 52.00961 * math.ln(age)
    L += 20.014077 * math.ln(cholesterol)
else:
    L += 31.764001 * math.ln(age)
    L += 22.465206 * math.ln(cholesterol)
```

---

**ROUND YOUR FINAL RESULT TO 4 DECIMAL PLACES**

In [3]:
import math

def framingham(age, sex, smoker, cholesterol, hdl, systolic, bp_treated):
    """
    (int,str,bool,int,int,int,bool)
    Returns a Framingham risk score, which predicts a patient's risk for hard coronary heart disease for non-diabetic patients.
    
    >>> framingham(30, 'Female', False, 150, 40, 120, False)
    0.0002
    >>> framingham(67, 'Female', False, 160, 60, 120, False)
    0.0173
    """ 

    if (age < 30) or (age > 79):
        P = -1
    
    else: 
        
        #Coefficents: male,female 
        Beta = { 'age':(52.00961,31.764001),'cholesterol':(20.014077,22.465206),
                 'hdl':(-0.905964,-1.187731),'systolic':(1.305784,2.552905),
                 'bp_treated':(0.241549,0.420251),'smoker':(12.096316,13.07543),
                 'age_cholesterol':(-4.605038,-5.060998),'age_smoker':(-2.84367,-2.996945),
                 'age_age':(-2.93323,0)
               }

        #Adjustments for males, age_smoker
        if sex == 'Male':
                s = 0
                constant = 0.9402
                y_intercept = -172.300168

                #Adjustment for male age > 70
                if age > 70:
                    age_smoker = (Beta['age_smoker'][s])*(math.log(70)* int(smoker))
                else:
                    age_smoker = (Beta['age_smoker'][s])*(math.log(age)*int(smoker))     

        #Adjustments for females, age_smoker
        else:
                s = 1
                constant = 0.98767
                y_intercept = -146.5933061

                #Adjustment for female age > 78 (irrelevant -> P = -1)
                if age > 78:
                    age_smoker = (Beta['age_smoker'][s])*(math.log(78)*int(smoker))
                else:
                    age_smoker = (Beta['age_smoker'][s])*(math.log(age)*int(smoker))   

        #Calculate formula 
        L = [ (Beta['age'][s])*math.log(age) + (Beta['cholesterol'][s])*math.log(cholesterol) + (Beta['hdl'][s])*math.log(hdl) + 
              (Beta['systolic'][s])*math.log(systolic) + (Beta['bp_treated'][s])*int(bp_treated) + (Beta['smoker'][s])*int(smoker) + 
              (Beta['age_cholesterol'][s])*(math.log(age)*math.log(cholesterol)) + age_smoker + 
              (Beta['age_age'][s])*(math.log(age)*math.log(age)) + y_intercept
            ]
        
        P = 1 - constant**(math.exp(L[0]))
            
    return(round(P,4))       

In [None]:
import doctest
doctest.run_docstring_examples(framingham, globals(), verbose=True)

In [None]:
assert framingham(82,'Male',False,214,64,92,True) == -1
assert framingham(22,'Male',False,146,33,102,False) == -1
assert framingham(32,'Female',False,195,31,115,True) == 0.0015
assert framingham(21,'Female',False,152,42,82,True) == -1
assert framingham(52,'Female',False,214,58,85,True) == 0.005
assert framingham(88,'Male',True,173,67,104,False) == -1
assert framingham(22,'Male',False,163,62,112,False) == -1
assert framingham(71,'Female',False,188,30,99,False) == 0.0391
assert framingham(89,'Female',True,172,55,88,False) == -1
assert framingham(54,'Male',False,156,52,117,True) == 0.0437
assert framingham(89,'Female',False,147,58,127,True) == -1
assert framingham(36,'Male',True,169,33,128,True) == 0.0465
assert framingham(57,'Female',True,204,40,86,False) == 0.0189
assert framingham(22,'Female',False,177,59,81,False) == -1
assert framingham(40,'Female',False,165,43,111,True) == 0.0016
assert framingham(54,'Female',True,200,50,86,False) == 0.0126
assert framingham(39,'Male',False,189,49,130,True) == 0.0126
assert framingham(61,'Female',True,176,68,106,False) == 0.0153
assert framingham(57,'Female',False,181,47,124,True) == 0.0183
assert framingham(76,'Female',True,162,56,94,False) == 0.0239
assert framingham(83,'Male',False,215,52,98,True) == -1
assert framingham(86,'Female',True,169,55,100,True) == -1
assert framingham(61,'Female',False,151,65,86,True) == 0.0053
assert framingham(46,'Male',False,174,64,114,False) == 0.0142
assert framingham(25,'Male',False,193,31,84,False) == -1
assert framingham(62,'Male',False,167,31,115,False) == 0.1098
assert framingham(59,'Male',True,174,66,88,True) == 0.0709
assert framingham(60,'Female',True,156,63,124,True) == 0.0293
assert framingham(53,'Male',False,141,51,109,False) == 0.0244

---

## Part 4. Putting it all together

Now that we have our three scores, we need to put them together into an overall composite risk score for a whole group of patients.  Those patients are in a CSV file on the server called `/data/midterm_patients.csv`.  You can open this file in Jupyter by browsing to your Home icon -> from_instructor -> data.  There are several things that you're going to need to do read this file, calculate individual risk scores, and compute an overall risk score.

First, you'll notice that some of the column names are different between the data in the input file and the values that are expected by your functions above.  For example: "M" from the file needs to be turned into "Male" and "Yes" in the file needs to be turned into "True".  You will need to do conversions for all of the fields listed below:

| Field in CSV | Parameter Name Above | Source Values | Values Needed Above |
| :----------- | :------------------- | :-: | :-: |
| bp medicine  | bp_treated           | Yes / No | True / False |
| sex          | sex                  | M / F | Male / Female |
| smoker       | smoker               | Yes / No | True / False |
| risk factors | risks                | # | # |
| chf       | chf history               | Yes / No | True / False |
| hypertension       | hypertension history               | Yes / No | True / False |
| stroke    | stroke history             | Yes / No | True / False |
| vascular      | vascular disease history           | Yes / No | True / False |
| diabetes     | diabetes history              | Yes / No | True / False |


After calculating these three risk scores, use the rules below to determine who is at highest risk.  To be classified as "High Risk" a patient must meet all three criteria below:
1. CHA2DS2_VASc >= 2
2. HEART >= 4
3. Framingham >= 3%

Your output for this function needs to be a list where each item in this contains `[patient, CHA2DS2_VASc, HEART, Framingham, High Risk]`

In [146]:
import pandas as pd
import numpy as np

def test_patients(filename): 
    """ 
    (csv file) -> List
    Returns a list that contains patients id, common risk scores, and their calculated high risk 
    (formula based on predictive health scores). 
    """
    patients = pd.read_csv(filename)
    
    #manipulate patient data 
    patients.rename(columns={'chf history':'chf',
                         'hypertension history':'hypertension',
                         'stroke history':'stroke',
                         'vascular disease history': 'vascular',
                         'diabetes history':'diabetes',
                         'risk factors':'risks',
                         'total cholesterol':'cholesterol',
                         'hdl cholesterol':'hdl',
                         'systolic bp':'systolic',
                         'bp medicine':'bp_treated'},inplace=True)
    patients['sex']=patients['sex'].replace({'M':'Male','F':'Female'})
    patients.replace({'Yes':True,'No':False},inplace=True)
    
    #add scores as columns in patients 
    patients['CHA2DS2_VASc'] = np.vectorize(cha2ds2_vasc)(patients.age, patients.sex, patients.chf, patients.hypertension, patients.stroke, patients.vascular, patients.diabetes)
    patients['HEART'] = np.vectorize(heart)(patients.history, patients.ekg, patients.age, patients.risks, patients.troponin)
    patients['Framingham']=np.vectorize(framingham,otypes=[float])(patients.age, patients.sex, patients.smoker, patients.cholesterol, patients.hdl, patients.systolic, patients.bp_treated)

    #determine/add High Risk column in patients 
    conditions = [
        (patients.CHA2DS2_VASc >= 2) & (patients.HEART >= 4) & (patients.Framingham*100 >= 3.0),
        (patients.CHA2DS2_VASc < 2) & (patients.HEART < 4) & (patients.Framingham*100 < 3.0)
    ]

    values = [True,False]  
    patients['High Risk']=np.select(conditions,values)
    patients['High Risk'].replace({1:True,0:False},inplace=True)
    
    #create list for each patient 
    answers = patients[['patient','CHA2DS2_VASc','HEART','Framingham','High Risk']].values.tolist()
    
    return(answers)



Question for Paul: Is there a way to return values as True/False instead of 0/1 in np.select?

In [142]:
#test example

# import pandas as pd
# import numpy as np

# def test_fun(x):
#     y=x*0.010567
#     return(round(y,4))

# df = pd.DataFrame({'Id':['P1','P2','P3'],'x':[1,2,3],'z':[10,11,12]})
# df['y'] = np.vectorize(test_fun,otypes=[float])(df.x)     #vectorize func


# tconditions = [
#     (df.x>=2) & (df.z>=11) & (df.y*100>=2.0),
#     (df.x<2) & (df.z<11) & (df.y*100<2.0)
# ]

# tvalues = [True,False]
# df['test1']=np.select(tconditions,tvalues)
# df['test1'].replace({1:True,0:False},inplace=True)

# df.head()

# df[["x","y","test1"]].values.tolist()

#Different options to get list from dataframe

# ans=[]
# for index,row in df.iterrows():
#     ans.append([row['x'],row['y'],row['test1']]) 
#ans #gives specific cols of interest
#df.values.tolist() #gives all cols in the correct format
#map(list,df.values) ##gives all cols in the correct format
#https://datatofish.com/convert-pandas-dataframe-to-list/

Unnamed: 0,Id,x,z,y,test1
0,P1,1,10,0.0106,False
1,P2,2,11,0.0211,True
2,P3,3,12,0.0317,True


In [144]:
#new answers version
answers = [['E40794', 7.0, 7.0, -1.0, False],
 ['E57853', 2.0, 3.0, -1.0, False],
 ['E63841', 7.0, 4.0, 0.0015, False],
 ['E87700', 5.0, 5.0, -1.0, False],
 ['E49662', 3.0, 4.0, 0.005, False],
 ['E19241', 6.0, 7.0, -1.0, False],
 ['E94033', 2.0, 4.0, -1.0, False],
 ['E19724', 4.0, 8.0, 0.0391, True],
 ['E77077', 6.0, 6.0, -1.0, False],
 ['E75736', 2.0, 7.0, 0.0437, True],
 ['E20246', 6.0, 5.0, -1.0, False],
 ['E58235', 3.0, 2.0, 0.0465, False],
 ['E29619', 4.0, 5.0, 0.0189, False],
 ['E18023', 3.0, 2.0, -1.0, False],
 ['E56386', 5.0, 4.0, 0.0016, False],
 ['E87379', 3.0, 7.0, 0.0126, False],
 ['E44264', 1.0, 6.0, 0.0126, False],
 ['E85955', 2.0, 4.0, 0.0153, False],
 ['E17497', 4.0, 3.0, 0.0183, False],
 ['E11391', 9.0, 7.0, 0.0239, False],
 ['E41611', 2.0, 3.0, -1.0, False],
 ['E66188', 4.0, 6.0, -1.0, False],
 ['E74052', 3.0, 7.0, 0.0053, False],
 ['E40182', 5.0, 2.0, 0.0142, False],
 ['E21161', 4.0, 6.0, -1.0, False],
 ['E59494', 5.0, 5.0, 0.1098, True],
 ['E61747', 3.0, 7.0, 0.0709, True],
 ['E42697', 5.0, 6.0, 0.0293, False],
 ['E61043', 3.0, 3.0, 0.0244, False]]

In [147]:
assert test_patients('/data/test-patients.csv') == answers

---

## Submitting Your Work

In order to submit your work, you'll need to use the `git` command line program to **add** your homework file (this file) to your local repository, **commit** your changes to your local repository, and then **push** those changes up to github.com.  From there, I'll be able to **pull** the changes down and do my grading.  I'll provide some feedback, **commit** and **push** my comments back to you.  Next week, I'll show you how to **pull** down my comments.

To run through everything one last time and submit your work:
1. Use the `Kernel` -> `Restart Kernel and Run All Cells` menu option to run everything from top to bottom and stop here.
2. Save this note with Ctrl-S (or Cmd-S)
2. Skip down to the last command cell (the one starting with `%%bash`) and run that cell.

If anything fails along the way with this submission part of the process, let me know.  I'll help you troubleshoort.

In [None]:
assert False, "DO NOT REMOVE THIS LINE"

---

In [148]:
%%bash
git pull
git add midterm-2021.ipynb
git commit -a -m "Finally submitting the midterm!"
git push

Already up to date.
[main 1a4e7bd] Finally submitting the midterm!
 2 files changed, 932 insertions(+), 34 deletions(-)
 create mode 100644 week07-midterm/midterm-2021.ipynb


To github.com:mmunozru/hds5210-2021.git
   20281f5..1a4e7bd  main -> main



---

If the message above says something like _Finally submitting the midterm!__ or _Everything is up to date_, then your work was submitted correctly.