# HDS5210-2021 Midterm

In the midterm, you're going to focus on using the programming skills that you've developed so far to build a calculator three different risk scores and apply that to a data file. The three calculations you're going to write functions for are: 
* CHA2DS2-VASc Score for Atrial Fibrillation Stroke Risk - [link](https://www.mdcalc.com/cha2ds2-vasc-score-atrial-fibrillation-stroke-risk)
* HEART Score for Major Cardiac Events - [link](https://www.mdcalc.com/heart-score-major-cardiac-events)
* Framingham Risk Score for Hard Coronary Heart Disease - [link](https://www.mdcalc.com/framingham-risk-score-hard-coronary-heart-disease)

In each of the next three parts, you'll be programming a function to calculate each score.  In the last part of the midterm, you'll take those functions and use them to calculate risk scores for a list of patients from a CSV file and select a limited group of patients that match a fourth set of risk assessment criteria.


---

## Part 1: CHA2DS2-VASc

This scoring mechanism for Atrial Fibrillation Stroke uses 7 inputs:
* Age (Number)
* Sex (Male / Female)
* CHF History (True / False)
* Hypertension History (True / False)
* Stroke History (True / False)
* Vascular Disease History (True / False)
* Diabetes History (True / False)

Fill out the function below with logic to calculate the numeric risk score for teh given input.

Be sure to provide meaningful documentation and at least two test cases in your documentation.  Also make sure your code satisfies the test cases provided in the assert statements.

In [1]:
def cha2ds2_vasc(age, sex, chf, hypertension, stroke, vascular, diabetes):
    """(int, str, bool, bool, bool, bool, bool) -> int
    This function uses the logic from https://www.mdcalc.com/cha2ds2-vasc-score-atrial-fibrillation-stroke-risk
    to compute the CHA2DS2-VASc Score for Atrial Fibrillation Stroke Risk.
    
    >>> cha2ds2_vasc(82,'Male',False,True,True,True,True)
    7
    
    >>> cha2ds2_vasc(22,'Female',False,True,False,True,False)
    3
    
    >>> cha2ds2_vasc(83,'Male',False,False,False,False,False)
    2
    
    """
    ### YOUR SOLUTION HERE
    risk_score = 0
    if age < 65:
        risk_score += 0
    elif age > 64 and age < 75:
        risk_score += 1
    else:
        risk_score += 2
    
    if sex == 'Female':
        risk_score += 1
    else:
        risk_score += 0
        
    if chf:
        risk_score += 1
    
    if hypertension:
        risk_score += 1
        
    if stroke:
        risk_score += 2
    
    if vascular:
        risk_score += 1
        
    if diabetes:
        risk_score += 1
        
    return risk_score

Testing your code with assertions....

In [2]:
assert cha2ds2_vasc(82,'Male',False,True,True,True,True) == 7
assert cha2ds2_vasc(22,'Male',False,False,True,False,False) == 2
assert cha2ds2_vasc(32,'Female',True,True,True,True,True) == 7
assert cha2ds2_vasc(21,'Female',True,True,True,False,False) == 5
assert cha2ds2_vasc(52,'Female',True,True,False,False,False) == 3
assert cha2ds2_vasc(88,'Male',True,True,True,False,False) == 6
assert cha2ds2_vasc(22,'Male',False,False,True,False,False) == 2
assert cha2ds2_vasc(71,'Female',False,False,False,True,True) == 4
assert cha2ds2_vasc(89,'Female',True,False,False,True,True) == 6
assert cha2ds2_vasc(54,'Male',True,False,False,False,True) == 2
assert cha2ds2_vasc(89,'Female',False,False,True,True,False) == 6
assert cha2ds2_vasc(36,'Male',False,True,False,True,True) == 3
assert cha2ds2_vasc(57,'Female',True,False,False,True,True) == 4
assert cha2ds2_vasc(22,'Female',False,True,False,True,False) == 3
assert cha2ds2_vasc(40,'Female',True,True,True,False,False) == 5
assert cha2ds2_vasc(54,'Female',False,False,False,True,True) == 3
assert cha2ds2_vasc(39,'Male',True,False,False,False,False) == 1
assert cha2ds2_vasc(61,'Female',False,False,False,True,False) == 2
assert cha2ds2_vasc(57,'Female',True,False,True,False,False) == 4
assert cha2ds2_vasc(76,'Female',True,True,True,True,True) == 9
assert cha2ds2_vasc(83,'Male',False,False,False,False,False) == 2
assert cha2ds2_vasc(86,'Female',False,True,False,False,False) == 4
assert cha2ds2_vasc(61,'Female',True,False,False,False,True) == 3
assert cha2ds2_vasc(46,'Male',True,True,True,True,False) == 5
assert cha2ds2_vasc(25,'Male',True,True,False,True,True) == 4
assert cha2ds2_vasc(62,'Male',False,True,True,True,True) == 5
assert cha2ds2_vasc(59,'Male',False,True,True,False,False) == 3
assert cha2ds2_vasc(60,'Female',False,True,True,False,True) == 5
assert cha2ds2_vasc(53,'Male',False,True,True,False,False) == 3

---

## Part 2: HEART Score

The HEART score is a predictor for major cardiac events.  It requires 5 high-level inputs:
* History (Slightly / Moderately / Highly suspicious)
* EKG (Normal / Non-specific repolarization disturbance / Significant ST deviation)
* Age (Number)
* Risk Factors (Number of risk factors)
* Initial Troponin (Number of times the normal limit)

Fill out the function below with logic to calculate the numeric risk score for teh given input.

Be sure to provide meaningful documentation and at least two test cases in your documentation. Also make sure your code satisfies the test cases provided in the assert statements.

In [3]:
def heart(history, ekg, age, risks, troponin):
    """(str, str, int, int, float) -> int
    This function uses the logic from https://www.mdcalc.com/heart-score-major-cardiac-events
    to compute HEART score for major cardiac events.
    
    >>> heart('Moderately suspicious','Normal',82,4,3.8)
    7
    
    >>> heart('Slightly suspicious','Non-specific repolarization',36,1,0.4)
    2
    
    >>> heart('Moderately suspicious','Significant ST deviation',60,1,2.1)
    6
    
    """
    ### YOUR SOLUTION HERE
    risk_score = 0
    
    if history == 'Highly suspicious':
        risk_score += 2
    elif history == 'Moderately suspicious':
        risk_score += 1
    else:
        risk_score += 0
    
    if ekg == 'Significant ST deviation':
        risk_score += 2
    elif ekg == 'Non-specific repolarization':
        risk_score += 1
    else:
        risk_score += 0
    
    if age >= 65:
        risk_score += 2
    elif age < 65 and age >= 45:
        risk_score += 1
    else:
        risk_score += 0
        
    if risks >= 3:
        risk_score += 2
    elif risks <= 2 and risks >=1:
        risk_score += 1
    else:
        risk_score += 0
        
    if troponin > 3:
        risk_score += 2
    elif troponin <= 3 and troponin > 1:
        risk_score += 1
    else:
        risk_score += 0
        
    return risk_score
                 

In [4]:
assert heart('Moderately suspicious','Normal',82,4,3.8) == 7
assert heart('Slightly suspicious','Non-specific repolarization',22,2,2.3) == 3
assert heart('Slightly suspicious','Non-specific repolarization',32,4,1.3) == 4
assert heart('Highly suspicious','Non-specific repolarization',21,1,1.1) == 5
assert heart('Slightly suspicious','Normal',52,5,1.2) == 4
assert heart('Moderately suspicious','Significant ST deviation',88,5,0.5) == 7
assert heart('Slightly suspicious','Non-specific repolarization',22,5,3.0) == 4
assert heart('Slightly suspicious','Significant ST deviation',71,4,3.9) == 8
assert heart('Moderately suspicious','Non-specific repolarization',89,5,0.3) == 6
assert heart('Highly suspicious','Normal',54,4,3.9) == 7
assert heart('Moderately suspicious','Normal',89,3,0.3) == 5
assert heart('Slightly suspicious','Non-specific repolarization',36,1,0.4) == 2
assert heart('Moderately suspicious','Normal',57,4,1.3) == 5
assert heart('Slightly suspicious','Normal',22,5,0.2) == 2
assert heart('Slightly suspicious','Normal',40,4,3.9) == 4
assert heart('Highly suspicious','Normal',54,3,3.1) == 7
assert heart('Highly suspicious','Significant ST deviation',39,4,0.9) == 6
assert heart('Moderately suspicious','Normal',61,2,1.9) == 4
assert heart('Slightly suspicious','Normal',57,1,1.7) == 3
assert heart('Moderately suspicious','Significant ST deviation',76,2,1.7) == 7
assert heart('Slightly suspicious','Normal',83,1,1.0) == 3
assert heart('Highly suspicious','Normal',86,1,2.3) == 6
assert heart('Highly suspicious','Non-specific repolarization',61,2,3.5) == 7
assert heart('Slightly suspicious','Normal',46,2,1.0) == 2
assert heart('Slightly suspicious','Significant ST deviation',25,4,3.1) == 6
assert heart('Moderately suspicious','Non-specific repolarization',62,1,2.4) == 5
assert heart('Highly suspicious','Non-specific repolarization',59,2,3.6) == 7
assert heart('Moderately suspicious','Significant ST deviation',60,1,2.1) == 6
assert heart('Slightly suspicious','Normal',53,4,0.1) == 3

## Part 3: Framingham Risk Score for Hard Coronary Heart Disease

The Framingham Risk Score for Hard Coronary Heart Disease is intended for non diabetic patients age 30-79 only.  So, if the patient's age is < 30 or > 79, your function should return `-1` rather than a specific risk score.

The Framingham Risk Score takes 7 inputs:
* Age (Number)
* Sex (Male / Female)
* Smoker (True / False)
* Total cholesterol (Number)
* HDL cholesterol (Number)
* Systolic BP (Number)
* Blood pressure being treated with medicines (No / Yes)

You'll not that rather than being a basic parametric equation, this is a regression function defined by coefficients.  It also requires you take the natural logarithm (`ln`) of many of the parameters.  To help you out, here's an example of how to interpret the formulat provided in the website's **Evidence** tab.

Take special note of the footnotes in the logic:
* *Yes=1, No=0 (for Treated for blood pressure and Smoker)
* ** Men: if age >70, use ln(70) x Smoker. Women: if age >78, use ln(78) x Smoker.


---

These segments of the equation and the coefficient table...

> $ L_{Men} = \beta \times \ln(Age) + \beta \times \ln(cholesterol) ... $
>
> | Variable &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; | Men | Women
> | :---------------------------- | :--: | :--: |
> | $ \ln(Age) $ | 52.00961 | 31.764001
> | $ \ln(Cholesterol) $ | 20.014077 | 22.465206

Could be written in Python as follows:

```python
import math

L = 0
if sex == "Male":
    L += 52.00961 * math.ln(age)
    L += 20.014077 * math.ln(cholesterol)
else:
    L += 31.764001 * math.ln(age)
    L += 22.465206 * math.ln(cholesterol)
```

---

**ROUND YOUR FINAL RESULT TO 4 DECIMAL PLACES**

In [5]:

Variable

Coefficient (β)

Men

Women

ln(Age)

52.00961

31.764001

ln(Total cholesterol)

20.014077

22.465206

ln(HDL cholesterol)

-0.905964

-1.187731

ln(Systolic BP)

1.305784

2.552905

Treated for blood pressure*

0.241549

0.420251

Smoker*

12.096316

13.07543

ln(Age) x ln(Total cholesterol)

-4.605038

-5.060998

ln(Age) x Smoker**

-2.84367

-2.996945

ln(Age) x ln(Age)

-2.93323

-



SyntaxError: invalid syntax (<ipython-input-5-b2753044bd55>, line 15)

In [None]:
import math

def framingham(age, sex, smoker, cholesterol, hdl, systolic, bp_treated):
    """(int, str, bool, int, int, int, bool) -> float
    This function uses the logic from https://www.mdcalc.com/framingham-risk-score-hard-coronary-heart-disease
    to compute Framingham Risk Score for hard coronary heart disease for non diabetic patients age 30-79 only.
    
    >>> framingham(82,'Male',False,214,64,92,True)
    -1
    
    >>> framingham(57,'Female',True,204,40,86,False) 
    0.0189
    
    >>> framingham(62,'Male',False,167,31,115,False)
    0.1098
    
    """
    ### YOUR SOLUTION HERE
    import math
    
    risk_score = 0
    smoker_point = 0
    bp_treated_point = 0
    L = 0
    
    if smoker:
        smoker_point += 1
    if bp_treated:
        bp_treated_point += 1
        
    if age < 30 or age > 79:
        risk_score += -1
    elif sex == 'Male':
        if age <= 70:
            L += 52.00961*math.log(age) + 20.014077*math.log(cholesterol) + -0.905964*math.log(hdl) + 1.305784*math.log(systolic) + 0.241549*bp_treated_point + 12.096316*smoker_point + -4.605038*math.log(age)*math.log(cholesterol) + -2.84367*math.log(age)*smoker_point + -2.93323*math.log(age)*math.log(age) - 172.300168
            risk_score += round((1 - 0.9402**math.exp(L)), 4)
        else:
            L += 52.00961*math.log(age) + 20.014077*math.log(cholesterol) + -0.905964*math.log(hdl) + 1.305784*math.log(systolic) + 0.241549*bp_treated_point + 12.096316*smoker_point + -4.605038*math.log(age)*math.log(cholesterol) + -2.84367*math.log(70)*smoker_point + -2.93323*math.log(age)*math.log(age) - 172.300168
            risk_score += round((1 - 0.9402**math.exp(L)), 4)
    else:
        if age <= 78:
            L += 31.764001*math.log(age) + 22.465206*math.log(cholesterol) + -1.187731*math.log(hdl) + 2.552905*math.log(systolic) + 0.420251*bp_treated_point + 13.07543*smoker_point + -5.060998*math.log(age)*math.log(cholesterol) + -2.996945*math.log(age)*smoker_point - 146.5933061
            risk_score += round((1 - 0.98767**math.exp(L)), 4)
        else:
            L += 31.764001*math.log(age) + 22.465206*math.log(cholesterol) + -1.187731*math.log(hdl) + 2.552905*math.log(systolic) + 0.420251*bp_treated_point + 13.07543*smoker_point + -5.060998*math.log(age)*math.log(cholesterol) + -2.996945*math.log(78)*smoker_point - 146.5933061
            risk_score += round((1 - 0.98767**math.exp(L)), 4)
    
    return risk_score
    

In [None]:
assert framingham(82,'Male',False,214,64,92,True) == -1
assert framingham(22,'Male',False,146,33,102,False) == -1
assert framingham(32,'Female',False,195,31,115,True) == 0.0015
assert framingham(21,'Female',False,152,42,82,True) == -1
assert framingham(52,'Female',False,214,58,85,True) == 0.005
assert framingham(88,'Male',True,173,67,104,False) == -1
assert framingham(22,'Male',False,163,62,112,False) == -1
assert framingham(71,'Female',False,188,30,99,False) == 0.0391
assert framingham(89,'Female',True,172,55,88,False) == -1
assert framingham(54,'Male',False,156,52,117,True) == 0.0437
assert framingham(89,'Female',False,147,58,127,True) == -1
assert framingham(36,'Male',True,169,33,128,True) == 0.0465
assert framingham(57,'Female',True,204,40,86,False) == 0.0189
assert framingham(22,'Female',False,177,59,81,False) == -1
assert framingham(40,'Female',False,165,43,111,True) == 0.0016
assert framingham(54,'Female',True,200,50,86,False) == 0.0126
assert framingham(39,'Male',False,189,49,130,True) == 0.0126
assert framingham(61,'Female',True,176,68,106,False) == 0.0153
assert framingham(57,'Female',False,181,47,124,True) == 0.0183
assert framingham(76,'Female',True,162,56,94,False) == 0.0239
assert framingham(83,'Male',False,215,52,98,True) == -1
assert framingham(86,'Female',True,169,55,100,True) == -1
assert framingham(61,'Female',False,151,65,86,True) == 0.0053
assert framingham(46,'Male',False,174,64,114,False) == 0.0142
assert framingham(25,'Male',False,193,31,84,False) == -1
assert framingham(62,'Male',False,167,31,115,False) == 0.1098
assert framingham(59,'Male',True,174,66,88,True) == 0.0709
assert framingham(60,'Female',True,156,63,124,True) == 0.0293
assert framingham(53,'Male',False,141,51,109,False) == 0.0244

---

## Part 4. Putting it all together

Now that we have our three scores, we need to put them together into an overall composite risk score for a whole group of patients.  Those patients are in a CSV file on the server called `/data/midterm_patients.csv`.  You can open this file in Jupyter by browsing to your Home icon -> from_instructor -> data.  There are several things that you're going to need to do read this file, calculate individual risk scores, and compute an overall risk score.

First, you'll notice that some of the column names are different between the data in the input file and the values that are expected by your functions above.  For example: "M" from the file needs to be turned into "Male" and "Yes" in the file needs to be turned into "True".  You will need to do conversions for all of the fields listed below:

| Field in CSV | Parameter Name Above | Source Values | Values Needed Above |
| :----------- | :------------------- | :-: | :-: |
| bp medicine  | bp_treated           | Yes / No | True / False |
| sex          | sex                  | M / F | Male / Female |
| smoker       | smoker               | Yes / No | True / False |
| risk factors | risks                | # | # |
| chf       | chf history               | Yes / No | True / False |
| hypertension       | hypertension history               | Yes / No | True / False |
| stroke    | stroke history             | Yes / No | True / False |
| vascular      | vascular disease history           | Yes / No | True / False |
| diabetes     | diabetes history              | Yes / No | True / False |


After calculating these three risk scores, use the rules below to determine who is at highest risk.  To be classified as "High Risk" a patient must meet all three criteria below:
1. CHA2DS2_VASc >= 2
2. HEART >= 4
3. Framingham >= 3%

Your output for this function needs to be a list where each item in this contains `[patient, CHA2DS2_VASc, HEART, Framingham, High Risk]`

In [None]:
def tf(x):
    if x == 'Yes':
        x = True
    elif x == 'No':
        x = False
    
    return x

In [None]:
import csv

def test_patients(filename):
    ### YOUR SOLUTION HERE
    patients =[]
    item =[]
    
    with open(filename) as f:
        test_patients = csv.reader(f)
        next(test_patients, None)
        for row in test_patients:
            if row[2] == 'M':
                row[2] = 'Male'
            else:
                row[2] = 'Female'
            patient = row[0]
            age = int(row[1])
            sex = row[2]
            chf = tf(row[3])
            hypertension = tf(row[4])
            stroke = tf(row[5])
            vascular = tf(row[6])
            diabetes = tf(row[7])
            history = row[8]
            ekg = row[9]
            risks = int(row[10])
            troponin = float(row[11])
            smoker = tf(row[12])
            cholesterol = int(row[13])
            hdl = int(row[14])
            systolic = int(row[15])
            bp_treated = tf(row[16])
            CHA2DS2_VASc = cha2ds2_vasc(age, sex, chf, hypertension, stroke, vascular, diabetes)
            HEART = heart(history, ekg, age, risks, troponin)
            Framingham = framingham(age, sex, smoker, cholesterol, hdl, systolic, bp_treated)
            if CHA2DS2_VASc >= 2 and HEART>= 4 and Framingham >= 0.03:
                High_risk = True
            else:
                High_risk = False
            item = [patient, CHA2DS2_VASc, HEART, Framingham, High_risk]
            patients.append(item)
            
    return patients

In [None]:
answers = [['E40794', 7.0, 7.0, -1.0, False],
 ['E57853', 2.0, 3.0, -1.0, False],
 ['E63841', 7.0, 4.0, 0.0015, False],
 ['E87700', 5.0, 5.0, -1.0, False],
 ['E49662', 3.0, 4.0, 0.005, False],
 ['E19241', 6.0, 7.0, -1.0, False],
 ['E94033', 2.0, 4.0, -1.0, False],
 ['E19724', 4.0, 8.0, 0.0391, True],
 ['E77077', 6.0, 6.0, -1.0, False],
 ['E75736', 2.0, 7.0, 0.0437, True],
 ['E20246', 6.0, 5.0, -1.0, False],
 ['E58235', 3.0, 2.0, 0.0465, False],
 ['E29619', 4.0, 5.0, 0.0189, False],
 ['E18023', 3.0, 2.0, -1.0, False],
 ['E56386', 5.0, 4.0, 0.0016, False],
 ['E87379', 3.0, 7.0, 0.0126, False],
 ['E44264', 1.0, 6.0, 0.0126, False],
 ['E85955', 2.0, 4.0, 0.0153, False],
 ['E17497', 4.0, 3.0, 0.0183, False],
 ['E11391', 9.0, 7.0, 0.0239, False],
 ['E41611', 2.0, 3.0, -1.0, False],
 ['E66188', 4.0, 6.0, -1.0, False],
 ['E74052', 3.0, 7.0, 0.0053, False],
 ['E40182', 5.0, 2.0, 0.0142, False],
 ['E21161', 4.0, 6.0, -1.0, False],
 ['E59494', 5.0, 5.0, 0.1098, True],
 ['E61747', 3.0, 7.0, 0.0709, True],
 ['E42697', 5.0, 6.0, 0.0293, False],
 ['E61043', 3.0, 3.0, 0.0244, False]]

In [None]:
assert test_patients('/data/test-patients.csv') == answers

---

## Submitting Your Work

In order to submit your work, you'll need to use the `git` command line program to **add** your homework file (this file) to your local repository, **commit** your changes to your local repository, and then **push** those changes up to github.com.  From there, I'll be able to **pull** the changes down and do my grading.  I'll provide some feedback, **commit** and **push** my comments back to you.  Next week, I'll show you how to **pull** down my comments.

To run through everything one last time and submit your work:
1. Use the `Kernel` -> `Restart Kernel and Run All Cells` menu option to run everything from top to bottom and stop here.
2. Save this note with Ctrl-S (or Cmd-S)
2. Skip down to the last command cell (the one starting with `%%bash`) and run that cell.

If anything fails along the way with this submission part of the process, let me know.  I'll help you troubleshoort.

In [6]:
assert False, "DO NOT REMOVE THIS LINE"

AssertionError: DO NOT REMOVE THIS LINE

---

In [7]:
%%bash
git pull
git add midterm-2021.ipynb
git commit -a -m "Finally submitting the midterm!"
git push

Already up to date.
[main 1e502fe] Finally submitting the midterm!
 2 files changed, 784 insertions(+), 2 deletions(-)
 create mode 100644 midterm/midterm-2021.ipynb


To github.com:qilu2021/hds5210-2021.git
   f956821..1e502fe  main -> main



---

If the message above says something like _Finally submitting the midterm!__ or _Everything is up to date_, then your work was submitted correctly.