---
#### Goal of Script: Calculate high risk composite score 

Various scores have been developed to assess the health status of a patient. The CHAD2DS2-VASc Score determines if a patient with artial fibrillation (heart arrhythmia) is at risk of a stroke ([link](https://www.mdcalc.com/cha2ds2-vasc-score-atrial-fibrillation-stroke-risk)). The HEART score determines the risk of having a major cardiac event within 6 weeks ([link](https://www.mdcalc.com/heart-score-major-cardiac-events)) and the Framingham Risk Score determines the risk of having a heart attack within 10 years ([link](https://www.mdcalc.com/framingham-risk-score-hard-coronary-heart-disease)).

I created three separate functions to calculate the CHAD2DS2-VASc score, the HEART score, and the Framingham Risk score respectively. A fourth function (patient_scores) will calculate all three scores for each patient in a given dataset (patients.csv). The dataset is arbitrary and is manipulated to match the definitions in the the patient_scores function. 

Once all three scores are calculated, the patient_scores function will also characterize the patient as high risk if they meet the following somewhat arbitary criteria (CHA2DS2_VASc >= 2, HEART >= 4 and Framingham >= 3%). 

---
## Part 1: CHA2DS2-VASc

##### Step 1: Define assumptions

The CHAD2DS2-VASc score uses 7 inputs ([link](https://www.mdcalc.com/cha2ds2-vasc-score-atrial-fibrillation-stroke-risk)): 
* Age (Number)
* Sex (Male / Female)
* CHF History (True / False)
* Hypertension History (True / False)
* Stroke History (True / False)
* Vascular Disease History (True / False)
* Diabetes History (True / False)

##### Step 2: Create the CHAD2DS2-VASc score function  

The CHAD2DS2-VASc function has 7 inputs and returns a single value between 0 and 9.<br>
A value of 0 indicates a low risk of stroke and 9 idicates high risk. 

In [1]:
def cha2ds2_vasc(age, sex, chf, hypertension, stroke, vascular, diabetes):
    """
    (int,str,bool,bool,bool,bool,bool)
    Return the CHA-DS-VASc score, which predicts a patient's stroke risk when patients have an existing 
    atrial fibrillation. The score ranges from low (0) to high (9).
    
    >>> cha2ds2_vasc(30,'Female',False,False,False,False,False)
    1
    
    >>> cha2ds2_vasc(65,'Male',False,True,True,False,True)
    5
    """
    
    AFS_score = 0
    history = [chf, hypertension, stroke, vascular, diabetes]
    
    if age < 65:
        AFS_score += 0
    elif (age >= 65) and (age <= 74):
        AFS_score += 1
    else:
        AFS_score += 2
    
    if sex == 'Female':
        AFS_score += 1 
    else:
        AFS_score += 0
        
    for index,condition in enumerate(history):
        if condition == False:
            AFS_score += 0
        else:
            if index == 2:
                AFS_score += 2
            else:
                AFS_score += 1
        
    return(AFS_score)

In [2]:
##### Step 3: Validate cha2ds2_vasc function  

In [3]:
import doctest
doctest.run_docstring_examples(cha2ds2_vasc, globals(), verbose=True)

Finding tests in NoName
Trying:
    cha2ds2_vasc(30,'Female',False,False,False,False,False)
Expecting:
    1
ok
Trying:
    cha2ds2_vasc(65,'Male',False,True,True,False,True)
Expecting:
    5
ok


In [4]:
assert cha2ds2_vasc(82,'Male',False,True,True,True,True) == 7
assert cha2ds2_vasc(22,'Male',False,False,True,False,False) == 2
assert cha2ds2_vasc(32,'Female',True,True,True,True,True) == 7
assert cha2ds2_vasc(21,'Female',True,True,True,False,False) == 5
assert cha2ds2_vasc(52,'Female',True,True,False,False,False) == 3
assert cha2ds2_vasc(88,'Male',True,True,True,False,False) == 6
assert cha2ds2_vasc(22,'Male',False,False,True,False,False) == 2
assert cha2ds2_vasc(71,'Female',False,False,False,True,True) == 4
assert cha2ds2_vasc(89,'Female',True,False,False,True,True) == 6
assert cha2ds2_vasc(54,'Male',True,False,False,False,True) == 2
assert cha2ds2_vasc(89,'Female',False,False,True,True,False) == 6
assert cha2ds2_vasc(36,'Male',False,True,False,True,True) == 3
assert cha2ds2_vasc(57,'Female',True,False,False,True,True) == 4
assert cha2ds2_vasc(22,'Female',False,True,False,True,False) == 3
assert cha2ds2_vasc(40,'Female',True,True,True,False,False) == 5
assert cha2ds2_vasc(54,'Female',False,False,False,True,True) == 3
assert cha2ds2_vasc(39,'Male',True,False,False,False,False) == 1
assert cha2ds2_vasc(61,'Female',False,False,False,True,False) == 2
assert cha2ds2_vasc(57,'Female',True,False,True,False,False) == 4
assert cha2ds2_vasc(76,'Female',True,True,True,True,True) == 9
assert cha2ds2_vasc(83,'Male',False,False,False,False,False) == 2
assert cha2ds2_vasc(86,'Female',False,True,False,False,False) == 4
assert cha2ds2_vasc(61,'Female',True,False,False,False,True) == 3
assert cha2ds2_vasc(46,'Male',True,True,True,True,False) == 5
assert cha2ds2_vasc(25,'Male',True,True,False,True,True) == 4
assert cha2ds2_vasc(62,'Male',False,True,True,True,True) == 5
assert cha2ds2_vasc(59,'Male',False,True,True,False,False) == 3
assert cha2ds2_vasc(60,'Female',False,True,True,False,True) == 5
assert cha2ds2_vasc(53,'Male',False,True,True,False,False) == 3

---

## Part 2: HEART Score

##### Step 1: Define assumptions

The HEART score ues 5 high-level inputs ([link](https://www.mdcalc.com/heart-score-major-cardiac-events)):

* History (Slightly / Moderately / Highly suspicious)
* EKG (Normal / Non-specific repolarization disturbance / Significant ST deviation)
* Age (Number)
* Risk Factors (Number of risk factors)
* Initial Troponin (Number of times the normal limit)

#### Step 2: Create the HEART score function  

The HEART function has 5 inputs and returns a single value between 0 and 10. <br>
A value of 0 indicates a low risk of a major cardiac event and 10 idicates high risk. 

In [5]:
def heart(history, ekg, age, risks, troponin):
    """
    (str,str,int,int,float) -> int
    Returns a HEART score, which predicts a patient's risk for major cardiac events from 0 (low) to 10 (high).
    
    >>> heart('Slightly suspicious','Normal',30, 0, 0)
    0
    >>> heart('Highly suspicious','Significant ST deviation',71, 7, 5)
    10
    """
    
    heart_score = 0
    
    if history.split()[0] == 'Slightly':
        heart_score += 0 
    elif history.split()[0] == 'Moderately':
        heart_score += 1 
    else:
        heart_score += 2
    
    if ekg.split()[0] == 'Normal':
        heart_score += 0
    elif ekg.split()[0] == 'Non-specific':
        heart_score += 1
    else: 
        heart_score += 2
    
    if age < 45:
        heart_score += 0  
    elif (age >= 45) and (age <= 64):
        heart_score += 1
    else:
        heart_score += 2
        
    if risks == 0:
        heart_score += 0
    elif (risks == 1) or (risks == 2):
        heart_score += 1
    else:
        heart_score += 2
        
    if troponin <= 1.0:
        heart_score += 0
    elif (troponin > 1.0) and (troponin <= 3.0):
        heart_score += 1
    else:
        heart_score += 2
        
    return(heart_score)

##### Step 3: Validate heart function  

In [6]:
import doctest
doctest.run_docstring_examples(heart, globals(), verbose=True)

Finding tests in NoName
Trying:
    heart('Slightly suspicious','Normal',30, 0, 0)
Expecting:
    0
ok
Trying:
    heart('Highly suspicious','Significant ST deviation',71, 7, 5)
Expecting:
    10
ok


In [7]:
assert heart('Moderately suspicious','Normal',82,4,3.8) == 7
assert heart('Slightly suspicious','Non-specific repolarization',22,2,2.3) == 3
assert heart('Slightly suspicious','Non-specific repolarization',32,4,1.3) == 4
assert heart('Highly suspicious','Non-specific repolarization',21,1,1.1) == 5
assert heart('Slightly suspicious','Normal',52,5,1.2) == 4
assert heart('Moderately suspicious','Significant ST deviation',88,5,0.5) == 7
assert heart('Slightly suspicious','Non-specific repolarization',22,5,3.0) == 4
assert heart('Slightly suspicious','Significant ST deviation',71,4,3.9) == 8
assert heart('Moderately suspicious','Non-specific repolarization',89,5,0.3) == 6
assert heart('Highly suspicious','Normal',54,4,3.9) == 7
assert heart('Moderately suspicious','Normal',89,3,0.3) == 5
assert heart('Slightly suspicious','Non-specific repolarization',36,1,0.4) == 2
assert heart('Moderately suspicious','Normal',57,4,1.3) == 5
assert heart('Slightly suspicious','Normal',22,5,0.2) == 2
assert heart('Slightly suspicious','Normal',40,4,3.9) == 4
assert heart('Highly suspicious','Normal',54,3,3.1) == 7
assert heart('Highly suspicious','Significant ST deviation',39,4,0.9) == 6
assert heart('Moderately suspicious','Normal',61,2,1.9) == 4
assert heart('Slightly suspicious','Normal',57,1,1.7) == 3
assert heart('Moderately suspicious','Significant ST deviation',76,2,1.7) == 7
assert heart('Slightly suspicious','Normal',83,1,1.0) == 3
assert heart('Highly suspicious','Normal',86,1,2.3) == 6
assert heart('Highly suspicious','Non-specific repolarization',61,2,3.5) == 7
assert heart('Slightly suspicious','Normal',46,2,1.0) == 2
assert heart('Slightly suspicious','Significant ST deviation',25,4,3.1) == 6
assert heart('Moderately suspicious','Non-specific repolarization',62,1,2.4) == 5
assert heart('Highly suspicious','Non-specific repolarization',59,2,3.6) == 7
assert heart('Moderately suspicious','Significant ST deviation',60,1,2.1) == 6
assert heart('Slightly suspicious','Normal',53,4,0.1) == 3

---
## Part 3: Framingham Risk Score

##### Step 1: Define assumptions

The Framingham Risk Score uses 7 inputs ([link](https://www.mdcalc.com/framingham-risk-score-hard-coronary-heart-disease)):
* Age (Number)
* Sex (Male / Female)
* Smoker (True / False)
* Total cholesterol (Number)
* HDL cholesterol (Number)
* Systolic BP (Number)
* Blood pressure being treated with medicines (No / Yes)

Special notes in score logic:
*  *Yes=1, No=0 (for Treated for blood pressure and Smoker)
*  ** Men: if age >70, use ln(70) x Smoker
*  ** Women: if age >78, use ln(78) x Smoker

Note: Framingham Risk Score is for non diabetic patients aged 30-79. If patient's age is < 30 or > 79, function will return `-1` rather than a specific risk score.


#### Step 2: Create the Framingham score function  

The framingham function has 7 inputs and returns a percentage. <br>
A percentage less than 10% indicates the patient is low risk for myocardial infraction and cardiac death. A percentage between 10% and 20% is intermediate risk and a percentage greater than 20% is considered high risk. 

In [8]:
import math

def framingham(age, sex, smoker, cholesterol, hdl, systolic, bp_treated):
    """
    (int,str,bool,int,int,int,bool)
    Returns a Framingham risk score, which predicts a patient's risk for hard coronary heart disease for non-diabetic patients.
    
    >>> framingham(30, 'Female', False, 150, 40, 120, False)
    0.0002
    >>> framingham(67, 'Female', False, 160, 60, 120, False)
    0.0173
    """ 

    if (age < 30) or (age > 79):
        P = -1
    
    else: 
        
        #Coefficents: male,female 
        Beta = { 'age':(52.00961,31.764001),'cholesterol':(20.014077,22.465206),
                 'hdl':(-0.905964,-1.187731),'systolic':(1.305784,2.552905),
                 'bp_treated':(0.241549,0.420251),'smoker':(12.096316,13.07543),
                 'age_cholesterol':(-4.605038,-5.060998),'age_smoker':(-2.84367,-2.996945),
                 'age_age':(-2.93323,0)
               }

        #Adjustments for males, age_smoker
        if sex == 'Male':
                s = 0
                constant = 0.9402
                y_intercept = -172.300168

                #Adjustment for male age > 70
                if age > 70:
                    age_smoker = (Beta['age_smoker'][s])*(math.log(70)* int(smoker))
                else:
                    age_smoker = (Beta['age_smoker'][s])*(math.log(age)*int(smoker))     

        #Adjustments for females, age_smoker
        else:
                s = 1
                constant = 0.98767
                y_intercept = -146.5933061

                #Adjustment for female age > 78 (irrelevant -> P = -1)
                if age > 78:
                    age_smoker = (Beta['age_smoker'][s])*(math.log(78)*int(smoker))
                else:
                    age_smoker = (Beta['age_smoker'][s])*(math.log(age)*int(smoker))   

        #Calculate formula 
        L = [ (Beta['age'][s])*math.log(age) + (Beta['cholesterol'][s])*math.log(cholesterol) + (Beta['hdl'][s])*math.log(hdl) + 
              (Beta['systolic'][s])*math.log(systolic) + (Beta['bp_treated'][s])*int(bp_treated) + (Beta['smoker'][s])*int(smoker) + 
              (Beta['age_cholesterol'][s])*(math.log(age)*math.log(cholesterol)) + age_smoker + 
              (Beta['age_age'][s])*(math.log(age)*math.log(age)) + y_intercept
            ]
        
        P = 1 - constant**(math.exp(L[0]))
            
    return(round(P,4))       

##### Step 3: Validate Framingham function  

In [9]:
import doctest
doctest.run_docstring_examples(framingham, globals(), verbose=True)

Finding tests in NoName
Trying:
    framingham(30, 'Female', False, 150, 40, 120, False)
Expecting:
    0.0002
ok
Trying:
    framingham(67, 'Female', False, 160, 60, 120, False)
Expecting:
    0.0173
ok


In [10]:
assert framingham(82,'Male',False,214,64,92,True) == -1
assert framingham(22,'Male',False,146,33,102,False) == -1
assert framingham(32,'Female',False,195,31,115,True) == 0.0015
assert framingham(21,'Female',False,152,42,82,True) == -1
assert framingham(52,'Female',False,214,58,85,True) == 0.005
assert framingham(88,'Male',True,173,67,104,False) == -1
assert framingham(22,'Male',False,163,62,112,False) == -1
assert framingham(71,'Female',False,188,30,99,False) == 0.0391
assert framingham(89,'Female',True,172,55,88,False) == -1
assert framingham(54,'Male',False,156,52,117,True) == 0.0437
assert framingham(89,'Female',False,147,58,127,True) == -1
assert framingham(36,'Male',True,169,33,128,True) == 0.0465
assert framingham(57,'Female',True,204,40,86,False) == 0.0189
assert framingham(22,'Female',False,177,59,81,False) == -1
assert framingham(40,'Female',False,165,43,111,True) == 0.0016
assert framingham(54,'Female',True,200,50,86,False) == 0.0126
assert framingham(39,'Male',False,189,49,130,True) == 0.0126
assert framingham(61,'Female',True,176,68,106,False) == 0.0153
assert framingham(57,'Female',False,181,47,124,True) == 0.0183
assert framingham(76,'Female',True,162,56,94,False) == 0.0239
assert framingham(83,'Male',False,215,52,98,True) == -1
assert framingham(86,'Female',True,169,55,100,True) == -1
assert framingham(61,'Female',False,151,65,86,True) == 0.0053
assert framingham(46,'Male',False,174,64,114,False) == 0.0142
assert framingham(25,'Male',False,193,31,84,False) == -1
assert framingham(62,'Male',False,167,31,115,False) == 0.1098
assert framingham(59,'Male',True,174,66,88,True) == 0.0709
assert framingham(60,'Female',True,156,63,124,True) == 0.0293
assert framingham(53,'Male',False,141,51,109,False) == 0.0244

---
## Part 4: High Risk Score

The patients dataset (`patients.csv`) has 30 observations. The columns and column values differ from the definitions within the functions. For example: "M" from the patients.csv file needs to be turned into "Male" and "Yes" needs to be turned into "True". The conversions for all columns are listed below:

| Field in CSV | Parameter Name Above | Source Values | Values Needed Above |
| :----------- | :------------------- | :-: | :-: |
| bp medicine  | bp_treated           | Yes / No | True / False |
| sex          | sex                  | M / F | Male / Female |
| smoker       | smoker               | Yes / No | True / False |
| risk factors | risks                | # | # |
| chf       | chf history               | Yes / No | True / False |
| hypertension       | hypertension history               | Yes / No | True / False |
| stroke    | stroke history             | Yes / No | True / False |
| vascular      | vascular disease history           | Yes / No | True / False |
| diabetes     | diabetes history              | Yes / No | True / False |


##### Step 1: Define assumptions

A patient is classified as "High Risk" if they meet all three criteria below:

1. CHA2DS2_VASc >= 2
2. HEART >= 4
3. Framingham >= 3%

##### Step 2: Create the patient_scores function

The patient_scores function has 1 input (filename). The output is a list `[patient, CHA2DS2_VASc, HEART, Framingham, High Risk]` which gives the patient ID, the patient's individual CHA2DS2_VASc score, HEART score, Framingham score, and a True/False value if they are generally high risk patients. 

In [11]:
import pandas as pd
import numpy as np

def patient_scores(filename): 
    """ 
    (csv file) -> List
    Returns a list that contains patients id, common risk scores, and their calculated high risk 
    (formula based on predictive health scores). 
    """
    patients = pd.read_csv(filename)
    
    #manipulate patient data 
    patients.rename(columns={'chf history':'chf',
                         'hypertension history':'hypertension',
                         'stroke history':'stroke',
                         'vascular disease history': 'vascular',
                         'diabetes history':'diabetes',
                         'risk factors':'risks',
                         'total cholesterol':'cholesterol',
                         'hdl cholesterol':'hdl',
                         'systolic bp':'systolic',
                         'bp medicine':'bp_treated'},inplace=True)
    patients['sex']=patients['sex'].replace({'M':'Male','F':'Female'})
    patients.replace({'Yes':True,'No':False},inplace=True)
    
    #add scores as columns in patients 
    patients['CHA2DS2_VASc'] = np.vectorize(cha2ds2_vasc)(patients.age, patients.sex, patients.chf, patients.hypertension, patients.stroke, patients.vascular, patients.diabetes)
    patients['HEART'] = np.vectorize(heart)(patients.history, patients.ekg, patients.age, patients.risks, patients.troponin)
    patients['Framingham']=np.vectorize(framingham,otypes=[float])(patients.age, patients.sex, patients.smoker, patients.cholesterol, patients.hdl, patients.systolic, patients.bp_treated)

    #determine/add High Risk column in patients 
    conditions = [
        (patients.CHA2DS2_VASc >= 2) & (patients.HEART >= 4) & (patients.Framingham*100 >= 3.0),
        (patients.CHA2DS2_VASc < 2) & (patients.HEART < 4) & (patients.Framingham*100 < 3.0)
    ]

    values = [True,False]  
    patients['High Risk']=np.select(conditions,values)
    patients['High Risk'].replace({1:True,0:False},inplace=True)
    
    #create list for each patient 
    answers = patients[['patient','CHA2DS2_VASc','HEART','Framingham','High Risk']].values.tolist()
    
    return(answers)


In [12]:
answers = [['E40794', 7.0, 7.0, -1.0, False],
 ['E57853', 2.0, 3.0, -1.0, False],
 ['E63841', 7.0, 4.0, 0.0015, False],
 ['E87700', 5.0, 5.0, -1.0, False],
 ['E49662', 3.0, 4.0, 0.005, False],
 ['E19241', 6.0, 7.0, -1.0, False],
 ['E94033', 2.0, 4.0, -1.0, False],
 ['E19724', 4.0, 8.0, 0.0391, True],
 ['E77077', 6.0, 6.0, -1.0, False],
 ['E75736', 2.0, 7.0, 0.0437, True],
 ['E20246', 6.0, 5.0, -1.0, False],
 ['E58235', 3.0, 2.0, 0.0465, False],
 ['E29619', 4.0, 5.0, 0.0189, False],
 ['E18023', 3.0, 2.0, -1.0, False],
 ['E56386', 5.0, 4.0, 0.0016, False],
 ['E87379', 3.0, 7.0, 0.0126, False],
 ['E44264', 1.0, 6.0, 0.0126, False],
 ['E85955', 2.0, 4.0, 0.0153, False],
 ['E17497', 4.0, 3.0, 0.0183, False],
 ['E11391', 9.0, 7.0, 0.0239, False],
 ['E41611', 2.0, 3.0, -1.0, False],
 ['E66188', 4.0, 6.0, -1.0, False],
 ['E74052', 3.0, 7.0, 0.0053, False],
 ['E40182', 5.0, 2.0, 0.0142, False],
 ['E21161', 4.0, 6.0, -1.0, False],
 ['E59494', 5.0, 5.0, 0.1098, True],
 ['E61747', 3.0, 7.0, 0.0709, True],
 ['E42697', 5.0, 6.0, 0.0293, False],
 ['E61043', 3.0, 3.0, 0.0244, False]]

In [13]:
assert patient_scores('C:/Users/mmuno/Desktop/GitHub/hds5210-2021 (Python)/patients.csv') == answers
