# Model Explanations: Predicting CHD (Coronary Heart Disease)

### Global Interpretation:
- Understanding the overall structure of how a model makes a decision. 
- Explain the complete behavior of the model
- Help understand the suitability of the model for deployment (vetting the model)
- Predicting the risk of disease in patients
- How various health parameters are impacting the risk of a disease in patients

### Local Interpretation:
- Understanding how the model made decisions for a single instance.
- Explain individual predictions 
- Understand model behavior in the local neighborhood
- Why the model is saying a specific person has high risk of a disease!

### Dataset

#### What is coronary heart disease?


[Coronary heart disease (CHD)](https://en.wikipedia.org/wiki/Coronary_artery_disease)  is when your coronary arteries (the arteries that supply your heart muscle with oxygen-rich blood) become narrowed by a gradual build-up of fatty material within their walls. These arteries can become narrowed through build-up of plaque, which is made up of cholesterol and other substances. Narrowed arteries can cause symptoms, such as chest pain (angina), shortness of breath, and fatigue.


#### Dataset Description

Data is avaialable at: http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/
And header informtion is available at: http://statweb.stanford.edu/~tibs/ElemStatLearn/datasets/SAheart.info.txt

A retrospective sample of **males in a heart-disease high-risk region of the Western Cape, South Africa**. There are roughly two controls per case of CHD. Many of the CHD positive men have undergone blood pressure reduction treatment and other programs to reduce their risk factors after their CHD event. In some cases the measurements were made after these treatments. These data are taken from a larger dataset, described in Rousseauw et al, 1983, South African Medical Journal. 

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sn
import warnings
warnings.filterwarnings('ignore')

In [None]:
#chd_df = pd.read_csv( "https://drive.google.com/uc?export=download&id=1B6nWtO65LqgGV4AfDHwDUbBK_HC4g0uz", index_col=[0] )
chd_df = pd.read_csv( "https://drive.google.com/uc?export=download&id=1yRyZMfBQ8anG10GDFsLf15GRYPQqN12b",
                      index_col=[0])

In [None]:
chd_df.info()

In [None]:
chd_df.head(10)

### Encode the Categorical Variable

In [None]:
chd_encoded_df = pd.get_dummies( chd_df,
                                 columns = ['famhist'],
                                 drop_first = True ) 

In [None]:
X_features = list( chd_encoded_df.columns )

In [None]:
X_features.remove( "chd" )

### Split the dataset

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X_train, X_test, \
y_train, y_test = train_test_split( chd_encoded_df[X_features],
                                    chd_encoded_df.chd,
                                    test_size = 0.3,
                                    random_state = 42 )

In [None]:
X_train.shape

In [None]:
X_test.shape

### Build a RandomForest Model

In [None]:
from sklearn.ensemble import RandomForestClassifier

rf_clf = RandomForestClassifier(n_estimators=100, 
                                max_depth=6,
                                max_features=0.3,
                                max_samples=0.5,
                                class_weight={1:0.65, 0:0.35},
                                random_state=100)

In [None]:
rf_clf.fit(X_train, y_train)

In [None]:
y_pred_prob = rf_clf.predict_proba( X_test )[:,1]
y_pred = rf_clf.predict( X_test )

In [None]:
from sklearn.metrics import classification_report, roc_auc_score 

In [None]:
roc_auc_score( y_test, y_pred_prob )

## Partial Dependece Plots (PDPs)
- The effect of a feature on the outcome of an ML model after marginalizing the effect of other features
- Shows whether the relationship between the target and a feature is linear, monotonic or more complex.
- An assumption of the PDP is that the feature of interest (whose partial dependence is being computed) is not highly correlated with the other features. 

In [None]:
from sklearn.inspection import PartialDependenceDisplay

### Effect of Age on CHD

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))
ax.set_title("Partial Dependency Plot")
PartialDependenceDisplay.from_estimator(rf_clf,
                                        X_test,
                                        features = ['age'],
                                        feature_names = X_features,
                                        ax = ax);

### Effect of Tobacco on CHD

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))
ax.set_title("Partial Dependency Plot")
PartialDependenceDisplay.from_estimator(rf_clf,
                                        X_test,
                                        features = ['tobacco'],
                                        feature_names = X_features,
                                        ax = ax)

### Effect of SBP on CHD

In [None]:
fig, ax = plt.subplots(figsize=(12, 6))
ax.set_title("Partial Dependency Plot")
PartialDependenceDisplay.from_estimator(rf_clf,
                                        X_test,
                                        features = ['sbp'],
                                        feature_names = X_features,
                                        ax = ax)

## LIME - Local Interpretation of Model Explanation

- Local interpretable model-agnostic explanations (LIME)
- Why was this prediction is made or which variables caused the prediction?
- LIME modifies a single data sample by tweaking the feature values and observes the resulting impact on the output.
- Builds a surrogate model from the input (sample generation) and model predictions
- Any interpretable model can be used as a surrogate model
    - Linear Regression


### Install LIME

<code>
pip install lime
</code>

In [None]:
!pip install lime

In [None]:
X_features

In [None]:
import lime
import lime.lime_tabular

In [None]:
explainer = (lime
             .lime_tabular
             .LimeTabularExplainer(training_data = X_train.to_numpy(), 
                                   training_labels = y_train,                                   
                                   feature_names = X_features, 
                                   class_names = ['NO CHD', 'CHD'],
                                   categorical_features = ['famhist_Present'],
                                   categorical_names = ['famhist_Present'], 
                                   kernel_width=3,
                                   verbose = True ))

### Explaining a case of No CHD

In [None]:
X_test.iloc[0]

In [None]:
rf_clf.predict_proba([X_test.iloc[0]])

In [None]:
exp = explainer.explain_instance( X_test.iloc[0].to_numpy(), 
                                  rf_clf.predict_proba )

In [None]:
exp.show_in_notebook(show_table=True, show_all=False)

### Explaining a case of CHD

In [None]:
rf_clf.predict_proba([X_test.iloc[10]])

In [None]:
exp = explainer.explain_instance( X_test.iloc[10].to_numpy(), 
                                  rf_clf.predict_proba )

In [None]:
exp.show_in_notebook(show_table=True, show_all=False)