<h1> Machine Learning for Insights</h1>

<h2> What are the values of these insights</h2> 
- Model debugging
- Feature engineering
- Directing furure data collection
- Informing humman decision-making, for example advertisiers use data insight to help shoppers make purchasing decision 
- Building trust by verifying basic facts about the underlying data

<h2>Permutation Importance</h2>

Some of the questions that can be answered by analyzing feature impotance are:
- Identify important features that will impact model predicition
- How did each feature in the dataset affect particular prediction accuracy
- How does each features affect the overall model's predictions

**Feature importance is calculated after a model has been fitted:  **
- We will ask, if we randomly shuffle a single column of the validation data, leaving the target and all other columns in place, how would that affect the accuracy of prediction in the shuffled data?
- Randomly re-ordering a single column should cause less accurate predictions, since the resulting data no longer corresponds to anything observed in the real world. 
- Model accuracy especially suffers if we shuffle a column that the model relied on heavily for precition-- this indicates the column is of high importanc

** Process **
 
 1 Get a trained model
 
2 Shuffle the values in a single columns, make predictions using the resulting dataset. Calculate the loss function using these predicitons and target values. The preformance deterioration measures the importance of the variable that is being shuffled

3 Undo the shuffling and replete the process for other columns
    

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, 
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

df = pd.read_csv('../input/fifa-2018-match-statistics/FIFA 2018 Statistics.csv')
y = (df['Man of the Match'] == 'Yes') # convert from string "Yes/No" to binary
feature_names = [i for i in df.columns if df[i].dtype in [np.int64]]
X = df[feature_names]
train_X, valid_X, train_y, valid_y = train_test_split(X, y, random_state=1)
my_model = RandomForestClassifier(random_state=0).fit(train_X, train_y)


In [None]:
# Get feature importance
import eli5
from eli5.sklearn import PermutationImportance

permutation = PermutationImportance(my_model, random_state=1).fit(valid_X, valid_y)
eli5.show_weights(permutation, feature_names = feature_names)

** Interpreting Permutation Importances**
- Values with high `Weight` are the most impotant features
- The first number in each row shows how much model performance decreased with a random shffuling (in this case, using "accuracy" as the preformance metric)
- The number after `±` measures how performance varied from one-reshuffling to the next
- Occasionally value of permutation importance could be negative and this indicates that the prediciton on the shuffled data happened to be more accurace that the real data. This will happen when feature didn't matter, but random chance. 


**Calculate Permutatio for Taxi Fare Prediction**

In [None]:
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import LinearRegression
# Load data
df_taxi = pd.read_csv('../input/new-york-city-taxi-fare-prediction/train.csv', nrows=50000)

In [None]:
# Remove data with extreme outlier coordinates ot negative fares
df_taxi = df_taxi.query('pickup_latitude > 40.7 and pickup_latitude < 40.8 and ' +
                      'dropoff_latitude > 40.7 and dropoff_latitude < 40.8 and ' +
                      'pickup_longitude > -74 and pickup_longitude < -73.9 and ' +
                      'dropoff_longitude > -74 and dropoff_longitude < -73.9 and ' +
                      'fare_amount > 0'
                       )
y = df_taxi.fare_amount

In [None]:
# Model 
base_features = ['pickup_longitude', 'pickup_latitude', 'dropoff_longitude', 
                 'dropoff_latitude', 'passenger_count']
X = df_taxi[base_features]

train_taxi_X, valid_taxi_X, train_taxi_y, valid_taxi_y = train_test_split(X, y, random_state=1)
first_model = RandomForestRegressor(n_estimators=30, random_state=1).fit(train_taxi_X, train_taxi_y)

In [None]:
train_taxi_X.describe()

In [None]:
train_taxi_y.describe()

 For the first model, which variables seem potentially useful for prediciting taxi fares?

In [None]:
# Permutation performance
perm = PermutationImportance(first_model, random_state=1).fit(valid_taxi_X, valid_taxi_y)
eli5.show_weights(perm, feature_names=valid_taxi_X.columns.tolist())

Observation:
-  Dropoff and pickup lat and long are important features.
- On average, the lattitude features matter more than the longitude features

A good next step is to disentangle the effect of being in certain parts of the city from the effect of total distance traveled.  

In [None]:
# Create new features
df_taxi['abs_lon_change'] = abs(df_taxi.dropoff_longitude - df_taxi.pickup_longitude)
df_taxi['abs_lat_change'] = abs(df_taxi.dropoff_latitude - df_taxi.pickup_latitude)

# Add the new featues to the base features
features_2  = ['pickup_longitude',
               'pickup_latitude',
               'dropoff_longitude',
               'dropoff_latitude',
               'abs_lat_change',
               'abs_lon_change']

X = df_taxi[features_2]
new_train_taxi_X, new_valid_taxi_X, new_train_taxi_y, new_valid_taxi_y = train_test_split(X, y, random_state=1)
second_model = RandomForestRegressor(n_estimators=30, random_state=1).fit(new_train_taxi_X, new_train_taxi_y)

In [None]:
# Create a PermutationImportance object on second_model and fit it with the new valid data
perm_2 = PermutationImportance(second_model, random_state=1).fit(new_valid_taxi_X, new_valid_taxi_y)
# Show the weights for the permutation importance 
eli5.show_weights(perm_2, feature_names=features_2)

Distance traveled seems far more importanct than any location effect. Possible reasons latitude feature are more important than longitude features
    - latitudinal distances in the dataset tend to be larger
    - it is more expensive to travel a fixed latitudinal distance
    

**Conclusion**: Permutation importance is useful for debugging, understanding mode, and communicating a high-level  overview from model. 

Partial Dependence Plots
- It shows how a feature affects predictions. 
- Useful to answer questions such as
    - How would similart size house would priced in different areas?
    - Could predicted difference due to one feature or another
- It is calculated after a model has been fit-- similar to PermutationImportance, but we **repeatedly alter the value for one variable** to make a series of predictions. 

For FIFA 2018 dataset explore features using Decision Tree

In [None]:
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
import graphviz

# Create tree_based model
tree_model = DecisionTreeClassifier(random_state=0, max_depth=5, min_samples_split=5).fit(train_X, train_y)
tree_graph = tree.export_graphviz(tree_model, out_file=None, feature_names=feature_names)
# Visualize tree structure
graphviz.Source(tree_graph)

Create Partial Dependence Plot

In [None]:
from matplotlib import pyplot as plt
from pdpbox import pdp, get_dataset, info_plots

# Create the dataset for plotting
pdp_goals = pdp.pdp_isolate(model=tree_model, dataset=valid_X, model_features=feature_names, feature='Goal Scored')

# Plot it
pdp.pdp_plot(pdp_goals, 'Goal Scored')
plt.show()

Observation  and how to read the ppdp plot:
- The y-axis is interpreted as **change in the prediction** from what it would be predicted as the baseline or leftmost value
- A blue shaded area indicates level of confidence
From the graph we see that scoring a goal substantially increase chance of winning `Player of The Game`. However, extra goals beyond that appears to have little impact on predicitions.

In [None]:
# Another example plot
feature_to_plot = 'Distance Covered (Kms)'
pdp_dist = pdp.pdp_isolate(model=tree_model, dataset=valid_X, model_features=feature_names, feature=feature_to_plot)
pdp.pdp_plot(pdp_dist, feature_to_plot)
plt.show()

The graph is simple because of the undelying model used. The sample plot with advanced model 

In [None]:
# Build Random Forest model
rf_model = RandomForestClassifier(random_state=0).fit(train_X, train_y)

pdp_dist_2 = pdp.pdp_isolate(model=rf_model, dataset=valid_X,
                            model_features=feature_names, feature=feature_to_plot)
pdp.pdp_plot(pdp_dist_2, feature_to_plot)
plt.show()

Random Forest based model suggests that player are more likely to win `PLayer of The Game` if the player run a total of 100km over the course of the game. Running much more than that causes lower predicitions

In general, the smooth shape of this curve seems more plausible than the steep function from the Decision Tree model. 

<h2>2D Partial Dependence Plots</h2>

- It is used for understanding about interactions between features


In [None]:
# Use pdp_interact and pdp_interact_plot instead of pdp_isolate and pdp_isolate_plot, respectively
features_to_plot = ['Goal Scored', 'Distance Covered (Kms)']
inter_1 = pdp.pdp_interact(model=tree_model, dataset=valid_X, 
                           model_features=feature_names, features=features_to_plot)
pdp.pdp_interact_plot(pdp_interact_out=inter_1, feature_names=features_to_plot, plot_type='contour')
plt.show()

This contour plot shows predicitions for any combination of Goals Scored and Distance covered. For example, the highest predictions is when a team scores at least 1 goal and they run a total distance close to 100km. 


In [None]:
# Partial dependece plot for pickup_longitude
feat_name = 'pickup_longitude'
pdp_dist = pdp.pdp_isolate(model=first_model, dataset=valid_taxi_X, model_features=base_features, feature=feat_name)
pdp.pdp_plot(pdp_dist, feat_name)
plt.show()

In [None]:
from plotly import tools
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go

In [None]:
# Create all partial plots for NYC taxi-fare
def plot_pdp():
#     fig, axes = plt.subplots(nrows=3, ncols=2, 
#                              figsize=(13, 16))
    
    for feat_name in base_features:
        pdp_dist = pdp.pdp_isolate(model=first_model, dataset=valid_taxi_X, 
                                  model_features=base_features, feature=feat_name,n_jobs=3)
        pdp.pdp_plot(pdp_dist, feat_name)
        
#     plt.subplots_adjust(top=0.9)
#     plt.show()
    return None
plot_pdp()


Observations:
- Since we don't have distance measure, coordinate features, such as pickup_longitude, capture the effect of distance.
- Being picked up near the center of the longitude value lowers predicted fares on average, because it means shorter trip (on average)

In [None]:
# 2D partial plot for NYC taxi fare
fnames = ['pickup_longitude', 'dropoff_longitude']
longitude_pdp = pdp.pdp_interact(model = first_model, dataset=valid_taxi_X,
                                model_features = base_features, features = fnames)
pdp.pdp_interact_plot(pdp_interact_out=longitude_pdp, feature_names=fnames, plot_type='contour')
plt.show()

Observations from 2PDP:
- Countour running along a diagonal line, since we a pair of longitudes for pickup and dropoff indicating shorter trips
- Fare increases as we go further from the central diagonal 
- Fare also increase as we go further to the upper-right of the graph, including staying near the 45-degree line

Add direct distance measures


In [None]:
# PDP for pickup_longitude without absolute difference features
feat_name = 'pickup_longitude'
pdp_dist_original = pdp.pdp_isolate(model=first_model, dataset=valid_taxi_X, 
                                    model_features=base_features, feature=feat_name)
pdp.pdp_plot(pdp_dist_original, feat_name)
plt.show()


In [None]:
feat_name = 'pickup_longitude'
pdp_dist = pdp.pdp_isolate(model=second_model, dataset=new_valid_taxi_X, model_features=features_2, feature=feat_name)
pdp.pdp_plot(pdp_dist, feat_name)
plt.show()

Observations:
- Adding absolute distance reduced the partial dependence plot of `pickup_longitude`
- Accounting for the absolute distance traveled reduced the impct of `pickup_longitude` by about 1.5 (max)

Modify the initialization of `y` so that our PDP plot has a positive slope in the range [-1,1], and a negative slope everywhere else

In [None]:
from numpy.random import rand
n_sample = 20000

#  Creates two features, `X1` and `X2`, having random values in the range [-2, 2].
X1 = 4 * rand(n_sample) - 2
X2 = 4 * rand(n_sample) - 2

# Creates a target variable `y`, which is always 1.
y = -2 * X1 * (X1<-1) + X1 -2 * X1 * (X1 > 1) - X2
# Trains a `RandomForestRegressor` model to predict `y` given `X1` and `X2`
my_df = pd.DataFrame({'X1':X1, 'X2':X2, 'y':y})
predictors_df = my_df.drop(['y'], axis=1)
my_model = RandomForestRegressor(n_estimators=30, random_state=1).fit(predictors_df, my_df.y)
# Creates a PDP plot for `X1` and a scatter plot of `X1` vs. `y`
pdp_dist = pdp.pdp_isolate(model=my_model, dataset=my_df, model_features=['X1', 'X2'], feature='X1')
# Visualize results
pdp.pdp_plot(pdp_dist, 'X1')
plt.show()

Create a dataset with 2 features and a target, such that the pdp of the first feature is flat, but its permutation importance is high.  We will use a RandomForest for the model.

In [None]:
# Create array holding predictive feature
X1 = 4 * rand(n_sample) - 2
X2 = 4 * rand(n_sample) - 2

# Create y
y =  X1 * X2
# create dataframe because pdp_isolate expects a dataFrame as an argument
my_df = pd.DataFrame({'X1': X1, 'X2': X2, 'y': y})
predictors_df = my_df.drop(['y'], axis=1)

my_model = RandomForestRegressor(n_estimators=30, random_state=1).fit(predictors_df, my_df.y)

pdp_dist = pdp.pdp_isolate(model=my_model, dataset=my_df, model_features=['X1', 'X2'], feature='X1')
pdp.pdp_plot(pdp_dist, 'X1')
plt.show()


In [None]:
perm = PermutationImportance(my_model).fit(predictors_df, my_df.y)
# show the weights for the permutation importance you just calculated
eli5.show_weights(perm, feature_names = ['X1', 'X2'])

<h2>SHAP Values</h2>
- SHapely Additive exPlanations-- break down a predicition to show the impact of each featue
- Example 1: a model says a bank shouldn't loan someone money, and the bacnk is legally required to explanin the basis for each loan rejection
- Example 2: a healthcare provider wants to identify what factors are driving each patient's risk of some diseaces fo the van directly address those risk factors with targeted health interventions


To predict whether a ream would have a player win the Man of the Game awrd, we could ask 
   - How much was a prediction driven by the fact that the team scored 3 goals? or we can restate this as
   - How much was a predicition driven by the fact that the team scored 3 goals, **instead of some baceline number of goals**
  If we answer the question for `number of goals`, we could repeat the process for all other features
   - SHAP value perform this in a way that guarantees a nice property. When we make prediction
   - `sum(SHAP values for all features) = pred_for_team - pred_for_baseline_values`

In [None]:
df = pd.read_csv('../input/fifa-2018-match-statistics/FIFA 2018 Statistics.csv')
y = (df['Man of the Match'] == 'Yes') # convert from string "Yes/No" to binary
feature_names = [i for i in df.columns if df[i].dtype in [np.int64]]
X = df[feature_names]
X_fifa = X
train_X, valid_X, train_y, valid_y = train_test_split(X, y, random_state=1)
my_model_fifa = RandomForestClassifier(random_state=0).fit(train_X, train_y)

In [None]:
# Package used to calculate Shap values
import shap
row_to_show = 5
data_for_prediction = valid_X.iloc[row_to_show]    # use 1 row of data 
data_for_predicition_array = data_for_prediction.values.reshape(1, -1)

# Create object that can calculate shap values
explainer = shap.TreeExplainer(my_model_fifa)

# Calculate Shap values
shap_values = explainer.shap_values(data_for_prediction)


The shap_values is a list with two arrays
- The first array in the list is the SHAP values for negative outcome
- The second array is the list of SHAP values for positive outcome

In [None]:
my_model_fifa.predict_proba(data_for_predicition_array)

The team is 70% likely to have a player win the award. 

In [None]:
shap.initjs()
shap.force_plot(explainer.expected_value[1], shap_values[1], data_for_prediction)

In [None]:
# Example using KernelExplainer 
k_explainer = shap.KernelExplainer(my_model_fifa.predict_proba, train_X)
k_shap_values = k_explainer.shap_values(data_for_prediction)
shap.force_plot(explainer.expected_value[1], shap_values[1], data_for_prediction)

## The Scenario
A hospital has struggled with "readmissions," where they release a patient before the patient has recovered enough, and the patient returns with health complications. 

The hospital wants your help identifying patients at highest risk of being readmitted. Doctors (rather than your model) will make the final decision about when to release each patient; but they hope your model will highlight issues the doctors should consider when releasing a patient.


<h2>Step 1</h2>
   - A simple model is built, but the doctor said, model user, doesn't know how to evaluate a model. 
   - S/He would like further evidence that what the model is performing is in line with thier medical intution. 
   - To address this issue, we need to create a condensed overview of result supported by graphics

In [None]:
hosp_re_data = pd.read_csv('../input/hospital-readmissions/train.csv')
y = hosp_re_data.readmitted
base_features_hosp = [c for c in hosp_re_data.columns if c != 'readmitted']
# Split data into training and validation set
X = hosp_re_data[base_features_hosp]
train_X_hosp, valid_X_hosp, train_y_hosp, valid_y_hosp = train_test_split(X, y, random_state=1)
# Create model 
model_hosp = RandomForestClassifier(n_estimators=30, random_state=1).fit(train_X_hosp, train_y_hosp)

In [None]:
# Prepate the condensed exhibits for the doctor
# Use permutation importance as a suucinct model summary
perm = PermutationImportance(model_hosp, random_state=1).fit(valid_X_hosp, valid_y_hosp)
eli5.show_weights(perm, feature_names = valid_X_hosp.columns.tolist())

<h2>Step 2 </h2>
- From permutation importance, it appears that the `number_inpatient` is an importanct feature. The doctor would like to know more about it. 
- Create a graphic exhibit using PDP to show how `number_inpatient` affect the model preformace

In [None]:
# Using PDP for number_inpatient feature
feature_name = 'number_inpatient'
# Create the data for ploting
my_pdp = pdp.pdp_isolate(model=model_hosp, dataset=valid_X_hosp, model_features=valid_X_hosp.columns, feature=feature_name)
# plot
pdp.pdp_plot(my_pdp, feature_name)
plt.show()

<h2>Step 3</h2>
- The doctor thinkns it's a good sign that increasing the number of inpatient procedures leads to increased predicition. 
- From the plot, one can not tell whether the change in the plot is big or small. Add `time_in_hospital` to see hot it compares

In [None]:
feature_name = 'time_in_hospital'
# Create the data for ploting
my_pdp = pdp.pdp_isolate(model=model_hosp, dataset=valid_X_hosp, model_features=valid_X_hosp.columns, feature=feature_name)
# plot
pdp.pdp_plot(my_pdp, feature_name)
plt.show()

<h2>Step 4</h2>
- It appears that `time_in_hospital` doesn't matter at all. The difference between the lowest value on the partial dependence plot and the highest value is about 5%
- If that is what your model concluded, the doctors will believe it. But it seems so low. Could  the data be wrong, or is your model doing something more complex than they expect?  
- They'd like you to show them the raw readmission rate for each value of `time_in_hospital` to see how it compares to the partial dependence plot.

    - Make that plot. 
    - Are the results similar or different?

In [None]:
# Get the average readmission rate for each time_in_hospital
# Do concat to keep validation data separate, rather than using all original data
all_train_hosp = pd.concat([train_X_hosp, train_y_hosp], axis=1)
all_train_hosp.groupby(['time_in_hospital']).mean().readmitted.plot()
plt.show()

<h2>Step 5</h2>
- Now the doctor is convinced that the data is right, and the model overview looked reasonable. To turn this into a finished product that the doctor can use, lets create a function `patient_risk_factors` that does the following
    - Takes a single row with patient data
    - Create a visualization showing what features of that patient increased their risk of readmission, what features decreased it, and how much those features mattered
    

In [None]:
# Use SHAP 
# Create sample data to test the function
sample_data_for_prediction = valid_X_hosp.iloc[0].astype(float)

# Create function
def patient_risk_factors(model, patient_data):
    # Create object that can calculate shap values
    explainer = shap.TreeExplainer(model)
    shap_values = explainer.shap_values(patient_data)
    shap.initjs()
    return shap.force_plot(explainer.expected_value[1], shap_values[1], patient_data)

In [None]:
patient_risk_factors(model_hosp, sample_data_for_prediction)

<h2> Aggregating SHAP values</h2>
- Aggregating many SHAP values can give more detailed alternatives to permutation importance and partial dependence plots
- Unlike `Permutation Importance`, SHAP summary plots gives usa bird-eye view od feature importance and what is driving it. 

**Summary plot using SHAP**
Summary plot is made up of may dots with the following characterstics
- Vertical location shows what feature it is depicting
- Color shows whether that feature was high or low for that row of the dataset
- Horizontal location shows whether the effect of that value caused a higher or lower prediction


In [None]:
# Summary SHAP plot for FIFA data

# Create pbbject that can calculate shap values 
explainer = shap.TreeExplainer(my_model_fifa)
# Calculate the shap values for all validation data for plotting
shap_values = explainer.shap_values(valid_X)
# Make plot
shap.summary_plot(shap_values[1], valid_X)    # shap_values[1] is for prediction of 'True'

**Observations:**
- The model ignores the `Red` and `Yellow & Red` features
- Usually yellow card doesnt affect prediction, but there is an extreme case where a high value caused a much lower prediction
- High values of `Goal scored` caused higher prediction, and low values caused low predicition 

<h2>SHAP Dependence Contribution Plots</h2>
- Provide an alternative insight to PDP's, but they add a lot more detail
- It shows the effect of distribution

In [None]:
# Create pbbject that can calculate shap values 
explainer = shap.TreeExplainer(my_model_fifa)

# Calculate the shap values for all validation data for plotting
shap_values = explainer.shap_values(X_fifa)

# Make plot
shap.dependence_plot('Ball Possession %', shap_values[1], X_fifa, interaction_index='Goal Scored')

**Observations:**
- Each do represent a row of data
- Horizontal value is the actual value from the dataset, and
- Vertival value shows what having that horizontal value did to the prediction. The fact that there is a upward slope indicates that `Ball possession` increases the model's prediction for winning the `Man of the Game` award
- The spread suggests that other features must be interacting with `Ball possession %`
**In general, possessing the ball increasse a team's chance of having their player win the award. **


<h2>Exercise using the hospotal-readmissions data</h2>


In [None]:
base_features = ['number_inpatient', 'num_medications', 'number_diagnoses', 'num_lab_procedures', 
                 'num_procedures', 'time_in_hospital', 'number_outpatient', 'number_emergency', 
                 'gender_Female', 'payer_code_?', 'medical_specialty_?', 'diag_1_428', 'diag_1_414', 
                 'diabetesMed_Yes', 'A1Cresult_None']

X_hosp = hosp_re_data[base_features].astype(float)
y_hosp = hosp_re_data.readmitted
train_X_hosp_2, valid_X_hosp_2, train_y_hosp_2,valid_taxi_y_2 = train_test_split(X_hosp, y_hosp, random_state=1) 
# sample data for speed
small_valid_X_hosp_2 = valid_X_hosp_2[:150]
model_hosp_2 = RandomForestClassifier(n_estimators=30, random_state=1).fit(train_X_hosp_2, train_y_hosp_2)


In [None]:
hosp_re_data.describe()

**Q1: What is the effect of distribution for each feature?**

In [None]:
explainer = shap.TreeExplainer(model_hosp_2)
shap_values = explainer.shap_values(small_valid_X_hosp_2)
shap.summary_plot(shap_values[1], small_valid_X_hosp_2)

**Q2: Which of these featue `diag_1_428`, which has wider range of effect or `payer_code_`?**
The width of the effects range is not a reasonable approximation to permutation importance

## Question 
Consider the following SHAP contribution dependence plot. 

The x-axis shows `feature_of_interest` and the points are colored based on `other_feature`.

![Imgur](https://i.imgur.com/zFdHneM.png)

Is there an interaction between `feature_of_interest` and `other_feature`?  
If so, does `feature_of_interest` have a more positive impact on predictions when `other_feature` is high or when `other_feature` is low?


Increasing feature_of_interest has a more positive impact on predictions when other_feature is high.

Both **num_medications** and **num_lab_procedures** share that jumbling of pink and blue dots.

Aside from `num_medications` having effects of greater magnitude (both more positive and more negative), it's hard to see a meaningful difference between how these two features affect readmission risk.  Create the SHAP dependence contribution plots for each variable, and describe what you think is different between how these two variables affect predictions.

As a reminder, here is the code you previously saw to create this type of plot.

    shap.dependence_plot(feature_of_interest, shap_values[1], val_X)
    
And recall that your validation data is called `small_val_X`.

In [None]:
shap.dependence_plot('num_lab_procedures', shap_values[1], small_valid_X_hosp_2)
shap.dependence_plot('num_medications', shap_values[1], small_valid_X_hosp_2)

**Observations:**
- **num_lab_procedures**: The model seems to think this is a relevant feature. One potential next step would be to explore more by coloring it with different other features to search for an interaction.
- **num_medications** clearly slopes up until a value of about 20, and then it turns back down.

Note: This notebook was based on Kaggle's **Machine Learning for Insights Challenge** by Dan. I organized it in one notebook for future reference. 