## <font color='darkblue'>Preface</font>
([article source](https://towardsdatascience.com/explain-machine-learning-models-partial-dependence-ce6b9923034f)) <font size='3ptx'><b>Making black box models a thing of the past</b></font>

<b>With all the complexity that comes with developing machine learning models, it comes as no surprise that some of these just don’t translate very well when being explained in plain English</b>. The model inputs go in, the answers come out and no one knows how exactly the model arrived at this conclusion. This can result in some sort of disconnect or lack of transparency between different members working on the same team. As the prevalence of machine learning has increased in recent years, this lack of explainability when using complex models has grown even more. <b>In this article, I’ll discuss a few ways to make your models more explainable to the average person whether they be your non-technical manager or just a curious friend</b>.

### <font color='darkgreen'>Why is explainability important?</font>
<font size='3ptx'><b>The responsibility that falls on machine learning models has only increased over time. </b></font>

They are responsible for everything from filtering spam in your email to deciding if you qualify for that new job or loan you’ve been looking for. <b>When these models can’t be explained in plain English, a lack of trust ensues and people become reluctant to use your model for any important decisions</b>.

It would be a shame if the model you worked so hard to create ended up not being discarded because no one could understand what it was doing. <b>In being able to explain a model and show insights that come from it, people</b> (<font color='brown'>especially those with no background in data science</font>) <b>will be a lot more likely to trust and use the models that you create</b>.

### <font color='darkgreen'>Interpreting Coefficients</font>
<b><font size='3ptx'>On one end of the spectrum, we have simple models like linear regression.</font></b>

Models like this are quite simple to explain, with each coefficient representing how much a feature affects our target. e.g.:
![linear model](images/1.PNG)

<br/>

The image above shows the plot for a model represented by the equation $y=2x$. This just means that for an increase of 1 in feature x, the target variable will increase by 2. You can have multiple features like this; each one with its own coefficient representing its effect on the target.

On the other end, we have “black box” models like neural networks where all we can see are the inputs and outputs but the meanings and steps taken to get from input to output are effectively blocked by a sea of incomprehensible numbers.

## <font color='darkblue'>Partial Dependence</font>
<font size='3ptx'><b>Partial dependence shows how a particular feature affects a prediction. </b></font>

<b>By making all other features constant, we want to find out how the feature in question influences our outcome</b>. This is similar to interpreting coefficients explained in the previous section but partial dependence allows us to generalize this interpretation to models more sophisticated and complex than simple linear regression.

As an example, we’ll be using a decision tree on this [**Cardiovascular Disease dataset on Kaggle**](https://www.kaggle.com/sulianova/cardiovascular-disease-dataset). The library we’ll be using to plot partial dependence is [**pdpbox**](https://github.com/SauceCat/PDPbox). Let’s train the model and see how this all works.

In [1]:
# Import libraries
import matplotlib.pyplot as plt
import pdpbox.pdp as pdp
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn import tree

ModuleNotFoundError: No module named 'pdpbox'