# Explaining Models with Shap

Nena Esaw

### Load Your Saved Joblib File

* In your notebook, load the contents of your "best-models.joblib" file into a variable called "loaded_joblib."

* Save each object from the loaded_joblib dictionary as a separate variable in your notebook. (e.g. "X_train = loaded_joblib['X_train'])

In [None]:
import joblib
loaded = joblib.load('random_forest_l01.joblib')
loaded.keys()

In [None]:
X_train_df = loaded['X_train']
y_train = loaded['y_train']
X_test_df = loaded['X_test']
y_test = loaded['y_test']
preprocessor = loaded['preprocessor']
loaded_model = loaded['RandomForest']



### Explain your tree-based model with shap

* Create an X_shap and y_shap variable from your training data (use shap.sample as needed).

* Create a model explainer,

* Calculate the shap values for your model.

* Create a summary plot - with plot_type='bar':

    * In a Markdown cell below, display your saved feature importance image (that you used in your README) and compare the most important features according to SHAP vs. your original feature importances.
    
        * Are they the same features in both? If not, what's different?
        
* Save your bar summary plot figure as a .png file inside your repository (you will need this for the final piece of this assignment - Update Your README).

* Create a second summary plot - with plot_type='dot'
    * In a markdown cell, interpret the top 3 most important features and how they influence your model's predictions.
    
    * Save your figure as a .png file inside your repository (you will need this for the final piece of this assignment - Update Your README).

In [None]:
# Import shap and initialize javascript:
import shap
shap.initjs()


#### Create X_shap and y_shap variable from training data 

In [None]:
X_shap = shap.sample(X_train_df, random_state=321)
X_shap.head()


In [None]:
## get the corresponding y-values
y_shap = y_train.loc[X_shap.index]
y_shap 


### Model Explainer

In [None]:
#create a shape explainer using random forest tree model 
explainer = shap.Explainer(rf_clf)
explainer

#### Calculate the shap values for your model

In [None]:
## Getting the shap values
shap_values = explainer(X_shap,y_shap)
type(shap_values)


In [None]:
X_shap.shape

In [None]:
shap_values.shape


#### Create a summary plot - with plot_type='bar'

In [None]:
shap.summary_plot(shap_vals,features= X_shap, plot_type='bar')


In [None]:
saved feature importance image

Are they the same features in both? If not, what's different?

#### Create a second summary plot - with plot_type='dot'

In [None]:
shap.summary_plot(shap_vals,X_shap)


Interpret the top 3 most important features and how they influence your model's predictions

# Local Explanations

Continue working in your model explanation notebook from the previous core assignment. Add a new "Local Explanations" header at the bottom and continue your work:

Select at least 2 example rows based on the insights gained from your previous core assignments this week.
Explain why you selected the examples that you did.
If you're having trouble thinking of which type of examples to select, try selecting a store that had low sales ( one of the lowest values for your target) and one with high sales (the highest values for your target).
For each example, produce :
A Lime tabular explanation
Interpret what features most heavily influenced the predictions, according to LIME.
Save your figure as a .png file inside your repository (you will need this for the final piece of this assignment - Update Your README). Note: You will need to take a screenshot to save the lime explanation.
An individual Force Plot
Interpret what features most heavily influenced the predictions, according to SHAP.
Save your figure as a .png file inside your repository (you will need this for the final piece of this assignment - Update Your README). Note: You will need to take a screenshot to save the individual force plot.