# Boston Housing Price Prediction with scikit-learn (run model explainer locally)

![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/MachineLearningNotebooks/how-to-use-azureml/explain-model/explain-tabular-data-local/explain-local-sklearn-regression.png)

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

Explain a model with the AML explain-model package

1. Train a GradientBoosting regression model using Scikit-learn
2. Run 'explain_model' with full dataset in local mode, which doesn't contact any Azure services.
3. Run 'explain_model' with summarized dataset in local mode, which doesn't contact any Azure services.
4. Visualize the global and local explanations with the visualization dashboard.

In [2]:
from sklearn import datasets
from sklearn.ensemble import GradientBoostingRegressor
from azureml.explain.model.tabular_explainer import TabularExplainer

# 1. Run model explainer locally with full data

## Load the Boston house price data

In [3]:
boston_data = datasets.load_boston()

In [4]:
# Split data into train and test
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(boston_data.data, boston_data.target, test_size=0.2, random_state=0)

## Train a GradientBoosting Regression model, which you want to explain

In [5]:
reg = GradientBoostingRegressor(n_estimators=100, max_depth=4,
                                learning_rate=0.1, loss='huber',
                                random_state=1)
model = reg.fit(x_train, y_train)

## Explain predictions on your local machine

In [6]:
tabular_explainer = TabularExplainer(model, x_train, features = boston_data.feature_names)

## Explain overall model predictions (global explanation)

In [7]:
# Passing in test dataset for evaluation examples - note it must be a representative sample of the original data
# x_train can be passed as well, but with more examples explanations will take longer although they may be more accurate
global_explanation = tabular_explainer.explain_global(x_test)

In [8]:
# Sorted SHAP values 
print('ranked global importance values: {}'.format(global_explanation.get_ranked_global_values()))
# Corresponding feature names
print('ranked global importance names: {}'.format(global_explanation.get_ranked_global_names()))
# feature ranks (based on original order of features)
print('global importance rank: {}'.format(global_explanation.global_importance_rank))

ranked global importance values: [4.000625030753864, 2.2454269211199662, 0.725392776085317, 0.5820686629847459, 0.49708589474799253, 0.4894864830755951, 0.42491567766510796, 0.2833330909696509, 0.1984782050667422, 0.10556731969756282, 0.07271560150286586, 0.007669043211705519, 0.005843200868282472]
ranked global importance names: ['LSTAT', 'RM', 'PTRATIO', 'DIS', 'NOX', 'AGE', 'CRIM', 'B', 'TAX', 'INDUS', 'RAD', 'CHAS', 'ZN']
global importance rank: [12, 5, 10, 7, 4, 6, 0, 11, 9, 2, 8, 3, 1]


In [9]:
dict(zip(global_explanation.get_ranked_global_names(), global_explanation.get_ranked_global_values()))

{'LSTAT': 4.000625030753864,
 'RM': 2.2454269211199662,
 'PTRATIO': 0.725392776085317,
 'DIS': 0.5820686629847459,
 'NOX': 0.49708589474799253,
 'AGE': 0.4894864830755951,
 'CRIM': 0.42491567766510796,
 'B': 0.2833330909696509,
 'TAX': 0.1984782050667422,
 'INDUS': 0.10556731969756282,
 'RAD': 0.07271560150286586,
 'CHAS': 0.007669043211705519,
 'ZN': 0.005843200868282472}

## Explain overall model predictions as a collection of local (instance-level) explanations

In [10]:
# feature shap values for all features and all data points in the training data
print('local importance values: {}'.format(global_explanation.local_importance_values))

local importance values: [[0.18540432609924065, 0.0025033409901205634, 0.01531443648227016, -0.0042534014715787894, 0.29177343110177906, -2.6766490756405856, 0.8691050649979252, -0.03998891355866946, -0.018220405099587056, -0.32306094983711225, 0.3680701074367765, 0.11782940664408609, 2.8610394446879344], [0.14685259350604338, 0.002001254159749751, 0.16184182370028902, -0.0042534014715787894, 0.33973732932505324, -2.5271584418816153, 0.014464724445422832, 1.5466929703211865, 0.17781694408111168, -0.11465066383628707, -0.7198599351713774, 0.11419352306954823, 3.1669641304050997], [0.27706254080305526, 0.01544877397281254, 0.1162401796301337, 0.06000291374004979, 0.33524477074810255, -0.9544234151217024, -0.09820381169302618, 0.6608538709715304, 0.03477079076853831, 0.06062171492227679, 1.3368684523974086, 0.06655430964136003, -1.68928765019839], [-1.1677645394421923, 0.0019294516649112994, 0.06694828396304085, -0.0042534014715787894, -1.2958987706820522, -1.1812380372800844, 0.086972026




## Explain local data points (individual instances)

In [11]:
local_explanation = tabular_explainer.explain_local(x_test[0,:])

In [12]:
# sorted local feature importance information; reflects the original feature order
sorted_local_importance_names = local_explanation.get_ranked_local_names()
sorted_local_importance_values = local_explanation.get_ranked_local_values()

print('sorted local importance names: {}'.format(sorted_local_importance_names))
print('sorted local importance values: {}'.format(sorted_local_importance_values))

sorted local importance names: ['LSTAT', 'AGE', 'PTRATIO', 'NOX', 'CRIM', 'B', 'INDUS', 'ZN', 'CHAS', 'RAD', 'DIS', 'TAX', 'RM']
sorted local importance values: [2.8610394446879344, 0.8691050649979252, 0.3680701074367765, 0.29177343110177906, 0.18540432609924065, 0.11782940664408609, 0.01531443648227016, 0.0025033409901205634, -0.0042534014715787894, -0.018220405099587056, -0.03998891355866946, -0.32306094983711225, -2.6766490756405856]


## Load visualization dashboard

In [13]:
# Note you will need to have extensions enabled prior to jupyter kernel starting
!jupyter nbextension install --py --sys-prefix azureml.contrib.explain.model.visualize
!jupyter nbextension enable --py --sys-prefix azureml.contrib.explain.model.visualize
# Or, in Jupyter Labs, uncomment below
# jupyter labextension install @jupyter-widgets/jupyterlab-manager
# jupyter labextension install microsoft-mli-widget

  from numpy.core.umath_tests import inner1d
Installing C:\Anaconda\envs\AzureML\lib\site-packages\azureml\contrib\explain\model\visualize\static -> microsoft-mli-widget
Up to date: C:\Anaconda\envs\AzureML\share\jupyter\nbextensions\microsoft-mli-widget\extension.js
Up to date: C:\Anaconda\envs\AzureML\share\jupyter\nbextensions\microsoft-mli-widget\extension.js.map
Up to date: C:\Anaconda\envs\AzureML\share\jupyter\nbextensions\microsoft-mli-widget\index.js
Up to date: C:\Anaconda\envs\AzureML\share\jupyter\nbextensions\microsoft-mli-widget\index.js.map
- Validating: ok

    To initialize this nbextension in the browser every time the notebook (or other app) loads:
    
          jupyter nbextension enable azureml.contrib.explain.model.visualize --py --sys-prefix
    
  from numpy.core.umath_tests import inner1d
Enabling notebook extension microsoft-mli-widget/extension...
      - Validating: ok


In [14]:
from azureml.contrib.explain.model.visualize import ExplanationDashboard

In [15]:
ExplanationDashboard(global_explanation, model, x_test)

ExplanationWidget(value={'localExplanations': [[0.18540432609924065, 0.0025033409901205634, 0.0153144364822701…

<azureml.contrib.explain.model.visualize.ExplanationDashboard.ExplanationDashboard at 0x20856cf3908>