# Understanding the automated ML generated model using model explainability 
In this notebook, you will retrieve the best model from the automated machine learning experiment you performed previously. Then you will use the model interpretability features of the Azure Machine Learning Python SDK to indentify which features had the most impact on the prediction.

**Please be sure you have completed Exercise 1 before continuing**

Begin by running the following cell to ensure your environment has the required modules installed and updated.

In [None]:
!pip install --upgrade pip
!pip install --upgrade azureml-sdk[notebooks,explain,automl,contrib]
!pip install scikit-learn==0.20.3
!pip install -U scikit-image

Next run the following cell to import all the modules used in this notebook.

In [None]:
import os
import numpy as np
import pandas as pd
import logging
import pickle

import azureml
from azureml.core import Run
from azureml.core import Workspace
from azureml.core.model import Model
from azureml.core.run import Run
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig
from azureml.train.automl.run import AutoMLRun
from azureml.train.automl.automlexplainer import explain_model

# Verify AML SDK Installed
# view version history at https://pypi.org/project/azureml-sdk/#history 
print("SDK Version:", azureml.core.VERSION)

import sklearn

sklearn_version = sklearn.__version__
print('The scikit-learn version is {}.'.format(sklearn_version))

### Configure access to your Azure Machine Learning Workspace
To begin, you will need to provide the following information about your Azure Subscription.

**If you are using your own Azure subscription, please provide names for subscription_id, resource_group, workspace_name and workspace_region to use.** Note that the workspace needs to be of type [Machine Learning Workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/setup-create-workspace).

**If an environment is provided to you be sure to replace XXXXX in the values below with your unique identifier.**

In the following cell, be sure to set the values for `subscription_id`, `resource_group`, `workspace_name` and `workspace_region` as directed by the comments (*these values can be acquired from the Azure Portal*).

To get these values, do the following:
1. Navigate to the Azure Portal and login with the credentials provided.
2. From the left hand menu, under Favorites, select `Resource Groups`.
3. In the list, select the resource group with the name similar to `XXXXX`.
4. From the Overview tab, capture the desired values.

Execute the following cell by selecting the `>|Run` button in the command bar above.

In [None]:
#Provide the Subscription ID of your existing Azure subscription
subscription_id = "" # <- needs to be the subscription with the resource group

#Provide values for the existing Resource Group 
resource_group = "tech-immersion-onnx-xxxxx" # <- replace XXXXX with your unique identifier

#Provide the Workspace Name and Azure Region of the Azure Machine Learning Workspace
workspace_name = "gpu-tech-immersion-aml-xxxxx" # <- replace XXXXX with your unique identifier (should be lowercase)
workspace_region = "eastus" # <- region of your resource group

#Provide the name of the Experiment you used with Automated Machine Learning
experiment_name = 'automl-regression'

# the train data is available here
train_data_url = ('https://quickstartsws9073123377.blob.core.windows.net/'
                  'azureml-blobstore-0d1c4218-a5f9-418b-bf55-902b65277b85/'
                  'training-formatted.csv')

# this is the URL to the CSV file containing a small set of test data
test_data_url = ('https://quickstartsws9073123377.blob.core.windows.net/'
                  'azureml-blobstore-0d1c4218-a5f9-418b-bf55-902b65277b85/'
                  'fleet-formatted.csv')


## Connect to the Azure Machine Learning Workspace

Run the following cell to connect the Azure Machine Learning **Workspace**.

**Important Note**: You will be prompted to login in the text that is output below the cell. Be sure to navigate to the URL displayed and enter the code that is provided. Once you have entered the code, return to this notebook and wait for the output to read `Workspace configuration succeeded`.

In [None]:
# By using the exist_ok param, if the worskpace already exists we get a reference to the existing workspace
ws = Workspace.create(
    name = workspace_name,
    subscription_id = subscription_id,
    resource_group = resource_group, 
    location = workspace_region,
    exist_ok = True)

print("Workspace Provisioning complete.")

# Get the best model trained with automated machine learning

Retrieve the Run from the Experiment and then get the underlying AutoMLRun to get at the best model and child run objects:

In [None]:
existing_experiment = Experiment(ws,experiment_name)
run = list(Run.list(existing_experiment))[0]
from azureml.train.automl.run import AutoMLRun
automl_run = AutoMLRun(existing_experiment, run.id)
automl_run

Retrieve the best run and best model from the automated machine learning run by executing the following cell:

In [None]:
import azureml.automl
best_run, best_model = automl_run.get_output()

## Load the train and test data

Model interpretability works by passing training and test data thru the created model and evaluating the result of which values had a given impact. 

Load the training and test data by running the following cell.

In [None]:
# load the original training data
train_data = pd.read_csv(train_data_url)
X_train = train_data.iloc[:,1:74]
y_train = train_data.iloc[:,0].values.flatten()

# load some test vehicle data that the model has not seen
X_test = pd.read_csv(test_data_url)
X_test = X_test.drop(columns=["Car_ID", "Battery_Age"])
X_test.rename(columns={'Twelve_hourly_temperature_forecast_for_next_31_days_reversed': 'Twelve_hourly_temperature_history_for_last_31_days_before_death_last_recording_first'}, inplace=True)
X_test

# Model Explainability

For automated machine learning models, you can use the `explain_model` method to examine the features that were most impactful to the model.

Run the following cell perform the evaluation.

In [None]:
from azureml.train.automl.automlexplainer import explain_model

_, _, sorted_global_importance_values, sorted_global_importance_names, _ , _ = explain_model(
    best_model, 
    X_train, 
    X_test, 
    best_run=best_run, 
    y_train=y_train)

#Overall feature importance
feature_importance = dict(zip(sorted_global_importance_names, sorted_global_importance_values))

Run the following cell to render the feature importance using a Pandas DataFrame. Which feature had the greatest importance globally on the model?

In [None]:
features_df = pd.DataFrame(list(zip(sorted_global_importance_names, sorted_global_importance_values)), dtype=float)
pd.options.display.float_format = '{:.20g}'.format
features_df

Did the results surprise you? The `Battery_Rated_Cycles_CharGramCountVec_200` feature has the greatest impact on the `Survival_In_Days` prediction. This feature was not one in the original data, but an engineered feature that automated machine learning created that is derived from the `Battery_Rated_Cycles` feature. 