# Interpretability With Tensorflow 2.0 On Azure Machine Learning Service

## Overview of Tutorial
This notebook is Part 4 (Explaining Your Model Using Interpretability) of a four part workshop that demonstrates an end-to-end workflow for using Tensorflow 2.0 on Azure Machine Learning Service. The different components of the workshop are as follows:

- Part 1: [Preparing Data and Model Training](https://github.com/microsoft/bert-stack-overflow/blob/master/1-Training/AzureServiceClassifier_Training.ipynb)
- Part 2: [Inferencing and Deploying a Model](https://github.com/microsoft/bert-stack-overflow/blob/master/2-Inferencing/AzureServiceClassifier_Inferencing.ipynb)
- Part 3: [Setting Up a Pipeline Using MLOps](https://github.com/microsoft/bert-stack-overflow/tree/master/3-ML-Ops)
- Part 4: [Explaining Your Model Interpretability](https://github.com/microsoft/bert-stack-overflow/blob/master/4-Interpretibility/IBMEmployeeAttritionClassifier_Interpretability.ipynb)

**In this specific tutorial, we will cover the following topics:**

- TODO
- TODO

## What is Azure Machine Learning Service?
Azure Machine Learning service is a cloud service that you can use to develop and deploy machine learning models. Using Azure Machine Learning service, you can track your models as you build, train, deploy, and manage them, all at the broad scale that the cloud provides.
![](./images/aml-overview.png)


## What Is Machine Learning Interpretability?
Interpretability is the ability to explain why your model made the predictions it did. The Azure Machine Learning service offers various interpretability features to help accomplish this task. These features include:

- Feature importance values for both raw and engineered features.
- Interpretability on real-world datasets at scale, during training and inference.
- Interactive visualizations to aid you in the discovery of patterns in data and explanations at training time.

By accurately interpretabiliting your model, it allows you to:

- Use the insights for debugging your model.
- Validate model behavior matches their objectives.
- Check for for bias in the model.
- Build trust in your customers and stakeholders.

![](./images/interpretability-architecture.png)

## Install Azure Machine Learning Python SDK

If you are running this on a Notebook VM, the Azure Machine Learning Python SDK is installed by default. If you are running this locally, you can follow these [instructions](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/install?view=azure-ml-py) to install it using pip.

This tutorial series requires version 1.0.69 or higher. We can import the Python SDK to ensure it has been properly installed:

In [None]:
import azureml.core

print("Azure Machine Learning Python SDK version:", azureml.core.VERSION)

## Connect To Workspace

Just like in the previous tutorials, we will need to connect to a [workspace](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.workspace(class)?view=azure-ml-py).

The following code will allow you to create a workspace if you don't already have one created. You must have an Azure subscription to create a workspace:

```python
from azureml.core import Workspace
ws = Workspace.create(name='myworkspace',
                      subscription_id='<azure-subscription-id>',
                      resource_group='myresourcegroup',
                      create_resource_group=True,
                      location='eastus2')
```

**If you are running this on a Notebook VM, you can import the existing workspace.**

In [None]:
from azureml.core import Workspace

workspace = Workspace.from_config()
print('Workspace name: ' + workspace.name, 
      'Azure region: ' + workspace.location, 
      'Subscription id: ' + workspace.subscription_id, 
      'Resource group: ' + workspace.resource_group, sep = '\n')

> **Note:** that the above commands reads a config.json file that exists by default within the Notebook VM. If you are running this locally or want to use a different workspace, you must add a config file to your project directory. The config file should have the following schema:

```
    {
        "subscription_id": "<SUBSCRIPTION-ID>",
        "resource_group": "<RESOURCE-GROUP>",
        "workspace_name": "<WORKSPACE-NAME>"
    }
```

## Interpretability In Training
We will start by showing how we can interpret our model during training. For this tutorial, we will be using Tensorflow 2.0 to train a basic feed forward neural network on the IBM Employee Attrition Dataset. 

**Write this script into a project directory**

In [4]:
project_folder = 'ibm-attrition-classifier'

In [8]:
%%writefile $project_folder/train.py
import logging
import pandas as pd
import tensorflow as tf
from absl import flags
from sklearn.model_selection import train_test_split

# Ignore warnings in logs
logging.getLogger("transformers.tokenization_utils").setLevel(logging.ERROR)

def preprocess_data(data):
    data = pd.read_csv("data/emp_attrition.csv")

    # replace binary labels with 1's and 0's
    binary_data = {
        'Gender': ['Male', 'Female'],
        'Over18': ['N', 'Y'],
        'OverTime': ['No', 'Yes'],
        'Attrition': ['No', 'Yes']
    }
    for k, v in binary_data.items():
        data[k].replace(v, [0, 1], inplace=True)

    # Make column labeling consistent, so that 1 indicates True
    data.rename(columns={'Gender': 'IsFemale'}, inplace = True)

    # one-hot encode categorical data
    one_hot_cols = ['BusinessTravel', 'Department', 'EducationField', 'JobRole', 'MaritalStatus']
    for col_name in one_hot_cols:
        data = pd.concat([data, pd.get_dummies(data[col_name], drop_first=True)], axis=1)
        data.drop([col_name], axis=1, inplace=True)
        
    # Split data
    train, test = train_test_split(data, test_size=0.1)
    train_y = train.pop('Attrition')
    test_y = test.pop('Attrition')
    
    return train, test, train_y, test_y

# Load data
raw_data = pd.read_csv("data/emp_attrition.csv")
train_x, test_x, train_y, test_y = preprocess_data(raw_data)

# Train model
model = tf.keras.models.Sequential()
model.add(tf.keras.layers.Dense(units=16, activation='relu', input_shape=(len(train_x.columns),)))
model.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train neural network
model.fit(train_x, train_y, epochs=3, verbose=1, batch_size=128, validation_data=(test_x, test_y))

# Save model
model.save('ibm-attrition-classifier/model.h5')

Train on 1323 samples, validate on 147 samples
Epoch 1/3
Epoch 2/3
Epoch 3/3


**Run training script**

In [6]:
!python $project_folder/train.py

Traceback (most recent call last):
  File "ibm-attrition-classifier/train.py", line 2, in <module>
    import pandas as pd
ImportError: No module named pandas


**Load model and perform interpretability**

In [13]:
# TODO:  LOAD MODEL AND EXPLAIN IT
import tensorflow as tf

model = tf.keras.models.load_model('ibm-attrition-classifier/model.h5')

# from azureml.explain.model.tabular_explainer import TabularExplainer
# # "features" and "classes" fields are optional
# explainer = TabularExplainer(network, 
#                              train)

# # you can use the training data or the test data here
# global_explanation = explainer.explain_global(x_train)

# # if you used the PFIExplainer in the previous step, use the next line of code instead
# # global_explanation = explainer.explain_global(x_train, true_labels=y_test)

# # sorted feature importance values and feature names
# sorted_global_importance_values = global_explanation.get_ranked_global_values()
# sorted_global_importance_names = global_explanation.get_ranked_global_names()
# dict(zip(sorted_global_importance_names, sorted_global_importance_values))

# # alternatively, you can print out a dictionary that holds the top K feature names and values
# global_explanation.get_feature_importance_dict()

Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor


#### Train and Explain Locally
We will start by training our model locally in the Jupyter Notebook.

#### Train and Explain Remotely
Now we will train our model on the compute target created back in the [first tutorial]().

## Interpretability In Inferencing

## Raw Feature Transformations

## Visualizations