In [1]:
# Copyright 2023 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Explaining text classification with Vertex Explainable AI


## Overview



----------------

Machine learning models are complex systems, and developers require advanced tools to understand and explain (some of) their behavior. Vertex AI, Google Cloud's platform for end-to-end ML, has a specialized offering for model interpretability called Explainable AI. 

[Vertex Explainable AI](https://cloud.google.com/vertex-ai/docs/explainable-ai/overview) provides two different explanation types to better understand model decision making:
- *Feature-based explanations* indicate how much each feature in the model contributed to the predictions for a chosen input instance. 
- *Example-based explanations* provide a list of examples (typically from the training set) that are most similar to the chosen input instance. 

Vertex Explainable AI can be used with a custom-trained model, an AutoML model, or a BigQueryML model. 

**Note:** *you can either select feature-based or example-based explanations for one model.*


<hr style="border: 0.3px solid grey; width:40%;"></hr>


In this lab, you use Vertex Explainable AI to get feature-based explanations for a custom-trained model for a sentiment analysis task built on the [IMDB dataset](https://keras.io/api/datasets/imdb/). Given your custom-trained model, you need to:
1. Upload the model to [Vertex AI Model Registry](https://cloud.google.com/vertex-ai/docs/model-registry/introduction) with the desired explanation specs.

2. Deploy the model as a [Vertex AI Endpoint](https://cloud.google.com/vertex-ai/docs/general/deployment).

3. Send prediction requests to the model endpoint, and analyze your explanation results!

**Note:** *this lab extends the publicly-available notebook ["Explaining text classification with Vertex Explainable AI"](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/explainable_ai/xai_text_classification_feature_attributions.ipynb) hosted in the Vertex AI Samples git repo.* 

### Objective

In this lab, you will learn how to perform some model interpretability by:

- Using Vertex AI Model Registry to load a custom-trained model and configure explanations
- Using Vertex AI Explainable AI to get feature-based explanations with the *sampled Shapley method*.
- Visualizing explanations with matplotlib, and gathering some model insights.

## Set-up


----------------------------------

1. import the necessary dependencies for the libraries you'll be using in this exercise, which include [google-cloud-aiplatform](https://pypi.org/project/google-cloud-aiplatform/) and [Tensorflow Keras](https://www.tensorflow.org/guide/keras).

In [2]:
import os
import warnings
warnings.filterwarnings("ignore")
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

import numpy as np
import tensorflow as tf
from google.cloud import aiplatform, storage
from keras.datasets import imdb
from keras.layers import LSTM, Dense, Embedding
from keras.models import Sequential
from keras.utils import pad_sequences

import json

import matplotlib.pyplot as plt
from IPython.display import display, HTML

2. Initialize *aiplatform* with a baseline configuration for your environment.

In [3]:
# These variables define your google project information
# The BUCKET_URI specifies the name of the bucket you'll be using, unique to your project
project_id_list = !gcloud config get-value project 2> /dev/null
PROJECT_ID = project_id_list[0]
REGION = "us-central1"
BUCKET_URI = f"gs://explainable-ai-{PROJECT_ID}"

In [4]:
project_id_list

['qwiklabs-gcp-01-e97fc0ced76b']

In [5]:
# Only if your bucket doesn't already exist: run the following command to create your Cloud Storage bucket.
! gsutil ls $BUCKET_URI > /dev/null || gsutil mb -l $REGION $BUCKET_URI

BucketNotFoundException: 404 gs://explainable-ai-qwiklabs-gcp-01-e97fc0ced76b bucket does not exist.
Creating gs://explainable-ai-qwiklabs-gcp-01-e97fc0ced76b/...


In [6]:
# Initialize the Vertex AI SDK for Python for your project.
aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)

## Load the IMDB dataset

--------------------

The [IMDB Reviews dataset](https://keras.io/api/datasets/imdb/) is a set of 25,000 movie reviews that have been labeled as either positive or negative. The text of the reviews has been tokenized according to frequency of appearance in the corpus. Frequencies are offset from a zero-base by 3 for reserved tokens (0, 1, and 2). Token "3" represents the most frequently seen word. Our model is going to take these numeric representations of the text and try to learn contextual meaning from them based on how they are used in both positive and negative reviews.

To keep the model simple for this example, you are only going to take the 10,000 most commonly seen words across all reviews. Importantly, the tokens represent exact words. The singular "word" and plural "words" would be two different tokens (678 and 712, respectively). You will also take the final 80 words of the review, or pad the beginning of the review to 80 words if it is shorter.

In [7]:
max_features = 10000
maxlen = 80

(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)

x_train = pad_sequences(x_train, maxlen=maxlen)
x_test = pad_sequences(x_test, maxlen=maxlen)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz


Let's display a review to see the data representation.

In [8]:
x_train[0]

array([  15,  256,    4,    2,    7, 3766,    5,  723,   36,   71,   43,
        530,  476,   26,  400,  317,   46,    7,    4,    2, 1029,   13,
        104,   88,    4,  381,   15,  297,   98,   32, 2071,   56,   26,
        141,    6,  194, 7486,   18,    4,  226,   22,   21,  134,  476,
         26,  480,    5,  144,   30, 5535,   18,   51,   36,   28,  224,
         92,   25,  104,    4,  226,   65,   16,   38, 1334,   88,   12,
         16,  283,    5,   16, 4472,  113,  103,   32,   15,   16, 5345,
         19,  178,   32], dtype=int32)

These are the 80 integer tokens representing the 80 final words of the review. The token "0" is the padding token. If you had seen that at the beginning, it would have meant the review was shorter than 80 words. The token "2" represents a word that is outside the 10,000-most-common-word vocabulary you established. 

## Build a TensorFlow model locally

-----------------------

The exercise wants you to train a simple model that classifies movie reviews as positive or negative using the text of the review. The model needs to be simple to run fast; in real-life, you would consider prompt-design or tuning of a pre-trained LLM.

Here, you build a simple recurrent neural network (RNN) model in TensorFlow to accomplish this.

#### Train the model

You train an LSTM model in TensorFlow on the IMDB sentiment classification task.

**Note**: *model training takes ~=5 minutes.*

In [9]:
# Set up deterministic results
tf.keras.utils.set_random_seed(42)
tf.config.experimental.enable_op_determinism()
weight_init = tf.keras.initializers.GlorotNormal(seed=42)
bias_init = tf.keras.initializers.Zeros()

In [10]:
model = Sequential()
model.add(Embedding(max_features, 128, name="embeddings"))
model.add(LSTM(128, dropout=0.2, kernel_initializer=weight_init, bias_initializer=bias_init))
model.add(Dense(1, activation="sigmoid", kernel_initializer=weight_init, bias_initializer=bias_init))

model.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"])

model.fit(x_train, y_train, batch_size=32, epochs=2, validation_data=(x_test, y_test))

loss, accuracy = model.evaluate(x_test, y_test, batch_size=32)

print(f"Loss: {loss}  Accuracy: {accuracy}")

Epoch 1/2
Epoch 2/2
Loss: 0.3694600760936737  Accuracy: 0.8402400016784668


#### Export the model

Next, you export the model to your Cloud Storage bucket.

In [11]:
MODEL_DIR = f"{BUCKET_URI}/model"

tf.saved_model.save(model, MODEL_DIR)



INFO:tensorflow:Assets written to: gs://explainable-ai-qwiklabs-gcp-01-e97fc0ced76b/model/assets


INFO:tensorflow:Assets written to: gs://explainable-ai-qwiklabs-gcp-01-e97fc0ced76b/model/assets


## Upload the custom-trained model for deployment


-------------

In Vertex AI, models are uplaoded to [Vertex AI Model Registry](https://cloud.google.com/vertex-ai/docs/model-registry/introduction) as a `Vertex AI Model` resource.

You can configure and specify the feature-based explanations within the Model itself. Then, simply upload the model to Model Registry using the *aiplatform* library.

### Configure explanation settings

To configure explanations, you need to define an explanation spec that includes:
- [ExplanationParameters()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.ExplanationParameters) to configure explaining for Model's prediction. 
- [ExplanationMetadata()](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.ExplanationMetadata) to describe the Model's input and output for explanations.

Learn more at [configuring feature-based explanations](https://cloud.google.com/vertex-ai/docs/explainable-ai/configuring-explanations-feature-based).

#### Configure explanation metadata

You get the signatures of your model's input and output layers by reloading the model into memory, and querying it for the signatures corresponding to each layer. The input layer name of the serving function will be used later when you configure explanation settings.

In [12]:
# Get the model signature
MODEL_DIR = f"{BUCKET_URI}/model"
saved_model = tf.saved_model.load(MODEL_DIR)
serving_input = list(
    saved_model.signatures["serving_default"].structured_input_signature[1].keys()
)[0]
serving_output = list(
    saved_model.signatures["serving_default"].structured_outputs.keys()
)[0]

print("Serving function input:", serving_input)
print("Serving function output:", serving_output)

Serving function input: embeddings_input
Serving function output: dense


In [13]:
# Define the metadata for explanations
INPUT_METADATA = {
    "my_input": aiplatform.explain.ExplanationMetadata.InputMetadata(
        {
            "input_tensor_name": serving_input,
            "encoding": aiplatform.explain.ExplanationMetadata.InputMetadata.Encoding(
                1
            ),
        }
    ),
}

OUTPUT_METADATA = {
    "my_output": aiplatform.explain.ExplanationMetadata.OutputMetadata(
        {"output_tensor_name": serving_output}
    )
}

metadata = aiplatform.explain.ExplanationMetadata(
    inputs=INPUT_METADATA, outputs=OUTPUT_METADATA
)

#### Configure explanation parameters

Explanation parameters are mutually exclusive which means you can select only one of the following for feature-based explanation for a Model:
- `sampled_shapley_attribution`
- `integrated_gradients_attribution`
- `xrai_attribution`
    
In this lab, you use [sampled_shapley_attribution](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.SampledShapleyAttribution) with 10 feature permutations used when approximating the Shapley values. 

In [14]:
FEATURE_PARAMETERS = {"sampled_shapley_attribution": {"path_count": 10}}
feature_parameters = aiplatform.explain.ExplanationParameters(FEATURE_PARAMETERS)

### Upload the model to `Model Registry`

The explanation spec and the path to the trained model are used to upload the Model in `Model Registry`. Vertex AI provides [Docker container images](https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers) with pre-installed packages that you run as pre-built containers for serving predictions and explanations from trained model artifacts.

**Note:** Uploading the model to Model Registry can take a few minutes, usually 3-5.

In [None]:
uploaded_feature_model = aiplatform.Model.upload(
    display_name="tf_feature_expl",
    artifact_uri=MODEL_DIR,
    serving_container_image_uri="us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-11:latest",
    explanation_parameters=feature_parameters,
    explanation_metadata=metadata,
)

Creating Model


INFO:google.cloud.aiplatform.models:Creating Model


Create Model backing LRO: projects/208312552470/locations/us-central1/models/6456893029230837760/operations/2185939101541203968


INFO:google.cloud.aiplatform.models:Create Model backing LRO: projects/208312552470/locations/us-central1/models/6456893029230837760/operations/2185939101541203968


## Deploy the model for online prediction

Now that the Model has been created in Vertex AI Model Registry, you can deploy it as an endpoint.

You can specify any [compute resources](https://cloud.google.com/vertex-ai/docs/predictions/configure-compute) for your endpoint; here, you specify a small instance since it is a small model.  

**Note:** Deploying the model to an endpoint can take a few minutes, usually 5-10.

In [None]:
feature_endpoint = uploaded_feature_model.deploy(
    deployed_model_display_name="tf_feature_deploy",
    machine_type="e2-standard-2" ,
    accelerator_type=None,
    accelerator_count=0,
)

## Send an online prediction request with explanations

----------------

With the model endpoint created, you can now send an `explain` request. The request takes an encoded input text data and returns the predictions with explanations.

As you request feature-based explanations on a text classification model, you get the predicted class along with an explanation, indicating how much each word or token contributes to the classification.

In [None]:
# Utility function to send an online predition request for a test example at the specified `review` index,
# and extract prediction with explanations

def send_explanation_request(endpoint, review=0):
    example = x_test[review].tolist()
    example_label = y_test[review]
    result = endpoint.explain([example]) # Ask Vertex AI to explain
    
    prediction = result.predictions[0][0] # Model's prediction
    explanations = result.explanations[0] # Explanations
    
    return example, example_label, prediction, explanations

In [None]:
# Utility function to decode the embedding into the original text sentence

index = imdb.get_word_index()
reverse_index = {value: key for (key, value) in index.items()}

def decode_sentence(x, index=reverse_index):
    # the `-3` offset is due to the special tokens used by keras
    # see https://stackoverflow.com/questions/42821330/restore-original-text-from-keras-s-imdb-dataset
    return " ".join([index.get(i - 3, "UNK") for i in x])

In [None]:
example, example_label, prediction, explanations = send_explanation_request(feature_endpoint, review=2)

### Extract relevant info from the feature-based explanations

The feature-based explanations contain the following fields:
- `baseline_output_value`: output value for a baseline instance.
- `instance_output_value`: output value for the instance provided.
- `feature_attributions`: a description of the contributions of each feature; in this example, the first value applies to the first word in the sentence, and so on.
- `output_index`: index of the output.
- `approximation_error`: if above 0.05, consider adjusting the explanation spec. 
- `output_name`: it corresponds to the serving_output, i.e. the name of the last layer of the model.

You focus on the `feature_attributions`, and extract them as an array below.

In [None]:
attributions = explanations.attributions[0].feature_attributions["my_input"]
attributions

## Visualize and analyze the explanations


------------

For exploration and analysis, visualizations are often the most useful tool. You can visualize the attributions using matplotlib or other plotting tools.

### Visualize the explanations

#### Method 1: Matplotlib colormap

You can visualize the attributions for the text instance by mapping the values of the attributions onto a matplotlib colormap.

In the colormap, words are highlighted in the text following their attribution values. Words with high positive attribution are highlighted in shades of green and words with negative attribution in shades of pink. Stronger shading corresponds to higher attribution values. Positive attributions can be interpreted as increase in probability of the predicted class while negative attributions correspond to decrease in probability of the predicted class.

In [None]:
def highlight(string, color="white"):
    """
    Return HTML markup highlighting text with the desired color.
    """
    return f"<mark style=background-color:{color}>{string} </mark>"


def colorize(attrs, cmap="PiYG"):
    """
    Compute hex colors based on the attributions for a single instance.
    Uses a diverging colorscale by default and normalizes and scales
    the colormap so that colors are consistent with the attributions.
    """
    import matplotlib as mpl

    cmap_bound = np.abs(attrs).max()
    norm = mpl.colors.Normalize(vmin=-cmap_bound, vmax=cmap_bound)
    cmap = mpl.cm.get_cmap(cmap)

    # now compute hex values of colors
    colors = list(map(lambda x: mpl.colors.rgb2hex(cmap(norm(x))), attrs))
    return colors


def display_highlights(example, example_label, prediction, attributions):
    words = decode_sentence(example).split()
    colors = colorize(attributions)

    print(f"Prediction:[{prediction}] Actual:[{example_label}]\n")

    display(HTML("".join(list(map(highlight, words, colors)))))
    return words

In [None]:
words = display_highlights(example, example_label, prediction, attributions)

#### Method 2: Matplotlib barh
A more classic visualization of feature attributions is the horizontal bar plot.

Let's use a different test instance, and start by displaying the text highlighted again to see what review we're working with.

In [None]:
# Use review 10 this time.
example, example_label, prediction, explanations = send_explanation_request(feature_endpoint, review=10)
attributions = explanations.attributions[0].feature_attributions["my_input"]
words = display_highlights(example, example_label, prediction, attributions)

Next, display a Feature Attribution plot to view the magnitudes of sentiment contributions from the words in this review.

In [None]:
# Create a dictionary with words and feature attribution values 
importance_dict = {k:v for k, v in zip(words, attributions)}

# Sort by feature attributions 
sorted_feature_names = sorted(importance_dict, key=importance_dict.get)

# List sorted values 
sorted_attr_values = [importance_dict[key] for key in sorted_feature_names]
num_features = len(sorted_feature_names)

if num_features > 0:
    x_pos = list(range(num_features))
    plt.figure(figsize=(8,11))
    plt.barh(x_pos, sorted_attr_values)
    plt.yticks(x_pos, sorted_feature_names)
    plt.title('Feature attributions')
    plt.ylabel('Feature names')
    plt.xlabel('Attribution value')
    plt.show()

### Analyze the plots: what do the feature-based explanations say?

**⚠️ BEWARE: it's a simple model with one LSTM layer (128 units) trained for 2 epochs 🤖**

*We leave the analysis of the first review, visualized with Method 1, as an exercise.*

Let's focus on the second review analyzed in the feature attribution plot with Method 2. 
<br>You can easily see that typically happy words end up being attributed to positive scores and typically negative words are attributed to negative scores. The words *"good"* and *"guaranteed"* have a strong positive influence while the words *"chuckles"* and *"rid"* have strong negative influence. The model learned these associations in bulk from the review words and labels in our training set. Vertex Explainable AI further allows you to see specifically what the model learned, which can help you find potentially problematic scores.

For example, in this review, the word *"black"* appears in the phrase "black comedy", which is neither positive nor negative, just a type of comedy. The model, however, doesn't see *"black"* as a modifier to *"comedy"*. It is its own unique word. Across the rest of the dataset, *"black"* likely appeared in more negative reviews than positive reviews, which is why it got the negative attribution. Since this review is labeled and predicted as positive (typically showing predictions in the 90% range, depending on the random initialization undergone by your model), it likely mediated how negative *"black"* would have otherwise been. While this could be entirely coincidental, it could also point to biases in the data that you want to address. Further data exploration is definitely the next step!

Thankfully, you have much more powerful language processing tools now with Generative AI! You can use not just single words, but the entire context to understand meaning. You could use [PaLM Text Embeddings](https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/textembedding-gecko) to provide a vector representation of the entire review, rather than each word independently. You could then use a simpler dense (rather than recurrent) neural network to learn positive and negative scores from all of the context and nuance in the natural language!