### Overview


This notebook demonstrates how to use the `Vertex AI SDK` to train and deploy a custom model with `Scikit-learn pipeline and LightGBM` classifier for serving online predictions with explanations. Although Vertex AI has provided quite a few examples for deploying models built using *Tensorflow/SKLearn/XGBoost*, there are very few working examples explaining the deployment of **custom container** models, and almost none that show how to get **explainable predictions** for such models. This notebook is meant to bridge that gap.

### Dataset

The dataset used for this tutorial is the [`Titanic dataset`](https://www.kaggle.com/competitions/titanic/data). Given certain categorical and numerical predictors, the model predicts the survival status of the passengers on the titanic.

### Objective

This notebooks explains the process of training and deploying a custom classifier model to serve explainable predictions using Vertex AI SDK.<br>
As an alternative, you can also use the gcloud command-line tool or the Vertex AI Cloud Console.

The notebook follows the following broad steps:

- Training a `Scikit-learn Pipeline + LightGBM` classifier model using a Vertex AI custom training job

- Setting up a custom Docker container for serving online predictions

- Configuring the model to provide explainable predictions

- Uploading and deploying the trained model to Vertex AI

- Generating explainable online predictions from the deployed model

### Setting up the development environment

If you are using either Vertex AI Workbench Notebook or Google Colab, you can skip this step, since the environment already satisfies the requirements to run this notebook. <br> If not, make sure that your environment meets the following requirements:

- The Cloud Storage SDK
- Python 3
- virtualenv
- Jupyter notebook (running in a virtual environment with Python 3)

#### Instructions on how to meet these requirements:

1. [Cloud Storage SDK](https://cloud.google.com/sdk/docs/)

2. [Python 3](https://cloud.google.com/python/setup#installing_python)

3. [Virtualenv](https://cloud.google.com/python/setup#installing_and_using_virtualenv)

4. Activate your virtual environment and run `pip3 install Jupyter` in a terminal shell to install Jupyter.

5. Run `jupyter notebook` on the command line in a terminal shell to launch Jupyter.

6. Open this notebook in the Jupyter Notebook Dashboard.

### Installing packages

Install the Python packages required to execute this notebook.

In [None]:
import os
import sys

# The Vertex AI Workbench Notebook product has specific requirements
IS_WORKBENCH_NOTEBOOK = os.getenv("DL_ANACONDA_HOME") and not os.getenv("VIRTUAL_ENV")
IS_USER_MANAGED_WORKBENCH_NOTEBOOK = os.path.exists(
    "/opt/deeplearning/metadata/env_version"
)

# Vertex AI Notebook requires dependencies to be installed with '--user'
USER_FLAG = ""
if IS_WORKBENCH_NOTEBOOK:
    USER_FLAG = "--user"

! pip3 install --upgrade google-cloud-aiplatform $USER_FLAG -q
! pip3 install {USER_FLAG} --upgrade google-cloud-storage -q

### Set up your Google Cloud project

*The following steps are required regardless of your notebook environment.*

1. [Select/create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager).

2. [Enable billing for your project.](https://cloud.google.com/billing/docs/how-to/modify-project)

3. [Enable the following APIs: Vertex AI APIs, Compute Engine APIs, and Cloud Storage.](https://console.cloud.google.com/flows/enableapi?apiid=ml.googleapis.com,compute_component,storage-component.googleapis.com)

4. [The Google Cloud SDK](https://cloud.google.com/sdk) is already installed in Google Cloud Notebook.

### Project variables
#### For this notebook, you will need:
- [Project ID](https://cloud.google.com/vertex-ai/docs/pipelines/configure-project)
<br> The ID of your **billable** Google Cloud project.
- [Cloud Storage Bucket](https://cloud.google.com/storage/docs/creating-buckets)
<br> Containers for storing objects on Google Cloud.
- [Region](https://cloud.google.com/vertex-ai/docs/general/locations)
<br> The region where you want to deploy your model and store the model artifacts.

In [None]:
PROJECT_ID = "my_project_id" # Has to be the project ID of your billable GCP
MODEL_NAME = "my_model" # Any memorable string
VERSION = "v1" # # Any memorable string/number
BUCKET_NAME = "my_bucket" # Has to be created first
REGION = "europe-west1" # Choose as per your location

### Additional variables
- [Image URI](https://cloud.google.com/container-registry/docs/pushing-and-pulling)
<br> The Artifact Registry or Container Registry URI of your container images

In [None]:
SERVING_MACHINE_TYPE = "n1-standard-2" # Update based on your requirements
SERVING_GPU, SERVING_NGPU = (None, None) # example: (aip.gapic.AcceleratorType.NVIDIA_TESLA_K80.name, 2)
ARTIFACT_LOCATION_GCS = f"gs://{BUCKET_NAME}"
TRAIN_IMAGE_URI = f"eu.gcr.io/{PROJECT_ID}/{MODEL_NAME}:{VERSION}" # Differs based on region
PRED_IMAGE_URI = f"eu.gcr.io/{PROJECT_ID}/{MODEL_NAME}-pred:{VERSION}" # Differs based on region

### Authenticate your Google Cloud account

If you are using

A) `Workbench AI Notebooks`
<br> Environment is already authenticated. Skip this step.

B) `Colab`
<br> Authenticate your account via oAuth using the cell below.

C) `Other Environments`
<br> Follow [authentication for Google Cloud Account.](https://cloud.google.com/docs/authentication/getting-started)

In [None]:
# If you are running this notebook in Colab, run this cell and follow the
# instructions to authenticate your GCP account.

IS_COLAB = "google.colab" in sys.modules
if not os.path.exists("/opt/deeplearning/metadata/env_version") and not os.getenv(
    "DL_ANACONDA_HOME"
):
    if "google.colab" in sys.modules:
        from google.colab import auth as google_auth

        google_auth.authenticate_user()

    # If you are running this notebook locally, replace the string below with the
    # path to your service account key and run this cell to authenticate your GCP account.
    elif not os.getenv("IS_TESTING"):
        print("Not Testing")
        %env GOOGLE_APPLICATION_CREDENTIALS "<path_to_credentials.json_file>"

### Initialize Vertex AI SDK for Python

Initialize the Vertex AI SDK for Python for your project and corresponding bucket.

In [None]:
import google.cloud.aiplatform as aip
aip.init(project=PROJECT_ID, staging_bucket=ARTIFACT_LOCATION_GCS)

### Create model directory structure and docker files
Depending on whether you use FastAPI/Flask/another web framework to set up the server for predictions, the directory structure can be slightly different.<br>
We will use `FastAPI` in this notebook.<br>

### Directory structure for FastAPI server

📁 model<br>
└──📁 training<br>
└────📋 Dockerfile<br>
└────📋 requirements.txt<br>
└────📁 trainer<br>
└──────📋 train.py<br>

└──📁 inference<br>
└────📋 Dockerfile<br>
└────📋 requirements.txt<br>
└────📁 app<br>
└──────📋 main.py<br>
└──────📋 server.py<br>
└──────📋 prestart.sh<br>

### Download the code for the notebook

In [None]:
! git clone https://github.com/pankajrsingla/vertex_ai.git

### 1. Building and training the model
We train a `Scikit-learn pipeline + LightGBM classifier` model on the Titanic dataset.<br>
The trained model predicts the survival status of the passengers. <br>

*For details about hyperparameter tuning and explainability, take a look at:*<br>
*https://github.com/ml6team/quick-tips/blob/main/structured_data/2021_02_26_scikit_learn_pipelines/scikit_learn_pipelines_and_lightgbm_titanic_dataset.ipynb*

We follow the following steps:<br>
- Data loading and inspection
- Introduction of the preprocessing pipeline
- Adding a LightGBM classifier to the pipeline
- Training the model
- Saving the model artifact file (pickle) to GCP bucket

#### 1.1 The training code
Update the bucket name in the training file.

In [None]:
os.chdir("vertex_ai/skl-lgbm/training")
f = open("trainer/train.py",'a')
f.write("bucket_name = \"" + BUCKET_NAME + "\"")
f.write("""
model_name = "skl_lgbm.pkl"
storage_client = storage.Client()
bucket = storage_client.bucket(bucket_name)
blob = bucket.blob(model_name)
with blob.open("wb") as f:
    pickle.dump(model, f)
    
print("Done training.")""")
f.close()

In [None]:
! cat trainer/train.py

#### 1.2 Module dependencies for training container

In [None]:
! cat requirements.txt

#### 1.3 Dockerfile for training container
For a custom training image, we need to create a Dockerfile with the required details.<br>
We specify the python image, copy the training code, install the dependencies, and set the entrypoint for the container.

In [None]:
! cat Dockerfile

#### 1.4 Build the training image

In [None]:
! docker build ./ -t $TRAIN_IMAGE_URI

#### 1.5 (Optional) Run the training image
This step can be skipped if you want the training to be done on Vertex AI servers, for which a custom training job has been defined in the next section.<br>
Running the training image initiates the training locally and uploads the model artifact file to the GCP bucket once training is finished.

In [None]:
# ! docker run $TRAIN_IMAGE_URI

#### 1.6 Push the training image to GCP Artifact Registry

In [None]:
! docker push $TRAIN_IMAGE_URI

#### 1.7 Custom job for training the model on Vertex AI
##### 1.7.1 Create a custom training job
A custom training job is created with the following parameters:

- `project`: The project ID
- `display_name`: The human readable name for the custom training job.
- `container_image_uri`: The training container image.
- `location`: The region
- `api_endpoint`: The endpoint for the specified region.

In [None]:
def create_custom_job_sample(
    project: str,
    display_name: str,
    container_image_uri: str,
    location: str = "europe-west1",
    api_endpoint: str = "europe-west1-aiplatform.googleapis.com",
):
    # The AI Platform services require regional API endpoints.
    client_options = {"api_endpoint": api_endpoint}
    # Initialize client that will be used to create and send requests.
    # This client only needs to be created once, and can be reused for multiple requests.
    client = aip.gapic.JobServiceClient(client_options=client_options)
    custom_job = {
        "display_name": display_name,
        "job_spec": {
            "worker_pool_specs": [
                {
                    "machine_spec": {
                        "machine_type": "n1-standard-4",
                        "accelerator_type": aip.gapic.AcceleratorType.ACCELERATOR_TYPE_UNSPECIFIED,
                        "accelerator_count": None,
                    },
                    "replica_count": 1,
                    "container_spec": {
                        "image_uri": container_image_uri,
                        "command": [],
                        "args": [],
                    },
                }
            ]
        },
    }
    parent = f"projects/{project}/locations/{location}"
    response = client.create_custom_job(parent=parent, custom_job=custom_job)
    print("response:", response)

##### 1.7.2 Run the custom training job
Depending upon the model and the size of the dataset, this can take a while.

In [None]:
create_custom_job_sample(project=PROJECT_ID, 
                         display_name="my_custom_job", 
                         container_image_uri=TRAIN_IMAGE_URI,
                         location=REGION,
                         api_endpoint=f"{REGION}-aiplatform.googleapis.com"
                        )

### 2. FastAPI App server for serving predictions
We need:<br>
- `server.py`: Creates the model server.<br>
- `main.py`: Creates the HTTP server.

#### 2.1 Create the model server
This loads the model artifact file from GCP and generates the prediction.

In [None]:
os.chdir("../inference")
! cat app/server.py

#### 2.2 Create the (FastAPI) HTTP server
We will need an HTTP server in the deployment container to handle the `predict` and `health` requests. This server is akin to an additional layer on top of the model server.<br>
We build the HTTP server using FastAPI.<br>


In [None]:
! cat app/main.py

#### 2.3 Add the pre-start script

FastAPI will execute this script before starting up the server. The `PORT` environment variable is set to equal `AIP_HTTP_PORT` in order to run FastAPI on the same port expected by Vertex AI.

In [None]:
! cat app/prestart.sh

### 3. Prediction Container

#### 3.1 Module dependencies for prediction container

In [None]:
! cat requirements.txt

#### 3.2 Dockerfile for prediction container

In [None]:
! cat Dockerfile

#### 3.3 Build the prediction image

In [None]:
! docker build ./ -t $PRED_IMAGE_URI

#### 3.4 (Optional) Test the prediction server locally
Run the container locally in detached mode and provide the environment variables that the container requires. These variables will be provided to the container by Vertex prediction once deployed. Test the /health and /predict routes, then stop the running image.<br>
*This step can save a lot of time, as it allows you to correct any errors in the docker/model configuration without having to wait for the model to be uploaded and deployed.*

In [None]:
! docker run -p "80:8080" --name="local-skl-lgbm" -e "AIP_HTTP_PORT=8080" -e "AIP_HEALTH_ROUTE=/health" -e "AIP_PREDICT_ROUTE=/predict" -e "AIP_STORAGE_URI=$ARTIFACT_LOCATION_GCS" --rm  $PRED_IMAGE_URI

##### 3.4.1 Test the health route

In [None]:
! curl localhost/health

##### 3.4.2 Test the predict route
Create a json file with some test inputs and check the model predictions for these inputs.

In [None]:
%%writefile instances.json
{
    "instances": [{
        "Pclass": 3, 
        "Sex": "female", 
        "Age": 14.1, 
        "Fare": 11, 
        "Embarked":"C"
    },{
        "Pclass": 1, 
        "Sex": "male", 
        "Age": 11.1, 
        "Fare": 23, 
        "Embarked":"S"}]
}

In [None]:
! curl -X POST \
  -d @instances.json \
  -H "Content-Type: application/json; charset=utf-8" \
  localhost/predict

##### 3.4.3 Stop the local execution of the prediction image

In [None]:
! docker stop local-skl-lgbm

#### 3.5 Push the prediction image to GCP Artifact Registry

In [None]:
! docker push $PRED_IMAGE_URI

### 4. Set the explanation parameters and metadata
These are required only for explainable predictions. You can avoid configuring these if you only want plain predictions from the model.

#### 4.1 Explanation metadata
The explanation metadata consists of:<br>
- `outputs`: A scalar value in the output to attribute - what to explain.
- `inputs`: The features for attribution - how they contributed to the output.

In [None]:
explanation_metadata = aip.explain.ExplanationMetadata(
    inputs={
        "Pclass": {},
        "Sex": {},
        "Age": {},
        "Fare": {},
        "Embarked": {}
    },
    outputs={
        "Survived": {}
    }
)
print(explanation_metadata)

#### 4.2 Explanation parameters
You can choose between Shapley/Integrated Gradients/XRAI algorithms for explanability.<br>
*For details on the three methods, refer to:*<br>
*https://cloud.google.com/ai-platform/prediction/docs/ai-explanations/overview*

We will use `Shapley` algorithm in this example.

In [None]:
XAI = "shapley"  # [ shapley, ig, xrai ]
# Takes different permutations of the features, and assigns attribution for the outcome to each feature.
# Path count can be set to a lower/higher value.

if XAI == "shapley":
    PARAMETERS = {"sampled_shapley_attribution": {"path_count": 10}}
elif XAI == "ig":
    PARAMETERS = {"integrated_gradients_attribution": {"step_count": 50}}
elif XAI == "xrai":
    PARAMETERS = {"xrai_attribution": {"step_count": 50}}

explanation_parameters = aip.explain.ExplanationParameters(PARAMETERS)
print(explanation_parameters)

### 5. Upload the model

Upload your model to a `Model` resource using the `Model.upload()` method, with the following parameters:

- `display_name`: The human readable name for the `Model` resource
- `artifact_uri`: The Cloud Storage location of the trained model artifacts
- `serving_container_image_uri`: The serving container image
- `explanation_parameters`: Parameters to configure explaining for `Model`'s predictions
- `explanation_metadata`: Metadata describing the `Model`'s input and output for explanation
- `serving_container_predict_route`: The route for sending prediction requests to the server
- `serving_container_health_route`: The route for sending health check requests to the server

In [None]:
aip.init(project=PROJECT_ID, location=REGION)
model = aip.Model.upload(
    display_name=MODEL_NAME,
    artifact_uri=ARTIFACT_LOCATION_GCS,
    serving_container_image_uri=PRED_IMAGE_URI,
    explanation_parameters=explanation_parameters,
    explanation_metadata=explanation_metadata,
    serving_container_predict_route="/predict",
    serving_container_health_route="/health"
)

### 6. Deploy the model

Deploy your model for online prediction. To deploy the model, you invoke the `deploy` method, with the following main parameters:

- `deployed_model_display_name`: A human readable name for the deployed model.
- `explanation_metadata`: The metadata object for explanations
- `explanation_parameters`: The algorithm for explanation and the corresponding paramaters 

In [None]:
endpoint = model.deploy(
    deployed_model_display_name=MODEL_NAME,
    explanation_metadata=explanation_metadata,
    explanation_parameters=explanation_parameters
)

### 7. Make predictions

Once the `Model` resource is deployed to an `Endpoint` resource, one can do online predictions/explanations by sending `predict` / `explain` requests to the `Endpoint` resource.

#### Request

The format of each instance is:

    [feature_list]

Since the `predict()` / `explain()` methods can take multiple items (instances), if you have a single test item, send it as a list of one item.

#### Response

The response from the `predict()` / `explain()` call is a Python dictionary with the following entries:

- `ids`: The internal assigned unique identifiers for each prediction request.
- `predictions`: The prediction per instance.
- `deployed_model_id`: The Vertex AI identifier for the deployed `Model` resource which did the predictions.

#### 7.1 Predictions without explanations

In [None]:
instances = [{"Pclass": 3, "Sex": "female", "Age": 14.1, "Fare": 11, "Embarked":"C"}, 
             {"Pclass": 3, "Sex": "male", "Age": 11.1, "Fare": 23, "Embarked":"C"}]
predictions_plain = endpoint.predict(instances=instances)
print("Plain predictions:", predictions_plain)

#### 7.2 Predictions with explanations
The input and response format for explainable predictions is the same as for predictions, except that instead of a `predict` call, we make an `explain` call to the `Endpoint` resource.

The response from `explain()` call is a Python dictionary similar to what we get from a `predict()` call, with an additional `explanations` entry:
- `explanations` (optional): The feature attributions

In [None]:
explanations = endpoint.explain(instances)
print("Explainable predictions:", explanations)

#### 7.3 Examining the explanation attributes

In [None]:
INSTANCE = {"Pclass": 3, "Sex": "female", "Age": 14.1, "Fare": 11, "Embarked":"C"}
from tabulate import tabulate

feature_names = ["Pclass", "Sex", "Age", "Fare", "Embarked"]
explanation = endpoint.explain(instances=[INSTANCE])
attributions = explanation.explanations[0].attributions[0].feature_attributions

rows = []
for i, val in enumerate(feature_names):
    rows.append([val, INSTANCE[val], attributions[val]])
print(tabulate(rows, headers=["Feature name", "Feature value", "Attribution value"]))

### 8. Cleaning up
When you are done getting predictions/explanations, undeploy the model from the `Endpoint` resouce. This deprovisions all compute resources and ends billing for the deployed model. Additionally, you can also delete the uploaded model, and delete the artifacts from the GCP bucket, as well as delete the training and prediction images.

In [None]:
try:
    # Delete endpoint
    endpoint.delete(force=True)
    # Delete the model resource
    model.delete()
except Exception as e:
    print(e)

# Delete model artifacts from bucket
! gsutil -m rm -rf gs://$BUCKET_NAME/skl_lgbm.pkl

# Delete training and prediction images:
# ! docker images
# ! docker rmi <training_image_id> <prediction_image_id> -f