# Hyperparameter Tuning / Optimization in Vertex AI
In this notebook, we will use Vertex AI Vizier to perform hyperparameter tuning/optimization in Google Cloud.

## Prerequisites
**Note:** This notebook and repository are supporting artifacts for the "Google Machine Learning and Generative AI for Solutions Architects" book. The book describes the concepts associated with this notebook, and for some of the activities, the book contains instructions that should be performed before running the steps in the notebooks. Each top-level folder in this repo is associated with a chapter in the book. Please ensure that you have read the relevant chapter sections before performing the activities in this notebook.

**There are also important generic prerequisite steps outlined [here](https://github.com/PacktPublishing/Google-Machine-Learning-for-Solutions-Architects/blob/main/Prerequisite-steps/Prerequisites.ipynb).**


## Install required packages

In [None]:
! pip3 install --upgrade xgboost google-cloud-aiplatform --user -q --no-warn-script-location

## Restart the kernel

The code in the next cell will retart the kernel, which is sometimes required after installing/upgrading packages.

**When prompted, click OK to restart the kernel.**

The sleep command simply prevents further cells from executing before the kernel restarts.

In [None]:
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)


In [None]:
import time
time.sleep(10)

# (Wait for kernel to restart before proceeding...)

## Set Google Cloud resource variables

The following code will set variables specific to your Google Cloud resources that will be used in this notebook, such as the Project ID, Region, and GCS Bucket.

**Note: This notebook is intended to execute in a Vertex AI Workbench Notebook, in which case the API calls issued in this notebook are authenticated according to the permissions (e.g., service account) assigned to the Vertex AI Workbench Notebook.**

We will use the `gcloud` command to get the Project ID details from the local Google Cloud project, and assign the results to the PROJECT_ID variable. If, for any reason, PROJECT_ID is not set, you can set it manually or change it, if preferred.

We also use a default bucket name for most of the examples and activities in this book, which has the format: `{PROJECT_ID}-aiml-sa-bucket`. You can change the bucket name if preferred.

Also, we're defaulting to the **us-central1** region, but you can optionally replace this with your [preferred region](https://cloud.google.com/about/locations).

In [None]:
PROJECT_ID_DETAILS = !gcloud config get-value project
PROJECT_ID = PROJECT_ID_DETAILS[0]  # The project ID is item 0 in the list returned by the gcloud command
BUCKET=f"{PROJECT_ID}-aiml-sa-bucket" # Optional: replace with your preferred bucket name, which must be a unique name.
REGION="us-central1" # Optional: replace with your preferred region (See: https://cloud.google.com/about/locations) 
print(f"Project ID: {PROJECT_ID}")
print(f"Bucket Name: {BUCKET}")

## Create bucket

The following code will create the bucket if it doesn't already exist.

If you get an error saying that it already exists, that's fine, you can ignore it and continue with the rest of the steps, unless you want to use a different bucket.

In [None]:
!gsutil mb -l us-central1 gs://{BUCKET}

# Begin implementation

Now that we have performed the prerequisite steps for this activity, it's time to implement the activity.

# Overview

Using Vertex AI Vizier for hyperparameter tuning involves several steps. First, we'll need to create a training application, which will consists of a Python script that trains our model with given hyperparameters and then saves the trained model. This script must also report the performance of the model on the validation set, so Vertex AI Vizier can determine the best hyperparameters.

Next, we need to create a configuration file for the hyperparameter tuning job, which specifies the hyperparameters to tune and their possible values, as well as the metric to optimize.

Finally, we'll use the Vertex AI Vizier client library to submit the hyperparameter tuning job, which will run our training application with different sets of hyperparameters, and find the best ones.

# Preparation

Set additional variables related to our environment.


In [None]:
BUCKET_URI=f"gs://{BUCKET}"
APP_NAME="fraud-detect"
APPLICATION_DIR = "vizier"
TRAINER_DIR = f"{APPLICATION_DIR}/trainer"

## Get the source data for this use case

1. Download the "Credit Card Fraud Detection" dataset from [this link](https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud/download?datasetVersionNumber=3). 
2. The file downloads as a zip archive, so you will need to extract the creditcard.csv from within that zip archive.
3. In the top-left corner your JupyterLab screen – i.e., the screen on which you are currently reading these instructions – click upload symbol (the symbol is an arrow pointing upwards).
4. Upload the creditcard.csv file.

## Clean the data and transfer it to GCS

The following code removed non-numeric values from the target variable in our dataset.

The gsutil command then transfers the data to GCS to be referenced in our training script later.

In [None]:
import pandas as pd

file_path = 'creditcard.csv'  

# Read the CSV file into a DataFrame
df = pd.read_csv(file_path)

print(df.columns)

# Check for unique values in 'Class'
print(df['Class'].unique())

# Optionally, replace or drop rows with unexpected values
df.dropna(subset=['Class'], inplace=True)  # Drop rows with NaN in 'Class'

# Save the updated DataFrame back to the CSV file, overwriting the original
df.to_csv(file_path, index=False)  # index=False to avoid writing row numbers

In [None]:
!gsutil cp creditcard.csv $BUCKET_URI/creditcard.csv

## Containerize the training application code

Before we can run a hyperparameter tuning job, we need to create a source code file (training script) and a Dockerfile. The source code trains a model using XGBoost, and the Dockerfile will include all the commands needed to run the container image.

It will install all of the libraries required by our training script, and set up the entry point for the training code.

First, let's create a couple of directories that we'll use, and import and initialize the Google Cloud AI Platfrom client library.

In [None]:
!mkdir -p $APPLICATION_DIR
!mkdir -p $TRAINER_DIR

In [None]:
import google.cloud.aiplatform as aiplatform
from google.cloud.aiplatform import hyperparameter_tuning as hpt

In [None]:
# Initialize the AI Platform client
aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_URI)

### Create the training application (train.py)

The code in the next cell will create our training script.

**Important notes for our training code:**

*Notes related to XGBoost:*

* DMatrix is a data structure used by XGBoost that is optimized for both memory efficiency and training speed. We will convert our training, validation, and test datasets into DMatrix format before training the model.

* The param dictionary contains the parameters for the XGBoost model. eta is the learning rate, max_depth is the maximum depth of the trees, objective is the loss function to be minimized, and random_state is a seed for the random number generator for reproducibility.

* num_round is the number of rounds of training, equivalent to the number of trees in the model.

* The train function trains the model, and the predict function generates predictions. The predictions are probabilities of the positive class (fraudulent transactions), so they are between 0 and 1. We can convert these to class labels (0 or 1) by rounding them to the nearest integer (in reality, we could choose a different threshold depending on the business requirements).

*Notes related to training and tuning with Vertex AI Vizier:*

* We use the [cloudml-hypertune](https://github.com/GoogleCloudPlatform/cloudml-hypertune) Python package to pass metrics to Vertex AI. To learn more about this process, see the Google Cloud documentation [here](https://cloud.google.com/vertex-ai/docs/training/code-requirements#hp-tuning-metric).

* For hyperparameter tuning, Vertex AI runs our training code multiple times, with different command-line arguments each time. Our training code must parse these command-line arguments and use them as hyperparameters for training.. To learn more about this process, see the Google Cloud documentation [here](https://cloud.google.com/vertex-ai/docs/training/code-requirements#command-line-arguments).

**IMPORTANT:** Replace **YOUR_BUCKET_NAME** with your bucket name. 
This is because *writefile* will write the contents of this cell directly to file; it will not parse variables from earlier in this notebook.

In [None]:
%%writefile {TRAINER_DIR}/train.py

import argparse
import pandas as pd
import xgboost as xgb
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
from google.cloud import storage
from hypertune import HyperTune

data_location='gs://YOUR_BUCKET_NAME/creditcard.csv'

def train_model(data, max_depth, eta, gamma):
    data.dropna(subset=[data.columns[-1]], inplace=True)
    X = data.iloc[:,:-1]
    y = data.iloc[:,-1]
    
    # Split the data into training and test sets
    X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

    # Split the non-training data into validation and test sets
    X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp)
        
    dtrain = xgb.DMatrix(X_train, label=y_train)
    dval = xgb.DMatrix(X_val, label=y_val)
    dtest = xgb.DMatrix(X_test, label=y_test)

    params = {
        'max_depth': max_depth,
        'eta': eta,
        'gamma': gamma,
        'objective': 'binary:logistic',
        'nthread': 4,
        'eval_metric': 'auc'
    }
    
    evallist = [(dval, 'eval')]

    num_round = 10
    model = xgb.train(params, dtrain, num_round, evallist)
    
    preds = model.predict(dtest)
    auc = roc_auc_score(y_test, preds)

    hpt = HyperTune()
    hpt.report_hyperparameter_tuning_metric(
        hyperparameter_metric_tag='auc',
        metric_value=auc,
        global_step=1000)

    return model

def get_args():
    parser = argparse.ArgumentParser(description='XGBoost Hyperparameter Tuning')
    parser.add_argument('--max_depth', type=int, default=3)
    parser.add_argument('--eta', type=float, default=0.3)
    parser.add_argument('--gamma', type=float, default=0)
    args = parser.parse_args()
    return args

def main():
    args = get_args()
    data = pd.read_csv(data_location)
    model = train_model(data, args.max_depth, args.eta, args.gamma)

if __name__ == "__main__":
    main()


### Create our requirements.txt file
The requirements.txt file is a convenient way to specify all of the packages that we want to install in our custom container image. This file will be referenced in the Dockerfile for our image.

In this case, we will install:
* [XGBoost](https://xgboost.readthedocs.io/en/stable/)
* [cloudml-hypertune 0.1.0.dev6](https://pypi.org/project/cloudml-hypertune/)
* [The Vertex AI Python SDK](https://cloud.google.com/python/docs/reference/aiplatform/latest)
* [Python Client for Google Cloud Storage](https://cloud.google.com/python/docs/reference/storage/latest)

In [None]:
%%writefile {APPLICATION_DIR}/requirements.txt
xgboost
cloudml-hypertune==0.1.0.dev6
google-cloud-aiplatform
google-cloud-storage

### Create the Dockerfile for our custom training container

The [Dockerfile](https://docs.docker.com/engine/reference/builder/) specifies how to build our custom container image.

This Dockerfile specifies that we want to:
1. Use a Vertex AI [prebuilt container for custom training](https://cloud.google.com/vertex-ai/docs/training/pre-built-containers) as a base image.
2. Install the required dependencied specified in our requirements.txt file.
3. Copy our custom training script to the container image.
4. Run our custom training script when the container starts up.

In [None]:
%%writefile {APPLICATION_DIR}/Dockerfile

# Use a Vertex AI prebuilt container for custom training as a base image.
FROM us-docker.pkg.dev/vertex-ai/training/tf-cpu.2-12.py310:latest

# Specify the working directory to use in our container
WORKDIR /

# Copy our requirements.txt file to our container image
COPY requirements.txt /requirements.txt

# Install the packages specified in requirements.txt
RUN pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt

# Copy the training code to our container image
COPY trainer /trainer

# Sets up the entry point to invoke our training code.
ENTRYPOINT ["python", "-m", "trainer.train"]


### Build the container and put it in Google Artifact Registry
Next, we'll create a Docker repository in Google Artifact Registry, build our container, and push it to the newly-created repository.

In [None]:
REPO_NAME=f'{APP_NAME}-app'

!gcloud artifacts repositories create $REPO_NAME --repository-format=docker \
--location=$REGION --description="Docker repository"

! gcloud auth configure-docker $REGION-docker.pkg.dev --quiet

In [None]:
IMAGE_URI = (
    f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{REPO_NAME}/{APP_NAME}:latest"
)

In [None]:
cd $APPLICATION_DIR

In [None]:
! docker build ./ -t $IMAGE_URI --quiet

In [None]:
! docker push $IMAGE_URI

In [None]:
cd ..

## Configure a hyperparameter tuning job
Now that our training application code is containerized, it's time to specify and run the hyperparameter tuning job.

To create the hyperparameter tuning job, we need to first define the worker_pool_specs, which specifies the machine type and Docker image to use. The following spec includes one n1-standard-4 machine. (For more details, see the Google Cloud documentation [here](https://cloud.google.com/vertex-ai/docs/reference/rest/v1/CustomJobSpec#WorkerPoolSpec).)

In [None]:
# The spec for the worker pools, including machine type and Docker image

worker_pool_specs = [
    {
        "machine_spec": {
            "machine_type": "n1-standard-4",
        },
        "replica_count": 1,
        "container_spec": {
            "image_uri": IMAGE_URI
        },
    }
]

## Define our custom job spec and hyperparameter tuning spec.
Next, we define our custom job spec (referencing the worker pool specs we just created), and our hyperparameter tuning spec, which includes details such as the hyperparameters and the metrics we want to optimize. (For more details, see the Google Cloud documentation [here](https://cloud.google.com/vertex-ai/docs/training/using-hyperparameter-tuning#aiplatform_create_hyperparameter_tuning_job_python_package_sample-python).)

**IMPORTANT:** If you named your service account anything other than **ai-ml-sa** when you created it at the beginning of Chapter 8 then you will need to replace it in this code cell. (If you followed the recommended naming then you do not need to make a change here.)

In [None]:
# Define custom job
custom_job = aiplatform.CustomJob(
    display_name="xgboost_train",
    worker_pool_specs=worker_pool_specs
)

# Specify service account
service_account_email = f"ai-ml-sa@{PROJECT_ID}.iam.gserviceaccount.com"

# Set the custom service account in the job config
custom_job.service_account_email = service_account_email

In [None]:
# Define the hyperparameter tuning spec
hpt_job = aiplatform.HyperparameterTuningJob(
    display_name="xgboost_hpt",
    custom_job=custom_job,
    metric_spec={
        "auc": "maximize",
    },
    parameter_spec={
        "eta": aiplatform.hyperparameter_tuning.DoubleParameterSpec(min=0.01, max=0.3, scale='unit'),
        "max_depth": aiplatform.hyperparameter_tuning.IntegerParameterSpec(min=3, max=10, scale='unit'),
        "gamma": aiplatform.hyperparameter_tuning.DoubleParameterSpec(min=0, max=1, scale='unit'),
    },
    max_trial_count=20,
    parallel_trial_count=5,
)

# Run the hyperparameter tuning job

The following cell will run our job. Considering that the tuning job will include many trials, it may run for a long time (perhaps an hour or two). The output of this cell will display a link that will enable you to view the status of the tuning job in the Google Cloud console. The output of this cell will also repetitively display the current status of the tuning job every few seconds here in this notebook. Wait until the current status says "JOB_STATE_SUCCEEDED HyperparameterTuningJob run completed", and then we will inspect the optimized hyperparameters.

In [None]:
# Run the hyperparameter tuning job
hpt_job.run()

# Extract the best hyperparameters

In the next cell, we will get a list of all of the trials from our tuning job, then find the best-performing trial, and extract its hyperparameters.

In [None]:
# Get the list of trials sorted by the objective metric (auc) in descending order
trials = sorted(hpt_job.trials, key=lambda trial: trial.final_measurement.metrics[0].value, reverse=True)

# The first trial in the sorted list is the best trial
best_trial = trials[0]
best_auc = trials[0].final_measurement.metrics[0].value

# Extract hyperparameters of the best trial
best_hyperparameters = best_trial.parameters

print(f"Best AUC: {best_auc}")
print(f"Best hyperparameters: {best_hyperparameters}")

# Train a model with the best hyperparameters

Now, let's train a model with the best hyperparameters that were produced by our tuning job.

Note that **num_boost_round** is not a parameter of the model, but rather a parameter of the training function, so we will handle it separately. We will also convert it to an integer type here.

In [None]:
best_params = {}
for param in best_hyperparameters:
    param_id = param.parameter_id
    if param_id == 'num_boost_round':
        best_params[param_id] = int(param.value)
    else:
        best_params[param_id] = param.value

## Install XGboost

Let's install XGBoost so we can train a model directly here in our notebook (remember that our previous training jobs happened in a Docker container that we had created.)

In [None]:
!pip install xgboost

## Train the model

We will use a modified version of our earlier training code. In this case, we will directly provide the "best_params" to the training job.

The ouput of this cell will show us the ROC-AUC score achieved against the **validation** dataset for each training round (specified by num_round).

Finally, we will evaluate our model against the **test** dataset, and print the resulting ROC-AUC score for that.

In [None]:
import pandas as pd
import xgboost as xgb
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split

data_location=f'{BUCKET_URI}/creditcard.csv'

def train_model(data, hyperparameters):
    X = data.iloc[:,:-1]
    y = data.iloc[:,-1]
    
    # Split the data into training and test sets
    X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=42, stratify=y)

    # Split the non-training data into validation and test sets
    X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=42, stratify=y_temp)
        
    dtrain = xgb.DMatrix(X_train, label=y_train)
    dval = xgb.DMatrix(X_val, label=y_val)
    dtest = xgb.DMatrix(X_test, label=y_test)

    # Convert max_depth to int (xgboost expects it as an int)
    hyperparameters['max_depth'] = int(hyperparameters['max_depth'])

    hyperparameters.update({
        'objective': 'binary:logistic',
        'nthread': 4,
        'eval_metric': 'auc'
    })
    
    evallist = [(dval, 'eval')]

    num_round = 10
    model = xgb.train(hyperparameters, dtrain, num_round, evals=evallist)
    
    preds = model.predict(dtest)
    auc = roc_auc_score(y_test, preds)

    print(f'ROC-AUC Score on Test Set: {auc:.4f}')

    return model

def main():
    data = pd.read_csv(data_location)
    model = train_model(data, best_params)

if __name__ == "__main__":
    main()

**Note:** when I ran this, I got an ROC-AUC score of 0.9188, which is pretty good!

# That's it! Well Done!

# Clean up

When you no longer need the resources created by this notebook. You can delete them as follows.

**Note: if you do not delete the resources, you will continue to pay for them.**

In [None]:
clean_up = False  # Set to True if you want to delete the resources

## Delete Vizier resources

In [None]:
if clean_up:
    # Delete HPT job
    try:
        hpt_job.delete()
        print("Deleted HPT job successfully!")
    except Exception as e:
        print(f"Error deleting HPT job: {e}")
else:
    print("clean_up parameter is set to False.")

## Delete artifact repository

In [None]:
if clean_up == True:
    # Delete the Artifact repository
    ! gcloud artifacts repositories delete $REPO_NAME --location=$REGION --quiet
else:
    print("clean_up parameter is set to False")

## Delete GCS Bucket
The bucket can be reused throughout multiple activities in the book. Sometimes, activities in certain chapters make use of artifacts from previous chapters that are stored in the GCS bucket.

I highly recommend **not deleting the bucket** unless you will be performing no further activities in the book. For this reason, there's a separate `delete_bucket` variable to specify if you want to delete the bucket.

If you want to delete the bucket, set the `delete_bucket` parameter to `True`.

In [None]:
delete_bucket = False

In [None]:
if delete_bucket == True:
    # Delete the bucket
    ! gcloud storage rm --recursive gs://$BUCKET
else:
    print("delete_bucket parameter is set to False")