### Overview

This is a quick excercise to train code using custom container and to predict using prebuilt models.

In [6]:
PROJECT_ID = 'jchavezar-demo'
REGION = 'us-central1'
IMAGE_URI=f"{REGION}-docker.pkg.dev/{PROJECT_ID}/trainings/mpg:v1"
PREBUILT_PREDICTION_IMAGE_URI=f"us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-6:latest"

### Create work directry tree

Tree should be like this:

    .
    ├── ..
    ├── custom_2                    # Main folder for prediction
    │   ├── Dockerfile              # Declarative file to build the container
    │   ├── trainer                 # Workdir or folder for code
    │       ├── train.py
    └── ...

In [7]:
!rm -fr custom_2
!mkdir custom_2
!mkdir custom_2/trainer

### Create Dockerfile and Training Python File

In [8]:
%%writefile custom_2/Dockerfile

FROM gcr.io/deeplearning-platform-release/tf2-cpu.2-6
WORKDIR /

# Copies the trainer code to the docker image.
COPY trainer /trainer

# Sets up the entry point to invoke the trainer.
ENTRYPOINT ["python", "-m", "trainer.train"]

Writing custom_2/Dockerfile


In [9]:
%%writefile custom_2/trainer/train.py

import numpy as np
import pandas as pd
import pathlib
import os
import tensorflow as tf
import argparse
from tensorflow import keras
from tensorflow.keras import layers

print(tf.__version__)

"""## The Auto MPG dataset

The dataset is available from the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/).

### Get the data
First download the dataset.
"""

parser = argparse.ArgumentParser()
parser.add_argument('--model-dir', dest='model_dir', default=os.getenv("AIP_MODEL_DIR"), type=str, help='Model dir.')
args = parser.parse_args()


dataset_path = keras.utils.get_file("auto-mpg.data", "http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data")
dataset_path

"""Import it using pandas"""

column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
                'Acceleration', 'Model Year', 'Origin']
dataset = pd.read_csv(dataset_path, names=column_names,
                      na_values = "?", comment='\t',
                      sep=" ", skipinitialspace=True)

dataset.tail()

# TODO: replace `your-gcs-bucket` with the name of the Storage bucket you created earlier
BUCKET = args.model_dir

"""### Clean the data

The dataset contains a few unknown values.
"""

dataset.isna().sum()

"""To keep this initial tutorial simple drop those rows."""

dataset = dataset.dropna()

"""The `"Origin"` column is really categorical, not numeric. So convert that to a one-hot:"""

dataset['Origin'] = dataset['Origin'].map({1: 'USA', 2: 'Europe', 3: 'Japan'})

dataset = pd.get_dummies(dataset, prefix='', prefix_sep='')
dataset.tail()

"""### Split the data into train and test

Now split the dataset into a training set and a test set.

We will use the test set in the final evaluation of our model.
"""

train_dataset = dataset.sample(frac=0.8,random_state=0)
test_dataset = dataset.drop(train_dataset.index)

"""### Inspect the data

Have a quick look at the joint distribution of a few pairs of columns from the training set.

Also look at the overall statistics:
"""

train_stats = train_dataset.describe()
train_stats.pop("MPG")
train_stats = train_stats.transpose()
train_stats

"""### Split features from labels

Separate the target value, or "label", from the features. This label is the value that you will train the model to predict.
"""

train_labels = train_dataset.pop('MPG')
test_labels = test_dataset.pop('MPG')

"""### Normalize the data

Look again at the `train_stats` block above and note how different the ranges of each feature are.

It is good practice to normalize features that use different scales and ranges. Although the model *might* converge without feature normalization, it makes training more difficult, and it makes the resulting model dependent on the choice of units used in the input.

Note: Although we intentionally generate these statistics from only the training dataset, these statistics will also be used to normalize the test dataset. We need to do that to project the test dataset into the same distribution that the model has been trained on.
"""

def norm(x):
  return (x - train_stats['mean']) / train_stats['std']
normed_train_data = norm(train_dataset)
normed_test_data = norm(test_dataset)

"""This normalized data is what we will use to train the model.

Caution: The statistics used to normalize the inputs here (mean and standard deviation) need to be applied to any other data that is fed to the model, along with the one-hot encoding that we did earlier.  That includes the test set as well as live data when the model is used in production.

## The model

### Build the model

Let's build our model. Here, we'll use a `Sequential` model with two densely connected hidden layers, and an output layer that returns a single, continuous value. The model building steps are wrapped in a function, `build_model`, since we'll create a second model, later on.
"""

def build_model():
  model = keras.Sequential([
    layers.Dense(64, activation='relu', input_shape=[len(train_dataset.keys())]),
    layers.Dense(64, activation='relu'),
    layers.Dense(1)
  ])

  optimizer = tf.keras.optimizers.RMSprop(0.001)

  model.compile(loss='mse',
                optimizer=optimizer,
                metrics=['mae', 'mse'])
  return model

model = build_model()

"""### Inspect the model

Use the `.summary` method to print a simple description of the model
"""

model.summary()

"""Now try out the model. Take a batch of `10` examples from the training data and call `model.predict` on it.

It seems to be working, and it produces a result of the expected shape and type.

### Train the model

Train the model for 1000 epochs, and record the training and validation accuracy in the `history` object.

Visualize the model's training progress using the stats stored in the `history` object.

This graph shows little improvement, or even degradation in the validation error after about 100 epochs. Let's update the `model.fit` call to automatically stop training when the validation score doesn't improve. We'll use an *EarlyStopping callback* that tests a training condition for  every epoch. If a set amount of epochs elapses without showing improvement, then automatically stop the training.

You can learn more about this callback [here](https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping).
"""

model = build_model()

EPOCHS = 1000

# The patience parameter is the amount of epochs to check for improvement
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=10)

early_history = model.fit(normed_train_data, train_labels, 
                    epochs=EPOCHS, validation_split = 0.2, 
                    callbacks=[early_stop])

print("job_done")
print(f"saving on {args.model_dir}")
# Export model and save to GCS
model.save(BUCKET + '/mpg/model')

Writing custom_2/trainer/train.py


### Build the docker container

In [10]:
!docker build -t $IMAGE_URI custom_2/.

Sending build context to Docker daemon  9.216kB
Step 1/4 : FROM gcr.io/deeplearning-platform-release/tf2-cpu.2-6
 ---> a2a8733f4caa
Step 2/4 : WORKDIR /
 ---> Using cache
 ---> ea351d920d2a
Step 3/4 : COPY trainer /trainer
 ---> Using cache
 ---> ac87c401b7c6
Step 4/4 : ENTRYPOINT ["python", "-m", "trainer.train"]
 ---> Using cache
 ---> baaca71f526c
Successfully built baaca71f526c
Successfully tagged us-central1-docker.pkg.dev/jchavezar-demo/trainings/mpg:v1


### Push the container to Google Cloud Artifacts

In [11]:
!docker push $IMAGE_URI

The push refers to repository [us-central1-docker.pkg.dev/jchavezar-demo/trainings/mpg]

[1Bc68057a8: Preparing 
[1B4e054661: Preparing 
[1B067465ab: Preparing 
[1B01851389: Preparing 
[1B2e5f040e: Preparing 
[1B59dfa907: Preparing 
[1B668df2d8: Preparing 
[1B767a76ae: Preparing 
[1B559b3e11: Preparing 
[1Bc5f28369: Preparing 
[1Beeca4cbf: Preparing 
[1Bc2b66f65: Preparing 
[1B0bba959a: Preparing 
[1B677fbd36: Preparing 
[1B713472f0: Preparing 
[1B33654a88: Preparing 
[1Bbf18a086: Preparing 
[1B5cfc6aa2: Preparing 
[3Bbf18a086: Preparing 
[1B4b178955: Preparing 
[2B4b178955: Layer already exists [14A[2K[15A[2K[10A[2K[9A[2K[6A[2K[7A[2K[4A[2K[1A[2K[2A[2Kv1: digest: sha256:78f5a9829f1cfb94a62c747fe49240b397cb77038bc3385a2513f5d5cc62fcf2 size: 4918


### Call Custom Container Training Job from AI Platform Library (Vertex AI) and Upload it to Model Registry

In [12]:
from google.cloud import aiplatform

job = aiplatform.CustomContainerTrainingJob(
    display_name = 'mpg-sdk-1',
    container_uri = IMAGE_URI,
    model_serving_container_image_uri = PREBUILT_PREDICTION_IMAGE_URI,
    staging_bucket = 'gs://vtx-metadata'
)

model = job.run(
    model_display_name = 'mpg-sdk-1',
    is_default_version = False,
    model_version_aliases = ['test1'],
    base_output_dir = 'gs://vtx-models/mpg'
)

Training Output directory:
gs://vtx-models/mpg 
View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/6657772978991792128?project=569083142710
View backing custom job:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/691101607051395072?project=569083142710
CustomContainerTrainingJob projects/569083142710/locations/us-central1/trainingPipelines/6657772978991792128 current state:
PipelineState.PIPELINE_STATE_RUNNING
CustomContainerTrainingJob projects/569083142710/locations/us-central1/trainingPipelines/6657772978991792128 current state:
PipelineState.PIPELINE_STATE_RUNNING
CustomContainerTrainingJob projects/569083142710/locations/us-central1/trainingPipelines/6657772978991792128 current state:
PipelineState.PIPELINE_STATE_RUNNING
CustomContainerTrainingJob projects/569083142710/locations/us-central1/trainingPipelines/6657772978991792128 current state:
PipelineState.PIPELINE_STATE_RUNNING
CustomContainerTrainingJob projects

In [13]:
print(model)

<google.cloud.aiplatform.models.Model object at 0x7f6620fcc0d0> 
resource name: projects/569083142710/locations/us-central1/models/7554555278448918528


### Create the Endpoint where the model will live

In [16]:
endpoint = aiplatform.Endpoint.create(
    display_name="mpg-sdk-endpoint",
    labels={
        "endpoint": "miles_per_gallon",
        "type": "custom_container"
        },

)

Creating Endpoint
Create Endpoint backing LRO: projects/569083142710/locations/us-central1/endpoints/5419158561773060096/operations/9172020789388509184
Endpoint created. Resource name: projects/569083142710/locations/us-central1/endpoints/5419158561773060096
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/569083142710/locations/us-central1/endpoints/5419158561773060096')


###  Deplot the model into the Endpoint

In [22]:
_model = aiplatform.Model.list(filter='display_name="mpg-sdk-1"')[0]
model = aiplatform.Model(
    model_name=aiplatform.Model.list(filter='display_name="mpg-sdk-1"')[0].resource_name,
    version='test1')

endpoint.deploy(
    model=model,
    min_replica_count=1,
    machine_type="n1-standard-4"

)

Deploying Model projects/569083142710/locations/us-central1/models/7554555278448918528 to Endpoint : projects/569083142710/locations/us-central1/endpoints/5419158561773060096
Deploy Endpoint model backing LRO: projects/569083142710/locations/us-central1/endpoints/5419158561773060096/operations/8176725271739629568
Endpoint model deployed. Resource name: projects/569083142710/locations/us-central1/endpoints/5419158561773060096


In [51]:
model = aiplatform.Model(
    model_name=aiplatform.Model.list(filter='display_name="mpd-sdk-1"')[0].resource_name,
    version='test1')

In [31]:
aiplatform.Endpoint.list(filter='display_name="mpg-sdk-endpoint"')[0].resource_name.split('/')[5]

'5419158561773060096'

### List the Model and Predict

In [33]:
ENDPOINT_ID=aiplatform.Endpoint.list(filter='display_name="mpg-sdk-endpoint"')[0].resource_name.split('/')[5]
PROJECT_ID="569083142710"
INPUT_DATA_FILE="INPUT-JSON"

endpoint = aiplatform.Endpoint(
    endpoint_name=f"projects/{PROJECT_ID}/locations/us-central1/endpoints/{ENDPOINT_ID}"
)

In [34]:
test_mpg = [1.4838871833555929,
 1.8659883497083019,
 2.234620276849616,
 1.0187816540094903,
 -2.530890710602246,
 -1.6046416850441676,
 -0.4651483719733302,
 -0.4952254087173721,
 0.7746763768735953]

response = endpoint.predict([test_mpg])

print('API response: ', response)

print('Predicted MPG: ', response.predictions[0][0])


API response:  Prediction(predictions=[[16.1221714]], deployed_model_id='6209447937399848960', model_version_id='', model_resource_name='projects/569083142710/locations/us-central1/models/7554555278448918528', explanations=None)
Predicted MPG:  16.1221714


### Undeploy All the Models

In [40]:
for i in aiplatform.Endpoint.list():
    i.undeploy_all()

Undeploying Endpoint model: projects/569083142710/locations/us-central1/endpoints/5759180333639532544
Undeploy Endpoint model backing LRO: projects/569083142710/locations/us-central1/endpoints/5759180333639532544/operations/2006793782242050048
Endpoint model undeployed. Resource name: projects/569083142710/locations/us-central1/endpoints/5759180333639532544
