# 08f - Vertex AI > Training > Training Pipelines - With Custom Container

# <IN ACTIVE DEVELOPMENT - NOT COMPLETE>

Dev Notes:
- Python Kernel
- Orchestrates Vertex AI Services Sequentially

Workflow:
- code to script
- script to container: training and serving in one
- training pipeline job
- model to Vertex AI Model Registry
- deploy model to endpoint for online predictions
- batch prediction job with Vertex AI

Next Steps:
- Pipeline to conduct steps
- use [reticulate](https://rstudio.github.io/reticulate/) for an R centric workflow
    
Resource:
- https://cloud.google.com/blog/products/ai-machine-learning/train-and-deploy-ml-models-with-r-and-plumber-on-vertex-ai
- https://cloud.google.com/vertex-ai/docs/workbench/user-managed/use-r-bigquery

---
## Setup

### Package Installs (if needed)

This notebook uses the Python Clients for
- Google Service Usage
    - to enable APIs (Artifact Registry and Cloud Build)
- Artifact Registry
    - to create repositories for Python packages and Docker containers
- Cloud Build
    - To build custom Docker containers

The cells below check to see if the required Python libraries are installed.  If any are not it will print a message to do the install with the associated pip command to use.  These installs must be completed before continuing this notebook.

In [2]:
try:
    import google.cloud.service_usage_v1
except ImportError:
    print('You need to pip install google-cloud-service-usage')
    !pip install google-cloud-service-usage -q

In [3]:
try:
    import google.cloud.artifactregistry_v1
except ImportError:
    print('You need to pip install google-cloud-artifact-registry')
    !pip install google-cloud-artifact-registry -q

In [4]:
try:
    import google.cloud.devtools.cloudbuild
except ImportError:
    print('You need to pip install google-cloud-build')
    !pip install google-cloud-build

### Environment

inputs:

In [26]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [27]:
REGION = 'us-central1'
EXPERIMENT = '08f'
SERIES = '08'

# source data
BQ_PROJECT = PROJECT_ID
BQ_DATASET = 'fraud'
BQ_TABLE = 'fraud_prepped'

# Resources
TRAIN_COMPUTE = 'n1-standard-4'
DEPLOY_COMPUTE = 'n1-standard-4'

# Model Training
VAR_TARGET = 'Class'
VAR_OMIT = 'transaction_id' # add more variables to the string with space delimiters

packages:

In [49]:
from google.cloud import aiplatform
from datetime import datetime
import os, shutil, glob
import pkg_resources
from IPython.display import Markdown as md
from google.cloud import service_usage_v1
from google.cloud.devtools import cloudbuild_v1
from google.cloud import artifactregistry_v1
from google.cloud import storage
from google.cloud import bigquery
from google.protobuf import json_format
from google.protobuf.struct_pb2 import Value
import json
import numpy as np
import pandas as pd

clients:

In [29]:
aiplatform.init(project=PROJECT_ID, location=REGION)
bq = bigquery.Client()
gcs = storage.Client()
su_client = service_usage_v1.ServiceUsageClient()
ar_client = artifactregistry_v1.ArtifactRegistryClient()
cb_client = cloudbuild_v1.CloudBuildClient()

parameters:

In [30]:
TIMESTAMP = datetime.now().strftime("%Y%m%d%H%M%S")
BUCKET = PROJECT_ID
URI = f"gs://{BUCKET}/{SERIES}/{EXPERIMENT}"
DIR = f"temp/{EXPERIMENT}"

In [31]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

List the service accounts current roles:

In [32]:
!gcloud projects get-iam-policy $PROJECT_ID --filter="bindings.members:$SERVICE_ACCOUNT" --format='table(bindings.role)' --flatten="bindings[].members"

ROLE
roles/bigquery.admin
roles/owner
roles/run.admin
roles/storage.objectAdmin


>Note: If the resulting list is missing [roles/storage.objectAdmin](https://cloud.google.com/storage/docs/access-control/iam-roles) then [revisit the setup notebook](../00%20-%20Setup/00%20-%20Environment%20Setup.ipynb#permissions) and add this permission to the service account with the provided instructions.

environment:

In [33]:
!rm -rf {DIR}
!mkdir -p {DIR}

Experiment Tracking:

In [34]:
FRAMEWORK = 'r'
TASK = 'classification'
MODEL_TYPE = 'logistic_regression'
EXPERIMENT_NAME = f'experiment-{SERIES}-{EXPERIMENT}-{FRAMEWORK}-{TASK}-{MODEL_TYPE}'
RUN_NAME = f'run-{TIMESTAMP}'

### Enable APIs

Using Cloud Build and Artifact Registry requires enabling these APIs for the Google Cloud Project.

Options for enabeling these.  In this notebook option 2 is used.
 1. Use the APIs & Services page in the console: https://console.cloud.google.com/apis
     - `+ Enable APIs and Services`
     - Search for Cloud Build and Enable
     - Search for Artifact Registry and Enable
 2. Use [Google Service Usage](https://cloud.google.com/service-usage/docs) API from Python
     - [Python Client For Service Usage](https://github.com/googleapis/python-service-usage)
     - [Python Client Library Documentation](https://cloud.google.com/python/docs/reference/serviceusage/latest)
     
The following code cells use the Service Usage Client to:
- get the state of the service
- if 'DISABLED':
    - Try enabling the service and return the state after trying
- if 'ENABLED' print the state for confirmation

#### Artifact Registry

In [35]:
artifactregistry = su_client.get_service(
    request = service_usage_v1.GetServiceRequest(
        name = f'projects/{PROJECT_ID}/services/artifactregistry.googleapis.com'
    )
).state.name


if artifactregistry == 'DISABLED':
    print(f'Artifact Registry is currently {artifactregistry} for project: {PROJECT_ID}')
    print(f'Trying to Enable...')
    operation = su_client.enable_service(
        request = service_usage_v1.EnableServiceRequest(
            name = f'projects/{PROJECT_ID}/services/artifactregistry.googleapis.com'
        )
    )
    response = operation.result()
    if response.service.state.name == 'ENABLED':
        print(f'Artifact Registry is now enabled for project: {PROJECT_ID}')
    else:
        print(response)
else:
    print(f'Artifact Registry already enabled for project: {PROJECT_ID}')

Artifact Registry already enabled for project: statmike-mlops-349915


#### Cloud Build

In [36]:
cloudbuild = su_client.get_service(
    request = service_usage_v1.GetServiceRequest(
        name = f'projects/{PROJECT_ID}/services/cloudbuild.googleapis.com'
    )
).state.name


if cloudbuild == 'DISABLED':
    print(f'Cloud Build is currently {cloudbuild} for project: {PROJECT_ID}')
    print(f'Trying to Enable...')
    operation = su_client.enable_service(
        request = service_usage_v1.EnableServiceRequest(
            name = f'projects/{PROJECT_ID}/services/cloudbuild.googleapis.com'
        )
    )
    response = operation.result()
    if response.service.state.name == 'ENABLED':
        print(f'Cloud Build is now enabled for project: {PROJECT_ID}')
    else:
        print(response)
else:
    print(f'Cloud Build already enabled for project: {PROJECT_ID}')

Cloud Build already enabled for project: statmike-mlops-349915


---
## Training & Serving

### R Script for Training

This notebook trains the same R model from [08 - Vertex AI Custom Model - R - in Notebook](./08%20-%20Vertex%20AI%20Custom%20Model%20-%20R%20-%20in%20Notebook.ipynb) by first modifying and saving the training code to an R script as shown in [08 - Vertex Ai Custom Model - R - Notebook to Script](08%20-%20Vertex%20AI%20Custom%20Model%20-%20R%20-%20Notebook%20to%20Script.ipynb) which stores the script in [`./code/train.R`](./code/train.R).

**Review the script:**

In [37]:
SCRIPT_PATH = './code/train.R'

with open(SCRIPT_PATH, 'r') as file:
    data = file.read()
md(f"```R\n\n{data}\n```")

```R


# library import
library(bigrquery)
library(dplyr)

# inputs
args <- commandArgs(trailingOnly = TRUE)
project_id <- args[1]
region <- args[2]
experiment <- args[3]
series <- args[4]
bq_project <- args[5]
bq_dataset <- args[6]
bq_table <- args[7]
var_target <- args[8]
var_omit <- args[9]

# data source
get_data <- function(s){
    query = sprintf('SELECT * EXCEPT(%s, splits) FROM `%s.%s.%s` WHERE splits = "%s"', var_omit, bq_project, bq_dataset, bq_table, s)
    table <- bq_project_query(bq_project, query)
    ds <- bq_table_download(table)
    return(ds)
}
train <- get_data("TRAIN")
test <- get_data("TEST")

# logistic regression model
model <- glm(
    Class ~ .,
    data = train,
    family = binomial)

# predictions for evaluation
preds <- predict(model, test, type = "response")

# evaluate
actual <- test[, var_target]
names(actual) <- 'actual'
pred <- tibble(round(preds))
names(pred) <- 'pred'
results <- cbind(actual, pred)
cm <- table(results)

# save model to file
saveRDS(model, "model.rds")

```

Make a copy of the script in the notebooks temp folder and append code for saving to GCS model directory:

In [41]:
shutil.copyfile(SCRIPT_PATH, f'./{DIR}/train.R')

'./temp/08f/train.R'

In [43]:
%%writefile -a './temp/08f/train.R'

# use Vertex AI Training Pre-Defined Environment Variables to Write to GCS
Sys.getenv()
system2('gsutil', c('cp', 'model.rds', Sys.getenv('AIP_MODEL_DIR')))

Appending to ./temp/08f/train.R


### R Script for Serving

To serve the model, another script that uses [plumber](https://www.rplumber.io/) is created:

**Create the script:**

In [188]:
%%writefile './temp/08f/serve.R'

# library import
library(plumber)

# use Vertex AI Training Pre-Defined Environment Variables to Read from GCS
Sys.getenv()
system2('gsutil', c('cp', '-r', Sys.getenv('AIP_STORAGE_URI'), '.'))
system("du -a .")

# import model
model <- readRDS('artifacts/model.rds')

# prediction route function
predict_route <- function(req, res){
    print("Processing Prediction Request...")
    df <- as.data.frame(req$body$instances)
    preds <- predict(model, df, type = "response")
    return(list(predictions = preds))
}

# serving
print("Start Serving Process...")
pr() %>%
    pr_get(Sys.getenv("AIP_HEALTH_ROUTE"), function() "OK") %>%
    pr_post(Sys.getenv("AIP_PREDICT_ROUTE"), predict_route) %>%
    pr_run(host = "0.0.0.0", port=as.integer(Sys.getenv("AIP_HTTP_PORT", 8080)))

Overwriting ./temp/08f/serve.R


### Creating a Custom Container with Cloud Build

Cloud Build creates and manages the build on GCP.  The API creates a build by providing:
- location of the source
- instructions
- location to store the built artifacts

The instruction part of Cloud Build has options:
- Dockerfile
- Build Config file (YAML or JSON)
- Cloud Native Buildpacks

This notebook uses the approach of using the Python Client for Cloud Build and not referencing any local files.  For that reason, the first step is creating a Dockerfile for the workflow and storing it in GCS. The next step is running Cloud Build and using the client to specify the Build config rather than a config file.  The steps of the build config start with getting the code (git clone, or copy from GCS) and copying the Dockerfile.  

There are many workflows for creating containers with ML training code.  Many of the most common ones are explored in the tips notebook [Python Custom Containers](../Tips/Python%20Custom%20Containers.ipynb).  The method used here is the simplest - copy the training code directly into the container.  The other methods include packaging the training code as a Python Distribution and using `pip install` in from GCS, GitHub and even Artifact Registry as a private repository.

#### Create the Dockerfile
A basic dockerfile thats take the base image and copies the code in and add additional installs:

**Choose a BASE Image**

Here the base image is a pre-built deep-learning container from Google Cloud with the latest updates for `R` version 4.1.  Other sources for containers can be found here:
- [Deep Learning Containers](https://cloud.google.com/deep-learning-containers/docs/choosing-container)
    - list these with `gcloud container images list --repository="gcr.io/deeplearning-platform-release"`
- [Vertex AI Pre-Built Containers for Custom Training](https://cloud.google.com/vertex-ai/docs/training/pre-built-containers)
    - list these with `gcloud container images list --repository="us-docker.pkg.dev/vertex-ai/training"`
- [Vertex AI Pre-Built Containers for Prediction](https://cloud.google.com/vertex-ai/docs/predictions/pre-built-containers)
    - list these with `gcloud container images list --repository="us-docker.pkg.dev/vertex-ai/prediction"`

In [189]:
%%writefile './temp/08f/Dockerfile'
FROM gcr.io/deeplearning-platform-release/r-cpu.4-1:latest

WORKDIR /root

# copy code into /code folder
COPY ./*.R ./code/

RUN apt-get update
RUN apt-get install gfortran -yy
RUN R -e 'install.packages(c("plumber"))'

EXPOSE 8080

Overwriting ./temp/08f/Dockerfile


#### Store Resources in Cloud Storage

Copy from local folder (`DIR`) to GCS at the path `SERIES/EXPERIMENT/models/TIMESTAMP`:

In [190]:
!ls {DIR}

Dockerfile  serve.R  train.R


In [191]:
bucket = gcs.lookup_bucket(PROJECT_ID)
SOURCEPATH = f'{SERIES}/{EXPERIMENT}/models/{TIMESTAMP}'

In [192]:
for file in [f for f in os.listdir(DIR) if not f.startswith('.')]:
    print(file)
    blob = bucket.blob(f'{SOURCEPATH}/code/{file}')
    blob.upload_from_filename(f'{DIR}/{file}')

train.R
serve.R
Dockerfile


In [193]:
list(bucket.list_blobs(prefix = f'{SOURCEPATH}/code'))

[<Blob: statmike-mlops-349915, 08/08f/models/20221016190551/code/Dockerfile, 1665967041644546>,
 <Blob: statmike-mlops-349915, 08/08f/models/20221016190551/code/serve.R, 1665967041576709>,
 <Blob: statmike-mlops-349915, 08/08f/models/20221016190551/code/train.R, 1665967041497161>]

In [194]:
print(f"View the bucket directly here:\nhttps://console.cloud.google.com/storage/browser/{PROJECT_ID}/{SOURCEPATH}/code;tab=objects&project={PROJECT_ID}")

View the bucket directly here:
https://console.cloud.google.com/storage/browser/statmike-mlops-349915/08/08f/models/20221016190551/code;tab=objects&project=statmike-mlops-349915


#### Setup Artifact Registry

Artifact registry organizes artifacts with repositories.  Each repository contains packages and is designated to hold a partifcular format of package: Docker images, Python Packages and [others](https://cloud.google.com/artifact-registry/docs/supported-formats#package).

##### List Repositories

This may be empty if no repositories have been created for this project

In [195]:
for repo in ar_client.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    print(repo.name)

projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-docker
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-python


#### Create Docker Image Repository

Create an Artifact Registry Repository to hold Docker Images created by this notebook.  First, check to see if it is already created by a previous run and retrieve it if it has.  Otherwise, create!

In [196]:
docker_repo = None
for repo in ar_client.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    if repo.name.endswith(PROJECT_ID):
        docker_repo = repo
        print(f'Retrieved existing repo: {docker_repo.name}')

if not docker_repo:
    operation = ar_client.create_repository(
        request = artifactregistry_v1.CreateRepositoryRequest(
            parent = f'projects/{PROJECT_ID}/locations/{REGION}',
            repository_id = f'{PROJECT_ID}',
            repository = artifactregistry_v1.Repository(
                description = f'A repository for the {PROJECT_ID} project that holds docker images.',
                name = f'{PROJECT_ID}',
                format_ = artifactregistry_v1.Repository.Format.DOCKER,
                labels = {'series': SERIES, 'experiment': EXPERIMENT}
            )
        )
    )
    print('Creating Repository ...')
    docker_repo = operation.result()
    print(f'Completed creating repo: {docker_repo.name}')

Retrieved existing repo: projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915


In [197]:
print(docker_repo.format_.name, docker_repo.name)

DOCKER projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915


In [198]:
REPOSITORY = f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{docker_repo.name.split('/')[-1]}"
REPOSITORY

'us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915'

In [199]:
print(f'View the repository directly in the console here:\nhttps://console.cloud.google.com/artifacts/docker/{PROJECT_ID}/{REGION}/{PROJECT_ID}?project={PROJECT_ID}')

View the repository directly in the console here:
https://console.cloud.google.com/artifacts/docker/statmike-mlops-349915/us-central1/statmike-mlops-349915?project=statmike-mlops-349915


#### Build Custom Container
Use the Cloud Build client to construct and run the build instructions.  Here the files collected in GCS are copied to the build instance, then the Docker build in run in the folder with the `Dockerfile`.  The resulting image is pushed to Artifact Registry (setup above).

In [200]:
# setup the build config with empty list of steps - these will be added sequentially
build = cloudbuild_v1.Build(
    steps = []
)
# retrieve the source
build.steps.append(
    {
        'name': 'gcr.io/cloud-builders/gsutil',
        'args': ['cp', '-r', f'gs://{PROJECT_ID}/{SOURCEPATH}/code/*', '/workspace']
    }
)
# docker build
build.steps.append(
    {
        'name': 'gcr.io/cloud-builders/docker',
        'args': ['build', '-t', f'{REPOSITORY}/{EXPERIMENT}', '/workspace']
    }    
)
# docker push
build.images = [f"{REPOSITORY}/{EXPERIMENT}"]

In [201]:
build

steps {
  name: "gcr.io/cloud-builders/gsutil"
  args: "cp"
  args: "-r"
  args: "gs://statmike-mlops-349915/08/08f/models/20221016190551/code/*"
  args: "/workspace"
}
steps {
  name: "gcr.io/cloud-builders/docker"
  args: "build"
  args: "-t"
  args: "us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/08f"
  args: "/workspace"
}
images: "us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/08f"

In [202]:
operation = cb_client.create_build(
    project_id = PROJECT_ID,
    build = build
)

In [203]:
response = operation.result()
response.status, response.artifacts

(<Status.SUCCESS: 3>,
 images: "us-central1-docker.pkg.dev/statmike-mlops-349915/statmike-mlops-349915/08f")

In [204]:
print(f"Review the Custom Container with Artifact Registry in the Google Cloud Console:\nhttps://console.cloud.google.com/artifacts/docker/{PROJECT_ID}/{REGION}/{PROJECT_ID}-docker?project={PROJECT_ID}")

Review the Custom Container with Artifact Registry in the Google Cloud Console:
https://console.cloud.google.com/artifacts/docker/statmike-mlops-349915/us-central1/statmike-mlops-349915-docker?project=statmike-mlops-349915


### Setup Training Job

In [205]:
CMDARGS = [
    "--project_id=" + PROJECT_ID,
    "--region=" + REGION,
    "--experiment=" + EXPERIMENT,
    "--series=" + SERIES,
    "--bq_project=" + BQ_PROJECT,
    "--bq_dataset=" + BQ_DATASET,
    "--bq_table=" + BQ_TABLE,
    "--var_target=" + VAR_TARGET,
    "--var_omit=" + VAR_OMIT,
]

R code using `commandArgs()` does not used named parameters so parse `CMDARGS` for the `R` script:

In [159]:
CMDARGS = [c.split('=')[-1] for c in CMDARGS]
CMDARGS

['statmike-mlops-349915',
 'us-central1',
 '08f',
 '08',
 'statmike-mlops-349915',
 'fraud',
 'fraud_prepped',
 'Class',
 'transaction_id']

In [162]:
trainingJob = aiplatform.CustomContainerTrainingJob(
    display_name = f'{SERIES}_{EXPERIMENT}_{TIMESTAMP}',
    container_uri = f"{REPOSITORY}/{EXPERIMENT}",
    command = ["Rscript", "./code/train.R"],
    model_serving_container_image_uri = f"{REPOSITORY}/{EXPERIMENT}",
    model_serving_container_command = ["Rscript", "./code/serve.R"],
    staging_bucket = f"{URI}/models/{TIMESTAMP}",
    labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'}
)

### Run Training Job AND Upload The Model
The training job will automatically upload the model to the Vertex AI Model Registry and return the link to the model.

In [163]:
modelmatch = aiplatform.Model.list(filter = f'display_name={SERIES}_{EXPERIMENT} AND labels.series={SERIES} AND labels.experiment={EXPERIMENT}')

upload_model = True
if modelmatch:
    print("Model Already in Registry:")
    if RUN_NAME in modelmatch[0].version_aliases:
        print("This version already loaded, no action taken.")
        upload_model = False
        model = aiplatform.Model(model_name = modelmatch[0].resource_name)
    else:
        print('Loading model as new default version.')
        parent_model = aiplatform.Model(model_name = modelmatch[0].resource_name)
else:
    print('This is a new model, adding to model registry as version 1')
    parent_model = ''
    
model = trainingJob.run(
    model_display_name = f'{SERIES}_{EXPERIMENT}',
    model_labels = {'series' : f'{SERIES}', 'experiment' : f'{EXPERIMENT}', 'experiment_name' : f'{EXPERIMENT_NAME}', 'run_name' : f'{RUN_NAME}'},
    model_id = f'model_{SERIES}_{EXPERIMENT}',
    parent_model = parent_model,
    is_default_version = True,
    model_version_aliases = [RUN_NAME],
    model_version_description = RUN_NAME,
    base_output_dir = f"{URI}/models/{TIMESTAMP}",
    service_account = SERVICE_ACCOUNT,
    args = CMDARGS,
    replica_count = 1,
    machine_type = TRAIN_COMPUTE,
    accelerator_count = 0
)

This is a new model, addint to model registry as version 1
Training Output directory:
gs://statmike-mlops-349915/08/08f/models/20221016190551 
View Training:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/4805088739363651584?project=1026793852137
CustomContainerTrainingJob projects/1026793852137/locations/us-central1/trainingPipelines/4805088739363651584 current state:
PipelineState.PIPELINE_STATE_RUNNING
View backing custom job:
https://console.cloud.google.com/ai/platform/locations/us-central1/training/6197985253802377216?project=1026793852137
CustomContainerTrainingJob projects/1026793852137/locations/us-central1/trainingPipelines/4805088739363651584 current state:
PipelineState.PIPELINE_STATE_RUNNING
CustomContainerTrainingJob projects/1026793852137/locations/us-central1/trainingPipelines/4805088739363651584 current state:
PipelineState.PIPELINE_STATE_RUNNING
CustomContainerTrainingJob projects/1026793852137/locations/us-central1/trainingPipelines/48050

Get the backing Custom Job for the Training Pipeline:

In [164]:
clientPL = aiplatform.gapic.PipelineServiceClient(client_options = {'api_endpoint': f'{REGION}-aiplatform.googleapis.com'})

In [165]:
from google.protobuf.json_format import MessageToDict

backingCustomJob = MessageToDict(clientPL.get_training_pipeline(name = trainingJob.resource_name)._pb)['trainingTaskMetadata']['backingCustomJob']

In [166]:
customJob = aiplatform.CustomJob.get(backingCustomJob)
customJob.resource_name, customJob.display_name

('projects/1026793852137/locations/us-central1/customJobs/6197985253802377216',
 '08_08f_20221016190551-custom-job')

Create hyperlinks to job and tensorboard here:

In [167]:
job_link = f"https://console.cloud.google.com/vertex-ai/locations/{REGION}/training/{customJob.resource_name.split('/')[-1]}/cpu?cloudshell=false&project={PROJECT_ID}"

print(f'Review the Training Pipeline here:\nhttps://console.cloud.google.com/vertex-ai/training/training-pipelines?project={PROJECT_ID}')
print(f'Review the Custom Job here:\n{job_link}')
print(f'Review the model in the Vertex AI Model Registry:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/models/{model.name}?project={PROJECT_ID}')

Review the Training Pipeline here:
https://console.cloud.google.com/vertex-ai/training/training-pipelines?project=statmike-mlops-349915
Review the Custom Job here:
https://console.cloud.google.com/vertex-ai/locations/us-central1/training/6197985253802377216/cpu?cloudshell=false&project=statmike-mlops-349915
Review the model in the Vertex AI Model Registry:
https://console.cloud.google.com/vertex-ai/locations/us-central1/models/model_08_08f?project=statmike-mlops-349915


---
## Serving

### Create/Retrieve The Endpoint For This Series

In [211]:
endpoints = aiplatform.Endpoint.list(filter = f"labels.series={SERIES}")
if endpoints:
    endpoint = endpoints[0]
    print(f"Endpoint Exists: {endpoints[0].resource_name}")
else:
    endpoint = aiplatform.Endpoint.create(
        display_name = f"{SERIES}",
        labels = {'series' : f"{SERIES}"}    
    )
    print(f"Endpoint Created: {endpoint.resource_name}")
    
print(f'Review the Endpoint in the Console:\nhttps://console.cloud.google.com/vertex-ai/locations/{REGION}/endpoints/{endpoint.name}?project={PROJECT_ID}')

Endpoint Exists: projects/1026793852137/locations/us-central1/endpoints/6678420432971890688
Review the Endpoint in the Console:
https://console.cloud.google.com/vertex-ai/locations/us-central1/endpoints/6678420432971890688?project=statmike-mlops-349915


In [212]:
endpoint.display_name

'08'

In [213]:
endpoint.traffic_split

{'6117352843457331200': 100}

In [214]:
deployed_models = endpoint.list_models()
#deployed_models

### Deploy Model To Endpoint

In [215]:
print(f'Deploying model with 100% of traffic...')
endpoint.deploy(
    model = model,
    deployed_model_display_name = model.display_name,
    traffic_percentage = 100,
    machine_type = DEPLOY_COMPUTE,
    min_replica_count = 1,
    max_replica_count = 1
)

Deploying model with 100% of traffic...
Deploying Model projects/1026793852137/locations/us-central1/models/model_08_08f to Endpoint : projects/1026793852137/locations/us-central1/endpoints/6678420432971890688
Deploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/6678420432971890688/operations/6347399606887776256
Endpoint model deployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/6678420432971890688


### Remove Deployed Models without Traffic

In [216]:
for deployed_model in endpoint.list_models():
    if deployed_model.id in endpoint.traffic_split:
        print(f"Model {deployed_model.display_name} with version {deployed_model.model_version_id} has traffic = {endpoint.traffic_split[deployed_model.id]}")
    else:
        endpoint.undeploy(deployed_model_id = deployed_model.id)
        print(f"Undeploying {deployed_model.display_name} with version {deployed_model.model_version_id} because it has no traffic.")

Model 08_08f with version 1 has traffic = 100
Undeploying Endpoint model: projects/1026793852137/locations/us-central1/endpoints/6678420432971890688
Undeploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/6678420432971890688/operations/5478204878805270528
Endpoint model undeployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/6678420432971890688
Undeploying 08_08f with version 1 because it has no traffic.


In [217]:
endpoint.traffic_split

{'1908738991679602688': 100}

In [218]:
#endpoint.list_models()

---
## Prediction



### Prepare a record for prediction: instance and parameters lists

In [219]:
pred = bq.query(query = f"SELECT * FROM {BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE} WHERE splits='TEST' LIMIT 10").to_dataframe()

In [220]:
pred.head(4)

Unnamed: 0,Time,V1,V2,V3,V4,V5,V6,V7,V8,V9,...,V23,V24,V25,V26,V27,V28,Amount,Class,transaction_id,splits
0,35337,1.092844,-0.01323,1.359829,2.731537,-0.707357,0.873837,-0.79613,0.437707,0.39677,...,-0.167647,0.027557,0.592115,0.219695,0.03697,0.010984,0.0,0,a1b10547-d270-48c0-b902-7a0f735dadc7,TEST
1,60481,1.238973,0.035226,0.063003,0.641406,-0.260893,-0.580097,0.049938,-0.034733,0.405932,...,-0.057718,0.104983,0.537987,0.589563,-0.046207,-0.006212,0.0,0,814c62c8-ade4-47d5-bf83-313b0aafdee5,TEST
2,139587,1.870539,0.211079,0.224457,3.889486,-0.380177,0.249799,-0.577133,0.179189,-0.120462,...,0.180776,-0.060226,-0.228979,0.080827,0.009868,-0.036997,0.0,0,d08a1bfa-85c5-4f1b-9537-1c5a93e6afd0,TEST
3,162908,-3.368339,-1.980442,0.153645,-0.159795,3.847169,-3.516873,-1.209398,-0.292122,0.760543,...,-1.171627,0.214333,-0.159652,-0.060883,1.294977,0.120503,0.0,0,802f3307-8e5a-4475-b795-5d5d8d7d0120,TEST


In [221]:
newob = pred[pred.columns[~pred.columns.isin(VAR_OMIT.split()+[VAR_TARGET, 'splits'])]].to_dict(orient='records')[0]
#newob

In [222]:
newob

{'Time': 35337,
 'V1': 1.0928441854981998,
 'V2': -0.0132303486713432,
 'V3': 1.35982868199426,
 'V4': 2.7315370965921004,
 'V5': -0.707357349219652,
 'V6': 0.8738370029866129,
 'V7': -0.7961301510622031,
 'V8': 0.437706509544851,
 'V9': 0.39676985012996396,
 'V10': 0.587438102569443,
 'V11': -0.14979756231827498,
 'V12': 0.29514781622888103,
 'V13': -1.30382621882143,
 'V14': -0.31782283120234495,
 'V15': -2.03673231037199,
 'V16': 0.376090905274179,
 'V17': -0.30040350116459497,
 'V18': 0.433799615590844,
 'V19': -0.145082264348681,
 'V20': -0.240427548108996,
 'V21': 0.0376030733329398,
 'V22': 0.38002620963091405,
 'V23': -0.16764742731151097,
 'V24': 0.0275573495476881,
 'V25': 0.59211469704354,
 'V26': 0.219695164116351,
 'V27': 0.0369695108704894,
 'V28': 0.010984441006191,
 'Amount': 0.0}

In [241]:
instances = [json_format.ParseDict(newob, Value())]

In [243]:
instances = [newob]

In [244]:
instances

[{'Time': 35337,
  'V1': 1.0928441854981998,
  'V2': -0.0132303486713432,
  'V3': 1.35982868199426,
  'V4': 2.7315370965921004,
  'V5': -0.707357349219652,
  'V6': 0.8738370029866129,
  'V7': -0.7961301510622031,
  'V8': 0.437706509544851,
  'V9': 0.39676985012996396,
  'V10': 0.587438102569443,
  'V11': -0.14979756231827498,
  'V12': 0.29514781622888103,
  'V13': -1.30382621882143,
  'V14': -0.31782283120234495,
  'V15': -2.03673231037199,
  'V16': 0.376090905274179,
  'V17': -0.30040350116459497,
  'V18': 0.433799615590844,
  'V19': -0.145082264348681,
  'V20': -0.240427548108996,
  'V21': 0.0376030733329398,
  'V22': 0.38002620963091405,
  'V23': -0.16764742731151097,
  'V24': 0.0275573495476881,
  'V25': 0.59211469704354,
  'V26': 0.219695164116351,
  'V27': 0.0369695108704894,
  'V28': 0.010984441006191,
  'Amount': 0.0}]

### Get Predictions: Python Client

In [245]:
prediction = endpoint.predict(instances = json.dumps(instances))
prediction

InternalServerError: 500 {"error":"500 - Internal server error"}

In [185]:
prediction.predictions[0]

NameError: name 'prediction' is not defined

In [186]:
np.argmax(prediction.predictions[0])

NameError: name 'prediction' is not defined

### Get Predictions: REST

In [246]:
with open(f'{DIR}/request.json','w') as file:
    file.write(json.dumps({"instances": [newob]}))

thought:
```
-d @{DIR}/request.json \
-d {json.dumps(...)}
```

In [247]:
!curl -X POST \
-H "Authorization: Bearer "$(gcloud auth application-default print-access-token) \
-H "Content-Type: application/json; charset=utf-8" \
-d @{DIR}/request.json \
https://{REGION}-aiplatform.googleapis.com/v1/{endpoint.resource_name}:predict

<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 502 (Server Error)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//w

### Get Predictions: gcloud (CLI)

In [210]:
!gcloud beta ai endpoints predict {endpoint.name.rsplit('/',1)[-1]} --region={REGION} --json-request={DIR}/request.json

Using endpoint [https://us-central1-prediction-aiplatform.googleapis.com/]
[1;31mERROR:[0m (gcloud.beta.ai.endpoints.predict) HttpError accessing <https://us-central1-prediction-aiplatform.googleapis.com/v1beta1/projects/statmike-mlops-349915/locations/us-central1/endpoints/6678420432971890688:predict?alt=json>: response: <{'content-type': 'text/html; charset=UTF-8', 'referrer-policy': 'no-referrer', 'content-length': '1613', 'date': 'Mon, 17 Oct 2022 00:47:21 GMT', 'status': 502}>, content <<!DOCTYPE html>
<html lang=en>
  <meta charset=utf-8>
  <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">
  <title>Error 502 (Server Error)!!1</title>
  <style>
    *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 

---
## Remove Resources
see notebook "99 - Cleanup"