![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FFramework+Workflows%2FCatBoost&file=CatBoost+Custom+Prediction+With+FastAPI.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/Framework%20Workflows/CatBoost/CatBoost%20Custom%20Prediction%20With%20FastAPI.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FFramework%2520Workflows%2FCatBoost%2FCatBoost%2520Custom%2520Prediction%2520With%2520FastAPI.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/Framework%20Workflows/CatBoost/CatBoost%20Custom%20Prediction%20With%20FastAPI.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/Framework%20Workflows/CatBoost/CatBoost%20Custom%20Prediction%20With%20FastAPI.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# CatBoost: Custom Prediction With FastAPI

Build a [custom container for prediction](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements) with [CatBoost](https://catboost.ai/) using [FastAPI](https://fastapi.tiangolo.com/) and store in [Artifact  Registry](https://cloud.google.com/artifact-registry/docs/overview) for enabling serving predictions from:
- Locally for testing (with Docker): on [Vertex AI Workbench Instances](https://cloud.google.com/vertex-ai/docs/workbench/instances/introduction) for example
- On [Cloud Run](https://cloud.google.com/run/docs/overview/what-is-cloud-run)
- On [Vertex AI Prediction Endpoints](https://cloud.google.com/vertex-ai/docs/general/deployment)

**Prerequisites:**
- [CatBoost In Notebook](./CatBoost%20In%20Notebook.ipynb)
    - Train the model used here and store it in GCS

---
## Colab Setup

To run this notebook in Colab run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [4]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [5]:
try:
    import google.colab
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
except Exception:
    pass

---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [292]:
# tuples of (import name, install name, min_version)
packages = [
    ('numpy', 'numpy'),
    ('catboost', 'catboost'),
    ('docker', 'docker'),
    ('google.cloud.aiplatform', 'google-cloud-aiplatform'),
    ('google.cloud.storage', 'google-cloud-storage'),
    ('google.cloud.artifactregistry_v1', 'google-cloud-artifact-registry'),
    ('google.cloud.devtools', 'google-cloud-build'),
    ('google.cloud.run_v2', 'google-cloud-run'),   
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### API Enablement

In [286]:
!gcloud services enable artifactregistry.googleapis.com
!gcloud services enable cloudbuild.googleapis.com
!gcloud services enable run.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [287]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

inputs:

In [288]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [289]:
REGION = 'us-central1'
SERIES = 'frameworks-catboost'
EXPERIMENT = 'custom-container'

GCS_BUCKET = PROJECT_ID

packages:

In [534]:
import json, os
import time
import requests

import catboost 
import numpy as np
import docker

import google.auth
from google.cloud import storage
from google.cloud import artifactregistry_v1
from google.cloud.devtools import cloudbuild_v1
from google.cloud import run_v2
from google.cloud import aiplatform

clients:

In [295]:
# gcs storage client
gcs = storage.Client(project = GCS_BUCKET)
bucket = gcs.bucket(GCS_BUCKET)

# cloud build client
cb = cloudbuild_v1.CloudBuildClient()

# artifact registry client
ar = artifactregistry_v1.ArtifactRegistryClient()

# cloud run client
cr = run_v2.ServicesClient()

# vertex ai client
aiplatform.init(project = PROJECT_ID, location = REGION)

Parameters:

In [31]:
DIR = f"files/{EXPERIMENT}"

Environment:

In [34]:
if not os.path.exists(DIR):
    os.makedirs(DIR)

---
## CatBoost Model

Retrieve the model trained in prior workflow along with test records.  Test the model directly in this environment.

### Check For Files

In [18]:
files = list(bucket.list_blobs(prefix = f'{SERIES}/notebook'))
if len(files) > 0:
    print('Found the files created by the prerequisite workflow:')
    for file in files:
        print(f'- gs://{bucket.name}/{file.name}')
else:
    print('Files note found - Please run the prerequisite notebook (listed at top of this workflow)')

Found the files created by the prerequisite workflow:
- gs://statmike-mlops-349915/frameworks-catboost/notebook/examples.json
- gs://statmike-mlops-349915/frameworks-catboost/notebook/model.cbm


### Load Model

In [46]:
model_blob = bucket.blob(f'{SERIES}/notebook/model.cbm')
model_bytes = model_blob.download_as_bytes()
model = catboost.CatBoostClassifier().load_model(blob = model_bytes)

### Load Inference Examples

In [47]:
examples_blob = bucket.blob(f'{SERIES}/notebook/examples.json')
examples_np = np.array(
    json.loads(examples_blob.download_as_string())
)

### Test Model With Examples

In [48]:
model.predict(examples_np)

array([1, 1, 1, 1, 0, 0, 1, 1, 1, 1])

---
## Build A Custom Prediction Container

It is really not all that hard with Python!

For this example [FastAPI](https://fastapi.tiangolo.com/) is used.

This process uses docker to build a custom container and then runs the container on Cloud Run.

This could be done locally with Docker and pushed to Artifact Registry before deployment to Cloud Run.  The process below assumes that docker is not available locally and used Cloud Build to both build and push the resulting container to Artifact Registry.

---
### Setup Artifact Registry

[Artifact registry](https://cloud.google.com/artifact-registry/docs) organizes artifacts with repositories.  Each repository contains packages and is designated to hold a partifcular format of package: Docker images, Python Packages and [others](https://cloud.google.com/artifact-registry/docs/supported-formats#package).

#### List Repositories

This may be empty if no repositories have been created for this project

In [26]:
for repo in ar.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    print(repo.name)

projects/statmike-mlops-349915/locations/us-central1/repositories/gcf-artifacts
projects/statmike-mlops-349915/locations/us-central1/repositories/mlops
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-docker
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-python


#### Create/Retrieve Docker Image Repository

Create an Artifact Registry Repository to hold Docker Images created by this notebook.  First, check to see if it is already created by a previous run and retrieve it if it has.  Otherwise, create one named for this project.

In [27]:
docker_repo = None
for repo in ar.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    if f'{SERIES}' == repo.name.split('/')[-1]:
        docker_repo = repo
        print(f'Retrieved existing repo: {docker_repo.name}')

if not docker_repo:
    operation = ar.create_repository(
        request = artifactregistry_v1.CreateRepositoryRequest(
            parent = f'projects/{PROJECT_ID}/locations/{REGION}',
            repository_id = f'{SERIES}',
            repository = artifactregistry_v1.Repository(
                description = f'A repository for the {SERIES} series that holds docker images.',
                name = f'{SERIES}',
                format_ = artifactregistry_v1.Repository.Format.DOCKER,
                labels = {'series': SERIES}
            )
        )
    )
    print('Creating Repository ...')
    docker_repo = operation.result()
    print(f'Completed creating repo: {docker_repo.name}')

Creating Repository ...
Completed creating repo: projects/statmike-mlops-349915/locations/us-central1/repositories/frameworks-catboost


In [28]:
docker_repo.name, docker_repo.format_.name

('projects/statmike-mlops-349915/locations/us-central1/repositories/frameworks-catboost',
 'DOCKER')

In [29]:
REPOSITORY = f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{docker_repo.name.split('/')[-1]}"

In [30]:
REPOSITORY

'us-central1-docker.pkg.dev/statmike-mlops-349915/frameworks-catboost'

---
### Create Application Files

```
|__ Dockerfile
|__ requirements.txt
|__ app
    |__ __init__.py
    |__ main.py
    |__ prestart.sh
```

In [37]:
if not os.path.exists(DIR + '/source/app'):
    os.makedirs(DIR + '/source/app')

In [59]:
%%writefile {DIR}/source/Dockerfile
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.9

COPY ./app /app
COPY ./requirements.txt requirements.txt

RUN pip install --no-cache-dir --upgrade pip \
  && pip install --no-cache-dir -r requirements.txt

Writing files/custom-container/source/Dockerfile


In [60]:
%%writefile {DIR}/source/requirements.txt
google-cloud-storage
catboost
numpy

Writing files/custom-container/source/requirements.txt


In [61]:
%%writefile {DIR}/source/app/__init__.py
# init file

Overwriting files/custom-container/source/app/__init__.py


In [179]:
%%writefile {DIR}/source/app/main.py
# this version:
# - inputs to be json like {'instances': [[list],[list], ...]}
# - outputs in json like {'predictions': [[list],[list], ...]}
# trying to adhere to Vetex Endpoints Requirements:
# - https://cloud.google.com/vertex-ai/docs/predictions/get-online-predictions

# packages
import os
from fastapi import FastAPI, Request
import catboost
import numpy as np
from google.cloud import storage

# clients
app = FastAPI()
gcs = storage.Client()

# download the model file from GCS
paths = os.environ['AIP_STORAGE_URI'].split('/') + ['model.cbm']
bucket = gcs.bucket(paths[2])
blob = bucket.blob('/'.join(paths[3:]))
blob.download_to_filename('model.cbm')

# Load the catboost model
_model = catboost.CatBoostClassifier().load_model('model.cbm')

# get model classification levels
classes = [str(c) for c in list(_model.classes_)]

# Define function for health route
@app.get(os.environ['AIP_HEALTH_ROUTE'], status_code=200)
def health():
    return {}

# Define function for prediction route
@app.post(os.environ['AIP_PREDICT_ROUTE'])
async def predict(request: Request):
    # await the request
    body = await request.json()
    
    # parse the request
    instances = body["instances"]
    
    # get predicted probabilities
    predictions = _model.predict_proba(instances).tolist()    

    # this returns just the predicted probabilities:
    return {"predictions": predictions}

Overwriting files/custom-container/source/app/main.py


In [254]:
%%writefile {DIR}/source/app/main2.py
# this version:
# - inputs to be json like {'instances': [[list],[list], ...]}
# - outputs in json like {'predictions': [{'classes': list, 'scores': list, 'predicted_class': str}, ...]}
# trying to adhere to Vetex Endpoints Requirements:
# - https://cloud.google.com/vertex-ai/docs/predictions/get-online-predictions

# packages
import os
from fastapi import FastAPI, Request
import catboost
import numpy as np
from google.cloud import storage

# clients
app = FastAPI()
gcs = storage.Client()

# download the model file from GCS
paths = os.environ['AIP_STORAGE_URI'].split('/') + ['model.cbm']
bucket = gcs.bucket(paths[2])
blob = bucket.blob('/'.join(paths[3:]))
blob.download_to_filename('model.cbm')

# Load the catboost model
_model = catboost.CatBoostClassifier().load_model('model.cbm')

# get model classification levels
classes = [str(c) for c in list(_model.classes_)]

# Define function for health route
@app.get(os.environ['AIP_HEALTH_ROUTE'], status_code=200)
def health():
    return {}

# Define function for prediction route
@app.post(os.environ['AIP_PREDICT_ROUTE'])
async def predict(request: Request):
    # await the request
    body = await request.json()
    
    # parse the request
    instances = body["instances"]
    
    # get predicted probabilities
    predictions = _model.predict_proba(instances).tolist()
    
    # format predictions:
    preds = [dict(classes = classes, scores = p, predicted_class = classes[np.argmax(p)]) for p in predictions]
    
    # following outputs detail prediction info for classification:
    return {"predictions": preds}

Writing files/custom-container/source/app/main2.py


In [255]:
%%writefile {DIR}/source/app/prestart.sh
#!/bin/bash
export PORT=$AIP_HTTP_PORT

Overwriting files/custom-container/source/app/prestart.sh


In [256]:
bucket.blob(f'{SERIES}/{EXPERIMENT}/source/Dockerfile').upload_from_filename(f'{DIR}/source/Dockerfile')
bucket.blob(f'{SERIES}/{EXPERIMENT}/source/requirements.txt').upload_from_filename(f'{DIR}/source/requirements.txt')
bucket.blob(f'{SERIES}/{EXPERIMENT}/source/app/__init__.py').upload_from_filename(f'{DIR}/source/app/__init__.py')
bucket.blob(f'{SERIES}/{EXPERIMENT}/source/app/main.py').upload_from_filename(f'{DIR}/source/app/main.py')
bucket.blob(f'{SERIES}/{EXPERIMENT}/source/app/main2.py').upload_from_filename(f'{DIR}/source/app/main2.py')
bucket.blob(f'{SERIES}/{EXPERIMENT}/source/app/prestart.sh').upload_from_filename(f'{DIR}/source/app/prestart.sh')

In [257]:
list(bucket.list_blobs(prefix = f'{SERIES}/{EXPERIMENT}/source'))

[<Blob: statmike-mlops-349915, frameworks-catboost/custom-container/source/Dockerfile, 1727876238020014>,
 <Blob: statmike-mlops-349915, frameworks-catboost/custom-container/source/app/__init__.py, 1727876238252393>,
 <Blob: statmike-mlops-349915, frameworks-catboost/custom-container/source/app/main.py, 1727876238328405>,
 <Blob: statmike-mlops-349915, frameworks-catboost/custom-container/source/app/main2.py, 1727876238397140>,
 <Blob: statmike-mlops-349915, frameworks-catboost/custom-container/source/app/prestart.sh, 1727876238484887>,
 <Blob: statmike-mlops-349915, frameworks-catboost/custom-container/source/requirements.txt, 1727876238174388>]

---
### Build Application Container

Use the Cloud Build client to construct and run the build instructions. Here the files collected in GCS are copied to the build instance, then the Docker build is run in the folder with the `Dockerfile`. The resulting image is pushed to Artifact Registry (setup above).

In [258]:
# setup the build config with empty list of steps - these will be added sequentially
build = cloudbuild_v1.Build(
    steps = []
)
# retrieve the source
build.steps.append(
    {
        'name': 'gcr.io/cloud-builders/gsutil',
        'args': ['cp', '-r', f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/source/*', '/workspace']
    }
)
# docker build
build.steps.append(
    {
        'name': 'gcr.io/cloud-builders/docker',
        'args': ['build', '-t', f'{REPOSITORY}/{EXPERIMENT}', '/workspace']
    }    
)
# docker push
build.images = [f"{REPOSITORY}/{EXPERIMENT}"]

In [259]:
operation = cb.create_build(
    project_id = PROJECT_ID,
    build = build
)

In [260]:
build_response = operation.result()
build_response.status, build_response.artifacts

(<Status.SUCCESS: 3>,
 images: "us-central1-docker.pkg.dev/statmike-mlops-349915/frameworks-catboost/custom-container")

In [261]:
build_response.artifacts.images[0]

'us-central1-docker.pkg.dev/statmike-mlops-349915/frameworks-catboost/custom-container'

---
## Test Locally

If Docker is installed and running locally then use it to test the image.

In [262]:
try:
    local_test = True
    docker_client = docker.from_env()
    if docker_client.ping():
        print(f"Docker is installed and running. Version: {docker_client.version()['Version']}")
except Exception as e:
    local_test = False
    print('Docker is either not installed or not running - please fix before proceeding.\nhttps://docs.docker.com/engine/install/')

Docker is installed and running. Version: 20.10.17


### Pull and Run Container

Run the container image with:
- ports: inside 8080 mapped to outside 80
- set environment variables for:
    - `AIP_HTTP_PORT` is `8080`
    - `AIP_HEALTH_ROUTE` is `/health`
    - `AIP_PREDICT_ROUTE` is `/predict`
    - `AIP_STORAGE_URI` is the `gs://bucket/path/to/folder`
    - `MODULE_NAME` is 'main'
        - this actually defaults to main so is not required
        - an alternative script with different prediction output is created in `main2.py` above
        - use this environment variable to start the container using the alternative script in module `main2`
        - see the [FastAPI Docker Image Advanced Usage](https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker?tab=readme-ov-file#advanced-usage) details

In [271]:
if local_test:
    # make sure any prior runs are stopped:
    try:
        container = docker_client.containers.get('local-run')
        container.stop()
        container.remove()
    except docker.errors.NotFound:
        pass
    
    # get image:
    image_uri = build_response.artifacts.images[0]
    try:
        local_image = docker_client.images.get(image_uri)
        remote_image = docker_client.images.pull(image_uri)
        if local_image.id != remote_image.id:
            print('New image found, updating ...')
            local_image = remote_image
        else:
            print('Using existing image ...')
    except docker.errors.ImageNotFound:
        print('Pulling image ...')
        local_image = docker_client.images.pull(image_uri)
        
    # run container:
    print('Starting container ...')
    container = docker_client.containers.run(
        image = image_uri,
        detach = True,
        ports = {'8080/tcp':80}, # Map inside:outside (where docker run -p is outside:inside)
        name = 'local-run',
        environment = {
            'AIP_HTTP_PORT': '8080',
            'AIP_HEALTH_ROUTE': '/health',
            'AIP_PREDICT_ROUTE': '/predict',
            'AIP_STORAGE_URI': f'gs://{bucket.name}/{SERIES}/notebook',
            'MODULE_NAME': 'main2' # try main2 for alternative output
        }
    )
    print('Container ready.\n\tUse `container.logs()` to view startup logs.')

Using existing image ...
Starting container ...
Container ready.
	Use `container.logs()` to view startup logs.


In [272]:
#container.logs()

### Health Check

Want to see `200`:

In [273]:
if local_test:
    response = requests.get(f"http://localhost:80/health")
    print(response.status_code)

200


### Inference Test

In [274]:
def predict(instances):
    url = f"http://localhost:80/predict"
    headers = {'Content_Type': 'application/json'}
    data = json.dumps({'instances': instances})
    response = requests.post(url, headers = headers, data = data)    
    return json.loads(response.text)

In [275]:
predict(examples_np[0:1].tolist())

{'predictions': [{'classes': ['0', '1'],
   'scores': [0.01301489502678177, 0.9869851049732182],
   'predicted_class': '1'}]}

In [276]:
predict(examples_np[1:2].tolist())

{'predictions': [{'classes': ['0', '1'],
   'scores': [0.4290084852039555, 0.5709915147960445],
   'predicted_class': '1'}]}

In [277]:
predict(examples_np[0:2].tolist())

{'predictions': [{'classes': ['0', '1'],
   'scores': [0.01301489502678177, 0.9869851049732182],
   'predicted_class': '1'},
  {'classes': ['0', '1'],
   'scores': [0.4290084852039555, 0.5709915147960445],
   'predicted_class': '1'}]}

In [278]:
predict(examples_np.tolist())

{'predictions': [{'classes': ['0', '1'],
   'scores': [0.01301489502678177, 0.9869851049732182],
   'predicted_class': '1'},
  {'classes': ['0', '1'],
   'scores': [0.4290084852039555, 0.5709915147960445],
   'predicted_class': '1'},
  {'classes': ['0', '1'],
   'scores': [0.009615226297050805, 0.9903847737029492],
   'predicted_class': '1'},
  {'classes': ['0', '1'],
   'scores': [0.0013530555055006888, 0.9986469444944993],
   'predicted_class': '1'},
  {'classes': ['0', '1'],
   'scores': [0.9998299274177335, 0.00017007258226652255],
   'predicted_class': '0'},
  {'classes': ['0', '1'],
   'scores': [0.9993910308803445, 0.0006089691196554726],
   'predicted_class': '0'},
  {'classes': ['0', '1'],
   'scores': [0.007042458810377461, 0.9929575411896225],
   'predicted_class': '1'},
  {'classes': ['0', '1'],
   'scores': [0.011614043033476573, 0.9883859569665234],
   'predicted_class': '1'},
  {'classes': ['0', '1'],
   'scores': [0.014325467849923612, 0.9856745321500764],
   'predicted

### Stop Container

In [279]:
container.name

'local-run'

In [280]:
container = docker_client.containers.get(container.name)

In [281]:
container.status

'running'

In [282]:
container.stop()
container.remove()

---
## Vertex AI Prediction Endpoint

Register the model in the [Vertex AI Model Registry](https://cloud.google.com/vertex-ai/docs/model-registry/introduction) and [Deploy it to an endpoint](https://cloud.google.com/vertex-ai/docs/general/deployment).

### Model Registry

Check for existing version of the model:

In [586]:
parent_model = ''
for model in aiplatform.Model.list(filter=f'display_name="{SERIES}"'):
    parent_model = model.resource_name
    break
parent_model

''

Upload the model to the registry with different versions for:
- plain responses - using main.py which is default
- formated responses - using main2.py which is set with environment variable

In [587]:
vertex_model = aiplatform.Model.upload(
    display_name = SERIES,
    model_id = SERIES,
    parent_model = parent_model,
    serving_container_image_uri = build_response.artifacts.images[0],
    artifact_uri = f'gs://{bucket.name}/{SERIES}/notebook',
    is_default_version = True,
    version_aliases = ['plain'],
    labels = {'series': SERIES, 'experiment': EXPERIMENT}
)

Creating Model
Create Model backing LRO: projects/1026793852137/locations/us-central1/models/frameworks-catboost/operations/7283139355663663104
Model created. Resource name: projects/1026793852137/locations/us-central1/models/frameworks-catboost@1
To use this Model in another session:
model = aiplatform.Model('projects/1026793852137/locations/us-central1/models/frameworks-catboost@1')


In [588]:
vertex_model = aiplatform.Model.upload(
    parent_model = vertex_model.resource_name,
    serving_container_image_uri = build_response.artifacts.images[0],
    serving_container_environment_variables = {'MODULE_NAME': 'main2'},
    artifact_uri = f'gs://{bucket.name}/{SERIES}/notebook',
    is_default_version = False,
    version_aliases = ['formatted'],
    labels = {'series': SERIES, 'experiment': EXPERIMENT}
)

Creating Model
Create Model backing LRO: projects/1026793852137/locations/us-central1/models/frameworks-catboost/operations/244013138083577856
Model created. Resource name: projects/1026793852137/locations/us-central1/models/frameworks-catboost@2
To use this Model in another session:
model = aiplatform.Model('projects/1026793852137/locations/us-central1/models/frameworks-catboost@2')


### Create Endpoint

Check for existing endpoint:

In [589]:
vertex_endpoint = None
for endpoint in aiplatform.Endpoint.list(filter=f'display_name="{SERIES}"'):
    vertex_endpoint = endpoint
    break
vertex_endpoint

<google.cloud.aiplatform.models.Endpoint object at 0x7f44d342ee00> 
resource name: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800

Create endpoint if missing:

In [590]:
if not vertex_endpoint:
    vertex_endpoint = aiplatform.Endpoint.create(
        display_name = SERIES,
        labels = {'series': SERIES}   
    )
vertex_endpoint

<google.cloud.aiplatform.models.Endpoint object at 0x7f44d342ee00> 
resource name: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800

### Deploy Model: Default version with `main.py` - Plain

Get the latest model version with alias `plain`:

In [615]:
vertex_model = aiplatform.Model(model_name = SERIES, version = 'plain')
vertex_model.versioned_resource_name

'projects/1026793852137/locations/us-central1/models/frameworks-catboost@1'

In [616]:
vertex_model.deploy(
    endpoint = vertex_endpoint,
    traffic_percentage = 100,
    machine_type = 'n1-standard-4',
    min_replica_count = 1,
    max_replica_count = 2,
)

Deploying model to Endpoint : projects/1026793852137/locations/us-central1/endpoints/6108254384737484800
Deploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800/operations/2399548509733781504
Endpoint model deployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800


<google.cloud.aiplatform.models.Endpoint object at 0x7f44d342ee00> 
resource name: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800

### Test Predictions: Plain

In [618]:
vertex_endpoint.predict(instances = examples_np[0:1].tolist())

Prediction(predictions=[[0.01301489502678177, 0.9869851049732182]], deployed_model_id='597035913393995776', metadata=None, model_version_id='1', model_resource_name='projects/1026793852137/locations/us-central1/models/frameworks-catboost', explanations=None)

In [621]:
vertex_endpoint.predict(instances = examples_np[1:2].tolist()).predictions

[[0.4290084852039555, 0.5709915147960445]]

In [622]:
vertex_endpoint.predict(instances = examples_np[0:2].tolist()).predictions

[[0.01301489502678177, 0.9869851049732182],
 [0.4290084852039555, 0.5709915147960445]]

In [623]:
vertex_endpoint.predict(instances = examples_np.tolist()).predictions

[[0.01301489502678177, 0.9869851049732182],
 [0.4290084852039555, 0.5709915147960445],
 [0.009615226297050805, 0.9903847737029492],
 [0.001353055505500689, 0.9986469444944993],
 [0.9998299274177335, 0.0001700725822665225],
 [0.9993910308803445, 0.0006089691196554726],
 [0.007042458810377461, 0.9929575411896225],
 [0.01161404303347657, 0.9883859569665234],
 [0.01432546784992361, 0.9856745321500764],
 [0.005764301100520064, 0.9942356988994799]]

### Deploy Model: Version with `main2.py` - Formatted

In [624]:
vertex_model = aiplatform.Model(model_name = SERIES, version = 'formatted')
vertex_model.versioned_resource_name

'projects/1026793852137/locations/us-central1/models/frameworks-catboost@3'

In [625]:
vertex_model.deploy(
    endpoint = vertex_endpoint,
    traffic_percentage = 0,
    machine_type = 'n1-standard-4',
    min_replica_count = 1,
    max_replica_count = 2,
)

Deploying model to Endpoint : projects/1026793852137/locations/us-central1/endpoints/6108254384737484800
Deploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800/operations/1159369762346631168
Endpoint model deployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800


<google.cloud.aiplatform.models.Endpoint object at 0x7f44d342ee00> 
resource name: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800

### Shift Traffic To Model Version: Formatted

In [629]:
new_traffic_split = {}
for deployed_model in vertex_endpoint.list_models():
    if deployed_model.model_version_id == vertex_model.version_id:
        new_traffic_split[deployed_model.id] = 100
    else:
        new_traffic_split[deployed_model.id] = 0
new_traffic_split

{'597035913393995776': 0, '5545647478944038912': 100}

In [632]:
vertex_endpoint.update(traffic_split = new_traffic_split)

Updating Endpoint endpoint: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800
Endpoint endpoint updated. Resource name: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800


<google.cloud.aiplatform.models.Endpoint object at 0x7f44d342ee00> 
resource name: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800

In [633]:
vertex_endpoint.traffic_split

{'5545647478944038912': 100, '597035913393995776': 0}

### Test Predictions: Formatted

In [634]:
vertex_endpoint.predict(instances = examples_np[0:1].tolist())

Prediction(predictions=[{'classes': ['0', '1'], 'predicted_class': '1', 'scores': [0.01301489502678177, 0.9869851049732182]}], deployed_model_id='5545647478944038912', metadata=None, model_version_id='3', model_resource_name='projects/1026793852137/locations/us-central1/models/frameworks-catboost', explanations=None)

In [635]:
vertex_endpoint.predict(instances = examples_np[1:2].tolist()).predictions

[{'classes': ['0', '1'],
  'predicted_class': '1',
  'scores': [0.4290084852039555, 0.5709915147960445]}]

In [636]:
vertex_endpoint.predict(instances = examples_np[0:2].tolist()).predictions

[{'classes': ['0', '1'],
  'predicted_class': '1',
  'scores': [0.01301489502678177, 0.9869851049732182]},
 {'classes': ['0', '1'],
  'predicted_class': '1',
  'scores': [0.4290084852039555, 0.5709915147960445]}]

In [637]:
vertex_endpoint.predict(instances = examples_np.tolist()).predictions

[{'classes': ['0', '1'],
  'predicted_class': '1',
  'scores': [0.01301489502678177, 0.9869851049732182]},
 {'predicted_class': '1',
  'classes': ['0', '1'],
  'scores': [0.4290084852039555, 0.5709915147960445]},
 {'classes': ['0', '1'],
  'predicted_class': '1',
  'scores': [0.009615226297050805, 0.9903847737029492]},
 {'classes': ['0', '1'],
  'predicted_class': '1',
  'scores': [0.001353055505500689, 0.9986469444944993]},
 {'predicted_class': '0',
  'classes': ['0', '1'],
  'scores': [0.9998299274177335, 0.0001700725822665225]},
 {'classes': ['0', '1'],
  'predicted_class': '0',
  'scores': [0.9993910308803445, 0.0006089691196554726]},
 {'classes': ['0', '1'],
  'predicted_class': '1',
  'scores': [0.007042458810377461, 0.9929575411896225]},
 {'classes': ['0', '1'],
  'predicted_class': '1',
  'scores': [0.01161404303347657, 0.9883859569665234]},
 {'predicted_class': '1',
  'classes': ['0', '1'],
  'scores': [0.01432546784992361, 0.9856745321500764]},
 {'classes': ['0', '1'],
  'pre

### Shift Traffic To Model: Plain

In [638]:
new_traffic_split = {}
for deployed_model in vertex_endpoint.list_models():
    if deployed_model.model_version_id != vertex_model.version_id:
        new_traffic_split[deployed_model.id] = 100
    else:
        new_traffic_split[deployed_model.id] = 0
vertex_endpoint.update(traffic_split = new_traffic_split)

Updating Endpoint endpoint: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800
Endpoint endpoint updated. Resource name: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800


<google.cloud.aiplatform.models.Endpoint object at 0x7f44d342ee00> 
resource name: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800

In [639]:
vertex_endpoint.predict(instances = examples_np[0:2].tolist()).predictions

[[0.01301489502678177, 0.9869851049732182],
 [0.4290084852039555, 0.5709915147960445]]

### Undeploy Models Without Traffic

In [646]:
for deployed_model in vertex_endpoint.list_models():
    if vertex_endpoint.traffic_split[deployed_model.id] == 0:
        vertex_endpoint.undeploy(deployed_model_id = deployed_model.id)
vertex_endpoint.traffic_split

Undeploying Endpoint model: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800
Undeploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800/operations/6645880008390737920
Endpoint model undeployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800


{'597035913393995776': 100}

### Undeploy All Models

In [647]:
for deployed_model in vertex_endpoint.list_models():
    vertex_endpoint.undeploy(deployed_model_id = deployed_model.id)
vertex_endpoint.list_models()

Undeploying Endpoint model: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800
Undeploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800/operations/4585483178868736000
Endpoint model undeployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800


[]

### Delete Endpoint

In [648]:
vertex_endpoint.delete(force = True)

Deleting Endpoint : projects/1026793852137/locations/us-central1/endpoints/6108254384737484800
Endpoint deleted. . Resource name: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800
Deleting Endpoint resource: projects/1026793852137/locations/us-central1/endpoints/6108254384737484800
Delete Endpoint backing LRO: projects/1026793852137/locations/us-central1/operations/7868607307221827584
Endpoint resource projects/1026793852137/locations/us-central1/endpoints/6108254384737484800 deleted.


---
## Cloud Run

Deploy the model to [Cloud Run](https://cloud.google.com/run/docs/overview/what-is-cloud-run) using the same container build and tested above from Artifact Registry.

Some highlights for Cloud Run:
- Rapid scaling to handle requests
- Scale to zero (default) or other minimum if set
- Can handle larger input (request) and output (response) sizes
    - See [requests limits](https://cloud.google.com/run/quotas#request_limits)
- Configure [memory limits](https://cloud.google.com/run/docs/configuring/services/memory-limits) and [cpu limits](https://cloud.google.com/run/docs/configuring/services/cpu) and [concurrency](https://cloud.google.com/run/docs/configuring/concurrency) and [autoscaling](https://cloud.google.com/run/docs/about-instance-autoscaling) and [request timeout](https://cloud.google.com/run/docs/configuring/request-timeout)

### Deploy The Endpoint

In [548]:
parent = f"projects/{PROJECT_ID}/locations/{REGION}"
service = run_v2.Service()
#service.name = f"{parent}/services/{SERIES}"
service.template.containers = [
    run_v2.Container(
        image = build_response.artifacts.images[0],
        ports = [run_v2.ContainerPort(container_port = 8080)],
        env = [
            run_v2.EnvVar(name = 'AIP_HTTP_PORT', value = '8080'),
            run_v2.EnvVar(name = 'AIP_HEALTH_ROUTE', value = '/health'),
            run_v2.EnvVar(name = 'AIP_PREDICT_ROUTE', value = '/predict'),
            run_v2.EnvVar(name = 'AIP_STORAGE_URI', value = f'gs://{bucket.name}/{SERIES}/notebook'),
            run_v2.EnvVar(name = 'MODULE_NAME', value = 'main2')
        ],
        resources = run_v2.ResourceRequirements(
            limits = {"cpu": '8', "memory": '32Gi'}
        )
    )
]
service.ingress = run_v2.IngressTraffic.INGRESS_TRAFFIC_INTERNAL_ONLY

In [549]:
try:
    # create the service:
    run_response = cr.create_service(request = {"parent": parent, "service": service, "service_id": SERIES})
    # wait on the operation to complete:
    run_response.result()
    # print the name of the service
    print(f"Started Service: {run_response.metadata.name}")
except Exception as e:
    print(f"Error creating service: {e}")

Started Service: projects/statmike-mlops-349915/locations/us-central1/services/frameworks-catboost


In [550]:
run_response.metadata.uri

'https://frameworks-catboost-urlxi72dpa-uc.a.run.app'

### Permissions

The endpoint requires authentication.  Check ou tthe [Authentication Overview](https://cloud.google.com/run/docs/authenticating/overview) and in the case below the [Authenticating service-to-service](https://cloud.google.com/run/docs/authenticating/service-to-service) method is used by giving the same service account used to run the notebook and create the endpoint the role to invoke the endpoint as well.

In [551]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

In [552]:
run_response.metadata.name

'projects/statmike-mlops-349915/locations/us-central1/services/frameworks-catboost'

In [558]:
policy = cr.get_iam_policy(request = {'resource': run_response.metadata.name})
policy.bindings.add(
    role = 'roles/run.invoker',
    members = [f"serviceAccount:{SERVICE_ACCOUNT}", 'allUsers'] #'allUsers'
)
policy_response = cr.set_iam_policy(request = {"resource": run_response.metadata.name, "policy": policy})
print(f"IAM policy updated: {policy_response.bindings}")

IAM policy updated: [role: "roles/run.invoker"
members: "allUsers"
members: "serviceAccount:1026793852137-compute@developer.gserviceaccount.com"
]


In [559]:
policy.bindings

[role: "roles/run.invoker"
members: "serviceAccount:1026793852137-compute@developer.gserviceaccount.com"
, role: "roles/run.invoker"
members: "serviceAccount:1026793852137-compute@developer.gserviceaccount.com"
members: "allUsers"
]

**WAIT: The update of the IAM Policy might take a few moments to take effect.  Rerun the following health check section until you get a `200` response code.**

### Health Check

Want to see `200`:

In [560]:
def health(uri):
    url = f"{uri}/health"
    credentials, _ = google.auth.default()
    auth_req = google.auth.transport.requests.Request()
    credentials.refresh(auth_req)
    headers = {'Authorization': f'Bearer {credentials.token}'}
    response = requests.get(url, headers = headers)    
    return response.status_code

def check_health(uri, timeout_seconds = 200, retry_seconds = 10):
    start_time = time.time()
    while True:
        status_code = health(uri)
        if status_code == 200:
            break
        elapsed_time = time.time() - start_time
        if elapsed_time > timeout_seconds:
            break
        time.sleep(retry_seconds)
    return status_code

In [561]:
check_health(run_response.metadata.uri)

200

In [562]:
health(run_response.metadata.uri)

200

### Inference Test

In [563]:
def predict(instances):
    credentials, _ = google.auth.default()
    auth_req = google.auth.transport.requests.Request()
    credentials.refresh(auth_req)
    url = f"{run_response.metadata.uri}/predict"
    headers = {
        'Authorization': f'Bearer {credentials.token}',
        'Content_Type': 'application/json'
    }
    data = json.dumps({'instances': instances})
    response = requests.post(url, headers = headers, data = data)    
    return json.loads(response.text)

In [564]:
predict(examples_np[0:1].tolist())

{'predictions': [{'classes': ['0', '1'],
   'scores': [0.01301489502678177, 0.9869851049732182],
   'predicted_class': '1'}]}

In [565]:
predict(examples_np[1:2].tolist())

{'predictions': [{'classes': ['0', '1'],
   'scores': [0.4290084852039555, 0.5709915147960445],
   'predicted_class': '1'}]}

In [566]:
predict(examples_np[0:2].tolist())

{'predictions': [{'classes': ['0', '1'],
   'scores': [0.01301489502678177, 0.9869851049732182],
   'predicted_class': '1'},
  {'classes': ['0', '1'],
   'scores': [0.4290084852039555, 0.5709915147960445],
   'predicted_class': '1'}]}

In [567]:
predict(examples_np.tolist())

{'predictions': [{'classes': ['0', '1'],
   'scores': [0.01301489502678177, 0.9869851049732182],
   'predicted_class': '1'},
  {'classes': ['0', '1'],
   'scores': [0.4290084852039555, 0.5709915147960445],
   'predicted_class': '1'},
  {'classes': ['0', '1'],
   'scores': [0.009615226297050805, 0.9903847737029492],
   'predicted_class': '1'},
  {'classes': ['0', '1'],
   'scores': [0.0013530555055006888, 0.9986469444944993],
   'predicted_class': '1'},
  {'classes': ['0', '1'],
   'scores': [0.9998299274177335, 0.00017007258226652255],
   'predicted_class': '0'},
  {'classes': ['0', '1'],
   'scores': [0.9993910308803445, 0.0006089691196554726],
   'predicted_class': '0'},
  {'classes': ['0', '1'],
   'scores': [0.007042458810377461, 0.9929575411896225],
   'predicted_class': '1'},
  {'classes': ['0', '1'],
   'scores': [0.011614043033476573, 0.9883859569665234],
   'predicted_class': '1'},
  {'classes': ['0', '1'],
   'scores': [0.014325467849923612, 0.9856745321500764],
   'predicted

### Remove The Service

Cloud Run will scale to zero here since a minimum has not been set.  This notebook does proceed with deleting the service in the following code.

In [568]:
remove_response = cr.delete_service(request = {"name": run_response.metadata.name})

In [569]:
remove_response.result()

name: "projects/statmike-mlops-349915/locations/us-central1/services/frameworks-catboost"
uid: "6db08ce9-91fe-4300-9f34-4fa6c606412f"
generation: 2
create_time {
  seconds: 1727895279
  nanos: 18547000
}
update_time {
  seconds: 1727897716
  nanos: 248255000
}
delete_time {
  seconds: 1727897715
  nanos: 410411000
}
expire_time {
  seconds: 1730489715
  nanos: 410411000
}
creator: "1026793852137-compute@developer.gserviceaccount.com"
last_modifier: "1026793852137-compute@developer.gserviceaccount.com"
ingress: INGRESS_TRAFFIC_INTERNAL_ONLY
launch_stage: GA
template {
  scaling {
    max_instance_count: 100
  }
  timeout {
    seconds: 300
  }
  service_account: "1026793852137-compute@developer.gserviceaccount.com"
  containers {
    image: "us-central1-docker.pkg.dev/statmike-mlops-349915/frameworks-catboost/custom-container"
    env {
      name: "AIP_HTTP_PORT"
      value: "8080"
    }
    env {
      name: "AIP_HEALTH_ROUTE"
      value: "/health"
    }
    env {
      name: "AIP_P