![tracker](https://us-central1-vertex-ai-mlops-369716.cloudfunctions.net/pixel-tracking?path=statmike%2Fvertex-ai-mlops%2FMLOps%2FServing&file=Understanding+Prediction+IO+With+FastAPI.ipynb)
<!--- header table --->
<table align="left">
  <td style="text-align: center">
    <a href="https://colab.research.google.com/github/statmike/vertex-ai-mlops/blob/main/MLOps/Serving/Understanding%20Prediction%20IO%20With%20FastAPI.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/colab-logo-32px.png" alt="Google Colaboratory logo">
      <br>Run in<br>Colab
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/colab/import/https%3A%2F%2Fraw.githubusercontent.com%2Fstatmike%2Fvertex-ai-mlops%2Fmain%2FMLOps%2FServing%2FUnderstanding%2520Prediction%2520IO%2520With%2520FastAPI.ipynb">
      <img width="32px" src="https://lh3.googleusercontent.com/JmcxdQi-qOpctIvWKgPtrzZdJJK-J3sWE1RsfjZNwshCFgE_9fULcNpuXYTilIR2hjwN" alt="Google Cloud Colab Enterprise logo">
      <br>Run in<br>Colab Enterprise
    </a>
  </td>      
  <td style="text-align: center">
    <a href="https://github.com/statmike/vertex-ai-mlops/blob/main/MLOps/Serving/Understanding%20Prediction%20IO%20With%20FastAPI.ipynb">
      <img src="https://cloud.google.com/ml-engine/images/github-logo-32px.png" alt="GitHub logo">
      <br>View on<br>GitHub
    </a>
  </td>
  <td style="text-align: center">
    <a href="https://console.cloud.google.com/vertex-ai/workbench/deploy-notebook?download_url=https://raw.githubusercontent.com/statmike/vertex-ai-mlops/main/MLOps/Serving/Understanding%20Prediction%20IO%20With%20FastAPI.ipynb">
      <img src="https://lh3.googleusercontent.com/UiNooY4LUgW_oTvpsNhPpQzsstV5W8F7rYgxgGBD85cWJoLmrOzhVs_ksK_vgx40SHs7jCqkTkCk=e14-rj-sc0xffffff-h130-w32" alt="Vertex AI logo">
      <br>Open in<br>Vertex AI Workbench
    </a>
  </td>
</table>

# Understanding Prediction IO With FastAPI

Use a simple FastAPI implementation to explore the processing of different input instance formats on:
- Local Testing of Container
- Vertex AI Endpoints
- Cloud Run Endpoints
- Vetex AI Batch Prediction

Inputs to try:
- [Vertex AI **Online Prediction** Instance Formats](https://cloud.google.com/vertex-ai/docs/predictions/get-online-predictions#formatting-prediction-input)
    - [Custom container requirements for predictions](https://cloud.google.com/vertex-ai/docs/predictions/custom-container-requirements#prediction)
- [Vertex AI **Batch Prediction** Instance Formats](https://cloud.google.com/vertex-ai/docs/predictions/get-batch-predictions#input_data_requirements)

Notes On Prediction Inputs:
- This method builds a custom container with FastAPI application the repeats the input instances and outputs.  This is for demonstration of how serving options work: local, Vertex AI Batch and Online Endpoints, Cloud Run
- Inputs are formated as JSON, a requirement for customer containers, and the default for Vertex AI Batch prediction

---
## Colab Setup

To run this notebook in Colab run the cells in this section.  Otherwise, skip this section.

This cell will authenticate to GCP (follow prompts in the popup).

In [1]:
PROJECT_ID = 'statmike-mlops-349915' # replace with project ID

In [2]:
try:
    import google.colab
    from google.colab import auth
    auth.authenticate_user()
    !gcloud config set project {PROJECT_ID}
except Exception:
    pass

---
## Installs

The list `packages` contains tuples of package import names and install names.  If the import name is not found then the install name is used to install quitely for the current user.

In [293]:
# tuples of (import name, install name, min_version)
packages = [
    ('docker', 'docker'),
    ('google.cloud.aiplatform', 'google-cloud-aiplatform'),
    ('google.cloud.storage', 'google-cloud-storage'),
    ('google.cloud.bigquery', 'google-cloud-bigquery'),
    ('google.cloud.artifactregistry_v1', 'google-cloud-artifact-registry'),
    ('google.cloud.devtools', 'google-cloud-build'),
    ('google.cloud.run_v2', 'google-cloud-run'), 
    ('tensorflow', 'tensorflow')
]

import importlib
install = False
for package in packages:
    if not importlib.util.find_spec(package[0]):
        print(f'installing package {package[1]}')
        install = True
        !pip install {package[1]} -U -q --user
    elif len(package) == 3:
        if importlib.metadata.version(package[0]) < package[2]:
            print(f'updating package {package[1]}')
            install = True
            !pip install {package[1]} -U -q --user

### API Enablement

In [4]:
!gcloud services enable artifactregistry.googleapis.com
!gcloud services enable cloudbuild.googleapis.com
!gcloud services enable run.googleapis.com

### Restart Kernel (If Installs Occured)

After a kernel restart the code submission can start with the next cell after this one.

In [5]:
if install:
    import IPython
    app = IPython.Application.instance()
    app.kernel.do_shutdown(True)
    IPython.display.display(IPython.display.Markdown("""<div class=\"alert alert-block alert-warning\">
        <b>⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. The previous cells do not need to be run again⚠️</b>
        </div>"""))

---
## Setup

inputs:

In [6]:
project = !gcloud config get-value project
PROJECT_ID = project[0]
PROJECT_ID

'statmike-mlops-349915'

In [218]:
REGION = 'us-central1'
SERIES = 'mlops-serving'
EXPERIMENT = 'understand-io'

# GCS Bucket to Use In this workflow
GCS_BUCKET = PROJECT_ID

# make this the BigQuery Project / Dataset / Table prefix to store results
BQ_PROJECT = PROJECT_ID
BQ_DATASET = SERIES.replace('-', '_')
BQ_TABLE = EXPERIMENT
BQ_REGION = REGION[0:2]

packages:

In [304]:
import json, os, base64, csv, io, tempfile, time
import requests

import docker
import tensorflow as tf

import google.auth
from google.cloud import storage
from google.cloud import bigquery
from google.cloud import artifactregistry_v1
from google.cloud.devtools import cloudbuild_v1
from google.cloud import run_v2
from google.cloud import aiplatform

clients:

In [217]:
# gcs storage client
gcs = storage.Client(project = GCS_BUCKET)
bucket = gcs.bucket(GCS_BUCKET)

# bigquery client
bq = bigquery.Client(project = PROJECT_ID)

# cloud build client
cb = cloudbuild_v1.CloudBuildClient()

# artifact registry client
ar = artifactregistry_v1.ArtifactRegistryClient()

# cloud run client
cr = run_v2.ServicesClient()

# vertex ai client
aiplatform.init(project = PROJECT_ID, location = REGION)

Parameters:

In [10]:
DIR = f"files/{EXPERIMENT}"

Environment:

In [11]:
if not os.path.exists(DIR):
    os.makedirs(DIR)

---
## Build A Custom Prediction Container

It is really not all that hard with Python!

For this example [FastAPI](https://fastapi.tiangolo.com/) is used.

This process uses docker to build a custom container and then runs the container locally, on Cloud Run, and Vertex AI Endpoints.

This could be done locally with Docker and pushed to Artifact Registry before deployment to Cloud Run.  The process below assumes that docker is not available locally and uses Cloud Build to both build and push the resulting container to Artifact Registry.

---
### Setup Artifact Registry

[Artifact registry](https://cloud.google.com/artifact-registry/docs) organizes artifacts with repositories.  Each repository contains packages and is designated to hold a partifcular format of package: Docker images, Python Packages and [others](https://cloud.google.com/artifact-registry/docs/supported-formats#package).

#### List Repositories

This may be empty if no repositories have been created for this project

In [12]:
for repo in ar.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    print(repo.name)

projects/statmike-mlops-349915/locations/us-central1/repositories/frameworks
projects/statmike-mlops-349915/locations/us-central1/repositories/frameworks-catboost
projects/statmike-mlops-349915/locations/us-central1/repositories/gcf-artifacts
projects/statmike-mlops-349915/locations/us-central1/repositories/mlops
projects/statmike-mlops-349915/locations/us-central1/repositories/mlops-serving
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-docker
projects/statmike-mlops-349915/locations/us-central1/repositories/statmike-mlops-349915-python


#### Create/Retrieve Docker Image Repository

Create an Artifact Registry Repository to hold Docker Images created by this notebook.  First, check to see if it is already created by a previous run and retrieve it if it has.  Otherwise, create one named for this project.

In [13]:
docker_repo = None
for repo in ar.list_repositories(parent = f'projects/{PROJECT_ID}/locations/{REGION}'):
    if f'{SERIES}' == repo.name.split('/')[-1]:
        docker_repo = repo
        print(f'Retrieved existing repo: {docker_repo.name}')

if not docker_repo:
    operation = ar.create_repository(
        request = artifactregistry_v1.CreateRepositoryRequest(
            parent = f'projects/{PROJECT_ID}/locations/{REGION}',
            repository_id = f'{SERIES}',
            repository = artifactregistry_v1.Repository(
                description = f'A repository for the {SERIES} series that holds docker images.',
                name = f'{SERIES}',
                format_ = artifactregistry_v1.Repository.Format.DOCKER,
                labels = {'series': SERIES}
            )
        )
    )
    print('Creating Repository ...')
    docker_repo = operation.result()
    print(f'Completed creating repo: {docker_repo.name}')

Retrieved existing repo: projects/statmike-mlops-349915/locations/us-central1/repositories/mlops-serving


In [14]:
docker_repo.name, docker_repo.format_.name

('projects/statmike-mlops-349915/locations/us-central1/repositories/mlops-serving',
 'DOCKER')

In [15]:
REPOSITORY = f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{docker_repo.name.split('/')[-1]}"

In [16]:
REPOSITORY

'us-central1-docker.pkg.dev/statmike-mlops-349915/mlops-serving'

---
### Create Application Files

```
|__ Dockerfile
|__ requirements.txt
|__ app
    |__ __init__.py
    |__ main.py
    |__ prestart.sh
```

In [17]:
if not os.path.exists(DIR + '/source/app'):
    os.makedirs(DIR + '/source/app')

In [18]:
%%writefile {DIR}/source/Dockerfile
FROM tiangolo/uvicorn-gunicorn-fastapi:python3.9

COPY ./app /app
COPY ./requirements.txt requirements.txt

RUN pip install --no-cache-dir --upgrade pip \
  && pip install --no-cache-dir -r requirements.txt

Overwriting files/understand-io/source/Dockerfile


In [19]:
%%writefile {DIR}/source/requirements.txt
numpy

Overwriting files/understand-io/source/requirements.txt


In [20]:
%%writefile {DIR}/source/app/__init__.py
# init file

Overwriting files/understand-io/source/app/__init__.py


In [21]:
%%writefile {DIR}/source/app/main.py
# a simple passthrough of instance to predictions

# packages
import os
from fastapi import FastAPI, Request
import numpy as np

# clients
app = FastAPI()

# Define function for health route
@app.get(os.environ['AIP_HEALTH_ROUTE'], status_code=200)
def health():
    return {}

# Define function for prediction route
@app.post(os.environ['AIP_PREDICT_ROUTE'])
async def predict(request: Request):
    # await the request
    body = await request.json()
    
    # parse the request
    instances = body["instances"]
    
    # return the received inputs as the "predictions" - a simple pass through
    predictions = instances

    # this returns just the predicted probabilities:
    return {"predictions": predictions}

Overwriting files/understand-io/source/app/main.py


In [22]:
%%writefile {DIR}/source/app/prestart.sh
#!/bin/bash
export PORT=$AIP_HTTP_PORT

Overwriting files/understand-io/source/app/prestart.sh


In [23]:
bucket.blob(f'{SERIES}/{EXPERIMENT}/source/Dockerfile').upload_from_filename(f'{DIR}/source/Dockerfile')
bucket.blob(f'{SERIES}/{EXPERIMENT}/source/requirements.txt').upload_from_filename(f'{DIR}/source/requirements.txt')
bucket.blob(f'{SERIES}/{EXPERIMENT}/source/app/__init__.py').upload_from_filename(f'{DIR}/source/app/__init__.py')
bucket.blob(f'{SERIES}/{EXPERIMENT}/source/app/main.py').upload_from_filename(f'{DIR}/source/app/main.py')
bucket.blob(f'{SERIES}/{EXPERIMENT}/source/app/prestart.sh').upload_from_filename(f'{DIR}/source/app/prestart.sh')

In [24]:
list(bucket.list_blobs(prefix = f'{SERIES}/{EXPERIMENT}/source'))

[<Blob: statmike-mlops-349915, mlops-serving/understand-io/source/Dockerfile, 1741088729360795>,
 <Blob: statmike-mlops-349915, mlops-serving/understand-io/source/app/__init__.py, 1741088729633827>,
 <Blob: statmike-mlops-349915, mlops-serving/understand-io/source/app/main.py, 1741088729701223>,
 <Blob: statmike-mlops-349915, mlops-serving/understand-io/source/app/prestart.sh, 1741088729770839>,
 <Blob: statmike-mlops-349915, mlops-serving/understand-io/source/requirements.txt, 1741088729539287>]

---
### Build Application Container

Use the Cloud Build client to construct and run the build instructions. Here the files collected in GCS are copied to the build instance, then the Docker build is run in the folder with the `Dockerfile`. The resulting image is pushed to Artifact Registry (setup above).

In [25]:
# setup the build config with empty list of steps - these will be added sequentially
build = cloudbuild_v1.Build(
    steps = []
)
# retrieve the source
build.steps.append(
    {
        'name': 'gcr.io/cloud-builders/gsutil',
        'args': ['cp', '-r', f'gs://{GCS_BUCKET}/{SERIES}/{EXPERIMENT}/source/*', '/workspace']
    }
)
# docker build
build.steps.append(
    {
        'name': 'gcr.io/cloud-builders/docker',
        'args': ['build', '-t', f'{REPOSITORY}/{EXPERIMENT}', '/workspace']
    }    
)
# docker push
build.images = [f"{REPOSITORY}/{EXPERIMENT}"]

In [26]:
operation = cb.create_build(
    project_id = PROJECT_ID,
    build = build
)

In [27]:
build_response = operation.result()
build_response.status, build_response.artifacts

(<Status.SUCCESS: 3>,
 images: "us-central1-docker.pkg.dev/statmike-mlops-349915/mlops-serving/understand-io")

In [28]:
build_response.artifacts.images[0]

'us-central1-docker.pkg.dev/statmike-mlops-349915/mlops-serving/understand-io'

---
## Example Instances

In [29]:
b64_string = base64.b64encode(b'something to encode as binary here').decode('utf-8')
b64_string

'c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ=='

In [77]:
example_instances = [
    1,
    2,
    3,
    4,
    1.24578001,
    'a',
    'A',
    {'key':'value'},
    [1],
    [1, 2, 3, 4],
    [[1, 2], [3, 4]],
    [[[1, 2], [3, 4]]],
    [{'key':'value'}],
    [{'key':'value'}, {'key':'value2'}],
    ['gs://bucket/path/to/image/image1.jpg'],
    b64_string,
    [b64_string],
    {'b64': b64_string}
]

In [78]:
example_instances

[1,
 2,
 3,
 4,
 1.24578001,
 'a',
 'A',
 {'key': 'value'},
 [1],
 [1, 2, 3, 4],
 [[1, 2], [3, 4]],
 [[[1, 2], [3, 4]]],
 [{'key': 'value'}],
 [{'key': 'value'}, {'key': 'value2'}],
 ['gs://bucket/path/to/image/image1.jpg'],
 'c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ==',
 ['c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ=='],
 {'b64': 'c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ=='}]

---
## Test Locally

If Docker is installed and running locally then use it to test the image.

In [83]:
try:
    local_test = True
    docker_client = docker.from_env()
    if docker_client.ping():
        print(f"Docker is installed and running. Version: {docker_client.version()['Version']}")
except Exception as e:
    local_test = False
    print('Docker is either not installed or not running - please fix before proceeding.\nhttps://docs.docker.com/engine/install/')

Docker is installed and running. Version: 20.10.17


### Pull and Run Container

Run the container image with:
- ports: inside 8080 mapped to outside 80
- set environment variables for:
    - `AIP_HTTP_PORT` is `8080`
    - `AIP_HEALTH_ROUTE` is `/health`
    - `AIP_PREDICT_ROUTE` is `/predict`
    - `AIP_STORAGE_URI` is the `gs://bucket/path/to/folder`
    - `MODULE_NAME` is 'main'
        - this actually defaults to main so is not required
        - an alternative script with different prediction output is created in `main2.py` above
        - use this environment variable to start the container using the alternative script in module `main2`
        - see the [FastAPI Docker Image Advanced Usage](https://github.com/tiangolo/uvicorn-gunicorn-fastapi-docker?tab=readme-ov-file#advanced-usage) details

In [84]:
if local_test:
    # make sure any prior runs are stopped:
    try:
        container = docker_client.containers.get('local-run')
        container.stop()
        container.remove()
    except docker.errors.NotFound:
        pass
    
    # get image:
    image_uri = build_response.artifacts.images[0]
    try:
        local_image = docker_client.images.get(image_uri)
        remote_image = docker_client.images.pull(image_uri)
        if local_image.id != remote_image.id:
            print('New image found, updating ...')
            local_image = remote_image
        else:
            print('Using existing image ...')
    except docker.errors.ImageNotFound:
        print('Pulling image ...')
        local_image = docker_client.images.pull(image_uri)
        
    # run container:
    print('Starting container ...')
    container = docker_client.containers.run(
        image = image_uri,
        detach = True,
        ports = {'8080/tcp':80}, # Map inside:outside (where docker run -p is outside:inside)
        name = 'local-run',
        environment = {
            'AIP_HTTP_PORT': '8080',
            'AIP_HEALTH_ROUTE': '/health',
            'AIP_PREDICT_ROUTE': '/predict',
            'AIP_STORAGE_URI': f'gs://{bucket.name}/{SERIES}/catboost-overview',
            'MODULE_NAME': 'main' # try main2 for alternative output
        }
    )
    print('Container ready.\n\tUse `container.logs()` to view startup logs.')

Using existing image ...
Starting container ...
Container ready.
	Use `container.logs()` to view startup logs.


In [85]:
time.sleep(5) # wait a few seconds!

In [86]:
#container.logs()

### Health Check

Want to see `200`:

In [87]:
if local_test:
    response = requests.get(f"http://localhost:80/health")
    print(response.status_code)

200


### Inference Test

In [88]:
def predict(instances):
    url = f"http://localhost:80/predict"
    headers = {'Content_Type': 'application/json'}
    data = json.dumps({'instances': instances})
    response = requests.post(url, headers = headers, data = data)    
    return json.loads(response.text)

#### Try Each Example Instance:

In [89]:
for example in example_instances:
    print('The example instance is: ', example)
    print('The prediction response is: ', predict([example]))
    print(f"The prediction for the instance is: {predict([example])['predictions'][0]}\n")

The example instance is:  1
The prediction response is:  {'predictions': [1]}
The prediction for the instance is: 1

The example instance is:  2
The prediction response is:  {'predictions': [2]}
The prediction for the instance is: 2

The example instance is:  3
The prediction response is:  {'predictions': [3]}
The prediction for the instance is: 3

The example instance is:  4
The prediction response is:  {'predictions': [4]}
The prediction for the instance is: 4

The example instance is:  1.24578001
The prediction response is:  {'predictions': [1.24578001]}
The prediction for the instance is: 1.24578001

The example instance is:  a
The prediction response is:  {'predictions': ['a']}
The prediction for the instance is: a

The example instance is:  A
The prediction response is:  {'predictions': ['A']}
The prediction for the instance is: A

The example instance is:  {'key': 'value'}
The prediction response is:  {'predictions': [{'key': 'value'}]}
The prediction for the instance is: {'key'

#### Try All Example Instances:

In [90]:
predict(example_instances)

{'predictions': [1,
  2,
  3,
  4,
  1.24578001,
  'a',
  'A',
  {'key': 'value'},
  [1],
  [1, 2, 3, 4],
  [[1, 2], [3, 4]],
  [[[1, 2], [3, 4]]],
  [{'key': 'value'}],
  [{'key': 'value'}, {'key': 'value2'}],
  ['gs://bucket/path/to/image/image1.jpg'],
  'c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ==',
  ['c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ=='],
  {'b64': 'c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ=='}]}

### Stop Container

In [91]:
container.name

'local-run'

In [92]:
container = docker_client.containers.get(container.name)

In [93]:
container.status

'running'

In [94]:
container.stop()
container.remove()

---
## Vertex AI Prediction Endpoint

Register the model in the [Vertex AI Model Registry](https://cloud.google.com/vertex-ai/docs/model-registry/introduction) and [Deploy it to an endpoint](https://cloud.google.com/vertex-ai/docs/general/deployment).

### Model Registry

Check for existing version of the model:

In [121]:
parent_model = ''
for model in aiplatform.Model.list(filter=f'display_name="{SERIES}"'):
    parent_model = model.resource_name
    break
parent_model

'projects/1026793852137/locations/us-central1/models/mlops-serving'

### Upload Model To Registry As New Version

In [122]:
vertex_model = aiplatform.Model.upload(
    display_name = SERIES,
    model_id = SERIES,
    parent_model = parent_model,
    serving_container_image_uri = build_response.artifacts.images[0],
    serving_container_environment_variables = {'MODULE_NAME': 'main'},
    serving_container_predict_route = "/predict",
    serving_container_health_route = "/health",
    artifact_uri = f'gs://{bucket.name}/{SERIES}/{EXPERIMENT}',
    is_default_version = True,
    version_aliases = [f'{EXPERIMENT}'],
    version_description = EXPERIMENT,
    labels = {'series': SERIES, 'experiment': EXPERIMENT}
)

Creating Model
Create Model backing LRO: projects/1026793852137/locations/us-central1/models/mlops-serving/operations/3455690382388494336
Model created. Resource name: projects/1026793852137/locations/us-central1/models/mlops-serving@4
To use this Model in another session:
model = aiplatform.Model('projects/1026793852137/locations/us-central1/models/mlops-serving@4')


### Create Endpoint

Check for existing endpoint:

In [123]:
vertex_endpoint = None
for endpoint in aiplatform.Endpoint.list(filter=f'display_name="{SERIES}"'):
    vertex_endpoint = endpoint
    break
vertex_endpoint

Create endpoint if missing:

In [124]:
if not vertex_endpoint:
    vertex_endpoint = aiplatform.Endpoint.create(
        display_name = SERIES,
        labels = {'series': SERIES}   
    )
vertex_endpoint

Creating Endpoint
Create Endpoint backing LRO: projects/1026793852137/locations/us-central1/endpoints/7652043476925677568/operations/6047511967940214784
Endpoint created. Resource name: projects/1026793852137/locations/us-central1/endpoints/7652043476925677568
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/1026793852137/locations/us-central1/endpoints/7652043476925677568')


<google.cloud.aiplatform.models.Endpoint object at 0x7f50d114a800> 
resource name: projects/1026793852137/locations/us-central1/endpoints/7652043476925677568

### Deploy Model: Default version with `main.py`

Get the latest model version with alias that is the same as the variable `EXPERIMENT`:

In [125]:
vertex_model = aiplatform.Model(model_name = SERIES, version = f'{EXPERIMENT}')
vertex_model.versioned_resource_name

'projects/1026793852137/locations/us-central1/models/mlops-serving@4'

In [126]:
vertex_model.deploy(
    endpoint = vertex_endpoint,
    traffic_percentage = 100,
    machine_type = 'n1-standard-4',
    min_replica_count = 1,
    max_replica_count = 2,
)

Deploying model to Endpoint : projects/1026793852137/locations/us-central1/endpoints/7652043476925677568
Deploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/7652043476925677568/operations/636437015654563840
Endpoint model deployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/7652043476925677568


<google.cloud.aiplatform.models.Endpoint object at 0x7f50d114a800> 
resource name: projects/1026793852137/locations/us-central1/endpoints/7652043476925677568

### Inference Test

#### Try Each Example Instance:

In [129]:
for example in example_instances:
    print('The example instance is: ', example)
    print('The prediction response is: ', vertex_endpoint.predict([example]))
    print(f"The prediction for the instance is: {vertex_endpoint.predict([example]).predictions[0]}\n")

The example instance is:  1
The prediction response is:  Prediction(predictions=[1.0], deployed_model_id='181301770838867968', metadata=None, model_version_id='4', model_resource_name='projects/1026793852137/locations/us-central1/models/mlops-serving', explanations=None)
The prediction for the instance is: 1.0

The example instance is:  2
The prediction response is:  Prediction(predictions=[2.0], deployed_model_id='181301770838867968', metadata=None, model_version_id='4', model_resource_name='projects/1026793852137/locations/us-central1/models/mlops-serving', explanations=None)
The prediction for the instance is: 2.0

The example instance is:  3
The prediction response is:  Prediction(predictions=[3.0], deployed_model_id='181301770838867968', metadata=None, model_version_id='4', model_resource_name='projects/1026793852137/locations/us-central1/models/mlops-serving', explanations=None)
The prediction for the instance is: 3.0

The example instance is:  4
The prediction response is:  Pred

#### Try All Example Instances:

In [130]:
vertex_endpoint.predict(example_instances).predictions

[1.0,
 2.0,
 3.0,
 4.0,
 1.24578001,
 'a',
 'A',
 {'key': 'value'},
 [1.0],
 [1.0, 2.0, 3.0, 4.0],
 [[1.0, 2.0], [3.0, 4.0]],
 [[[1.0, 2.0], [3.0, 4.0]]],
 [{'key': 'value'}],
 [{'key': 'value'}, {'key': 'value2'}],
 ['gs://bucket/path/to/image/image1.jpg'],
 'c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ==',
 ['c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ=='],
 {'b64': 'c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ=='}]

### Undeploy All Models

In [131]:
for deployed_model in vertex_endpoint.list_models():
    vertex_endpoint.undeploy(deployed_model_id = deployed_model.id)
vertex_endpoint.list_models()

Undeploying Endpoint model: projects/1026793852137/locations/us-central1/endpoints/7652043476925677568
Undeploy Endpoint model backing LRO: projects/1026793852137/locations/us-central1/endpoints/7652043476925677568/operations/3000263870070652928
Endpoint model undeployed. Resource name: projects/1026793852137/locations/us-central1/endpoints/7652043476925677568


[]

### Delete Endpoint

In [132]:
vertex_endpoint.delete(force = True)

Deleting Endpoint : projects/1026793852137/locations/us-central1/endpoints/7652043476925677568
Endpoint deleted. . Resource name: projects/1026793852137/locations/us-central1/endpoints/7652043476925677568
Deleting Endpoint resource: projects/1026793852137/locations/us-central1/endpoints/7652043476925677568
Delete Endpoint backing LRO: projects/1026793852137/locations/us-central1/operations/474307429069225984
Endpoint resource projects/1026793852137/locations/us-central1/endpoints/7652043476925677568 deleted.


---
## Cloud Run

Deploy the model to [Cloud Run](https://cloud.google.com/run/docs/overview/what-is-cloud-run) using the same container build and tested above from Artifact Registry.

Some highlights for Cloud Run:
- Rapid scaling to handle requests
- Scale to zero (default) or other minimum if set
- Can handle larger input (request) and output (response) sizes
    - See [requests limits](https://cloud.google.com/run/quotas#request_limits)
- Configure [memory limits](https://cloud.google.com/run/docs/configuring/services/memory-limits) and [cpu limits](https://cloud.google.com/run/docs/configuring/services/cpu) and [concurrency](https://cloud.google.com/run/docs/configuring/concurrency) and [autoscaling](https://cloud.google.com/run/docs/about-instance-autoscaling) and [request timeout](https://cloud.google.com/run/docs/configuring/request-timeout)

### Deploy The Endpoint

In [98]:
parent = f"projects/{PROJECT_ID}/locations/{REGION}"
service = run_v2.Service()
#service.name = f"{parent}/services/{SERIES}"
service.template.containers = [
    run_v2.Container(
        image = build_response.artifacts.images[0],
        ports = [run_v2.ContainerPort(container_port = 8080)],
        env = [
            run_v2.EnvVar(name = 'AIP_HTTP_PORT', value = '8080'),
            run_v2.EnvVar(name = 'AIP_HEALTH_ROUTE', value = '/health'),
            run_v2.EnvVar(name = 'AIP_PREDICT_ROUTE', value = '/predict'),
            run_v2.EnvVar(name = 'AIP_STORAGE_URI', value = f'gs://{bucket.name}/{SERIES}/{EXPERIMENT}'),
            run_v2.EnvVar(name = 'MODULE_NAME', value = 'main')
        ],
        resources = run_v2.ResourceRequirements(
            limits = {"cpu": '8', "memory": '32Gi'}
        )
    )
]
service.ingress = run_v2.IngressTraffic.INGRESS_TRAFFIC_INTERNAL_ONLY

In [99]:
try:
    # create the service:
    run_response = cr.create_service(request = {"parent": parent, "service": service, "service_id": SERIES})
    # wait on the operation to complete:
    run_response.result()
    # print the name of the service
    print(f"Started Service: {run_response.metadata.name}")
except Exception as e:
    print(f"Error creating service: {e}")

Started Service: projects/statmike-mlops-349915/locations/us-central1/services/mlops-serving


In [100]:
run_response.metadata.uri

'https://mlops-serving-urlxi72dpa-uc.a.run.app'

### Permissions

The endpoint requires authentication.  Check ou tthe [Authentication Overview](https://cloud.google.com/run/docs/authenticating/overview) and in the case below the [Authenticating service-to-service](https://cloud.google.com/run/docs/authenticating/service-to-service) method is used by giving the same service account used to run the notebook and create the endpoint the role to invoke the endpoint as well.

In [101]:
SERVICE_ACCOUNT = !gcloud config list --format='value(core.account)' 
SERVICE_ACCOUNT = SERVICE_ACCOUNT[0]
SERVICE_ACCOUNT

'1026793852137-compute@developer.gserviceaccount.com'

In [102]:
run_response.metadata.name

'projects/statmike-mlops-349915/locations/us-central1/services/mlops-serving'

In [103]:
policy = cr.get_iam_policy(request = {'resource': run_response.metadata.name})
policy.bindings.add(
    role = 'roles/run.invoker',
    members = [f"serviceAccount:{SERVICE_ACCOUNT}", 'allUsers'] #'allUsers'
)
policy_response = cr.set_iam_policy(request = {"resource": run_response.metadata.name, "policy": policy})
print(f"IAM policy updated: {policy_response.bindings}")

IAM policy updated: [role: "roles/run.invoker"
members: "allUsers"
members: "serviceAccount:1026793852137-compute@developer.gserviceaccount.com"
]


In [104]:
policy.bindings

[role: "roles/run.invoker"
members: "serviceAccount:1026793852137-compute@developer.gserviceaccount.com"
members: "allUsers"
]

**WAIT: The update of the IAM Policy might take a few moments to take effect.  Rerun the following health check section until you get a `200` response code.**

### Health Check

Want to see `200`:

In [105]:
def health(uri):
    url = f"{uri}/health"
    credentials, _ = google.auth.default()
    auth_req = google.auth.transport.requests.Request()
    credentials.refresh(auth_req)
    headers = {'Authorization': f'Bearer {credentials.token}'}
    response = requests.get(url, headers = headers)    
    return response.status_code

def check_health(uri, timeout_seconds = 200, retry_seconds = 10):
    start_time = time.time()
    while True:
        status_code = health(uri)
        if status_code == 200:
            break
        elapsed_time = time.time() - start_time
        if elapsed_time > timeout_seconds:
            break
        time.sleep(retry_seconds)
    return status_code

In [106]:
check_health(run_response.metadata.uri)

200

In [107]:
health(run_response.metadata.uri)

200

### Inference Test

In [108]:
def predict(instances):
    credentials, _ = google.auth.default()
    auth_req = google.auth.transport.requests.Request()
    credentials.refresh(auth_req)
    url = f"{run_response.metadata.uri}/predict"
    headers = {
        'Authorization': f'Bearer {credentials.token}',
        'Content_Type': 'application/json'
    }
    data = json.dumps({'instances': instances})
    response = requests.post(url, headers = headers, data = data)    
    return json.loads(response.text)

#### Try Each Example Instance:

In [109]:
for example in example_instances:
    print('The example instance is: ', example)
    print('The prediction response is: ', predict([example]))
    print(f"The prediction for the instance is: {predict([example])['predictions'][0]}\n")

The example instance is:  1
The prediction response is:  {'predictions': [1]}
The prediction for the instance is: 1

The example instance is:  2
The prediction response is:  {'predictions': [2]}
The prediction for the instance is: 2

The example instance is:  3
The prediction response is:  {'predictions': [3]}
The prediction for the instance is: 3

The example instance is:  4
The prediction response is:  {'predictions': [4]}
The prediction for the instance is: 4

The example instance is:  1.24578001
The prediction response is:  {'predictions': [1.24578001]}
The prediction for the instance is: 1.24578001

The example instance is:  a
The prediction response is:  {'predictions': ['a']}
The prediction for the instance is: a

The example instance is:  A
The prediction response is:  {'predictions': ['A']}
The prediction for the instance is: A

The example instance is:  {'key': 'value'}
The prediction response is:  {'predictions': [{'key': 'value'}]}
The prediction for the instance is: {'key'

#### Try All Example Instances:

In [110]:
predict(example_instances)

{'predictions': [1,
  2,
  3,
  4,
  1.24578001,
  'a',
  'A',
  {'key': 'value'},
  [1],
  [1, 2, 3, 4],
  [[1, 2], [3, 4]],
  [[[1, 2], [3, 4]]],
  [{'key': 'value'}],
  [{'key': 'value'}, {'key': 'value2'}],
  ['gs://bucket/path/to/image/image1.jpg'],
  'c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ==',
  ['c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ=='],
  {'b64': 'c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ=='}]}

### Remove The Service

Cloud Run will scale to zero here since a minimum has not been set.  This notebook does proceed with deleting the service in the following code.

In [111]:
remove_response = cr.delete_service(request = {"name": run_response.metadata.name})

In [113]:
#remove_response.result()

---
## Vertex AI Batch Prediction

This section shows how to use the same application built above with the Vertex AI Batch Prediction service.

The approaches above use the hosted FastAPI application (local, Cloud Run, Vertex AI Endpoints) to serve on-demand prediction request.  For batch workflows the data to be inferenced is saved in an input format and then processed by the Vertex AI Batch Prediction service into an output destination.  Input instances are all converted to JSON before being sent to the prediction container - just like we did with the online serving example above.  

Each possible file format for the service is covered here to help understand the data processing pipeline parts of the service.

**References:**

- SDK Link For Creating Batch Prediction Job From Model Reference: [`google.cloud.aiplatform.Model.batch_predict()`](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.Model#google_cloud_aiplatform_Model_batch_predict)
- SDK Link For Creating Batch Predictions Jobs Directly: [`google.cloud.aiplatform.BatchPredictionJob.create()`](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.BatchPredictionJob#google_cloud_aiplatform_BatchPredictionJob_create)
- The result (returned valued) is an object of type: [`google.cloud.aiplatform.BatchPredictionJob`](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.BatchPredictionJob)

In [178]:
vertex_model.supported_input_storage_formats

['jsonl', 'bigquery', 'csv', 'tf-record', 'tf-record-gzip', 'file-list']

### JSON Lines File - Stored In GCS

https://cloud.google.com/vertex-ai/docs/predictions/get-batch-predictions#input_data_requirements

#### Save To GCS AS JSONL

In [133]:
blob = bucket.blob(f"{SERIES}/{EXPERIMENT}/batch/jsonl/example_instances.jsonl")
blob.upload_from_string(
    "\n".join(json.dumps(instance) for instance in example_instances),
    content_type="application/jsonl",
)

#### Start Batch Prediction Job On Vertex AI

In [134]:
batch_prediction_job_jsonl = vertex_model.batch_predict(
    job_display_name = f"{SERIES}-{EXPERIMENT}-JSONL",
    gcs_source = [f"gs://{bucket.name}/{blob.name}"],
    gcs_destination_prefix = f"gs://{bucket.name}/{SERIES}/{EXPERIMENT}/batch/jsonl",
    instances_format = 'jsonl',
    machine_type = 'n1-standard-2',
    accelerator_count = 0,
    accelerator_type = None,
    starting_replica_count = 1,
    max_replica_count = 10,
)

Creating BatchPredictionJob
BatchPredictionJob created. Resource name: projects/1026793852137/locations/us-central1/batchPredictionJobs/2498244367114829824
To use this BatchPredictionJob in another session:
bpj = aiplatform.BatchPredictionJob('projects/1026793852137/locations/us-central1/batchPredictionJobs/2498244367114829824')
View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/2498244367114829824?project=1026793852137
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/2498244367114829824 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/2498244367114829824 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/2498244367114829824 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredicti

In [138]:
batch_prediction_job_jsonl.state

<JobState.JOB_STATE_SUCCEEDED: 4>

In [142]:
batch_prediction_job_jsonl.state._member_names_

['JOB_STATE_UNSPECIFIED',
 'JOB_STATE_QUEUED',
 'JOB_STATE_PENDING',
 'JOB_STATE_RUNNING',
 'JOB_STATE_SUCCEEDED',
 'JOB_STATE_FAILED',
 'JOB_STATE_CANCELLING',
 'JOB_STATE_CANCELLED',
 'JOB_STATE_PAUSED',
 'JOB_STATE_EXPIRED',
 'JOB_STATE_UPDATING',
 'JOB_STATE_PARTIALLY_SUCCEEDED']

In [143]:
batch_prediction_job_jsonl.done()

True

In [146]:
batch_prediction_job_jsonl.end_time - batch_prediction_job_jsonl.start_time

datetime.timedelta(seconds=920, microseconds=655599)

In [145]:
str(batch_prediction_job_jsonl.end_time - batch_prediction_job_jsonl.start_time)

'0:15:20.655599'

#### Retrieve Results From GCS

There could be multiple files containing the prediction results depending on the number of serving nodes.  **Note** that there is also a series for errors files that contain per instance errors if there are any.

In [148]:
batch_prediction_job_jsonl.output_info

gcs_output_directory: "gs://statmike-mlops-349915/mlops-serving/understand-io/batch/jsonl/prediction-mlops-serving-2025_03_04T07_05_13_159Z"

In [160]:
blobs = list(bucket.list_blobs(prefix = batch_prediction_job_jsonl.output_info.gcs_output_directory.split(bucket.name)[-1][1:]))
blobs

[<Blob: statmike-mlops-349915, mlops-serving/understand-io/batch/jsonl/prediction-mlops-serving-2025_03_04T07_05_13_159Z/prediction.errors_stats-00000-of-00001, 1741101495088915>,
 <Blob: statmike-mlops-349915, mlops-serving/understand-io/batch/jsonl/prediction-mlops-serving-2025_03_04T07_05_13_159Z/prediction.results-00000-of-00001, 1741101494924910>]

In [165]:
jsonl_predictions = []
for blob in blobs:
    if blob.name.split('/')[-1].startswith('prediction.results-'):
        print('Reading from: ', blob.name)
        blob_content = blob.download_as_string().decode('utf-8')
        for line in blob_content.splitlines():
            jsonl_predictions.append(json.loads(line))

Reading from:  mlops-serving/understand-io/batch/jsonl/prediction-mlops-serving-2025_03_04T07_05_13_159Z/prediction.results-00000-of-00001


In [166]:
jsonl_predictions

[{'instance': 1, 'prediction': 1},
 {'instance': 2, 'prediction': 2},
 {'instance': 3, 'prediction': 3},
 {'instance': 4, 'prediction': 4},
 {'instance': 1.24578001, 'prediction': 1.24578001},
 {'instance': 'a', 'prediction': 'a'},
 {'instance': 'A', 'prediction': 'A'},
 {'instance': {'key': 'value'}, 'prediction': {'key': 'value'}},
 {'instance': [1], 'prediction': [1]},
 {'instance': [1, 2, 3, 4], 'prediction': [1, 2, 3, 4]},
 {'instance': [[1, 2], [3, 4]], 'prediction': [[1, 2], [3, 4]]},
 {'instance': [[[1, 2], [3, 4]]], 'prediction': [[[1, 2], [3, 4]]]},
 {'instance': [{'key': 'value'}], 'prediction': [{'key': 'value'}]},
 {'instance': [{'key': 'value'}, {'key': 'value2'}],
  'prediction': [{'key': 'value'}, {'key': 'value2'}]},
 {'instance': ['gs://bucket/path/to/image/image1.jpg'],
  'prediction': ['gs://bucket/path/to/image/image1.jpg']},
 {'instance': 'c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ==',
  'prediction': 'c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ=='},
 {'ins

### CSV File - Stored In GCS

https://cloud.google.com/vertex-ai/docs/predictions/get-batch-predictions#input_data_requirements

#### Save To GCS AS CSV

In [193]:
header = ['input_1']
prepped_instances = [header]
for instance in example_instances:
    if isinstance(instance, (int, float)): prepped_instances.append([instance])
    elif isinstance(instance, str): prepped_instances.append([instance])
    else: prepped_instances.append([str(instance)])
    
prepped_instances

[['input_1'],
 [1],
 [2],
 [3],
 [4],
 [1.24578001],
 ['a'],
 ['A'],
 ["{'key': 'value'}"],
 ['[1]'],
 ['[1, 2, 3, 4]'],
 ['[[1, 2], [3, 4]]'],
 ['[[[1, 2], [3, 4]]]'],
 ["[{'key': 'value'}]"],
 ["[{'key': 'value'}, {'key': 'value2'}]"],
 ["['gs://bucket/path/to/image/image1.jpg']"],
 ['c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ=='],
 ["['c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ==']"],
 ["{'b64': 'c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ=='}"]]

In [194]:
output = io.StringIO()
writer = csv.writer(output) #, quoting = csv.QUOTE_ALL) # Quote all fields
writer.writerows(prepped_instances)

In [195]:
blob = bucket.blob(f"{SERIES}/{EXPERIMENT}/batch/csv/example_instances.csv")
blob.upload_from_string(
    output.getvalue(),
    content_type="text/csv",
)

#### Start Batch Prediction Job On Vertex AI

In [196]:
batch_prediction_job_csv = vertex_model.batch_predict(
    job_display_name = f"{SERIES}-{EXPERIMENT}-CSV",
    gcs_source = [f"gs://{bucket.name}/{blob.name}"],
    gcs_destination_prefix = f"gs://{bucket.name}/{SERIES}/{EXPERIMENT}/batch/csv",
    instances_format = 'csv',
    machine_type = 'n1-standard-2',
    accelerator_count = 0,
    accelerator_type = None,
    starting_replica_count = 1,
    max_replica_count = 10,
)

Creating BatchPredictionJob
BatchPredictionJob created. Resource name: projects/1026793852137/locations/us-central1/batchPredictionJobs/5260006469855608832
To use this BatchPredictionJob in another session:
bpj = aiplatform.BatchPredictionJob('projects/1026793852137/locations/us-central1/batchPredictionJobs/5260006469855608832')
View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/5260006469855608832?project=1026793852137
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/5260006469855608832 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/5260006469855608832 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/5260006469855608832 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredicti

In [206]:
batch_prediction_job_csv.state

<JobState.JOB_STATE_SUCCEEDED: 4>

In [207]:
batch_prediction_job_csv.state._member_names_

['JOB_STATE_UNSPECIFIED',
 'JOB_STATE_QUEUED',
 'JOB_STATE_PENDING',
 'JOB_STATE_RUNNING',
 'JOB_STATE_SUCCEEDED',
 'JOB_STATE_FAILED',
 'JOB_STATE_CANCELLING',
 'JOB_STATE_CANCELLED',
 'JOB_STATE_PAUSED',
 'JOB_STATE_EXPIRED',
 'JOB_STATE_UPDATING',
 'JOB_STATE_PARTIALLY_SUCCEEDED']

In [208]:
batch_prediction_job_csv.done()

True

In [209]:
batch_prediction_job_csv.end_time - batch_prediction_job_csv.start_time

datetime.timedelta(seconds=964, microseconds=761931)

In [210]:
str(batch_prediction_job_csv.end_time - batch_prediction_job_csv.start_time)

'0:16:04.761931'

#### Retrieve Results From GCS

There could be multiple files containing the prediction results depending on the number of serving nodes.  **Note** that there is also a series for errors files that contain per instance errors if there are any.

In [211]:
batch_prediction_job_csv.output_info

gcs_output_directory: "gs://statmike-mlops-349915/mlops-serving/understand-io/batch/csv/prediction-mlops-serving-2025_03_04T09_12_56_163Z"

In [212]:
blobs = list(bucket.list_blobs(prefix = batch_prediction_job_csv.output_info.gcs_output_directory.split(bucket.name)[-1][1:]))
blobs

[<Blob: statmike-mlops-349915, mlops-serving/understand-io/batch/csv/prediction-mlops-serving-2025_03_04T09_12_56_163Z/prediction.errors_stats-00000-of-00001, 1741109219072711>,
 <Blob: statmike-mlops-349915, mlops-serving/understand-io/batch/csv/prediction-mlops-serving-2025_03_04T09_12_56_163Z/prediction.results-00000-of-00002, 1741109218086521>,
 <Blob: statmike-mlops-349915, mlops-serving/understand-io/batch/csv/prediction-mlops-serving-2025_03_04T09_12_56_163Z/prediction.results-00001-of-00002, 1741109218077960>]

In [213]:
csv_predictions = []
for blob in blobs:
    if blob.name.split('/')[-1].startswith('prediction.results-'):
        print('Reading from: ', blob.name)
        blob_content = blob.download_as_string().decode('utf-8')
        for line in blob_content.splitlines():
            csv_predictions.append(json.loads(line))

Reading from:  mlops-serving/understand-io/batch/csv/prediction-mlops-serving-2025_03_04T09_12_56_163Z/prediction.results-00000-of-00002
Reading from:  mlops-serving/understand-io/batch/csv/prediction-mlops-serving-2025_03_04T09_12_56_163Z/prediction.results-00001-of-00002


In [214]:
csv_predictions

[{'instance': [2.0], 'prediction': [2.0]},
 {'instance': [3.0], 'prediction': [3.0]},
 {'instance': [1.0], 'prediction': [1.0]},
 {'instance': [4.0], 'prediction': [4.0]},
 {'instance': [1.24578001], 'prediction': [1.24578001]}]

### BigQuery Table

BigQuery tables can be the input and output for Vertex AI Batch Prediction jobs.  Since BQ columns have types, schemas, the example set of instances get converted to string first which changes the input types to the prediction service: 1 become "1" for example.

#### Save To BigQuery Table

In [254]:
dataset = bigquery.Dataset(f"{BQ_PROJECT}.{BQ_DATASET}")
dataset.location = BQ_REGION
bq_dataset = bq.create_dataset(dataset, exists_ok = True)

In [255]:
bq_table = bq_dataset.table(BQ_TABLE)

In [256]:
job_config = bigquery.LoadJobConfig(
    source_format = bigquery.SourceFormat.NEWLINE_DELIMITED_JSON,
    write_disposition = bigquery.WriteDisposition.WRITE_TRUNCATE,
    autodetect = True
)

In [257]:
load_job = bq.load_table_from_json(
    json_rows = [{"input_1" : str(instance)} for instance in example_instances],
    destination = bq_table,
    job_config = job_config
)
load_job.result()

LoadJob<project=statmike-mlops-349915, location=US, id=451f615b-6ce5-4dbb-ab95-68457ece2f72>

In [258]:
bq.query(f"SELECT * FROM `{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}`").to_dataframe()

Unnamed: 0,input_1
0,1
1,2
2,3
3,4
4,1.24578001
5,a
6,A
7,{'key': 'value'}
8,[1]
9,"[1, 2, 3, 4]"


#### Start Batch Prediction Job On Vertex AI

In [259]:
batch_prediction_job_bq = vertex_model.batch_predict(
    job_display_name = f"{SERIES}-{EXPERIMENT}-BQ",
    bigquery_source = f"bq://{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}",
    bigquery_destination_prefix = f"bq://{BQ_PROJECT}.{BQ_DATASET}",
    instances_format = 'bigquery',
    machine_type = 'n1-standard-2',
    accelerator_count = 0,
    accelerator_type = None,
    starting_replica_count = 1,
    max_replica_count = 10,
)

Creating BatchPredictionJob
BatchPredictionJob created. Resource name: projects/1026793852137/locations/us-central1/batchPredictionJobs/2740084148667416576
To use this BatchPredictionJob in another session:
bpj = aiplatform.BatchPredictionJob('projects/1026793852137/locations/us-central1/batchPredictionJobs/2740084148667416576')
View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/2740084148667416576?project=1026793852137
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/2740084148667416576 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/2740084148667416576 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/2740084148667416576 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredicti

In [262]:
batch_prediction_job_bq.state

<JobState.JOB_STATE_SUCCEEDED: 4>

In [263]:
batch_prediction_job_bq.state._member_names_

['JOB_STATE_UNSPECIFIED',
 'JOB_STATE_QUEUED',
 'JOB_STATE_PENDING',
 'JOB_STATE_RUNNING',
 'JOB_STATE_SUCCEEDED',
 'JOB_STATE_FAILED',
 'JOB_STATE_CANCELLING',
 'JOB_STATE_CANCELLED',
 'JOB_STATE_PAUSED',
 'JOB_STATE_EXPIRED',
 'JOB_STATE_UPDATING',
 'JOB_STATE_PARTIALLY_SUCCEEDED']

In [264]:
batch_prediction_job_bq.done()

True

In [265]:
batch_prediction_job_bq.end_time - batch_prediction_job_bq.start_time

datetime.timedelta(seconds=882, microseconds=226634)

In [266]:
str(batch_prediction_job_bq.end_time - batch_prediction_job_bq.start_time)

'0:14:42.226634'

#### Retrieve Results From BigQuery

Results are stored in a BigQuery table out the destination provided during the batch job:

In [267]:
batch_prediction_job_bq.output_info

bigquery_output_dataset: "bq://statmike-mlops-349915.mlops_serving"
bigquery_output_table: "predictions_2025_03_05T05_32_55_714Z_576"

In [268]:
bq.query(f"SELECT * FROM `{BQ_PROJECT}.{BQ_DATASET}.{batch_prediction_job_bq.output_info.bigquery_output_table}`").to_dataframe()

Unnamed: 0,input_1,prediction
0,1,"[""1""]"
1,2,"[""2""]"
2,3,"[""3""]"
3,4,"[""4""]"
4,1.24578001,"[""1.24578001""]"
5,a,"[""a""]"
6,A,"[""A""]"
7,{'key': 'value'},"[""{'key': 'value'}""]"
8,[1],"[""[1]""]"
9,"[1, 2, 3, 4]","[""[1, 2, 3, 4]""]"


### File List - Stored In GCS

Instances can be stored in gcs as files - any type of file really but this is particularly helpful for document file types or media files like images, and video.  Batch prediction input for this method is a file with a list of gcs locations for the individual instances stored in a gcs based file.  The individual instance files are read in base64 encoded binary and sent to the prediction container as a dictionary with single key `b64`.  The prediction container would then need to handle the decoding of this input.

#### Save Instances To Files In GCS

In [276]:
filelist = []
for i, instance in enumerate(example_instances):
    
    blob = bucket.blob(f"{SERIES}/{EXPERIMENT}/batch/filelist/files/instance_{i}.txt")
    blob.upload_from_string(
        str(instance),
        content_type = "text/plain",
    )
    filelist.append(f"gs://{bucket.name}/{blob.name}")

blob = bucket.blob(f"{SERIES}/{EXPERIMENT}/batch/filelist/filelist.txt")
blob.upload_from_string(
    '\n'.join(filelist),
    content_type = 'text/plain'
)

#### Start Batch Prediction Job On Vertex AI

In [277]:
batch_prediction_job_filelist = vertex_model.batch_predict(
    job_display_name = f"{SERIES}-{EXPERIMENT}-FILELIST",
    gcs_source = [f"gs://{bucket.name}/{blob.name}"],
    gcs_destination_prefix = f"gs://{bucket.name}/{SERIES}/{EXPERIMENT}/batch/filelist",
    instances_format = 'file-list',
    machine_type = 'n1-standard-2',
    accelerator_count = 0,
    accelerator_type = None,
    starting_replica_count = 1,
    max_replica_count = 10,
)

Creating BatchPredictionJob
BatchPredictionJob created. Resource name: projects/1026793852137/locations/us-central1/batchPredictionJobs/3414990774075392000
To use this BatchPredictionJob in another session:
bpj = aiplatform.BatchPredictionJob('projects/1026793852137/locations/us-central1/batchPredictionJobs/3414990774075392000')
View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/3414990774075392000?project=1026793852137
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/3414990774075392000 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/3414990774075392000 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/3414990774075392000 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredicti

In [278]:
batch_prediction_job_filelist.state

<JobState.JOB_STATE_SUCCEEDED: 4>

In [279]:
batch_prediction_job_filelist.state._member_names_

['JOB_STATE_UNSPECIFIED',
 'JOB_STATE_QUEUED',
 'JOB_STATE_PENDING',
 'JOB_STATE_RUNNING',
 'JOB_STATE_SUCCEEDED',
 'JOB_STATE_FAILED',
 'JOB_STATE_CANCELLING',
 'JOB_STATE_CANCELLED',
 'JOB_STATE_PAUSED',
 'JOB_STATE_EXPIRED',
 'JOB_STATE_UPDATING',
 'JOB_STATE_PARTIALLY_SUCCEEDED']

In [280]:
batch_prediction_job_filelist.done()

True

In [281]:
batch_prediction_job_filelist.end_time - batch_prediction_job_filelist.start_time

datetime.timedelta(seconds=931, microseconds=898112)

In [282]:
str(batch_prediction_job_filelist.end_time - batch_prediction_job_filelist.start_time)

'0:15:31.898112'

#### Retrieve Results From GCS

There could be multiple files containing the prediction results depending on the number of serving nodes.  **Note** that there is also a series for errors files that contain per instance errors if there are any.

In [283]:
batch_prediction_job_filelist.output_info

gcs_output_directory: "gs://statmike-mlops-349915/mlops-serving/understand-io/batch/filelist/prediction-mlops-serving-2025_03_05T06_53_04_021Z"

In [284]:
blobs = list(bucket.list_blobs(prefix = batch_prediction_job_filelist.output_info.gcs_output_directory.split(bucket.name)[-1][1:]))
blobs

[<Blob: statmike-mlops-349915, mlops-serving/understand-io/batch/filelist/prediction-mlops-serving-2025_03_05T06_53_04_021Z/prediction.errors_stats-00000-of-00001, 1741187180928168>,
 <Blob: statmike-mlops-349915, mlops-serving/understand-io/batch/filelist/prediction-mlops-serving-2025_03_05T06_53_04_021Z/prediction.results-00000-of-00004, 1741187180562990>,
 <Blob: statmike-mlops-349915, mlops-serving/understand-io/batch/filelist/prediction-mlops-serving-2025_03_05T06_53_04_021Z/prediction.results-00001-of-00004, 1741187180572141>,
 <Blob: statmike-mlops-349915, mlops-serving/understand-io/batch/filelist/prediction-mlops-serving-2025_03_05T06_53_04_021Z/prediction.results-00002-of-00004, 1741187180692006>,
 <Blob: statmike-mlops-349915, mlops-serving/understand-io/batch/filelist/prediction-mlops-serving-2025_03_05T06_53_04_021Z/prediction.results-00003-of-00004, 1741187180573352>]

In [285]:
filelist_predictions = []
for blob in blobs:
    if blob.name.split('/')[-1].startswith('prediction.results-'):
        print('Reading from: ', blob.name)
        blob_content = blob.download_as_string().decode('utf-8')
        for line in blob_content.splitlines():
            filelist_predictions.append(json.loads(line))

Reading from:  mlops-serving/understand-io/batch/filelist/prediction-mlops-serving-2025_03_05T06_53_04_021Z/prediction.results-00000-of-00004
Reading from:  mlops-serving/understand-io/batch/filelist/prediction-mlops-serving-2025_03_05T06_53_04_021Z/prediction.results-00001-of-00004
Reading from:  mlops-serving/understand-io/batch/filelist/prediction-mlops-serving-2025_03_05T06_53_04_021Z/prediction.results-00002-of-00004
Reading from:  mlops-serving/understand-io/batch/filelist/prediction-mlops-serving-2025_03_05T06_53_04_021Z/prediction.results-00003-of-00004


In [287]:
filelist_predictions[0]

{'instance': 'gs://statmike-mlops-349915/mlops-serving/understand-io/batch/filelist/files/instance_13.txt',
 'prediction': {'b64': 'W3sna2V5JzogJ3ZhbHVlJ30sIHsna2V5JzogJ3ZhbHVlMid9XQ=='}}

#### Decode Results

In [290]:
for pred in filelist_predictions:
    pred['prediction']['decoded'] = base64.b64decode(pred['prediction']['b64']).decode('utf-8')

In [291]:
filelist_predictions[0]

{'instance': 'gs://statmike-mlops-349915/mlops-serving/understand-io/batch/filelist/files/instance_13.txt',
 'prediction': {'b64': 'W3sna2V5JzogJ3ZhbHVlJ30sIHsna2V5JzogJ3ZhbHVlMid9XQ==',
  'decoded': "[{'key': 'value'}, {'key': 'value2'}]"}}

In [292]:
filelist_predictions

[{'instance': 'gs://statmike-mlops-349915/mlops-serving/understand-io/batch/filelist/files/instance_13.txt',
  'prediction': {'b64': 'W3sna2V5JzogJ3ZhbHVlJ30sIHsna2V5JzogJ3ZhbHVlMid9XQ==',
   'decoded': "[{'key': 'value'}, {'key': 'value2'}]"}},
 {'instance': 'gs://statmike-mlops-349915/mlops-serving/understand-io/batch/filelist/files/instance_8.txt',
  'prediction': {'b64': 'WzFd', 'decoded': '[1]'}},
 {'instance': 'gs://statmike-mlops-349915/mlops-serving/understand-io/batch/filelist/files/instance_15.txt',
  'prediction': {'b64': 'YzI5dFpYUm9hVzVuSUhSdklHVnVZMjlrWlNCaGN5QmlhVzVoY25rZ2FHVnlaUT09',
   'decoded': 'c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ=='}},
 {'instance': 'gs://statmike-mlops-349915/mlops-serving/understand-io/batch/filelist/files/instance_16.txt',
  'prediction': {'b64': 'WydjMjl0WlhSb2FXNW5JSFJ2SUdWdVkyOWtaU0JoY3lCaWFXNWhjbmtnYUdWeVpRPT0nXQ==',
   'decoded': "['c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ==']"}},
 {'instance': 'gs://statmike-mlops-349915/ml

### TFRecord Files - Stored In GCS

A TFRecord is a file that TensorFlow uses to store binary formated data for efficient reading and transport.  A single file can contain multiple records.  Read more details about TFRecords [here](https://www.tensorflow.org/tutorials/load_data/tfrecord) and [here](https://www.tensorflow.org/guide/data#consuming_tfrecord_data).

You can use uncompressed or compressed versions of TFRecord files by specifying the instance format as either: 'tf-record', 'tf-record-gzip'.  

The Vertex AI Batch Prediction service can read different record files from different serving replicas so make sure that the records files are split into smaller files and pass them to the job with a wildcard like `gcs_source = ['gs://bucket-name/path/*.tfrecord']`.

#### Save Instances To TFRecord Files In GCS

In [307]:
for i, instance in enumerate(example_instances):
    blob = bucket.blob(f"{SERIES}/{EXPERIMENT}/batch/tfrecord/records/instance_{i}.tfrecord")
    with tempfile.NamedTemporaryFile(delete = True, suffix = '.tfrecord') as temp_file:
        with tf.io.TFRecordWriter(temp_file.name) as writer: # options = 'GZIP'
            feature = tf.train.Feature(bytes_list = tf.train.BytesList(value = [str(instance).encode('utf-8')]))
            example = tf.train.Example(features = tf.train.Features(feature = {"value": feature}))
            writer.write(example.SerializeToString())
        blob.upload_from_filename(temp_file.name)

#### Start Batch Prediction Job On Vertex AI

In [308]:
batch_prediction_job_tfrecord = vertex_model.batch_predict(
    job_display_name = f"{SERIES}-{EXPERIMENT}-TFRECORD",
    gcs_source = [f"gs://{bucket.name}/{SERIES}/{EXPERIMENT}/batch/tfrecord/records/*.tfrecord"],
    gcs_destination_prefix = f"gs://{bucket.name}/{SERIES}/{EXPERIMENT}/batch/tfrecord",
    instances_format = 'tf-record',
    machine_type = 'n1-standard-2',
    accelerator_count = 0,
    accelerator_type = None,
    starting_replica_count = 1,
    max_replica_count = 10,
)

Creating BatchPredictionJob
BatchPredictionJob created. Resource name: projects/1026793852137/locations/us-central1/batchPredictionJobs/5510156360290926592
To use this BatchPredictionJob in another session:
bpj = aiplatform.BatchPredictionJob('projects/1026793852137/locations/us-central1/batchPredictionJobs/5510156360290926592')
View Batch Prediction Job:
https://console.cloud.google.com/ai/platform/locations/us-central1/batch-predictions/5510156360290926592?project=1026793852137
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/5510156360290926592 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/5510156360290926592 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredictionJobs/5510156360290926592 current state:
JobState.JOB_STATE_RUNNING
BatchPredictionJob projects/1026793852137/locations/us-central1/batchPredicti

In [309]:
batch_prediction_job_tfrecord.state

<JobState.JOB_STATE_SUCCEEDED: 4>

In [310]:
batch_prediction_job_tfrecord.state._member_names_

['JOB_STATE_UNSPECIFIED',
 'JOB_STATE_QUEUED',
 'JOB_STATE_PENDING',
 'JOB_STATE_RUNNING',
 'JOB_STATE_SUCCEEDED',
 'JOB_STATE_FAILED',
 'JOB_STATE_CANCELLING',
 'JOB_STATE_CANCELLED',
 'JOB_STATE_PAUSED',
 'JOB_STATE_EXPIRED',
 'JOB_STATE_UPDATING',
 'JOB_STATE_PARTIALLY_SUCCEEDED']

In [311]:
batch_prediction_job_tfrecord.done()

True

In [312]:
batch_prediction_job_tfrecord.end_time - batch_prediction_job_tfrecord.start_time

datetime.timedelta(seconds=919, microseconds=210520)

In [313]:
str(batch_prediction_job_tfrecord.end_time - batch_prediction_job_tfrecord.start_time)

'0:15:19.210520'

#### Retrieve Results From GCS

There could be multiple files containing the prediction results depending on the number of serving nodes.  **Note** that there is also a series for errors files that contain per instance errors if there are any.

In [314]:
batch_prediction_job_tfrecord.output_info

gcs_output_directory: "gs://statmike-mlops-349915/mlops-serving/understand-io/batch/tfrecord/prediction-mlops-serving-2025_03_05T09_59_44_728Z"

In [315]:
blobs = list(bucket.list_blobs(prefix = batch_prediction_job_tfrecord.output_info.gcs_output_directory.split(bucket.name)[-1][1:]))
blobs

[<Blob: statmike-mlops-349915, mlops-serving/understand-io/batch/tfrecord/prediction-mlops-serving-2025_03_05T09_59_44_728Z/prediction.errors_stats-00000-of-00001, 1741198360307693>,
 <Blob: statmike-mlops-349915, mlops-serving/understand-io/batch/tfrecord/prediction-mlops-serving-2025_03_05T09_59_44_728Z/prediction.results-00000-of-00002, 1741198359587390>,
 <Blob: statmike-mlops-349915, mlops-serving/understand-io/batch/tfrecord/prediction-mlops-serving-2025_03_05T09_59_44_728Z/prediction.results-00001-of-00002, 1741198359589120>]

In [316]:
tfrecord_predictions = []
for blob in blobs:
    if blob.name.split('/')[-1].startswith('prediction.results-'):
        print('Reading from: ', blob.name)
        blob_content = blob.download_as_string().decode('utf-8')
        for line in blob_content.splitlines():
            tfrecord_predictions.append(json.loads(line))

Reading from:  mlops-serving/understand-io/batch/tfrecord/prediction-mlops-serving-2025_03_05T09_59_44_728Z/prediction.results-00000-of-00002
Reading from:  mlops-serving/understand-io/batch/tfrecord/prediction-mlops-serving-2025_03_05T09_59_44_728Z/prediction.results-00001-of-00002


In [317]:
tfrecord_predictions[0]

{'prediction': {'b64': 'CkoKSAoFdmFsdWUSPwo9Cjt7J2I2NCc6ICdjMjl0WlhSb2FXNW5JSFJ2SUdWdVkyOWtaU0JoY3lCaWFXNWhjbmtnYUdWeVpRPT0nfQ=='}}

#### Decode Results

In [318]:
for pred in tfrecord_predictions:
    pred['prediction']['decoded'] = base64.b64decode(pred['prediction']['b64']).decode('utf-8')

In [319]:
tfrecord_predictions[0]

{'prediction': {'b64': 'CkoKSAoFdmFsdWUSPwo9Cjt7J2I2NCc6ICdjMjl0WlhSb2FXNW5JSFJ2SUdWdVkyOWtaU0JoY3lCaWFXNWhjbmtnYUdWeVpRPT0nfQ==',
  'decoded': "\nJ\nH\n\x05value\x12?\n=\n;{'b64': 'c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ=='}"}}

In [320]:
tfrecord_predictions

[{'prediction': {'b64': 'CkoKSAoFdmFsdWUSPwo9Cjt7J2I2NCc6ICdjMjl0WlhSb2FXNW5JSFJ2SUdWdVkyOWtaU0JoY3lCaWFXNWhjbmtnYUdWeVpRPT0nfQ==',
   'decoded': "\nJ\nH\n\x05value\x12?\n=\n;{'b64': 'c29tZXRoaW5nIHRvIGVuY29kZSBhcyBiaW5hcnkgaGVyZQ=='}"}},
 {'prediction': {'b64': 'ChAKDgoFdmFsdWUSBQoDCgEz',
   'decoded': '\n\x10\n\x0e\n\x05value\x12\x05\n\x03\n\x013'}},
 {'prediction': {'b64': 'ChAKDgoFdmFsdWUSBQoDCgE0',
   'decoded': '\n\x10\n\x0e\n\x05value\x12\x05\n\x03\n\x014'}},
 {'prediction': {'b64': 'ChkKFwoFdmFsdWUSDgoMCgoxLjI0NTc4MDAx',
   'decoded': '\n\x19\n\x17\n\x05value\x12\x0e\n\x0c\n\n1.24578001'}},
 {'prediction': {'b64': 'ChAKDgoFdmFsdWUSBQoDCgFh',
   'decoded': '\n\x10\n\x0e\n\x05value\x12\x05\n\x03\n\x01a'}},
 {'prediction': {'b64': 'ChAKDgoFdmFsdWUSBQoDCgFB',
   'decoded': '\n\x10\n\x0e\n\x05value\x12\x05\n\x03\n\x01A'}},
 {'prediction': {'b64': 'Ch8KHQoFdmFsdWUSFAoSChB7J2tleSc6ICd2YWx1ZSd9',
   'decoded': "\n\x1f\n\x1d\n\x05value\x12\x14\n\x12\n\x10{'key': 'value'}"}},
 {'predicti

## Filtering And Transformation During Batch Jobs

As batch prediction jobs read records to process for inference it can optionally filter (subset fields/columns) and transform (change the order of fields/columns) the records.  Read more in the documentation [here](https://cloud.google.com/vertex-ai/docs/predictions/get-batch-predictions#filter_and_transform_input_data).

This requires specifying an [instanceConfig](https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform_v1.types.BatchPredictionJob.InstanceConfig) for the BatchPredictionJob, like:
```
"instanceConfig": {
"instance_type":""
"excluded_fields":[]
"included_fields":[]
"key_field":[]
}
```
**Notes**
- `instance_type` is the instance type the Model accepts, like 'array' will create an array of values form the input record
- `key_field` is a field that will not be included in the instance and will be provided in the output instead of repeating the instance (the default behavior)
- `excluded_fields` are fields to ignore and not pass to the instance but will be included in the output unless a key_field was specified
- `included_fields` are fields to include in the instance and also determine the field order or array order of the instance

To use this the job needs to be submitted with the gapic/aiplatform_v1 SDK as follows:

Create a jobs client:
```
client_options = {"api_endpoint": f"{REGION}-aiplatform.googleapis.com"}
jobs_client = aiplatform.gapic.JobServiceClient(client_options = client_options)
```

Create a batch prediction job object:
```
batch_prediction_job = aiplatform.gapic.BatchPredictionJob(
    display_name = f'{SERIES}_{EXPERIMENT}',
    model = vertex_model.versioned_resource_name,
    input_config = dict(
        instances_format = 'bigquery',
        bigquery_source = dict(input_uri = f'bq://{BQ_PROJECT}.{BQ_DATASET}.{BQ_TABLE}')
    ),
    output_config = dict(
        predictions_format = 'bigquery',
        bigquery_destination = dict(output_uri = f'bq://{BQ_PROJECT}.{BQ_DATASET}')
    ),
    dedicated_resources = dict(
        machine_spec = dict(machine_type = 'n1-standard-2'),
        starting_replica_count = 2,
        max_replica_count = 10
    ),
    instance_config = dict(
        instance_type = 'array',
        included_fields = list(train_x.columns),
        #excluded_fields = ['Class', 'splits', 'transaction_id']
    )
)
```

Submit the job with the jobs client:
```
BatchJob = jobs_client.create_batch_prediction_job(
    parent = f'projects/{PROJECT_ID}/locations/{REGION}',
    batch_prediction_job = batch_prediction_job
)
```

Get the job and check the state:
```
BatchJob = jobs_client.get_batch_prediction_job(
    name = BatchJob.name
)
BatchJob.state, BatchJob.state.value, BatchJob.state.name
```

Get the jobs output info:
```
BatchJob.output_info
```