## Before you begin

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment.**

1. [Select or create a Google Cloud project](https://console.cloud.google.com/cloud-resource-manager). When you first create an account, you get a $300 free credit towards your compute/storage costs.

2. [Make sure that billing is enabled for your project](https://cloud.google.com/billing/docs/how-to/modify-project).

3. [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). 
3. **Enable the required Google Cloud APIs**

This tutorial uses a **custom prediction container**, **Artifact Registry**, **Cloud Build**, and **Vertex AI**, so you must enable all the following:

| API | Purpose |
|-----|---------|
| `aiplatform.googleapis.com` | Vertex AI (model registry, endpoints, predictions) |
| `artifactregistry.googleapis.com` | Stores your custom Docker images |
| `cloudbuild.googleapis.com` | Builds the container image from your Dockerfile |
| `storage.googleapis.com` | Allows reading/writing to Cloud Storage buckets |
| `compute.googleapis.com` | Required for Endpoint deployment VMs |


4. If you are running this notebook locally, you need to install the [Cloud SDK](https://cloud.google.com/sdk). [How to Install Cloud SDK on macOS](https://medium.com/@philipsatimor2/how-to-install-and-configure-google-cloud-sdk-on-macos-using-homebrew-cc41f36dc592)

# Practical MLOps with Vertex AI: Deploying ML Models

## Overview

This example demonstrates how to build, package, and deploy a custom machine learning model using **Vertex AI**. Unlike AutoML or prebuilt containers, this workflow shows how to take a **PyTorch model you trained yourself**, wrap it in a **FastAPI prediction server**, containerize it with **Docker**, and deploy it as a **fully managed, scalable endpoint** on Google Cloud.

You will walk through the complete MLOps lifecycle:

- Training a BERT text-classification model on a small dataset  
- Saving the model artifacts needed for inference (`model.pt`)  
- Creating custom inference code (`inference.py`)  
- Building a prediction API with FastAPI (`server.py`)  
- Creating a lightweight Docker image  
- Using **Cloud Build** to build and push the image  
- Registering and deploying the model with **Vertex AI**  
- Sending live prediction requests to the deployed endpoint  

This example is ideal for:

- ML engineers who want full control over their serving stack  
- Data scientists who need more flexibility than AutoML provides  
- Developers learning modern MLOps practices on Google Cloud  
- Anyone deploying **PyTorch models** using custom logic  

Before you begin, you should have:

- Basic familiarity with Python and PyTorch  
- A working understanding of virtual environments or notebooks  
- A Google Cloud project with required APIs enabled  
- Very basic knowledge of Docker (enough to understand a Dockerfile)  

Learn more about [Vertex AI custom model deployment](https://cloud.google.com/vertex-ai/docs/predictions/custom-models).  


---

#### 2. Project Setup and Environment


In [12]:
!pip3 install --upgrade --quiet google-cloud-aiplatform

In [13]:
!gcloud config list

INFORMATION: Project 'vast-collective-478617-j1' has no 'environment' tag set. Use either 'Production', 'Development', 'Test', or 'Staging'. Add an 'environment' tag using `gcloud resource-manager tags bindings create`.
[compute]
region = us-central1
[core]
account = 60487384516-compute@developer.gserviceaccount.com
disable_usage_reporting = True
project = vast-collective-478617-j1
universe_domain = googleapis.com
[dataproc]
region = us-central1

Your active configuration is: [default]


In [None]:
#Environment setup and installs

In [14]:
PROJECT_ID = "vast-collective-478617-j1"       # Your GCP project ID
REGION = "us-central1"                         # Vertex AI region (must match resources)
BUCKET_NAME = "vertex-mlops-philip-devfest"    # Existing GCS bucket
MODEL_DIR = "bert_agnews_model"                # Local folder to store model artifacts
ENDPOINT_DISPLAY_NAME = "bert-agnews-endpoint"

print(PROJECT_ID, REGION, BUCKET_NAME)


vast-collective-478617-j1 us-central1 vertex-mlops-philip-devfest


In [15]:
IMAGE_REPO = "vertex-mlops"              # Artifact Registry repo name (will be created if needed)
IMAGE_NAME = "bert-agnews-workshop"      #Image name
IMAGE_TAG ="v1"                          #Image tag

In [16]:
IMAGE_URI = f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{IMAGE_REPO}/{IMAGE_NAME}:{IMAGE_TAG}"

In [17]:
print("Project:", PROJECT_ID)
print("Region:", REGION)
print("Bucket:", BUCKET_NAME)
print("Image URI:", IMAGE_URI)

Project: vast-collective-478617-j1
Region: us-central1
Bucket: vertex-mlops-philip-devfest
Image URI: us-central1-docker.pkg.dev/vast-collective-478617-j1/vertex-mlops/bert-agnews-workshop:v1


In [18]:
!gcloud config set project {PROJECT_ID}   # Set the active project for gcloud and SDK tools


INFORMATION: Project 'vast-collective-478617-j1' has no 'environment' tag set. Use either 'Production', 'Development', 'Test', or 'Staging'. Add an 'environment' tag using `gcloud resource-manager tags bindings create`.
Updated property [core/project].


### 2.3. Install Python dependencies (for training and local testing)

We install training and serving libraries into our notebook environment.

- `transformers`, `datasets`, `torch` for model + data  
- `scikit-learn` for evaluation  
- `google-cloud-aiplatform` to interact with Vertex AI  
- `fastapi`, `uvicorn` for our custom prediction container


In [19]:
!pip install -q transformers datasets torch scikit-learn pandas numpy google-cloud-aiplatform fastapi uvicorn


In [12]:
#Re-start kernel if needed after install.

#### 2.4. Initialize the Vertex AI Python SDK

This lets us use the high-level Vertex AI client in code instead of clicking in the UI.


In [20]:
from google.cloud import aiplatform

aiplatform.init(
    project=PROJECT_ID,
    location=REGION,
    staging_bucket=f"gs://{BUCKET_NAME}"
)


In [None]:
#Load a Small Sample of the AG News Dataset (for fast ..considering time.)

#### 3. Load and Sample the AG News Dataset

We use the Hugging Face `datasets` library to load AG News and then take a **tiny subset** to keep training fast.

- 200 examples for training  
- 50 examples for validation  
- 100 examples for testing  




We use the [AG News](https://huggingface.co/datasets/ag_news) dataset, a benchmark dataset for text classification with four categories:

| Label | Category |
| ----- | -------- |
| 0     | World    |
| 1     | Sports   |
| 2     | Business |
| 3     | Sci/Tech |

The dataset is tokenized using the `BertTokenizer` and then mapped into a format suitable for PyTorch.

In [17]:
from datasets import load_dataset

dataset = load_dataset("ag_news")
train_full = dataset["train"]
test_full = dataset["test"]


In [20]:
print( len(train_full), len(test_full))

120000 7600


In [21]:
N_TRAIN = 200
N_VAL = 50
N_TEST = 100

small_train = train_full.shuffle(seed=42).select(range(N_TRAIN + N_VAL))
small_test = test_full.shuffle(seed=42).select(range(N_TEST))

train_ds = small_train.select(range(N_TRAIN))
val_ds = small_train.select(range(N_TRAIN, N_TRAIN + N_VAL))

len(train_ds), len(val_ds), len(small_test)


(200, 50, 100)

#### 4. Tokenize Text with DistilBERT

We convert raw text into token IDs required by the DistilBERT model.

Steps:

1. Load the DistilBERT tokenizer.  
2. Define a function to tokenize and pad/truncate.  
3. Apply it to train/val/test splits.  
4. Configure them to output PyTorch tensors.


In [22]:
from transformers import AutoTokenizer

MODEL_NAME = "distilbert-base-uncased"
MAX_LEN = 128

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

def tokenize_batch(batch):
    return tokenizer(
        batch["text"],
        padding="max_length",
        truncation=True,
        max_length=MAX_LEN,
    )

train_enc = train_ds.map(tokenize_batch, batched=True)
val_enc = val_ds.map(tokenize_batch, batched=True)
test_enc = small_test.map(tokenize_batch, batched=True)

cols = ["input_ids", "attention_mask", "label"]
for ds in [train_enc, val_enc, test_enc]:
    ds.set_format(type="torch", columns=cols)


  import pynvml  # type: ignore[import]
The cache for model files in Transformers v4.22.0 has been updated. Migrating your old cache. This is a one-time only operation. You can interrupt this and resume the migration later on by calling `transformers.utils.move_cache()`.


0it [00:00, ?it/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

Map:   0%|          | 0/200 [00:00<?, ? examples/s]

Map:   0%|          | 0/50 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

#### 5. Create PyTorch DataLoaders

We wrap the tokenized datasets in DataLoaders:

- Batches data  
- Shuffles training set  
- Keeps validation and test deterministic  


In [23]:
from torch.utils.data import DataLoader

BATCH_SIZE = 16   # small batch size for CPU-friendly training

train_loader = DataLoader(train_enc, batch_size=BATCH_SIZE, shuffle=True)
val_loader = DataLoader(val_enc, batch_size=BATCH_SIZE)
test_loader = DataLoader(test_enc, batch_size=BATCH_SIZE)

#### 6. Define the DistilBERT Classifier in PyTorch

We build a simple classifier:

- DistilBERT encoder  
- Dropout regularization  
- Linear layer for 4 AG News labels  


In [25]:
import torch
from torch import nn
from transformers import AutoModel

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device


device(type='cpu')

In [26]:
class BertClassifier(nn.Module):
    def __init__(self, base_model_name, num_labels=4):
        super().__init__()
        self.bert = AutoModel.from_pretrained(base_model_name)
        self.dropout = nn.Dropout(0.2)
        self.classifier = nn.Linear(self.bert.config.hidden_size, num_labels)

    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        pooled = outputs.last_hidden_state[:, 0]
        x = self.dropout(pooled)
        return self.classifier(x)


In [27]:
model = BertClassifier(MODEL_NAME).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(model.parameters(), lr=2e-5)


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

#Training the model (FAST)
#### 7. Train the Model (Fast)

We train for **one epoch** on our tiny subset.

The goal is not maximum accuracy, but to demonstrate the workflow




In [28]:
from tqdm.auto import tqdm

def train_one_epoch(model, loader):
    model.train()
    total_correct = 0
    total_examples = 0

    for batch in tqdm(loader):
        optimizer.zero_grad()

        input_ids = batch["input_ids"].to(device)
        # Compute loss between predicted logits and true labels
        attention_mask = batch["attention_mask"].to(device)
        labels = batch["label"].to(device)

        outputs = model(input_ids, attention_mask)
        loss = criterion(outputs, labels)
        # Backpropagate gradients
        loss.backward()
        #update weights
        optimizer.step()

        preds = outputs.argmax(dim=1)
        total_correct += (preds == labels).sum().item()
        total_examples += labels.size(0)

    return total_correct / total_examples


In [30]:
EPOCHS = 3  # Fast for workshop

for epoch in range(EPOCHS):
    acc = train_one_epoch(model, train_loader)
    print(f"Epoch {epoch+1} accuracy: {acc:.3f}")


  0%|          | 0/13 [00:00<?, ?it/s]

Epoch 1 accuracy: 0.805


  0%|          | 0/13 [00:00<?, ?it/s]

Epoch 2 accuracy: 0.915


  0%|          | 0/13 [00:00<?, ?it/s]

Epoch 3 accuracy: 0.955


In [None]:
#Evaluate the model

#### 8. Evaluate the Model

We evaluate on the test set and compute:

- Accuracy  
- Confusion matrix  


In [31]:
from sklearn.metrics import accuracy_score, confusion_matrix

model.eval()
all_labels = []
all_preds = []

with torch.no_grad():
    for batch in test_loader:
        outputs = model(
            batch["input_ids"].to(device),
            batch["attention_mask"].to(device)
        )
        preds = outputs.argmax(dim=1).cpu().numpy()

        all_preds.extend(preds)
        all_labels.extend(batch["label"].numpy())

print("Test Accuracy:", accuracy_score(all_labels, all_preds))
confusion_matrix(all_labels, all_preds)


Test Accuracy: 0.84


array([[16,  2,  2,  2],
       [ 0, 27,  0,  0],
       [ 0,  0, 28,  4],
       [ 1,  0,  5, 13]])

In [None]:
#save the model Locally

### 9. Save the Trained Model and Prepare for Inference

In this section, we save the fine-tuned model weights and prepare the files that will be
used later inside the custom prediction container.



In [32]:
import os
os.makedirs(MODEL_DIR, exist_ok=True)

MODEL_PATH = f"{MODEL_DIR}/model.pt"
torch.save(model.state_dict(), MODEL_PATH)
MODEL_PATH


'bert_agnews_model/model.pt'

In [33]:
#Create the Inference Script (inference.py)
#This script defines how Vertex AI will load and run the model.

In [34]:
%%writefile {MODEL_DIR}/inference.py
import torch
from torch import nn
from transformers import AutoModel, AutoTokenizer

MODEL_NAME = "distilbert-base-uncased"
MAX_LEN = 128

class BertClassifier(nn.Module):
    def __init__(self, base_model_name, num_labels=4):
        super().__init__()
        self.bert = AutoModel.from_pretrained(base_model_name)
        self.dropout = nn.Dropout(0.2)
        self.classifier = nn.Linear(self.bert.config.hidden_size, num_labels)

    def forward(self, input_ids, attention_mask):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        pooled = outputs.last_hidden_state[:, 0]
        x = self.dropout(pooled)
        return self.classifier(x)

class Predictor:
    def __init__(self, model_path):
        self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        self.tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
        self.model = BertClassifier(MODEL_NAME)
        self.model.load_state_dict(torch.load(model_path, map_location=self.device))
        self.model.to(self.device)
        self.model.eval()

    def predict(self, instances):
        texts = [inst["text"] for inst in instances]
        enc = self.tokenizer(
            texts,
            truncation=True,
            padding=True,
            max_length=MAX_LEN,
            return_tensors="pt",
        )

        input_ids = enc["input_ids"].to(self.device)
        attention_mask = enc["attention_mask"].to(self.device)

        with torch.no_grad():
            outputs = self.model(input_ids, attention_mask)
            probs = torch.softmax(outputs, dim=1)
            preds = probs.argmax(dim=1).cpu().tolist()
            confs = probs.max(dim=1).values.cpu().tolist()

        return [{"label": int(p), "confidence": float(c)} for p, c in zip(preds, confs)]


Writing bert_agnews_model/inference.py


In [None]:
#Fast API Server(server.py)

### 11. Create the FastAPI Server (`server.py`)

Vertex AI will not call `Predictor` directly. Instead, it sends HTTP requests
to a web server that runs inside our custom container.

We use **FastAPI** to create a small REST API that:

- Loads the `Predictor` once at startup  
- Exposes a `/predict` endpoint that accepts JSON input  
- Returns predictions as JSON  

This file will live in the same folder as `model.pt` and `inference.py`.


In [3]:
%%writefile bert_agnews_model/server.py
import os
from typing import List

import torch
from fastapi import FastAPI
from pydantic import BaseModel

from inference import Predictor

MODEL_PATH = os.getenv("MODEL_PATH", "model.pt")

app = FastAPI()
predictor = Predictor(MODEL_PATH)

class Instance(BaseModel):
    text: str

class PredictRequest(BaseModel):
    instances: List[Instance]


# --- Health and metadata endpoints --- #

@app.get("/")
def root():
    # Simple root handler so readiness probes get HTTP 200
    return {"status": "ok"}

@app.get("/health")
def health():
    # Explicit health endpoint (nice for debugging)
    return {"status": "healthy"}

@app.get("/v1/endpoints/{endpoint_id}/deployedModels/{deployed_model_id}")
def model_info(endpoint_id: str, deployed_model_id: str):
    # When Vertex calls this, just say "yes, I'm here"
    return {
        "endpoint_id": endpoint_id,
        "deployed_model_id": deployed_model_id,
        "status": "ready",
    }


# --- Prediction endpoint --- #
  

@app.post("/predict")
def predict(request: PredictRequest):
      """
    Main prediction endpoint.

    Accepts a JSON body of the form:
    {
      "instances": [
        {"text": "some text"},
        {"text": "another text"}
      ]
    }

    Returns:
    {
      "predictions": [
        {"label": ..., "confidence": ...},
        ...
      ]
    }
    """
     # Convert Pydantic objects to simple dicts
    instances = [{"text": inst.text} for inst in request.instances]
    preds = predictor.predict(instances)
    return {"predictions": preds}


Overwriting bert_agnews_model/server.py


### 12. Create `requirements.txt` for the Container

The container needs to know which Python packages to install at build time.
We list them in a `requirements.txt` file located in `bert_agnews_model/`.

This is separate from the notebook environment: it is used when building
the Docker image that will run on Vertex AI.


In [21]:
%%writefile {MODEL_DIR}/requirements.txt
torch
transformers
fastapi
uvicorn


Overwriting bert_agnews_model/requirements.txt


### 13. Create a Dockerfile for the Custom Prediction Container

The Dockerfile defines the environment that will run on Vertex AI.  
It does the following:

1. Starts from a small Python base image.  
2. Sets `/app` as the working directory inside the container.  
3. Copies everything from the local `bert_agnews_model/` folder into `/app/`.  
4. Installs Python dependencies from `requirements.txt`.  
5. Sets the `MODEL_PATH` environment variable that `Predictor` will read.  
6. Launches the FastAPI app with Uvicorn on port `8080` (the default for Vertex).  


In [22]:
%%writefile Dockerfile
FROM python:3.11-slim

WORKDIR /app

COPY bert_agnews_model/ /app/

RUN pip install --no-cache-dir -r requirements.txt

ENV MODEL_PATH=/app/model.pt

CMD ["uvicorn", "server:app", "--host", "0.0.0.0", "--port", "8080"]


Overwriting Dockerfile


### 14. Build and Push the Image with Cloud Build

We use **Cloud Build** and **Artifact Registry** to build and store the container image.

Steps:

1. Create an Artifact Registry repository (one time per project).  
2. Configure Docker to authenticate with Artifact Registry.  
3. Run `gcloud builds submit` to build the image from the Dockerfile and push it.  



In [23]:
!gcloud artifacts repositories create {IMAGE_REPO} \
    --repository-format=DOCKER \
    --location={REGION} \
    --description="Vertex MLOps workshop images" || true


[1;31mERROR:[0m (gcloud.artifacts.repositories.create) ALREADY_EXISTS: the repository already exists


In [None]:
# 14.2 Configure Docker to use gcloud credentials for Artifact Registry

In [24]:
!gcloud auth configure-docker {REGION}-docker.pkg.dev -q



{
  "credHelpers": {
    "gcr.io": "gcloud",
    "us.gcr.io": "gcloud",
    "eu.gcr.io": "gcloud",
    "asia.gcr.io": "gcloud",
    "staging-k8s.gcr.io": "gcloud",
    "marketplace.gcr.io": "gcloud",
    "us-central1-docker.pkg.dev": "gcloud"
  }
}
Adding credentials for: us-central1-docker.pkg.dev
gcloud credential helpers already registered correctly.


In [None]:
#ensure Cloud Build API is enabled !!!

In [33]:
# 14.3 Build the Docker image from the current directory and push it

In [25]:
!gcloud builds submit . --tag {IMAGE_URI}


Creating temporary archive of 9 file(s) totalling 253.3 MiB before compression.
Uploading tarball of [.] to [gs://vast-collective-478617-j1_cloudbuild/source/1764249242.489282-ccbfc58cb54245e09c5644b4fabac7bd.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/vast-collective-478617-j1/locations/global/builds/fba4fb48-a8c3-4996-acac-5e086c8a7344].
Logs are available at [ https://console.cloud.google.com/cloud-build/builds/fba4fb48-a8c3-4996-acac-5e086c8a7344?project=60487384516 ].
Waiting for build to complete. Polling interval: 1 second(s).
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "fba4fb48-a8c3-4996-acac-5e086c8a7344"

FETCHSOURCE
Fetching storage object: gs://vast-collective-478617-j1_cloudbuild/source/1764249242.489282-ccbfc58cb54245e09c5644b4fabac7bd.tgz#1764249276565916
Copying gs://vast-collective-478617-j1_cloudbuild/source/1764249242.489282-ccbfc58cb54245e09c5644b4fabac7bd.tgz#1764249276565916...
\ [1 files][233.5 

### 15. (Optional) Upload Model Artifacts to Google Cloud Storage

Even though the container already contains all model files, it is common MLOps
practice to also store the raw artifacts in a GCS bucket.

This gives you:

- A backup of `model.pt` and your code.  
- A central store for versioned models.  


In [26]:
!gsutil -m cp -r {MODEL_DIR} gs://{BUCKET_NAME}/models/{MODEL_DIR}


Copying file://bert_agnews_model/inference.py [Content-Type=text/x-python]...
Copying file://bert_agnews_model/model.pt [Content-Type=application/vnd.snesdev-page-table]...
Copying file://bert_agnews_model/server.py [Content-Type=text/x-python]...      
Copying file://bert_agnews_model/requirements.txt [Content-Type=text/plain]...  
==> NOTE: You are uploading one or more large file(s), which would run          
significantly faster if you enable parallel composite uploads. This
feature can be enabled by editing the
"parallel_composite_upload_threshold" value in your .boto
configuration file. However, note that if you do this large files will
be uploaded as `composite objects
<https://cloud.google.com/storage/docs/composite-objects>`_,which
means that any user who downloads such objects will need to have a
compiled crcmod installed (see "gsutil help crcmod"). This is because
without a compiled crcmod, computing checksums on composite objects is
so slow that gsutil disables downloads of

### 16. Register the Model in Vertex AI

Now we tell Vertex AI about our model by creating a **Model** resource.

This resource points to:

- The artifact directory in GCS (for bookkeeping).  
- The custom container image in Artifact Registry.  
- The HTTP route used for prediction (`/predict`).  



In [27]:
vertex_model = aiplatform.Model.upload(
    display_name="bert-agnews-workshop",
    artifact_uri=f"gs://{BUCKET_NAME}/models/{MODEL_DIR}",
    serving_container_image_uri=IMAGE_URI,
    serving_container_predict_route="/predict",
)

vertex_model.resource_name


Creating Model
Create Model backing LRO: projects/60487384516/locations/us-central1/models/1820517477201739776/operations/7482138470526222336
Model created. Resource name: projects/60487384516/locations/us-central1/models/1820517477201739776@1
To use this Model in another session:
model = aiplatform.Model('projects/60487384516/locations/us-central1/models/1820517477201739776@1')


'projects/60487384516/locations/us-central1/models/1820517477201739776'


#### 17. Create an Endpoint and Deploy the Model

A **Model** is just a registered asset.  
To actually serve traffic, we need a **Vertex AI Endpoint** and then deploy the model to it.

Steps:

1. Create a new endpoint (think of it as a network address for predictions).  
2. Deploy our custom container model to that endpoint.  


In [28]:
endpoint = aiplatform.Endpoint.create(display_name="bert-agnews-endpoint")
endpoint.resource_name


Creating Endpoint
Create Endpoint backing LRO: projects/60487384516/locations/us-central1/endpoints/1847415929663651840/operations/4498503717393268736
Endpoint created. Resource name: projects/60487384516/locations/us-central1/endpoints/1847415929663651840
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/60487384516/locations/us-central1/endpoints/1847415929663651840')


'projects/60487384516/locations/us-central1/endpoints/1847415929663651840'

In [29]:
deploy_op = vertex_model.deploy(
    endpoint=endpoint,
    machine_type="n1-standard-2",
    min_replica_count=1,
    max_replica_count=1,
    traffic_percentage=100,
    sync=True,
)
deploy_op


Deploying model to Endpoint : projects/60487384516/locations/us-central1/endpoints/1847415929663651840
Deploy Endpoint model backing LRO: projects/60487384516/locations/us-central1/endpoints/1847415929663651840/operations/2838364304753819648
Endpoint model deployed. Resource name: projects/60487384516/locations/us-central1/endpoints/1847415929663651840


<google.cloud.aiplatform.models.Endpoint object at 0x7f7ded132170> 
resource name: projects/60487384516/locations/us-central1/endpoints/1847415929663651840

#### 18. Send Live Prediction Requests

Once deployment completes successfully, the endpoint is ready to accept prediction requests.

We use the Python SDK to send a list of instances. Each instance is a dictionary with a `text` field,
matching the shape expected by our `Predictor.predict()` method.



In [30]:
sample_instances = [
    {"text": "Apple releases a new AI powered smartphone with advanced features."},
    {"text": "The local team won the national championship in a thrilling final."},
]

endpoint.predict(sample_instances)


Prediction(predictions=[{'label': 3.0, 'confidence': 0.8962261080741882}, {'label': 1.0, 'confidence': 0.7797536253929138}], deployed_model_id='9199005759671631872', metadata=None, model_version_id='1', model_resource_name='projects/60487384516/locations/us-central1/models/1820517477201739776', explanations=None)

If everything is wired correctly, you should see a response similar to:

```json
{
  "predictions": [
    {"label": 3, "confidence": 0.87},
    {"label": 1, "confidence": 0.91}
  ]
}