## Before you begin

### Set up your Google Cloud project

**The following steps are required, regardless of your notebook environment (Vertex AI Workbench, Colab, or local).**

1. **Select or create a Google Cloud project**  
   https://console.cloud.google.com/cloud-resource-manager  
   ↳ Your Google Cloud account includes $300 free credit for new users.

2. **Make sure that billing is enabled for your project**  
   https://cloud.google.com/billing/docs/how-to/modify-project  

3. **Enable the required Google Cloud APIs**

This tutorial uses a **custom prediction container**, **Artifact Registry**, **Cloud Build**, and **Vertex AI**, so you must enable all the following:

| API | Purpose |
|-----|---------|
| `aiplatform.googleapis.com` | Vertex AI (model registry, endpoints, predictions) |
| `artifactregistry.googleapis.com` | Stores your custom Docker images |
| `cloudbuild.googleapis.com` | Builds the container image from your Dockerfile |
| `storage.googleapis.com` | Allows reading/writing to Cloud Storage buckets |
| `compute.googleapis.com` | Required for Endpoint deployment VMs |


---

# Deploy IMDB Sentiment Model to Vertex AI Endpoint

Hands-on workshop notebook for deploying a BERT-based IMDB sentiment model as a real-time endpoint on Vertex AI, using a custom FastAPI container.

---

## 1. Environment and prerequisites

**Why this section?**  
Vertex AI calls your model through a container. That container and your notebook need the right Python libraries and SDKs.

### 1.1 Install Python dependencies

```python
# Core libraries:
# - google-cloud-aiplatform: Vertex AI SDK for Python
# - fastapi, uvicorn: web server inside the container
# - transformers, torch: Hugging Face BERT model and backend
# - pydantic: request validation for FastAPI

!pip install -q "google-cloud-aiplatform>=1.49.0" fastapi uvicorn transformers torch pydantic


### Enviroment and prerequirsites
-Configure project variables
- Why this section?
- We centralise configuration so you only edit project/region names in one place.

In [1]:
# If needed, install extra libraries (Workbench usually has aiplatform preinstalled)
!pip install -q "google-cloud-aiplatform>=1.49.0" fastapi uvicorn transformers torch pydantic


In [2]:
!gcloud config list

INFORMATION: Project 'vast-collective-478617-j1' has no 'environment' tag set. Use either 'Production', 'Development', 'Test', or 'Staging'. Add an 'environment' tag using `gcloud resource-manager tags bindings create`.
[compute]
region = us-central1
[core]
account = 60487384516-compute@developer.gserviceaccount.com
disable_usage_reporting = True
project = vast-collective-478617-j1
universe_domain = googleapis.com
[dataproc]
region = us-central1

Your active configuration is: [default]


In [3]:
import os
from google.cloud import aiplatform

# ----- REQUIRED: EDIT THESE VALUES -----
PROJECT_ID = "vast-collective-478617-j1"
REGION = "us-central1"  # or your chosen region
ARTIFACT_REPO = "vertex-mlops-repo"  # Artifact Registry repo name (will be created if it does not exist)
IMAGE_NAME = "imdb-sentiment-bert"
MODEL_DISPLAY_NAME = "imdb-bert-sentiment"
ENDPOINT_DISPLAY_NAME = "imdb-bert-sentiment-endpoint"
LOCATION = REGION  # same as REGION
# --------------------------------------


In [4]:
# Set the project for gcloud in this Workbench VM
!gcloud config set project {PROJECT_ID}


INFORMATION: Project 'vast-collective-478617-j1' has no 'environment' tag set. Use either 'Production', 'Development', 'Test', or 'Staging'. Add an 'environment' tag using `gcloud resource-manager tags bindings create`.
Updated property [core/project].


In [5]:
# Enable required APIs (run once per project)
!gcloud services enable \
    aiplatform.googleapis.com \
    artifactregistry.googleapis.com \
    cloudbuild.googleapis.com


Operation "operations/acat.p2-60487384516-93a62d7e-63cd-4303-9801-5ae2b31e724c" finished successfully.


In [None]:
# 1. Create an Artifact Registry repository (for container images)

# If the repo already exists, this will just error harmlessly; you can ignore "already exists" messages.

In [6]:
!gcloud artifacts repositories create {ARTIFACT_REPO} \
    --repository-format=docker \
    --location={REGION} \
    --description="Repo for Vertex AI custom containers" || echo "Repo may already exist"


Create request issued for: [vertex-mlops-repo]
Waiting for operation [projects/vast-collective-478617-j1/locations/us-central1
/operations/b60e3de3-a109-4f98-b1ba-415eed959f4e] to complete...done.          
Created repository [vertex-mlops-repo].


In [7]:
IMAGE_URI = f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{ARTIFACT_REPO}/{IMAGE_NAME}:v1"
IMAGE_URI


'us-central1-docker.pkg.dev/vast-collective-478617-j1/vertex-mlops-repo/imdb-sentiment-bert:v1'

In [None]:
# 2. Build the serving app (FastAPI + Hugging Face model)

# We will create a small app that:

# Loads philipobiorah/bert-imdb-model and tokenizer.

# Exposes /ping for health checks.

# Exposes /predict to accept text and return sentiment + confidence.

In [10]:
!rm -rf imdb_serving_app
!mkdir -p imdb_serving_app
%cd imdb_serving_app

/home/jupyter/v3_2/imdb_serving_app


  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]


In [11]:
%%writefile main.py
from fastapi import FastAPI
from pydantic import BaseModel
from typing import List
import torch
from transformers import BertTokenizer, BertForSequenceClassification

app = FastAPI()

# Load tokenizer and model at startup
MODEL_NAME = "philipobiorah/bert-imdb-model"
BASE_TOKENIZER_NAME = "bert-base-uncased"

tokenizer = BertTokenizer.from_pretrained(BASE_TOKENIZER_NAME)
model = BertForSequenceClassification.from_pretrained(MODEL_NAME)
model.eval()  # set to evaluation mode

class Instance(BaseModel):
    text: str

class PredictRequest(BaseModel):
    instances: List[Instance]

@app.get("/ping")
def health_check():
    return {"status": "ok"}

@app.post("/predict")
def predict(request: PredictRequest):
    texts = [inst.text for inst in request.instances]

    # Tokenize batch
    inputs = tokenizer(
        texts,
        return_tensors="pt",
        truncation=True,
        padding=True,
        max_length=512
    )

    with torch.no_grad():
        logits = model(**inputs).logits

    probs = torch.nn.functional.softmax(logits, dim=1)
    predictions = []

    for i, text in enumerate(texts):
        prob = probs[i]
        sentiment_idx = prob.argmax().item()
        confidence = prob[sentiment_idx].item() * 100.0
        sentiment_label = "Positive" if sentiment_idx == 1 else "Negative"

        predictions.append({
            "text": text,
            "sentiment": sentiment_label,
            "confidence": round(confidence, 2)
        })

    # Vertex AI expects {"predictions": [...]}
    return {"predictions": predictions}


Writing main.py


In [12]:
%%writefile requirements.txt
fastapi
uvicorn[standard]
torch
transformers
pydantic


Writing requirements.txt


In [13]:
%%writefile Dockerfile
FROM python:3.10-slim

# Prevents Python from writing pyc files and buffering stdout/stderr
ENV PYTHONDONTWRITEBYTECODE=1
ENV PYTHONUNBUFFERED=1

# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
    git \
    && rm -rf /var/lib/apt/lists/*

# Workdir
WORKDIR /app

# Copy requirements and install
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy app code
COPY main.py .

# Expose port
EXPOSE 8080

# Start FastAPI app
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8080"]


Writing Dockerfile


In [None]:
# 3. Build and push the container image using Cloud Build

# We build the image in the current directory (which contains Dockerfile, main.py, requirements.txt) and push it to Artifact Registry.

In [14]:
!gcloud builds submit --tag {IMAGE_URI} .


Creating temporary archive of 3 file(s) totalling 2.1 KiB before compression.
Uploading tarball of [.] to [gs://vast-collective-478617-j1_cloudbuild/source/1764238217.206755-305469b3485b4337b2e6485866cc65ce.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/vast-collective-478617-j1/locations/global/builds/627a9a13-a235-45a8-a346-4ca77253d26e].
Logs are available at [ https://console.cloud.google.com/cloud-build/builds/627a9a13-a235-45a8-a346-4ca77253d26e?project=60487384516 ].
Waiting for build to complete. Polling interval: 1 second(s).
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "627a9a13-a235-45a8-a346-4ca77253d26e"

FETCHSOURCE
Fetching storage object: gs://vast-collective-478617-j1_cloudbuild/source/1764238217.206755-305469b3485b4337b2e6485866cc65ce.tgz#1764238217459660
Copying gs://vast-collective-478617-j1_cloudbuild/source/1764238217.206755-305469b3485b4337b2e6485866cc65ce.tgz#1764238217459660...
/ [1 files][  1.3 Ki

In [None]:
# 4. Upload the model to Vertex AI

# Now we register the container image as a Vertex AI Model.

In [15]:
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=LOCATION)

model = aiplatform.Model.upload(
    display_name=MODEL_DISPLAY_NAME,
    serving_container_image_uri=IMAGE_URI,
    serving_container_predict_route="/predict",
    serving_container_health_route="/ping",
)

model.resource_name


Creating Model
Create Model backing LRO: projects/60487384516/locations/us-central1/models/3642786476426526720/operations/2142734084185522176
Model created. Resource name: projects/60487384516/locations/us-central1/models/3642786476426526720@1
To use this Model in another session:
model = aiplatform.Model('projects/60487384516/locations/us-central1/models/3642786476426526720@1')


'projects/60487384516/locations/us-central1/models/3642786476426526720'

In [None]:
# 5. Create an Endpoint and deploy the model

# You can either create a new endpoint or reuse an existing one. Here we create a fresh endpoint and deploy this model to it.

In [16]:
endpoint = aiplatform.Endpoint.create(display_name=ENDPOINT_DISPLAY_NAME)
endpoint.resource_name


Creating Endpoint
Create Endpoint backing LRO: projects/60487384516/locations/us-central1/endpoints/8963103340909035520/operations/5757435735103766528
Endpoint created. Resource name: projects/60487384516/locations/us-central1/endpoints/8963103340909035520
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/60487384516/locations/us-central1/endpoints/8963103340909035520')


'projects/60487384516/locations/us-central1/endpoints/8963103340909035520'

In [17]:
deploy_op = model.deploy(
    endpoint=endpoint,
    machine_type="n1-standard-2",  # or e2-standard-2/4 etc
    min_replica_count=1,
    max_replica_count=1,
    traffic_percentage=100,
    sync=True,
)

endpoint.resource_name


Deploying model to Endpoint : projects/60487384516/locations/us-central1/endpoints/8963103340909035520
Deploy Endpoint model backing LRO: projects/60487384516/locations/us-central1/endpoints/8963103340909035520/operations/1479860514031927296
Endpoint model deployed. Resource name: projects/60487384516/locations/us-central1/endpoints/8963103340909035520


'projects/60487384516/locations/us-central1/endpoints/8963103340909035520'

In [None]:
# 6. Test online predictions from the notebook

# Now call the deployed endpoint using the Vertex AI Python SDK.

In [18]:
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=LOCATION)

endpoint = aiplatform.Endpoint(endpoint_name=endpoint.resource_name)

test_instances = [
    {"text": "This movie was absolutely fantastic!"},
    {"text": "I really disliked this movie, it was terrible."},
]

response = endpoint.predict(instances=test_instances)
response


Prediction(predictions=[{'confidence': 99.76, 'sentiment': 'Positive', 'text': 'This movie was absolutely fantastic!'}, {'confidence': 99.75, 'sentiment': 'Negative', 'text': 'I really disliked this movie, it was terrible.'}], deployed_model_id='770519057047748608', metadata=None, model_version_id='1', model_resource_name='projects/60487384516/locations/us-central1/models/3642786476426526720', explanations=None)

In [19]:
print(response.predictions)


[{'confidence': 99.76, 'sentiment': 'Positive', 'text': 'This movie was absolutely fantastic!'}, {'confidence': 99.75, 'sentiment': 'Negative', 'text': 'I really disliked this movie, it was terrible.'}]


In [20]:
for pred in response.predictions:
    print(
        f"Text: {pred['text']}\n"
        f"Sentiment: {pred['sentiment']}, "
        f"Confidence: {pred['confidence']}%\n"
    )


Text: This movie was absolutely fantastic!
Sentiment: Positive, Confidence: 99.76%

Text: I really disliked this movie, it was terrible.
Sentiment: Negative, Confidence: 99.75%



In [21]:
#Optional Clean up resources

In [None]:
# 7. (Optional) Clean up resources

# Run these if you want to delete the endpoint and model when you are done to avoid charges.

In [None]:
# # Undeploy and delete endpoint
# endpoint.undeploy_all()
# endpoint.delete()

In [None]:
# # Delete model
# model.delete()