# Office cloud task for Google Cloud ML Setup and Deployment

This notebook sets up the necessary Google Cloud Platform (GCP) services for machine learning deployment. It configures:

- **Google Cloud Storage (GCS)** - For storing model artifacts and data
- **Google Cloud AI Platform** - For model training and deployment
- **Project configuration** - Sets up the GCP project and region

The setup process includes creating a storage bucket and initializing the AI Platform with the necessary credentials and configurations.

In [None]:
# Google Cloud Platform libraries
from google.cloud import aiplatform  # Vertex AI for model training and deployment
from google.cloud import storage     # Google Cloud Storage for data and model artifacts

import pandas as pd                  # For data manipulation

In [None]:
# Configure GCP project settings
# This cell sets up the basic configuration for your GCP project
# These settings will be used throughout the notebook for all GCP operations

# Get the current GCP project ID from gcloud CLI
# This ensures we're working with the correct project
project = !gcloud config get-value project
PROJECT_ID = project[0]

# Set the region for AI Platform services
LOCATION = 'us-central1'

# Define the storage bucket name for storing model artifacts
# This bucket will store our trained models and preprocessing objects
BUCKET = 'cloud-office-ml-bucket'

In [None]:
# Initialize Google Cloud service clients
# These clients will be used to interact with GCS and BigQuery services

gcs = storage.Client(project = PROJECT_ID)  # Google Cloud Storage client

In [None]:
# Create or verify Google Cloud Storage bucket
# This bucket will store model artifacts, training data, and other ML assets

if not gcs.lookup_bucket(BUCKET):
    # Create new bucket if it doesn't exist
    bucketDef = gcs.bucket(BUCKET)
    bucket = gcs.create_bucket(bucketDef, project=PROJECT_ID, location=LOCATION)
    print(f'Created Bucket: {gcs.lookup_bucket(BUCKET).name}')
else:
    # Use existing bucket if it already exists
    bucket = gcs.bucket(BUCKET)
    print(f'Bucket already exist: {bucket.name}')

Bucket already exist: cloud-office-ml-bucket


In [None]:
# Create bucket URI for AI Platform configuration
# This URI format is required by Google Cloud AI Platform services

BUCKET_URI = f"gs://{bucket.name}"

In [None]:
# Initialize Google Cloud AI Platform
# This sets up the AI Platform with your project settings and staging bucket
# The staging bucket is where training artifacts and model files will be stored

aiplatform.init(project=PROJECT_ID, location=LOCATION, staging_bucket=BUCKET_URI)

In [None]:
MODEL_ARTIFACT_DIR = "coml-artifact-dir"
REPOSITORY = "coml-repository-name"
IMAGE = "coml-image-name"
MODEL_DISPLAY_NAME = "coml-model-display-name"

# Set the defaults if no names were specified
if MODEL_ARTIFACT_DIR == "[your-artifact-directory]":
    MODEL_ARTIFACT_DIR = "custom-container-prediction-model"

if REPOSITORY == "[your-repository-name]":
    REPOSITORY = "custom-container-prediction"

if IMAGE == "[your-image-name]":
    IMAGE = "sklearn-fastapi-server"

if MODEL_DISPLAY_NAME == "[your-model-display-name]":
    MODEL_DISPLAY_NAME = "sklearn-custom-container"

In [None]:
%mkdir app

In [None]:
%%writefile app/preprocess.py

import pandas as pd

class Preprocessor():
    def __init__(self):
        self.numerical = ['tenure', 'monthlycharges', 'totalcharges']
        self.categorical = [
            'gender',
            'seniorcitizen',
            'partner',
            'dependents',
            'phoneservice',
            'multiplelines',
            'internetservice',
            'onlinesecurity',
            'onlinebackup',
            'deviceprotection',
            'techsupport',
            'streamingtv',
            'streamingmovies',
            'contract',
            'paperlessbilling',
            'paymentmethod',
        ]

    def preprocess(self, df: pd.DataFrame) -> pd.DataFrame:
        """
        Preprocess the raw dataframe.
        Args:
            df_raw: Raw dataframe to preprocess.

        Returns:
            Preprocessed dataframe.
        """

        df.columns = [col.lower().replace(' ', '_') for col in df.columns]
        df = df[self.categorical + self.numerical + ['churn']]
        df.churn = (df.churn == 'Yes').astype(int)

        for col in df.columns:
            if df[col].dtype == 'object':
                df[col] = df[col].str.lower().str.replace(' ', '_')
                
        df.totalcharges = pd.to_numeric(df.totalcharges, errors='coerce')
        df[self.numerical] = df[self.numerical].fillna(0)

        return df

Writing app/preprocess.py


In [None]:
import pickle

import joblib
from app.preprocess import Preprocessor
from sklearn.feature_extraction import DictVectorizer
from sklearn.linear_model import LogisticRegression

dv = DictVectorizer()
prp = Preprocessor()

BUCKET = 'cloud-office-ml-bucket'

df_raw = pd.read_csv(f'gs://{BUCKET}/WA_Fn-UseC_-Telco-Customer-Churn.csv')
df_processed = prp.preprocess(df_raw)

y_train = df_processed['churn']
X_train = df_processed.drop('churn', axis=1)

train_dict = X_train.to_dict(orient='records')
X_train = dv.fit_transform(train_dict)

model = LogisticRegression(solver='liblinear')
model.fit(X_train, y_train)

joblib.dump(model, "model.joblib")
with open("preprocessor.pkl", "wb") as f:
    pickle.dump(prp, f)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.churn = (df.churn == 'Yes').astype(int)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[col] = df[col].str.lower().str.replace(' ', '_')
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df.totalcharges = pd.to_numeric(df.totalcharges, errors='coerce')
A value is trying to be set on a copy of a s

Upload model artifacts - model itself and preprocessor

In [12]:
!gsutil cp model.joblib preprocessor.pkl {BUCKET_URI}/{MODEL_ARTIFACT_DIR}/

Copying file://model.joblib [Content-Type=application/octet-stream]...
Copying file://preprocessor.pkl [Content-Type=application/octet-stream]...      
- [2 files][  1.6 KiB/  1.6 KiB]                                                
Operation completed over 2 objects/1.6 KiB.                                      


Build a FastAPI server

In [None]:
%%writefile app/main.py

from fastapi import FastAPI, Request
from typing import Literal
from pydantic import BaseModel, Field

import joblib
import pickle
import pandas as pd

from google.cloud import storage


AIP_HEALTH_ROUTE = '/health'
AIP_PREDICT_ROUTE = '/predict'
PROBABILITY_THRESHOLD = 0.5

class Customer(BaseModel):
    gender: Literal["male", "female"]
    seniorcitizen: Literal[0, 1]
    partner: Literal["yes", "no"]
    dependents: Literal["yes", "no"]
    phoneservice: Literal["yes", "no"]
    multiplelines: Literal["no", "yes", "no_phone_service"]
    internetservice: Literal["dsl", "fiber_optic", "no"]
    onlinesecurity: Literal["no", "yes", "no_internet_service"]
    onlinebackup: Literal["no", "yes", "no_internet_service"]
    deviceprotection: Literal["no", "yes", "no_internet_service"]
    techsupport: Literal["no", "yes", "no_internet_service"]
    streamingtv: Literal["no", "yes", "no_internet_service"]
    streamingmovies: Literal["no", "yes", "no_internet_service"]
    contract: Literal["month-to-month", "one_year", "two_year"]
    paperlessbilling: Literal["yes", "no"]
    paymentmethod: Literal[
        "electronic_check",
        "mailed_check",
        "bank_transfer_(automatic)",
        "credit_card_(automatic)",
    ]
    tenure: int = Field(..., ge=0)
    monthlycharges: float = Field(..., ge=0.0)
    totalcharges: float = Field(..., ge=0.0)

class PredictResponse(BaseModel):
    churn_probability: float
    churn: bool

app = FastAPI()
gcs_client = storage.Client()

with open("preprocessor.pkl", 'wb') as preprocessor_f, open("model.joblib", 'wb') as model_f:
    gcs_client.download_blob_to_file(
        f"{BUCKET_URI}/{MODEL_ARTIFACT_DIR}/preprocessor.pkl", preprocessor_f
    )
    gcs_client.download_blob_to_file(
        f"{BUCKET_URI}/{MODEL_ARTIFACT_DIR}/model.joblib", model_f
    )

with open("preprocessor.pkl", "rb") as f:
    preprocessor = pickle.load(f)

_model = joblib.load("model.joblib")
_preprocessor = preprocessor


@app.get(AIP_HEALTH_ROUTE, status_code=200)
def health():
    return {}


@app.post(AIP_PREDICT_ROUTE)
async def predict(request: Request):
    body = await request.json()

    customer_data = body["customer_data"]
    inputs = pd.DataFrame(customer_data)
    preprocessed_inputs = _preprocessor.preprocess(inputs)
    outputs = _model.predict_proba(preprocessed_inputs)

    return PredictResponse(
        churn_probability=outputs[1],
        churn=outputs[1] >= PROBABILITY_THRESHOLD
    )


Overwriting app/main.py


In [33]:
%%writefile test_request.py

import requests

url = 'http://localhost:8086/predict'

customer = {
    'gender': 'female',
    'seniorcitizen': 0,
    'partner': 'yes',
    'dependents': 'no',
    'phoneservice': 'no',
    'multiplelines': 'no_phone_service',
    'internetservice': 'dsl',
    'onlinesecurity': 'no',
    'onlinebackup': 'yes',
    'deviceprotection': 'no',
    'techsupport': 'no',
    'streamingtv': 'no',
    'streamingmovies': 'no',
    'contract': 'month-to-month',
    'paperlessbilling': 'yes',
    'paymentmethod': 'electronic_check',
    'tenure': 1,
    'monthlycharges': 29.85,
    'totalcharges': 29.85
}

response = requests.post(url, json=customer)

predictions = response.json()

if predictions['churn']:
    print('customer is likely to churn, send promo')
else:
    print('customer is not likely to churn')

Overwriting test_request.py


In [76]:
%%writefile Dockerfile

FROM python:3.12-slim

# Install uv
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /usr/local/bin/

# Set working directory
WORKDIR /app

# Copy dependency files
COPY pyproject.toml uv.lock ./

# Install dependencies
RUN uv sync --frozen

# Install fastapi
RUN uv add fastapi

# Install uvicorn separately if needed
RUN uv add uvicorn[standard]

# Copy application code
COPY app/ ./app/

# Expose port (if needed)
EXPOSE 8086

#Run the FastAPI application
CMD ["uv", "run", "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8086"]

Overwriting Dockerfile


In [78]:
!docker build -t gcr.io/cloud-office-ml-project/churn-prediction-fastapi:latest .

[1A[1B[0G[?25l
[?25h[1A[0G[?25l[+] Building 0.0s (0/1)                                          docker:default
[?25h[1A[0G[?25l[+] Building 0.2s (1/3)                                          docker:default
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 582B                                       0.0s
[0m => [internal] load metadata for ghcr.io/astral-sh/uv:latest               0.2s
 => [internal] load metadata for docker.io/library/python:3.12-slim        0.2s
[?25h[1A[1A[1A[1A[1A[0G[?25l[+] Building 0.2s (2/3)                                          docker:default
[34m => [internal] load build definition from Dockerfile                       0.0s
[0m[34m => => transferring dockerfile: 582B                                       0.0s
[0m => [internal] load metadata for ghcr.io/astral-sh/uv:latest               0.2s
[34m => [internal] load metadata for docker.io/library/python:3.12-s

In [80]:
REPOSITORY = "cloud-office-ml-docker-repo"
! gcloud artifacts repositories create {REPOSITORY} --repository-format=docker --location={LOCATION} --description="Docker repository"
! gcloud artifacts repositories list

[1;31mERROR:[0m (gcloud.artifacts.repositories.create) ALREADY_EXISTS: the repository already exists
Listing items under project office-cloud-project, across all locations.

                                                                            ARTIFACT_REGISTRY
REPOSITORY                   FORMAT  MODE                 DESCRIPTION        LOCATION     LABELS  ENCRYPTION          CREATE_TIME          UPDATE_TIME          SIZE (MB)
cloud-office-ml-docker-repo  DOCKER  STANDARD_REPOSITORY  Docker repository  us-central1          Google-managed key  2025-10-23T11:30:07  2025-10-23T11:30:07  0


In [81]:
!gcloud builds submit --region={LOCATION} --tag={LOCATION}-docker.pkg.dev/{PROJECT_ID}/{REPOSITORY}/{IMAGE}

Creating temporary archive of 12 file(s) totalling 479.1 KiB before compression.
Some files were not included in the source upload.

Check the gcloud log [/home/ev/.config/gcloud/logs/2025.10.23/11.52.52.541188.log] to see which files and the contents of the
default gcloudignore file used (see `$ gcloud topic gcloudignore` to learn
more).

Uploading tarball of [.] to [gs://office-cloud-project_cloudbuild/source/1761209572.666546-005d921acf8d4527b70d43834dfbfb47.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/office-cloud-project/locations/us-central1/builds/f47623eb-2516-42d7-867a-88f23b2a4e9a].
Logs are available at [ https://console.cloud.google.com/cloud-build/builds;region=us-central1/f47623eb-2516-42d7-867a-88f23b2a4e9a?project=485739110011 ].
Waiting for build to complete. Polling interval: 1 second(s).
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "f47623eb-2516-42d7-867a-88f23b2a4e9a"

FETCHSOURCE
Fetching storage ob

In [82]:
model = aiplatform.Model.upload(
    display_name=MODEL_DISPLAY_NAME,
    artifact_uri=f"{BUCKET_URI}/{MODEL_ARTIFACT_DIR}",
    serving_container_image_uri=f"{LOCATION}-docker.pkg.dev/{PROJECT_ID}/{REPOSITORY}/{IMAGE}",
)

Creating Model
Create Model backing LRO: projects/485739110011/locations/us-central1/models/4839323408283992064/operations/4892136473639256064
Model created. Resource name: projects/485739110011/locations/us-central1/models/4839323408283992064@1
To use this Model in another session:
model = aiplatform.Model('projects/485739110011/locations/us-central1/models/4839323408283992064@1')


In [None]:
endpoint = model.deploy(
    machine_type="n1-standard-4"
)

Creating Endpoint
Create Endpoint backing LRO: projects/485739110011/locations/us-central1/endpoints/434112479413403648/operations/4941359410191532032
Endpoint created. Resource name: projects/485739110011/locations/us-central1/endpoints/434112479413403648
To use this Endpoint in another session:
endpoint = aiplatform.Endpoint('projects/485739110011/locations/us-central1/endpoints/434112479413403648')
Deploying model to Endpoint : projects/485739110011/locations/us-central1/endpoints/434112479413403648
Deploy Endpoint model backing LRO: projects/485739110011/locations/us-central1/endpoints/434112479413403648/operations/8074175900981133312


KeyboardInterrupt: 

In [28]:
# local cleanup
! rm -rf app/
! rm -rf test_request.py
! rm -rf Dockerfile
     