<a href="https://colab.research.google.com/github/obinnachike/Finetuning-Foundational-Model-on-Vertex-AI/blob/main/vertex_llm_finetuning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Model Tuning with Vertex AI Foundation Model

##Objective##

This lab shows how to tune a foundational model on new unseen data and you will use the following Google Cloud products:

Vertex AI Pipelines
Vertex AI Evaluation Services
Vertex AI Model Registry
Vertex AI Endpoints

##Use Case##

Using Generative AI we will generate a suitable TITLE for a news BODY from BBC FULLTEXT DATA (Sourced from BigQuery Public Dataset bigquery-public-data.bbc_news.fulltext). We will fine tune text-bison@002 to a new fine-tuned model called "bbc-news-summary-tuned" and compare the result with the response from the base model.

We will use this content of the BBC FULLTEXT DATA to a jsonl format using the frame in text-bison@002

In [1]:
!pip install google-cloud-bigquery pandas




In [2]:
pip install -U google-cloud-aiplatform




In [3]:
from google.colab import auth
auth.authenticate_user()
import IPython
from google.cloud import bigquery


In [4]:
project_id = "phonic-hydra-474723-k1"
client = bigquery.Client(project=project_id)


In [6]:
# LOAD DATA

import pandas as pd


# Load the data (limit rows for testing)
query = """
SELECT title, body
FROM `bigquery-public-data.stackoverflow.posts_questions`
WHERE title IS NOT NULL AND body IS NOT NULL
LIMIT 100
"""
df = client.query(query).to_dataframe()
df.head()


Unnamed: 0,title,body
0,Unable to start ypserv in Ubuntu,<p>I was trying to configure NIS master Server...
1,Error Running Stable Diffusion from the comman...,<p>I installed Stable Diffusion v1.4 by follow...
2,"Input contains NaN, infinity or a value too la...",<p>my dataframe does not contain NAN or infite...
3,here is my sample code written to attach scree...,<p>@Test\npublic void extentReports() throws I...
4,How can delete compilation in AWS Amplify,<p>I have a lot of compilations in AWS Amplify...


In [31]:
import json

with open("stackoverflow_gemini_correct.jsonl", "w", encoding="utf-8") as f:
    for idx, row in df.iterrows():
        input_text = str(row.get("body", "")).strip()
        output_text = str(row.get("title", "")).strip()

        if not input_text or not output_text:
            print(f"Skipping row {idx} — empty input or output")
            continue

        example = {
            "contents": [
                {
                    "role": "user",
                    "parts": [{"text": input_text}]
                },
                {
                    "role": "model",
                    "parts": [{"text": output_text}]
                }
            ]
        }

        f.write(json.dumps(example, ensure_ascii=False) + "\n")



In [27]:

#Import the necessary libraries

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
import warnings
warnings.filterwarnings('ignore')

import sys

import json
import vertexai
import pandas as pd


In [32]:
from google.cloud import aiplatform

PROJECT_ID = "phonic-hydra-474723-k1"
REGION = "us-central1"

aiplatform.init(project=PROJECT_ID, location=REGION)


In [33]:
from google.cloud import storage

# Define your variables
BUCKET_NAME = "my-ver-bucket"  # e.g. 'data-16-05-2024'
DESTINATION_BLOB_NAME = "TRAININGS1.jsonl"
SOURCE_FILE_NAME = "stackoverflow_gemini_correct.jsonl"

# Initialize GCS client and upload
client = storage.Client()
bucket = client.bucket(BUCKET_NAME)
blob = bucket.blob(DESTINATION_BLOB_NAME)
blob.upload_from_filename(SOURCE_FILE_NAME)

print(f"Uploaded to gs://{BUCKET_NAME}/{DESTINATION_BLOB_NAME}")




Uploaded to gs://my-ver-bucket/TRAININGS1.jsonl


In [34]:
import vertexai
from vertexai.tuning import sft

# Start training
sft_tuning_job = sft.train(
    source_model="gemini-2.0-flash-001",  # or gemini-1.5-pro
    train_dataset="gs://my-ver-bucket/TRAININGS1.jsonl",
)




INFO:vertexai.tuning._tuning:Creating SupervisedTuningJob
INFO:vertexai.tuning._tuning:SupervisedTuningJob created. Resource name: projects/88233143849/locations/us-central1/tuningJobs/7145409907783630848
INFO:vertexai.tuning._tuning:To use this SupervisedTuningJob in another session:
INFO:vertexai.tuning._tuning:tuning_job = sft.SupervisedTuningJob('projects/88233143849/locations/us-central1/tuningJobs/7145409907783630848')
INFO:vertexai.tuning._tuning:View Tuning Job:
https://console.cloud.google.com/vertex-ai/generative/language/locations/us-central1/tuning/tuningJob/7145409907783630848?project=88233143849


In [43]:
import time
# Polling for job completion
while not sft_tuning_job.has_ended:
    time.sleep(60)
    sft_tuning_job.refresh()

print(sft_tuning_job.tuned_model_name)
print(sft_tuning_job.tuned_model_endpoint_name)
print(sft_tuning_job.experiment)



projects/88233143849/locations/us-central1/models/6484720569018220544@1
projects/88233143849/locations/us-central1/endpoints/881292654522925056
<google.cloud.aiplatform.metadata.experiment_resources.Experiment object at 0x7a9405ff3ce0>


In [44]:
print("Job state:", sft_tuning_job.state)



Job state: 4


In [37]:
print(sft_tuning_job._gca_resource.error)





In [45]:
#Viewing a list of tuning jobs


responses = sft.SupervisedTuningJob.list()

for response in responses:
    print(response)

<vertexai.tuning._supervised_tuning.SupervisedTuningJob object at 0x7a9404d327b0> 
resource name: projects/88233143849/locations/us-central1/tuningJobs/7145409907783630848
<vertexai.tuning._supervised_tuning.SupervisedTuningJob object at 0x7a9404fceb10> 
resource name: projects/88233143849/locations/us-central1/tuningJobs/175808009451077632


In [51]:
print("Tuned model name:", sft_tuning_job.tuned_model_name)


Tuned model name: projects/88233143849/locations/us-central1/models/6484720569018220544@1


In [54]:

print(sft_tuning_job.resource_name)

projects/88233143849/locations/us-central1/tuningJobs/7145409907783630848


Predict with the new Fine Tuned Model

In [57]:
from vertexai.generative_models import GenerativeModel

content = "Summarize this text to generate a title: \n Ever noticed how plane seats appear to be getting smaller and smaller? With increasing numbers of people taking to the skies, some experts are questioning if having such packed out planes is putting passengers at risk."

sft_tuning_job = sft.SupervisedTuningJob("projects/88233143849/locations/us-central1/tuningJobs/7145409907783630848")
tuned_model = GenerativeModel(sft_tuning_job.tuned_model_endpoint_name)
response = tuned_model.generate_content(content)

print(response.text)





Passenger safety on planes


DELETING A TUNED MODEL

In [None]:
from google.cloud import aiplatform

aiplatform.init(project=PROJECT_ID, location=REGION)

# To find out which models are available in Model Registry
models = aiplatform.Model.list()

model = aiplatform.Model(MODEL_ID)
model.delete()