# Tuning and deploy a foundation model


**Learning Objective**

1. Learn how to generate a JSONL file for PaLM tuning
1. Learn how to launch a tuning job on Vertex Pipeline
1. Learn how to query you tuned LLM and evaluate it

Creating an LLM requires massive amounts of data, significant computing resources, and specialized skills. In this notebook, you'll learn how tuning allows you to customize a PaLM foundation model on Vertex Generative AI studio for more specific tasks or knowledge domains.

While the prompt design is excellent for quick experimentation, if training data is available, you can achieve higher quality by tuning the model. Tuning a model enables you to customize the model response based on examples of the task you want the model to perform.

For more details on tuning have a look at the [official documentation](https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models).

**Quota**: Tuning the `text-bison@001`  model uses the `tpu-v3-8` training resources and the accompanying quotas from your Google Cloud project. Each project has a default quota of eight v3-8 cores, which allows for one to two concurrent tuning jobs. If you want to run more concurrent jobs you need to request additional quota via the [Quotas page](https://console.cloud.google.com/iam-admin/quotas).

**Costs:** This tutorial uses billable a component of Google Cloud `Vertex AI Generative AI Studio`.
Learn about [Vertex AI pricing](https://cloud.google.com/vertex-ai/pricing),
and use the [Pricing Calculator](https://cloud.google.com/products/calculator/)
to generate a cost estimate based on your projected usage.

## Setup

In [None]:
import IPython

# The version of google-cloud-aiplatform needs to be >= 1.33.0
!pip install --upgrade --user \
    google-cloud-aiplatform \
    sequence-evaluate sentence-transformers \
    rouge

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

In [45]:
import time

import pandas as pd
from google.cloud import aiplatform, bigquery
from seq_eval import SeqEval
from sklearn.model_selection import train_test_split
from vertexai.preview.language_models import TextGenerationModel

In [29]:
REGION = "us-central1"
PROJECT_ID = !(gcloud config get-value project)
PROJECT_ID = PROJECT_ID[0]
BUCKET_NAME = PROJECT_ID
BUCKET_URI = f"gs://{BUCKET_NAME}"

In [4]:
!gsutil ls $BUCKET_URI || gsutil mb -l $REGION -p $PROJECT_ID $BUCKET_URI

gs://dherin-dev/cord19_embeddings.json
gs://dherin-dev/salads.csv
gs://dherin-dev/tune_data_stack_overflow_python_qa.jsonl
gs://dherin-dev/115851500182/
gs://dherin-dev/7737964263322419200-616112577574862848/
gs://dherin-dev/babyweight/
gs://dherin-dev/babyweight_220707_021136/
gs://dherin-dev/babyweight_220707_021151/
gs://dherin-dev/babyweight_220707_021154/
gs://dherin-dev/car_damage_lab_images/
gs://dherin-dev/classification-bert-20230411003650/
gs://dherin-dev/contextual_bandit_checkpoints/
gs://dherin-dev/covertype/
gs://dherin-dev/models/
gs://dherin-dev/movies/
gs://dherin-dev/staging/
gs://dherin-dev/taxifare-20230710171207/
gs://dherin-dev/taxifare-20230710191151/
gs://dherin-dev/taxifare/


## Tune your Model

Now it's time for you to create a tuning job. Tune a foundation model by creating a pipeline job using Generative AI Studio, cURL, or the Python SDK. In this notebook, we will be using the Python SDK. You will be using a Q&A with a context dataset in JSON format.

### Training Data
💾 Your model tuning dataset must be in a JSONL format where each line contains a single training example. You must make sure that you include instructions.

You will use the StackOverflow data on BigQuery Public Datasets, limiting to questions with the `python` tag, and accepted answers for answers since 2020-01-01.

First create a helper function to let you easily query BigQuery and return the results as a Pandas DataFrame.

In [4]:
def run_bq_query(sql):
    bq_client = bigquery.Client()

    # Try dry run before executing query to catch any errors
    job_config = bigquery.QueryJobConfig(dry_run=True, use_query_cache=False)
    bq_client.query(sql, job_config=job_config)

    # If dry run succeeds without errors, proceed to run query
    job_config = bigquery.QueryJobConfig()
    client_result = bq_client.query(sql, job_config=job_config)

    job_id = client_result.job_id

    # Wait for query/job to finish running. then get & return data frame
    df = client_result.result().to_arrow().to_pandas()
    print(f"Finished job_id: {job_id}")

    return df

Next define the query.

In [56]:
query = """
SELECT CONCAT(q.title, q.body) as input_text, a.body AS output_text
FROM
    `bigquery-public-data.stackoverflow.posts_questions` q
JOIN
    `bigquery-public-data.stackoverflow.posts_answers` a
ON
    q.accepted_answer_id = a.id
WHERE
    q.accepted_answer_id IS NOT NULL AND
    REGEXP_CONTAINS(q.tags, "python") AND
    a.creation_date >= "2020-01-01"
LIMIT
    1000
"""

In [57]:
df = run_bq_query(query)
df.head()

Finished job_id: 439a8a5f-91d6-477d-8a6d-4d13d2555b36


Unnamed: 0,input_text,output_text
0,append dataframe in nested loop<p>I have the f...,<p>I am not entirely sure if I understand your...
1,Python pandas find element of one column in li...,<p>You can do <code>apply</code>:</p>\n<pre><c...
2,How to add a minimum value constraint in Pyomo...,<p>figured it out. The two methods I described...
3,Producing Buffer Radius Polygons - Possible Pr...,<p>This is apparently an issue with <code>geov...
4,SMOTE for balancing data<p>I am trying to trai...,<p>You haven't given enough of your code or da...


There should be 1000 questions and answers.

In [58]:
print(len(df))

1000


Lets split the data into training and evalation. For Extractive Q&A tasks we advise 100+ training examples. In this case you will use 800.

In [59]:
# split is set to 80/20
train, evaluation = train_test_split(df, test_size=0.2)
print(len(train))

800


For tuning, the training data first needs to be converted into a JSONL format.

In [60]:
training_data_filename = "tune_data_stack_overflow_python_qa.jsonl"
train.to_json(training_data_filename, orient="records", lines=True)

In [65]:
!head -n 1 $training_data_filename

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
{"input_text":"Assignment operator overloading in python Abstract Syntax Trees<p>I want to overload assignment operator in python on the fly using <a href=\"https:\/\/docs.python.org\/3\/library\/ast.html\" rel=\"nofollow noreferrer\">Abstract Syntax Trees<\/a><\/p>\n<pre><code>import ast\nimport astunparse\n\nclass OverloadAssignments(ast.NodeTransformer):\n    def visit_Assign(self, node):\n        if isinstance(node, ast.Assign) and node.targets:\n            funcs = node.targets[0]\n            slot_name_candidate = astunparse.unparse(funcs).strip()\n            if isinstance(funcs, ast.Name) and &quot;_slot&quot; in slot_name_candidate:\n                slot_name = ast.Constant(value=slot_name_candidate

You can then export the local file to GCS, so that it can be used by Vertex AI for the tuning job.

In [66]:
!gsutil cp $training_data_filename $BUCKET_URI

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
Copying file://tune_data_stack_overflow_python_qa.jsonl [Content-Type=application/octet-stream]...
/ [1 files][  2.3 MiB/  2.3 MiB]                                                
Operation completed over 1 objects/2.3 MiB.                                      


You can check to make sure that the file successfully transferred to your Google Cloud Storage bucket:

In [67]:
TRAINING_DATA_URI = f"{BUCKET_URI}/{training_data_filename}"
!gsutil ls -al $TRAINING_DATA_URI

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
   2410384  2023-09-20T00:36:52Z  gs://dherin-dev/tune_data_stack_overflow_python_qa.jsonl#1695170212151253  metageneration=1
TOTAL: 1 objects, 2410384 bytes (2.3 MiB)


### Model Tuning
Now it's time to start to tune a model. You will use the Vertex AI SDK to submit our tuning job.

#### Recommended Tuning Configurations
✅ Here are some recommended configurations for tuning a foundation model based on the task, in this example Q&A. You can find more in the [documentation](https://cloud.google.com/vertex-ai/docs/generative-ai/models/tune-models).

Extractive QA:
- Make sure that your train dataset size is 100+
- Training steps [100-500]. You can try more than one value to get the best performance on a particular dataset (e.g. 100, 200, 500)

In [68]:
aiplatform.init(project=PROJECT_ID, location=REGION)

model = TextGenerationModel.from_pretrained("text-bison@001")

Next it's time to start your tuning job. 

**Disclaimer:** tuning and deploying a model takes time.

In [None]:
TRAIN_STEPS = 500
MODEL_NAME = f"asl-palm-text-tuned-model-{time.time()}"
print("Model name:", MODEL_NAME)

model.tune_model(
    training_data=TRAINING_DATA_URI,
    model_display_name=MODEL_NAME,
    train_steps=TRAIN_STEPS,
    # Tuning can only happen in the "europe-west4" location
    tuning_job_location="europe-west4",
    # Model can only be deployed in the "us-central1" location
    tuned_model_location="us-central1",
)

Model name: asl-palm-text-tuned-model-1695170250.4494693
Creating PipelineJob
PipelineJob created. Resource name: projects/115851500182/locations/europe-west4/pipelineJobs/tune-large-model-20230920003730
To use this PipelineJob in another session:
pipeline_job = aiplatform.PipelineJob.get('projects/115851500182/locations/europe-west4/pipelineJobs/tune-large-model-20230920003730')
View Pipeline Job:
https://console.cloud.google.com/vertex-ai/locations/europe-west4/pipelines/runs/tune-large-model-20230920003730?project=115851500182
PipelineJob projects/115851500182/locations/europe-west4/pipelineJobs/tune-large-model-20230920003730 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/115851500182/locations/europe-west4/pipelineJobs/tune-large-model-20230920003730 current state:
PipelineState.PIPELINE_STATE_RUNNING
PipelineJob projects/115851500182/locations/europe-west4/pipelineJobs/tune-large-model-20230920003730 current state:
PipelineState.PIPELINE_STATE_RUNNING
Pi

## Retrieve the foundational model from Vertex AI Model registry

When your tuning job is finished, your model will be available on Vertex AI Model Registry. The following Python SDK sample shows you how to list tuned models.

In [50]:
model = TextGenerationModel.from_pretrained("text-bison@001")
model.list_tuned_model_names()

['projects/115851500182/locations/us-central1/models/7558911543518167040']

You can also use the Google Cloud Console UI to view all of your model in [Vertex AI Model Registry](https://console.cloud.google.com/vertex-ai/models?). Below you can see an example of a tuned foundational model available on Vertex AI Model Registry.

Now it's time to get predictions. First you need to get the latest tuned model from the Vertex AI Model registry.

In [19]:
deployed_model = TextGenerationModel.get_tuned_model(
    model.list_tuned_model_names()[0]
)

Now you can start send a prompt to the API. Feel free to update the following prompt.

In [26]:
PROMPT = """
How can I store my TensorFlow checkpoint on Google Cloud Storage?

Python example:

"""

print(deployed_model.predict(PROMPT))

```python
import tensorflow as tf

# Create a GCS bucket
bucket = tf.gfile.GFile('gs://my-bucket/', 'w')

# Create a checkpoint directory
checkpoint_dir = 'gs://my-bucket/checkpoints/'

# Create a checkpoint file
checkpoint_file = os.path.join(checkpoint_dir, 'checkpoint')

# Create a saver
saver = tf.train.Saver()

# Save the checkpoint
saver.save(sess, checkpoint_file)

# Restore the


Next you will generate the evaluation metrics. `evaluator.evaluate` will return a few eval metrics. Some of the important ones are:
- [Blue](https://en.wikipedia.org/wiki/BLEU): The BLEU evaluation metric is a measure of the similarity between a machine-generated text and a human-written reference text.
- [Rouge](https://en.wikipedia.org/wiki/ROUGE_(metric)): The ROUGE evaluation metric is a measure of the overlap between a machine-generated text and a human-written reference text.

## Evaluation
It's essential to evaluate your model to understand its performance. Evaluation can be done in an automated way using evaluation metrics like F1 or Rouge. You can also leverage human evaluation methods. Human evaluation methods involve asking humans to rate the quality of the LLM's answers. This can be done through crowdsourcing or by having experts evaluate the responses. Some standard human evaluation metrics include fluency, coherence, relevance, and informativeness. Often you want to choose a mix of evaluation metrics to get a good understanding of your model performance. Below you will find an example of how you can do the evaluation.

In this example you will be using [sequence-evaluate](https://pypi.org/project/sequence-evaluate/) to evaluation the tuned model.

Earlier in the notebook, you created a train and eval dataset. Now it's time to take some of the eval data. You will use the questions to get a response from our tuned model, and the answers we will use as a reference:

- **Candidates**: Answers generated by the tuned model.
- **References**: Original answers that we will use to compare.

In [52]:
# you can change the number of rows you want to use
EVAL_ROWS = 60

evaluation = evaluation.head(EVAL_ROWS)
evaluation_question = evaluation.input_text
evaluation_answer = evaluation.output_text

In [53]:
def evaluate_model(model, eval_input, eval_output):
    candidates = []

    for i in eval_input:
        response = model.predict(i)
        candidates.append(response.text)
    references = eval_output.tolist()

    evaluator = SeqEval()
    return evaluator.evaluate(candidates, references, verbose=False)

Now we can evaluate the tunned model

In [54]:
evaluate_model(deployed_model, evaluation_question, evaluation_answer)

{'bleu_1': 0.04047520756830512,
 'bleu_2': 0.015100714783626129,
 'bleu_3': 0.008332257944719989,
 'bleu_4': 0.004868503649911386,
 'rouge_1_precision': 0.2226664727278332,
 'rouge_1_recall': 0.08248341451392938,
 'rouge_1_f1': 0.11105924745988842,
 'rouge_2_precision': 0.02592229901067698,
 'rouge_2_recall': 0.01139208073925231,
 'rouge_2_f1': 0.01428915384614036,
 'rouge_l_precision': 0.20558055140278145,
 'rouge_l_recall': 0.07492196202502902,
 'rouge_l_f1': 0.10178188164640203,
 'inter_dist1': 0.02047382269530938,
 'inter_dist2': 0.1481372832263503,
 'intra_dist1': 0.11618918174622357,
 'intra_dist2': 0.4187750753268838,
 'semantic_textual_similarity': 0.4033529758453369}

And we can also compare it to the untuned model:

In [55]:
evaluate_model(model, evaluation_question, evaluation_answer)

{'bleu_1': 0.1003128560268671,
 'bleu_2': 0.05564918804413014,
 'bleu_3': 0.04236881534645315,
 'bleu_4': 0.034599527052774505,
 'rouge_1_precision': 0.267516380123697,
 'rouge_1_recall': 0.1558227088368697,
 'rouge_1_f1': 0.17846678760532284,
 'rouge_2_precision': 0.07045565520237056,
 'rouge_2_recall': 0.04442757694288905,
 'rouge_2_f1': 0.04805766823189456,
 'rouge_l_precision': 0.2548800164873334,
 'rouge_l_recall': 0.14979437731505252,
 'rouge_l_f1': 0.17098167425295888,
 'inter_dist1': 0.016247833586984978,
 'inter_dist2': 0.12961354726527693,
 'intra_dist1': 0.09521129488653601,
 'intra_dist2': 0.3913011373479328,
 'semantic_textual_similarity': 0.5161213874816895}

If the score for the tunned model are lower than the original foundation model, you'll need to increase the size the of tuning set, and possibly modify the number of steps you are using for tuning.

## Acknowledgement 

This notebook is adapted from a [tutorial](https://github.com/GoogleCloudPlatform/generative-ai/blob/main/language/tuning/getting_started_tuning.ipynb)
written by Polong Lin.

Copyright 2023 Google LLC

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

     https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.