## Wikilingua Dataset

The dataset includes article and summary pairs from WikiHow. It consists of  article-summary pairs in multiple languages. Refer to the following [github repository](https://github.com/esdurmus/Wikilingua) for more details.

For this notebook, we have picked `english` language dataset.

### Dataset Citation

```
@inproceedings{ladhak-wiki-2020,
    title={WikiLingua: A New Benchmark Dataset for Multilingual Abstractive Summarization},
    author={Faisal Ladhak, Esin Durmus, Claire Cardie and Kathleen McKeown},
    booktitle={Findings of EMNLP, 2020},
    year={2020}
}
```

### Use Cases
* Explainability
* Output in Particular Format such as JSON or Template (Structured Documents)


## Getting Started

### Install Vertex AI SDK and other required packages

In [3]:
%pip install --upgrade --user --quiet google-cloud-aiplatform rouge_score plotly jsonlines

Note: you may need to restart the kernel to use updated packages.


## Step1: Import Libraries

In [1]:
import time

# For extracting vertex experiment details.
from google.cloud import aiplatform
from google.cloud.aiplatform.metadata import context
from google.cloud.aiplatform.metadata import utils as metadata_utils
import jsonlines

# For data handling.
import pandas as pd

# For visualization.
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# For evaluation metric computation.
from rouge_score import rouge_scorer
from tqdm import tqdm

# For fine tuning Gemini model.
import vertexai
from vertexai.generative_models import (
    GenerationConfig,
    GenerativeModel,
    HarmBlockThreshold,
    HarmCategory,
)
from vertexai.preview.tuning import sft

hi


## Step2: Set Google Cloud project information and initialize Vertex AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com).

Learn more about [setting up a project and a development environment](https://cloud.google.com/vertex-ai/docs/start/cloud-environment).


In [2]:
PROJECT_ID = "cryptic-skyline-411516"
LOCATION = "us-central1"

vertexai.init(project=PROJECT_ID, location=LOCATION)

## Step3: Create Dataset in correct format

The dataset used to tune a foundation model needs to include examples that align with the task that you want the model to perform. Structure your training dataset in a text-to-text format. Each record, or row, in the dataset contains the input text (also referred to as the prompt) which is paired with its expected output from the model. Supervised tuning uses the dataset to teach the model to mimic a behavior, or task, you need by giving it hundreds of examples that illustrate that behavior.

Your dataset size depends on the task, and follows the recommendation mentioned in the `Overview` section. The more examples you provide in your dataset, the better the results.

### Dataset format

Training data should be structured within a JSONL file located at a Google Cloud Storage (GCS) URI. Each line (or row) of the JSONL file must adhere to a specific schema: It should contain a `contents` array, with objects inside defining a `role` (either "user" for user input or "model" for model output) and `parts`, containing the input data. For example, a valid data row would look like this:


```
{
   "contents":[
      {
         "role":"user",  # This indicate input content
         "parts":[
            {
               "text":"How are you?"
            }
         ]
      },
      {
         "role":"model", # This indicate target content
         "parts":[ # text only
            {
               "text":"I am good, thank you!"
            }
         ]
      }
      #  ... repeat "user", "model" for multi turns.
   ]
}
```


Refer to the public [documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini-supervised-tuning-prepare#about-datasets) for more details.

To run a tuning job, you need to upload one or more datasets to a Cloud Storage bucket. You can either create a new Cloud Storage bucket or use an existing one to store dataset files. The region of the bucket doesn't matter, but we recommend that you use a bucket that's in the same Google Cloud project where you plan to tune your model.

### Step3 [a]: Create a Cloud Storage bucket

Create a storage bucket to store intermediate artifacts such as datasets.


In [6]:
# Provide a bucket name
BUCKET_NAME = "training_data"  # @param {type:"string"}
BUCKET_URI = f"gs://{BUCKET_NAME}_{PROJECT_ID}"

Only if your bucket doesn't already exist: Run the following cell to create your Cloud Storage bucket.


In [8]:
! gsutil mb -l {LOCATION} -p {PROJECT_ID} {BUCKET_URI}

Creating gs://training_data_cryptic-skyline-411516/...
ServiceException: 409 A Cloud Storage bucket named 'training_data_cryptic-skyline-411516' already exists. Try another name. Bucket names must be globally unique across all Google Cloud projects, including those outside of your organization.


### Step3 [b]: Upload tuning data to Cloud Storage

- Data used in this notebook is present in the public Google Cloud Storage(GCS) bucket.
- It's in Gemini 1.0 finetuning dataset format

In [9]:
!gsutil ls gs://github-repo/generative-ai/gemini/tuning/summarization/wikilingua

gs://github-repo/generative-ai/gemini/tuning/summarization/wikilingua/
gs://github-repo/generative-ai/gemini/tuning/summarization/wikilingua/sft_test_samples.csv
gs://github-repo/generative-ai/gemini/tuning/summarization/wikilingua/sft_train_samples.jsonl
gs://github-repo/generative-ai/gemini/tuning/summarization/wikilingua/sft_val_samples.jsonl


In [10]:
!gsutil cp gs://github-repo/generative-ai/gemini/tuning/summarization/wikilingua/* .

Copying gs://github-repo/generative-ai/gemini/tuning/summarization/wikilingua/sft_test_samples.csv...
/ [0 files][    0.0 B/235.8 KiB]                                                
/ [1 files][235.8 KiB/235.8 KiB]                                                
Copying gs://github-repo/generative-ai/gemini/tuning/summarization/wikilingua/sft_train_samples.jsonl...
/ [1 files][235.8 KiB/  1.4 MiB]                                                
-
- [2 files][  1.4 MiB/  1.4 MiB]                                                
Copying gs://github-repo/generative-ai/gemini/tuning/summarization/wikilingua/sft_val_samples.jsonl...
- [2 files][  1.4 MiB/  1.6 MiB]                                                
\
\ [3 files][  1.6 MiB/  1.6 MiB]                                                

Operation completed over 3 objects/1.6 MiB.                                      


#### Convert Gemini 1.0 tuning dataset to Gemini 1.5 tuning dataset format

In [12]:
def save_jsonlines(file, instances):
    """
    Saves a list of json instances to a jsonlines file.
    """
    with jsonlines.open(file, mode="w") as writer:
        writer.write_all(instances)

In [14]:
def create_tuning_samples(file_path):
    """
    Creates tuning samples from a file.
    """
    with jsonlines.open(file_path) as reader:
        instances = []
        for obj in reader:
            instance = []
            for content in obj["messages"]:
                instance.append(
                    {"role": content["role"], "parts": [{"text": content["content"]}]}
                )
            instances.append({"contents": instance})
    return instances

In [15]:
train_file = "sft_train_samples.jsonl"
train_instances = create_tuning_samples(train_file)
len(train_instances)

500

In [16]:
# save the training instances to jsonl file
save_jsonlines(train_file, train_instances)

In [17]:
val_file = "sft_val_samples.jsonl"
val_instances = create_tuning_samples(val_file)
len(val_instances)

100

In [18]:
# save the validation instances to jsonl file
save_jsonlines(val_file, val_instances)

In [19]:
# Copy the tuning and evaluation data to your bucket.
!gsutil cp {train_file} {BUCKET_URI}/train/
!gsutil cp {val_file} {BUCKET_URI}/val/

Copying file://sft_train_samples.jsonl [Content-Type=application/octet-stream]...
/ [0 files][    0.0 B/  1.2 MiB]                                                
-
- [0 files][960.0 KiB/  1.2 MiB]                                                
- [1 files][  1.2 MiB/  1.2 MiB]                                                
\

Operation completed over 1 objects/1.2 MiB.                                      
Copying file://sft_val_samples.jsonl [Content-Type=application/octet-stream]...
/ [0 files][    0.0 B/231.9 KiB]                                                
/ [1 files][231.9 KiB/231.9 KiB]                                                
-

Operation completed over 1 objects/231.9 KiB.                                    


### Step3 [c]: Test dataset

- It contains document text(`input_text`) and corresponding reference summary(`output_text`), which will be compared with the model generated summary

In [20]:
# Load the test dataset using pandas as it's in the csv format.
testing_data_path = "sft_test_samples.csv"
test_data = pd.read_csv(testing_data_path)
test_data.head()

Unnamed: 0,input_text,output_text
0,Hold your arm out flat in front of you with yo...,Squeeze a line of lotion onto the tops of both...
1,"As you continue playing, surviving becomes pai...",Make a Crock Pot for better food. Create an Al...
2,Go to https://www.4kdownload.com/products/prod...,Download the 4K Video Downloader setup file. I...
3,You should know that vaginoplasty can treat a ...,Consider the health of your bladder. Find a so...
4,If you want to gather data on the frequency of...,Gather data to be graphed. Choose your range b...


In [21]:
test_data.loc[0, "input_text"]

'Hold your arm out flat in front of you with your elbow bent. The top of your forearm should form a level surface. Apply a line of lotion from the back of your hand up your arm almost to the crease of your elbow. Squeeze lotion onto both forearms.  Do not rub the lotion into your arms, rather let it sit on your arm in the line you squeezed. You can use as much or as little lotion as you feel is necessary to cover your back completely. Bend your elbows and reach both of your arms behind you, placing the lotion covered forearms against your back. Depending on how flexible you are, this may hurt a little. It might be easier to place one arm behind your back at a time. If you have shoulder pain or are not very flexible, this method may not work well for you. Rub your forearms and the backs of your hands up and down your back like windshield wipers covering as much of your back as you can. You can use your left arm first to cover your left side and then place your right arm behind and use i

In [28]:
# Article summary stats
stats = test_data["output_text"].apply(len).describe()
stats

count    100.000000
mean     186.230000
std       92.788655
min       28.000000
25%      127.250000
50%      171.000000
75%      227.000000
max      577.000000
Name: output_text, dtype: float64

In [29]:
print(f"Total `{stats['count']}` test records")
print(f"Average length is `{stats['mean']}` and max is `{stats['max']}` characters")
print("\nConsidering 1 token = 4 chars")

# Get ceil value of the tokens required.
tokens = (stats["max"] / 4).__ceil__()
print(
    f"\nSet max_token_length = stats['max']/4 = {stats['max']/4} ~ {tokens} characters"
)
print(f"\nLet's keep output tokens upto `{tokens}`")

Total `100.0` test records
Average length is `186.23` and max is `577.0` characters

Considering 1 token = 4 chars

Set max_token_length = stats['max']/4 = 144.25 ~ 145 characters

Let's keep output tokens upto `145`


In [30]:
# Maximum number of tokens that can be generated in the response by the LLM.
# Experiment with this number to get optimal output.
max_output_tokens = tokens

## Step4: Load model

The following Gemini text models support supervised tuning:

* `gemini-1.5-pro-002`

In [31]:
base_model = "gemini-1.5-flash-002"
generation_model = GenerativeModel(base_model)

## Step5: Test the Gemini model

### Generation config

- Each call that you send to a model includes parameter values that control how the model generates a response. The model can generate different results for different parameter values
- <strong>Experiment</strong> with different parameter values to get the best values for the task

Refer to the following [link](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/adjust-parameter-values) for understanding different parameters

**Prompt** is a natural language request submitted to a language model to receive a response back

Some best practices include
  - Clearly communicate what content or information is most important
  - Structure the prompt:
    - Defining the role if using one. For example, You are an experienced UX designer at a top tech company
    - Include context and input data
    - Provide the instructions to the model
    - Add example(s) if you are using them

Refer to the following [link](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/prompts/prompt-design-strategies) for prompt design strategies.

Wikilingua data contains the following task prompt at the end of the article, `Provide a summary of the article in two or three sentences:`

In [33]:
test_doc = test_data.loc[0, "input_text"]

prompt = f"""
{test_doc}
"""

generation_config = GenerationConfig(
    temperature=0.1,
    max_output_tokens=max_output_tokens,
)

response = generation_model.generate_content(
    contents=prompt, generation_config=generation_config
).text
print(response)

This describes a method for applying lotion to one's back using one's forearms as applicators.  Lotion is applied to the forearms, then the forearms are used to rub the lotion onto the back in a windshield wiper motion.  This technique may not be suitable for everyone, particularly those with limited flexibility or shoulder pain.



In [34]:
# Ground truth
test_data.loc[0, "output_text"]

'Squeeze a line of lotion onto the tops of both forearms and the backs of your hands. Place your arms behind your back. Move your arms in a windshield wiper motion.'

## Step6: Evaluation before model tuning

- Evaluate the Gemini model on the test dataset before tuning it on the training dataset.

In [35]:
# Convert the pandas dataframe to records (list of dictionaries).
corpus = test_data.to_dict(orient="records")
# Check number of records.
len(corpus)

100

### Evaluation metric

The type of metrics used for evaluation depends on the task that you are evaluating. The following table shows the supported tasks and the metrics used to evaluate each task:

| Task             | Metric(s)                     |
|-----------------|---------------------------------|
| Classification   | Micro-F1, Macro-F1, Per class F1 |
| Summarization    | ROUGE-L                         |
| Question Answering | Exact Match                     |
| Text Generation  | BLEU, ROUGE-L                   |


<br/>

Refer to this [documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/models/evaluate-models) for metric based evaluation.

- **Recall-Oriented Understudy for Gisting Evaluation (ROUGE)**: A metric used to evaluate the quality of automatic summaries of text. It works by comparing a generated summary to a set of reference summaries created by humans.

Now you can take the candidate and reference to evaluate the performance. In this case, ROUGE will give you:

- `rouge-1`, which measures unigram overlap
- `rouge-2`, which measures bigram overlap
- `rouge-l`, which measures the longest common subsequence

#### *Recall vs. Precision*

**Recall**, meaning it prioritizes how much of the information in the reference summaries is captured in the generated summary.

**Precision**, which measures how much of the generated summary is relevant to the original text.

<strong>Alternate Evaluation method</strong>: Check out the [AutoSxS](https://cloud.google.com/vertex-ai/generative-ai/docs/models/side-by-side-eval) evaluation for automatic evaluation of the task.


In [36]:
# Create rouge_scorer object for evaluation
scorer = rouge_scorer.RougeScorer(["rouge1", "rouge2", "rougeL"], use_stemmer=True)

In [86]:
def run_evaluation(model: GenerativeModel, corpus: list[dict]) -> pd.DataFrame:
    """Runs evaluation for the given model and data.

    Args:
      model: The generation model.
      corpus: The test data.

    Returns:
      A pandas DataFrame containing the evaluation results.
    """
    records = []
    for item in tqdm(corpus):
        document = item.get("input_text")
        summary = item.get("output_text")

        # Catch any exception that occur during model evaluation.
        try:
            response = model.generate_content(
                document,
                generation_config=generation_config,
                safety_settings={
                    HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_NONE,
                    HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_NONE,
                    HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_NONE,
                    HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_NONE,
                },
            )
            print(response)

            # Check if response is generated by the model, if response is empty then continue to next item.
            if not (
                response
                and response.candidates
                and response.candidates[0].content.parts
            ):
                print(
                    f"\nModel has blocked the response for the document.\n Response: {response}\n Document: {document}"
                )
                continue

            # Calculates the ROUGE score for a given reference and generated summary.
            scores = scorer.score(target=summary, prediction=response.text)

            # Append the results to the records list
            records.append(
                {
                    "document": document,
                    "summary": summary,
                    "generated_summary": response.text,
                    "scores": scores,
                    "rouge1_precision": scores.get("rouge1").precision,
                    "rouge1_recall": scores.get("rouge1").recall,
                    "rouge1_fmeasure": scores.get("rouge1").fmeasure,
                    "rouge2_precision": scores.get("rouge2").precision,
                    "rouge2_recall": scores.get("rouge2").recall,
                    "rouge2_fmeasure": scores.get("rouge2").fmeasure,
                    "rougeL_precision": scores.get("rougeL").precision,
                    "rougeL_recall": scores.get("rougeL").recall,
                    "rougeL_fmeasure": scores.get("rougeL").fmeasure,
                }
            )
        except AttributeError as attr_err:
            print("Attribute Error:", attr_err)
            continue
        except Exception as err:
            print("Error:", err)
            continue
    return pd.DataFrame(records)

In [46]:
# Batch of test data.
corpus_batch = corpus[:100]

<div class="alert alert-block alert-warning">
<b>⚠️ It will take ~5 mins for the evaluation run on the provided batch. ⚠️</b>
</div>

In [87]:
## debug with single item
evaluation_df = run_evaluation(generation_model, [corpus_batch[0]])

  0%|          | 0/1 [00:00<?, ?it/s]

100%|██████████| 1/1 [00:01<00:00,  1.07s/it]

candidates {
  content {
    role: "model"
    parts {
      text: "This describes a method for applying lotion to one\'s back using one\'s forearms as makeshift applicators.  Lotion is applied to the forearms, then the forearms are used to rub the lotion onto the back in a windshield wiper motion.  This technique may not be suitable for everyone, particularly those with limited flexibility or shoulder pain.\n"
    }
  }
  avg_logprobs: -0.028314865297741361
  finish_reason: STOP
  safety_ratings {
    category: HARM_CATEGORY_HATE_SPEECH
    probability: NEGLIGIBLE
    probability_score: 0.284576565
    severity: HARM_SEVERITY_LOW
    severity_score: 0.288568705
  }
  safety_ratings {
    category: HARM_CATEGORY_DANGEROUS_CONTENT
    probability: NEGLIGIBLE
    probability_score: 0.201813355
    severity: HARM_SEVERITY_MEDIUM
    severity_score: 0.444565088
  }
  safety_ratings {
    category: HARM_CATEGORY_HARASSMENT
    probability: NEGLIGIBLE
    probability_score: 0.321673483
    s




In [48]:
# Run evaluation using loaded model and test data corpus
evaluation_df = run_evaluation(generation_model, corpus_batch)

  0%|          | 0/100 [00:00<?, ?it/s]

 41%|████      | 41/100 [00:33<00:29,  2.01it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 43%|████▎     | 43/100 [00:33<00:18,  3.05it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 44%|████▍     | 44/100 [00:33<00:16,  3.46it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 47%|████▋     | 47/100 [00:34<00:11,  4.68it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 49%|████▉     | 49/100 [00:34<00:10,  5.07it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 51%|█████     | 51/100 [00:35<00:08,  5.47it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 53%|█████▎    | 53/100 [00:35<00:08,  5.77it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 54%|█████▍    | 54/100 [00:35<00:07,  5.89it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 57%|█████▋    | 57/100 [00:36<00:07,  5.66it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 59%|█████▉    | 59/100 [00:36<00:07,  5.50it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 61%|██████    | 61/100 [00:36<00:06,  5.79it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 63%|██████▎   | 63/100 [00:37<00:06,  5.76it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 65%|██████▌   | 65/100 [00:37<00:05,  6.18it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 67%|██████▋   | 67/100 [00:37<00:05,  6.22it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 69%|██████▉   | 69/100 [00:38<00:04,  6.21it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 71%|███████   | 71/100 [00:38<00:04,  5.95it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 73%|███████▎  | 73/100 [00:38<00:04,  5.89it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 75%|███████▌  | 75/100 [00:39<00:04,  5.93it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 77%|███████▋  | 77/100 [00:39<00:03,  5.96it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 79%|███████▉  | 79/100 [00:39<00:03,  6.19it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 81%|████████  | 81/100 [00:40<00:03,  6.15it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 83%|████████▎ | 83/100 [00:40<00:02,  5.87it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 85%|████████▌ | 85/100 [00:40<00:02,  6.02it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 87%|████████▋ | 87/100 [00:41<00:02,  6.08it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 89%|████████▉ | 89/100 [00:41<00:01,  5.96it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 91%|█████████ | 91/100 [00:41<00:01,  5.97it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 93%|█████████▎| 93/100 [00:42<00:01,  6.06it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 95%|█████████▌| 95/100 [00:42<00:00,  6.07it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 97%|█████████▋| 97/100 [00:42<00:00,  6.02it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 99%|█████████▉| 99/100 [00:43<00:00,  5.96it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


100%|██████████| 100/100 [00:43<00:00,  2.31it/s]


Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


In [53]:
evaluation_df.head()

Unnamed: 0,document,summary,generated_summary,scores,rouge1_precision,rouge1_recall,rouge1_fmeasure,rouge2_precision,rouge2_recall,rouge2_fmeasure,rougeL_precision,rougeL_recall,rougeL_fmeasure
0,Hold your arm out flat in front of you with yo...,Squeeze a line of lotion onto the tops of both...,This describes a method for applying lotion to...,"{'rouge1': (0.22807017543859648, 0.41935483870...",0.22807,0.419355,0.295455,0.125,0.233333,0.162791,0.192982,0.354839,0.25


In [54]:
evaluation_df_stats = evaluation_df.dropna().describe()

In [55]:
# Statistics of the evaluation dataframe.
evaluation_df_stats

Unnamed: 0,rouge1_precision,rouge1_recall,rouge1_fmeasure,rouge2_precision,rouge2_recall,rouge2_fmeasure,rougeL_precision,rougeL_recall,rougeL_fmeasure
count,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
mean,0.22807,0.419355,0.295455,0.125,0.233333,0.162791,0.192982,0.354839,0.25
std,,,,,,,,,
min,0.22807,0.419355,0.295455,0.125,0.233333,0.162791,0.192982,0.354839,0.25
25%,0.22807,0.419355,0.295455,0.125,0.233333,0.162791,0.192982,0.354839,0.25
50%,0.22807,0.419355,0.295455,0.125,0.233333,0.162791,0.192982,0.354839,0.25
75%,0.22807,0.419355,0.295455,0.125,0.233333,0.162791,0.192982,0.354839,0.25
max,0.22807,0.419355,0.295455,0.125,0.233333,0.162791,0.192982,0.354839,0.25


In [56]:
print("Mean rougeL_precision is", evaluation_df_stats.rougeL_precision["mean"])

Mean rougeL_precision is 0.19298245614035087


## Step7: Fine-tune the Model

 - `source_model`: Specifies the base Gemini model version you want to fine-tune.
 - `train_dataset`: Path to your training data in JSONL format.

  *Optional parameters*
 - `validation_dataset`: If provided, this data is used to evaluate the model during tuning.
 - `tuned_model_display_name`: Display name for the tuned model.
 - `epochs`: The number of training epochs to run.
 - `learning_rate_multiplier`: A value to scale the learning rate during training.
 - `adapter_size` : Gemini 1.5 Pro supports Adapter length [1, 4], default value is 4.

**Note: The default hyperparameter settings are optimized for optimal performance based on rigorous testing and are recommended for initial use. Users may customize these parameters to address specific performance requirements.**

In [57]:
tuned_model_display_name = "test_model_Wikilingua"  # @param {type:"string"}

# Tune a model using `train` method.
sft_tuning_job = sft.train(
    source_model=base_model,
    train_dataset=f"{BUCKET_URI}/train/sft_train_samples.jsonl",
    # Optional:
    validation_dataset=f"{BUCKET_URI}/val/sft_val_samples.jsonl",
    tuned_model_display_name=tuned_model_display_name,
)

Creating SupervisedTuningJob


INFO:vertexai.tuning._tuning:Creating SupervisedTuningJob


SupervisedTuningJob created. Resource name: projects/1082216977346/locations/us-central1/tuningJobs/3682657757346922496


INFO:vertexai.tuning._tuning:SupervisedTuningJob created. Resource name: projects/1082216977346/locations/us-central1/tuningJobs/3682657757346922496


To use this SupervisedTuningJob in another session:


INFO:vertexai.tuning._tuning:To use this SupervisedTuningJob in another session:


tuning_job = sft.SupervisedTuningJob('projects/1082216977346/locations/us-central1/tuningJobs/3682657757346922496')


INFO:vertexai.tuning._tuning:tuning_job = sft.SupervisedTuningJob('projects/1082216977346/locations/us-central1/tuningJobs/3682657757346922496')


View Tuning Job:
https://console.cloud.google.com/vertex-ai/generative/language/locations/us-central1/tuning/tuningJob/3682657757346922496?project=1082216977346


INFO:vertexai.tuning._tuning:View Tuning Job:
https://console.cloud.google.com/vertex-ai/generative/language/locations/us-central1/tuning/tuningJob/3682657757346922496?project=1082216977346


In [58]:
# Get the tuning job info.
sft_tuning_job.to_dict()

{'name': 'projects/1082216977346/locations/us-central1/tuningJobs/3682657757346922496',
 'tunedModelDisplayName': 'test_model_Wikilingua',
 'baseModel': 'gemini-1.5-flash-002',
 'supervisedTuningSpec': {'trainingDatasetUri': 'gs://training_data_cryptic-skyline-411516/train/sft_train_samples.jsonl',
  'validationDatasetUri': 'gs://training_data_cryptic-skyline-411516/val/sft_val_samples.jsonl',
  'hyperParameters': {}},
 'state': 'JOB_STATE_PENDING',
 'createTime': '2025-01-10T22:27:26.308902Z',
 'updateTime': '2025-01-10T22:27:26.308902Z'}

In [59]:
# Get the resource name of the tuning job
sft_tuning_job_name = sft_tuning_job.resource_name
sft_tuning_job_name

'projects/1082216977346/locations/us-central1/tuningJobs/3682657757346922496'

**Note: Tuning time depends on several factors, such as training data size, number of epochs, learning rate multiplier, etc.**

<div class="alert alert-block alert-warning">
<b>⚠️ It will take ~30 mins for the model tuning job to complete on the provided dataset and set configurations/hyperparameters. ⚠️</b>
</div>

In [60]:
%%time
# Wait for job completion
while not sft_tuning_job.refresh().has_ended:
    time.sleep(60)

CPU times: total: 2.75 s
Wall time: 15min 38s


In [61]:
# tuned model name
tuned_model_name = sft_tuning_job.tuned_model_name
tuned_model_name

'projects/1082216977346/locations/us-central1/models/4945081033713254400@1'

In [62]:
# tuned model endpoint name
tuned_model_endpoint_name = sft_tuning_job.tuned_model_endpoint_name
tuned_model_endpoint_name

'projects/1082216977346/locations/us-central1/endpoints/5693087590182289408'

### Step7 [a]: Tuning and evaluation metrics

#### Model tuning metrics

- `/train_total_loss`: Loss for the tuning dataset at a training step.
- `/train_fraction_of_correct_next_step_preds`: The token accuracy at a training step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the tuning dataset.
- `/train_num_predictions`: Number of predicted tokens at a training step

#### Model evaluation metrics:

- `/eval_total_loss`: Loss for the evaluation dataset at an evaluation step.
- `/eval_fraction_of_correct_next_step_preds`: The token accuracy at an evaluation step. A single prediction consists of a sequence of tokens. This metric measures the accuracy of the predicted tokens when compared to the ground truth in the evaluation dataset.
- `/eval_num_predictions`: Number of predicted tokens at an evaluation step.

The metrics visualizations are available after the model tuning job completes. If you don't specify a validation dataset when you create the tuning job, only the visualizations for the tuning metrics are available.


In [64]:
# Get resource name from tuning job.
experiment_name = sft_tuning_job.experiment.resource_name
experiment_name

'projects/1082216977346/locations/us-central1/metadataStores/default/contexts/tuning-experiment-20250110143110555620'

In [65]:
# Locate Vertex AI Experiment and Vertex AI Experiment Run
experiment = aiplatform.Experiment(experiment_name=experiment_name)
filter_str = metadata_utils._make_filter_string(
    schema_title="system.ExperimentRun",
    parent_contexts=[experiment.resource_name],
)
experiment_run = context.Context.list(filter_str)[0]

In [66]:
# Read data from Tensorboard
tensorboard_run_name = f"{experiment.get_backing_tensorboard_resource().resource_name}/experiments/{experiment.name}/runs/{experiment_run.name.replace(experiment.name, '')[1:]}"
tensorboard_run = aiplatform.TensorboardRun(tensorboard_run_name)
metrics = tensorboard_run.read_time_series_data()

In [69]:
def get_metrics(metric: str = "/train_total_loss"):
    """
    Get metrics from Tensorboard.

    Args:
      metric: metric name, eg. /train_total_loss or /eval_total_loss.
    Returns:
      steps: list of steps.
      steps_loss: list of loss values.
    """
    loss_values = metrics[metric].values
    steps_loss = []
    steps = []
    for loss in loss_values:
        steps_loss.append(loss.scalar.value)
        steps.append(loss.step)
    return steps, steps_loss

In [70]:
# Get Train and Eval Loss
train_loss = get_metrics(metric="/train_total_loss")
eval_loss = get_metrics(metric="/eval_total_loss")

### Step7 [b]: Plot the metrics

In [71]:
# Plot the train and eval loss metrics using Plotly python library

fig = make_subplots(
    rows=1, cols=2, shared_xaxes=True, subplot_titles=("Train Loss", "Eval Loss")
)

# Add traces
fig.add_trace(
    go.Scatter(x=train_loss[0], y=train_loss[1], name="Train Loss", mode="lines"),
    row=1,
    col=1,
)
fig.add_trace(
    go.Scatter(x=eval_loss[0], y=eval_loss[1], name="Eval Loss", mode="lines"),
    row=1,
    col=2,
)

# Add figure title
fig.update_layout(title="Train and Eval Loss", xaxis_title="Steps", yaxis_title="Loss")

# Set x-axis title
fig.update_xaxes(title_text="Steps")

# Set y-axes titles
fig.update_yaxes(title_text="Loss")

# Show plot
fig.show()

## Step8: Load the Tuned Model

 - Load the fine-tuned model using `GenerativeModel` class with the tuning job model endpoint name.

 - Test the tuned model with the following prompt

In [72]:
prompt

'\nHold your arm out flat in front of you with your elbow bent. The top of your forearm should form a level surface. Apply a line of lotion from the back of your hand up your arm almost to the crease of your elbow. Squeeze lotion onto both forearms.  Do not rub the lotion into your arms, rather let it sit on your arm in the line you squeezed. You can use as much or as little lotion as you feel is necessary to cover your back completely. Bend your elbows and reach both of your arms behind you, placing the lotion covered forearms against your back. Depending on how flexible you are, this may hurt a little. It might be easier to place one arm behind your back at a time. If you have shoulder pain or are not very flexible, this method may not work well for you. Rub your forearms and the backs of your hands up and down your back like windshield wipers covering as much of your back as you can. You can use your left arm first to cover your left side and then place your right arm behind and use

In [73]:
if True:
    tuned_genai_model = GenerativeModel(tuned_model_endpoint_name)
    # Test with the loaded model.
    print("***Testing***")
    print(
        tuned_genai_model.generate_content(
            contents=prompt, generation_config=generation_config
        )
    )
else:
    print("State:", sft_tuning_job.state)
    print("Error:", sft_tuning_job.error)

***Testing***
candidates {
  content {
    role: "model"
    parts {
      text: "Hold your arm out in front of you with your elbow bent. Squeeze a line of lotion onto your forearm. Reach your arm behind your back. Rub your forearm up and down your back. Repeat with your other arm.\n\n"
    }
  }
  avg_logprobs: -0.33572750091552733
  finish_reason: STOP
}
usage_metadata {
  prompt_token_count: 261
  candidates_token_count: 45
  total_token_count: 306
}



```
candidates {
  content {
    role: "model"
    parts {
      text: "Squeeze a line of lotion onto the top of each forearm. Place your forearms behind your back. Rub your forearms up and down your back.\n\n"
    }
  }
  finish_reason: STOP
  avg_logprobs: -0.39081838726997375
}
usage_metadata {
  prompt_token_count: 261
  candidates_token_count: 32
  total_token_count: 293
}

```

- We can clearly see the difference between summary generated pre and post tuning, as tuned summary is more inline with the ground truth format (**Note**: Pre and Post outputs, might vary based on the set parameters.)

  - *Pre*: `This article describes a method for applying lotion to your own back using your forearms. The technique involves squeezing lotion in a line along your forearms, bending your elbows, and rubbing your arms against your back in a windshield wiper motion. This method may not be suitable for individuals with shoulder pain or limited flexibility.`
  - *Post*: `Squeeze a line of lotion onto the top of each forearm. Place your forearms behind your back. Rub your forearms up and down your back`
  - *Ground Truth*:` Squeeze a line of lotion onto the tops of both forearms and the backs of your hands. Place your arms behind your back. Move your arms in a windshield wiper motion.`

## Step9: Evaluation post model tuning

<div class="alert alert-block alert-warning">
<b>⚠️ It will take ~5 mins for the evaluation on the provided batch. ⚠️</b>
</div>

In [76]:
# debug run evaluation
evaluation_df_post_tuning = run_evaluation(tuned_genai_model, [corpus_batch[0]])

{'input_text': 'Hold your arm out flat in front of you with your elbow bent. The top of your forearm should form a level surface. Apply a line of lotion from the back of your hand up your arm almost to the crease of your elbow. Squeeze lotion onto both forearms.  Do not rub the lotion into your arms, rather let it sit on your arm in the line you squeezed. You can use as much or as little lotion as you feel is necessary to cover your back completely. Bend your elbows and reach both of your arms behind you, placing the lotion covered forearms against your back. Depending on how flexible you are, this may hurt a little. It might be easier to place one arm behind your back at a time. If you have shoulder pain or are not very flexible, this method may not work well for you. Rub your forearms and the backs of your hands up and down your back like windshield wipers covering as much of your back as you can. You can use your left arm first to cover your left side and then place your right arm b

In [74]:
# run evaluation
evaluation_df_post_tuning = run_evaluation(tuned_genai_model, corpus_batch)

 47%|████▋     | 47/100 [01:08<00:43,  1.21it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 49%|████▉     | 49/100 [01:09<00:25,  2.01it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 51%|█████     | 51/100 [01:09<00:16,  3.04it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 53%|█████▎    | 53/100 [01:09<00:11,  4.03it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 55%|█████▌    | 55/100 [01:10<00:09,  4.89it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 57%|█████▋    | 57/100 [01:10<00:07,  5.49it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 59%|█████▉    | 59/100 [01:10<00:07,  5.63it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 61%|██████    | 61/100 [01:11<00:06,  5.95it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 62%|██████▏   | 62/100 [01:11<00:06,  6.00it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 65%|██████▌   | 65/100 [01:11<00:05,  5.88it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 67%|██████▋   | 67/100 [01:12<00:05,  6.11it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 69%|██████▉   | 69/100 [01:12<00:05,  5.97it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 71%|███████   | 71/100 [01:12<00:05,  5.26it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 73%|███████▎  | 73/100 [01:13<00:04,  5.87it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 75%|███████▌  | 75/100 [01:13<00:04,  5.92it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 77%|███████▋  | 77/100 [01:13<00:03,  6.11it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 79%|███████▉  | 79/100 [01:14<00:03,  6.06it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 80%|████████  | 80/100 [01:14<00:03,  6.10it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 86%|████████▌ | 86/100 [01:20<00:07,  1.79it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.
Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 93%|█████████▎| 93/100 [01:29<00:06,  1.02it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


 95%|█████████▌| 95/100 [01:30<00:04,  1.14it/s]

Error: 429 Online prediction request quota exceeded for gemini-1.5-flash. Please try again later with backoff.


100%|██████████| 100/100 [01:38<00:00,  1.02it/s]


In [77]:
evaluation_df_post_tuning.head()

Unnamed: 0,document,summary,generated_summary,scores,rouge1_precision,rouge1_recall,rouge1_fmeasure,rouge2_precision,rouge2_recall,rouge2_fmeasure,rougeL_precision,rougeL_recall,rougeL_fmeasure
0,Hold your arm out flat in front of you with yo...,Squeeze a line of lotion onto the tops of both...,Hold your arm out in front of you with your el...,"{'rouge1': (0.5135135135135135, 0.612903225806...",0.513514,0.612903,0.558824,0.25,0.3,0.272727,0.378378,0.451613,0.411765


In [78]:
evaluation_df_post_tuning_stats = evaluation_df_post_tuning.dropna().describe()

In [79]:
# Statistics of the evaluation dataframe post model tuning.
evaluation_df_post_tuning_stats

Unnamed: 0,rouge1_precision,rouge1_recall,rouge1_fmeasure,rouge2_precision,rouge2_recall,rouge2_fmeasure,rougeL_precision,rougeL_recall,rougeL_fmeasure
count,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0
mean,0.513514,0.612903,0.558824,0.25,0.3,0.272727,0.378378,0.451613,0.411765
std,,,,,,,,,
min,0.513514,0.612903,0.558824,0.25,0.3,0.272727,0.378378,0.451613,0.411765
25%,0.513514,0.612903,0.558824,0.25,0.3,0.272727,0.378378,0.451613,0.411765
50%,0.513514,0.612903,0.558824,0.25,0.3,0.272727,0.378378,0.451613,0.411765
75%,0.513514,0.612903,0.558824,0.25,0.3,0.272727,0.378378,0.451613,0.411765
max,0.513514,0.612903,0.558824,0.25,0.3,0.272727,0.378378,0.451613,0.411765


In [81]:
print(
    "Mean rougeL_precision is", evaluation_df_post_tuning_stats.rougeL_precision["mean"]
)

Mean rougeL_precision is 0.3783783783783784


#### Improvement

In [84]:
improvement = round(
    (
        (
            evaluation_df_post_tuning_stats.rougeL_precision["mean"]
            - evaluation_df_stats.rougeL_precision["mean"]
        )
        / evaluation_df_stats.rougeL_precision["mean"]
    )
    * 100,
    2,
)
print(
    f"Model tuning has improved the rougeL_precision by {improvement}% (result might differ based on each tuning iteration)"
)

0.3783783783783784
0.19298245614035087
Model tuning has improved the rougeL_precision by 96.07% (result might differ based on each tuning iteration)


## Conclusion

Performance could be further improved:
- By adding more training samples. In general, improve your training data quality and/or quantity towards getting a more diverse and comprehensive dataset for your task
- By tuning the hyperparameters, such as epochs and learning rate multiplier
  - To find the optimal number of epochs for your dataset, we recommend experimenting with different values. While increasing epochs can lead to better performance, it's important to be mindful of overfitting, especially with smaller datasets. If you see signs of overfitting, reducing the number of epochs can help mitigate the issue
- You may try different prompt structures/formats and opt for the one with better performance

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.


Otherwise, you can delete the individual resources you created in this tutorial.

Refer to this [instructions](https://cloud.google.com/vertex-ai/docs/tutorials/image-classification-custom/cleanup#delete_resources) to delete the resources from console.

In [None]:
# Delete Experiment.
delete_experiments = True
if delete_experiments:
    experiments_list = aiplatform.Experiment.list()
    for experiment in experiments_list:
        if experiment.resource_name == experiment_name:
            print(experiment.resource_name)
            experiment.delete()
            break

print("***" * 10)

# Delete Endpoint.
delete_endpoint = True
# If force is set to True, all deployed models on this
# Endpoint will be first undeployed.
if delete_endpoint:
    for endpoint in aiplatform.Endpoint.list():
        if endpoint.resource_name == tuned_model_endpoint_name:
            print(endpoint.resource_name)
            endpoint.delete(force=True)
            break

print("***" * 10)

# Delete Cloud Storage Bucket.
delete_bucket = True
if delete_bucket:
    ! gsutil -m rm -r $BUCKET_URI