<a href="https://colab.research.google.com/github/frank-morales2020/MLxDL/blob/main/FT_GEMINI_NASA_VERTEXAI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install colab-env -q
!pip install google-generativeai -q
!pip install rouge-score -q

## Dataset Format (JSON)

In [None]:
{
  "contents": [
    {
      "role": "user",
      "parts": [
        {
          "text": "Engine sensor readings over time: [1.0, 41.9993, 0.8409, 100.0, 445.0, 548.68, 1343.85, 1111.03, 3.91, 5.69, 137.26, 2211.96, 8296.96, ..., 8054.65, 9.2728, 0.02, 331.0, 2223.0, 100.0, 14.78, 8.8922]"
        }
      ]
    },
    {
      "role": "model",
      "parts": [
        {
          "text": "Remaining Useful Life: 0"
        }
      ]
    }
  ]
}

## Fine Tuning

In [None]:
from vertexai.preview.tuning import sft
import vertexai
import os
from google.colab import auth
import colab_env
import time

# Project details (replace with your values if not using env vars)
PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT")
REGION = os.environ.get("GOOGLE_CLOUD_REGION")
BUCKET_NAME = os.environ.get("GOOGLE_CLOUD_BUCKET_NAME")
STAGING_BUCKET = f"gs://{BUCKET_NAME}/staging"

# Authentication and Initialization
auth.authenticate_user()
vertexai.init(project=PROJECT_ID, location=REGION, staging_bucket=STAGING_BUCKET)

# Define your tuning parameters
BASE_MODEL = "gemini-2.0-flash-001"  # Using Gemini 2.0 Flash

TRAIN_DATASET_URI = f"gs://{BUCKET_NAME}/cmapss_FD004_train_text.jsonl"  # Path to your training data in JSONL format
VALIDATION_DATASET_URI = f"gs://{BUCKET_NAME}/cmapss_FD004_test_text.jsonl"  # Path to your validation data in JSONL format
TUNED_MODEL_DISPLAY_NAME = "cmapss-text-tuned-gemini-2.0-flash-001"
EPOCHS = 10  # Adjust as needed
LEARNING_RATE_MULTIPLIER = 1.0  # Adjust as needed



# Start the fine-tuning job
try:
    sft_tuning_job = sft.train(
        source_model=BASE_MODEL,
        train_dataset=TRAIN_DATASET_URI,
        validation_dataset=VALIDATION_DATASET_URI,
        tuned_model_display_name=TUNED_MODEL_DISPLAY_NAME,
        epochs=EPOCHS,
        learning_rate_multiplier=LEARNING_RATE_MULTIPLIER,
    )


    print(f"Tuning job started: {sft_tuning_job.resource_name}")

    # Periodically check the job status until it's complete
    while True:
        job_status = sft_tuning_job.state  # Get the job's state directly

        if job_status in ("SUCCEEDED", "FAILED", "CANCELLED"):
            break  # Exit the loop if the job is finished

        print(f"Job status: {job_status}, waiting...")
        time.sleep(60)  # Wait for 60 seconds before checking again

    print(f"Tuning job completed with status: {job_status}. Resource name: {sft_tuning_job.resource_name}")



except Exception as e:
    print(f"An error occurred: {e}")
    print("Please double-check the base model name and your Vertex AI setup.")

In [None]:
from google.cloud import aiplatform

# Initialize the Vertex AI SDK
aiplatform.init(project=PROJECT_ID, location=REGION)

# List all custom models
models = aiplatform.Model.list()

print("Inspecting metadata of all models:")
print("Inspecting the dictionary representation of each model:")
for model in models:
    print(f"\nDisplay Name: {model.display_name}")
    print(f"Resource Name: {model.resource_name}")
    model_dict = model.to_dict()
    print(f"Model Dictionary: {model_dict}")

In [146]:
from google.cloud import aiplatform

# Initialize the Vertex AI SDK
aiplatform.init(project=PROJECT_ID, location=REGION)

# List all custom models
models = aiplatform.Model.list()



# Filter for models that have the 'google-vertex-llm-tuning-job-id' label
fine_tuned_models = [
    model for model in models
    if model.labels is not None and 'google-vertex-llm-tuning-job-id' in model.labels and 'gemini-2.0' in model.display_name
]


fine_tuned_models=fine_tuned_models[1:2]

# Print the display names and resource names of the fine-tuned models
if fine_tuned_models:
    print("\nSuccessfully found fine-tuned models:")
    for model in fine_tuned_models:
        print(f"- Display Name: {model.display_name}")
        resource_name = model.resource_name
        model_resource_name_id = resource_name.split('/')[-1]
        print(f"- Resource Name Model: {model_resource_name_id}")
        print(f"- Resource Name Tuning Job: {model.labels['google-vertex-llm-tuning-job-id']}")
        print(f"- Tune Type: {model.labels['tune-type']}")  # Add this line to print the model state
        #print('\n')
        print(f"- Create Time: {model.create_time}")

        #model.gca_resource.deployed_models
        #endpoints = [deployed_model.endpoint for deployed_model in model.gca_resource.deployed_models]
        #endpoint_id = endpoints[0].split('/')[-1]  # Get the last part after splitting by '/'
        #print(f"- Model Endpoint: {endpoint_id}")

        if model.gca_resource.deployed_models:
            endpoints = [deployed_model.endpoint for deployed_model in model.gca_resource.deployed_models]
            endpoint_id = endpoints[0].split('/')[-1]  # Get the last part after splitting by '/'
            print(f"- Model Endpoint: {endpoint_id}")
        else:
            print(f"- Model Endpoint: Not Deployed")  # Indicate if the model is not deployed



        #print(f"- Status: {aiplatform.Model(model_name=model.resource_name).gca_resource}")
        print('\n')
        # Get state using Model(model_name=...).state
        #print(f"- Status: {aiplatform.Model(model_name=model.resource_name).state}")
else:
    print("\nNo fine-tuned models found based on the tuning job ID label.")


Successfully found fine-tuned models:
- Display Name: cmapss-text-tuned-gemini-2.0-flash-001
- Resource Name Model: 1440268972921454592
- Resource Name Tuning Job: 5437787329584431104
- Tune Type: sft
- Create Time: 2025-04-05 09:49:50.951936+00:00
- Model Endpoint: 7903506184244559872




## Evaluation

In [None]:
!pip install rouge-score -q
!pip install google-generativeai -q
!pip install colab-env -q

In [151]:
from google import genai
from google.genai import types
import json
import os
import colab_env
from google.cloud import aiplatform
from google.colab import auth
from rouge_score import rouge_scorer
from tqdm import tqdm

# Authentication and Initialization
auth.authenticate_user()
PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT")
PROJECT_NUMBER = os.environ.get("GOOGLE_CLOUD_PROJECT_NUMBER")
REGION = os.environ.get("GOOGLE_CLOUD_REGION")
BUCKET_NAME = os.environ.get("GOOGLE_CLOUD_BUCKET_NAME")
STAGING_BUCKET = f"gs://{BUCKET_NAME}/staging"

EVAL_DATASET_URI = f"gs://{BUCKET_NAME}/cmapss_FD004_test_text.jsonl"  # Update with your dataset URI
tuned_model_resource_name = f'projects/{PROJECT_NUMBER}/locations/{REGION}/models/1440268972921454592@1'
aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=STAGING_BUCKET)

def generate_and_evaluate():
    client = genai.Client(
        vertexai=True,
        project=PROJECT_ID,
        location=REGION,
    )

    model_endpoint = f"projects/{PROJECT_NUMBER}/locations/{REGION}/endpoints/{endpoint_id}"  # Update with your model endpoint


    print('\n')
    report=f"Evaluation of the model in Vertex AI: {model_resource_name_id} in the endpoint {endpoint_id}, with the dataset: {EVAL_DATASET_URI}"
    print(report)
    print('\n\n')


    validation_dataset_uri = EVAL_DATASET_URI

    # Copy the validation dataset locally
    local_dataset_path = '/content/cmapss_FD004_test_text.jsonl'
    !gsutil cp {validation_dataset_uri} .
    print('\n')

    scorer = rouge_scorer.RougeScorer(['rouge1', 'rougeL'], use_stemmer=True)
    all_scores = []

    num_lines = sum(1 for line in open(local_dataset_path))  # Calculate total lines beforehand


    # Read and process the dataset file
    with open(local_dataset_path, 'r') as f:
        for line in tqdm(f, total=num_lines, desc="Processing dataset"):  # Set total and description
            data = json.loads(line)

            # Extract prompt and ground truth from JSON structure
            try:
                prompt = data['contents'][0]['parts'][0]['text']
                ground_truth_text = data['contents'][1]['parts'][0]['text']
            except (IndexError, KeyError):
                print("Skipping invalid data point:", line)
                continue  # Skip to the next line

            if prompt and ground_truth_text:
                contents = [prompt]

                # Generate content for the current prompt
                generated_text = ""
                try:
                    for chunk in client.models.generate_content_stream(
                        model=model_endpoint,
                        contents=contents,
                        config=types.GenerateContentConfig(
                            temperature=1,
                            top_p=0.95,
                            max_output_tokens=8192,
                            response_modalities=["TEXT"],
                            safety_settings=[types.SafetySetting(category=c, threshold="OFF") for c in [
                                "HARM_CATEGORY_HATE_SPEECH",
                                "HARM_CATEGORY_DANGEROUS_CONTENT",
                                "HARM_CATEGORY_SEXUALLY_EXPLICIT",
                                "HARM_CATEGORY_HARASSMENT",
                            ]],
                        ),
                    ):
                        generated_text += chunk.text
                except Exception as e:
                    print(f"Error during text generation for prompt '{prompt[:50]}...': {e}")
                    continue  # Skip to the next line

                # Calculate ROUGE scores
                scores = scorer.score(ground_truth_text, generated_text)
                all_scores.append(scores)


    # Calculate and print average ROUGE scores
    if all_scores:
        avg_rouge1 = sum(s['rouge1'].fmeasure for s in all_scores) / len(all_scores)
        avg_rougeL = sum(s['rougeL'].fmeasure for s in all_scores) / len(all_scores)
        print('\n\n')
        print(f"Average ROUGE-1: {avg_rouge1}")
        print(f"Average ROUGE-L: {avg_rougeL}")
        print('\n')
    else:
        print("No ROUGE scores were calculated. Check the dataset and text generation process.")

generate_and_evaluate()



Evaluation of the model in Vertex AI: 1440268972921454592 in the endpoint 7903506184244559872, with the dataset: gs://poc-my-new-staging-bucket-2025-1/cmapss_FD004_test_text.jsonl



Copying gs://poc-my-new-staging-bucket-2025-1/cmapss_FD004_test_text.jsonl...
/ [1 files][  1.4 MiB/  1.4 MiB]                                                
Operation completed over 1 objects/1.4 MiB.                                      




Processing dataset: 100%|██████████| 252/252 [01:29<00:00,  2.81it/s]




Average ROUGE-1: 0.75
Average ROUGE-L: 0.75







Interpretation:

* ROUGE-1: Measures the overlap of unigrams (individual words) between the generated text and the ground truth. A score of 0.75 suggests a relatively high degree of similarity at the word level.

* ROUGE-L: Measures the longest common subsequence (LCS) between the generated text and the ground truth, taking into account sentence structure. A score of 0.75 also indicates a good level of similarity in terms of overall sentence structure and content.

1. Overall, these scores suggest that the LLM is performing well in generating text that is similar to the ground truth in terms of both word-level and sentence-level structure and content.

2. Of course, the interpretation of these scores can depend on the specific task and the desired level of accuracy. However, in general, ROUGE scores above 0.5 are often considered to be reasonably good, and scores above 0.7 are considered to be very good.