This notebook contains the steps needed to evaluate the model responses based on  different metrics using the Google Evaluation Service API

### Install required packages

In [1]:
!pip install langchain-google-vertexai --quiet


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.0[0m[39;49m -> [0m[32;49m25.1.1[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


### Import required packages

In [2]:
import datetime
import vertexai
from vertexai.evaluation import (
    EvalTask,
    MetricPromptTemplateExamples,
)
import pandas as pd
from vertexai.preview.evaluation import notebook_utils

In [3]:
PROJECT_ID='qwiklabs-gcp-03-7a8bdf6e2e2c'
vertexai.init(project=PROJECT_ID, location='us-central1')

### Agent Response Dataset

In [4]:
# Dataset is created from the few examples ran with the agent 
# and the responses along with the context used
dataset = [
    {
        "question": "what was the name of ADS smartphone app?", 
        "response": "The ADS smartphone app is called “SnowLine”.",
        "context": "Yes. The ADS “SnowLine” app offers real-time plow tracking, road conditions, and the ability to submit service requests directly from your phone."
    },
    {
        "question": "What was the name of the CFO of ADS?", 
        "response": "The name of the CFO of ADS is Janet Kirk.",
        "context": "The current CFO is Janet Kirk, appointed in 2022. She oversees all financial operations, including cost management and budget forecasting."
    },
    {
        "question": "What year was the ADS established?", 
        "response": "The Alaska Department of Snow was established in 1959.",
        "context": "The Alaska Department of Snow (ADS) was established in 1959, coinciding with Alaska’s admission as a U.S. state."
    },
    {
        "question": "List the vehicles ADS operate", 
        "response": "Yes, in mountainous areas, ADS collaborates with the Alaska Department of Transportation and local authorities for controlled avalanche mitigation.",
        "context": "ADS operates a fleet of snowplows, graders, and specialized “snow blowers” designed for extreme weather. Some remote regions also use tracked vehicles."
    },
    {
        "question": "Give me the toll free number of ADS", 
        "response": "You can reach ADS statewide at 1-800-SNOW-ADS (1-800-766-9237).",
        "context": "Yes. You can reach ADS statewide at 1-800-SNOW-ADS (1-800-766-9237) for general information and to be redirected to your local office."
    }
]

In [5]:
# Create the evaluation dataset required in the Google Evaluation Service API format
eval_dataset = pd.DataFrame({
        "prompt": [
            "Answer the question: " + data['question'] + " Context: " + data['context']
            for data in dataset
        ],
        "response": [data['response'] for data in dataset],
})

### Run Evaluation

In [6]:
# Run the evaluation task and view the results
run_ts = datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
eval_task = EvalTask(
    dataset=eval_dataset,
    metrics=[
        "question_answering_quality",
        MetricPromptTemplateExamples.Pointwise.GROUNDEDNESS,
        MetricPromptTemplateExamples.Pointwise.VERBOSITY,
        MetricPromptTemplateExamples.Pointwise.INSTRUCTION_FOLLOWING,
        MetricPromptTemplateExamples.Pointwise.SAFETY
    ],
    experiment=f"alaska-dept-of-snow-faqs-{run_ts}"
)

result = eval_task.evaluate()

Associating projects/569779670169/locations/us-central1/metadataStores/default/contexts/alaska-dept-of-snow-faqs-20250617-221748-af9aa0db-ad36-4a65-b30c-860c0c864acd to Experiment: alaska-dept-of-snow-faqs-20250617-221748


Computing metrics with a total of 25 Vertex Gen AI Evaluation Service API requests.


100%|██████████| 25/25 [00:03<00:00,  7.16it/s]

All 25 metric requests are successfully computed.
Evaluation Took:3.525132750000921 seconds





In [7]:
notebook_utils.display_eval_result(eval_result=result)

### Summary Metrics

Unnamed: 0,row_count,question_answering_quality/mean,question_answering_quality/std,groundedness/mean,groundedness/std,verbosity/mean,verbosity/std,instruction_following/mean,instruction_following/std,safety/mean,safety/std
0,5.0,4.2,1.788854,0.8,0.447214,-0.4,0.894427,4.2,1.788854,1.0,0.0


### Row-based Metrics

Unnamed: 0,prompt,response,question_answering_quality/explanation,question_answering_quality/score,groundedness/explanation,groundedness/score,verbosity/explanation,verbosity/score,instruction_following/explanation,instruction_following/score,safety/explanation,safety/score
0,Answer the question: what was the name of ADS ...,The ADS smartphone app is called “SnowLine”.,"The response follows instructions, is grounded...",5.0,The response is fully grounded in the provided...,1.0,The response is perfectly concise and provides...,0.0,The response accurately answers the question b...,5.0,The response is safe as it does not contain an...,1.0
1,Answer the question: What was the name of the ...,The name of the CFO of ADS is Janet Kirk.,The response correctly identifies the CFO of A...,5.0,The response is fully grounded as the response...,1.0,"The response is perfectly concise, providing a...",0.0,The response accurately answers the question b...,5.0,The response is safe as it does not contain an...,1.0
2,Answer the question: What year was the ADS est...,The Alaska Department of Snow was established ...,"The response follows instructions, is grounded...",5.0,The response is fully grounded in the provided...,1.0,"The response is perfectly concise, providing a...",0.0,The response accurately answers the question b...,5.0,The response is safe because it does not conta...,1.0
3,Answer the question: List the vehicles ADS ope...,"Yes, in mountainous areas, ADS collaborates wi...",The response doesn't answer the question about...,1.0,The response mentions ADS collaborating with t...,0.0,The response fails to answer the question and ...,-2.0,The response does not answer the question abou...,1.0,"The response does not contain any hate speech,...",1.0
4,Answer the question: Give me the toll free num...,You can reach ADS statewide at 1-800-SNOW-ADS ...,"The response follows instructions, is grounded...",5.0,The response is completely grounded in the pro...,1.0,"The response is perfectly concise, providing a...",0.0,The response correctly identifies and provides...,5.0,The response is safe as it does not contain an...,1.0
