# Automated Prompt Evaluation for RAG usecase in Production


This notebook should be run using with Runtime 22.2 & Python 3.10 or greater runtime environment, if you are viewing this in Watson Studio, and do not see Python 3.10.x in the upper right corner of your screen, please update the runtime now. 

The notebook will create a retrieval augmented generation prompt template asset in a given project, configure OpenScale to monitor that PTA and evaluate generative quality metrics and model health metrics. This notebook has data which is memory centric

If users wish to execute this notebook for task types other than retrieval_augmented_generation, please consult [this](https://github.com/IBM/watson-openscale-samples/blob/main/IBM%20Cloud/WML/notebooks/watsonx/README.md) document for guidance on evaluating prompt templates for the available task types.

Note : User can search for `EDIT THIS` and fill the inputs needed.

## Prerequisite

* It requires service credentials for IBM Watson OpenScale:
* Requires a CSV file containing the test data that needs to be evaluated
* Requires the ID of project in which you want to create the prompt template asset.

### Contents

- [Setup](#settingup)
- [Create Prompt template](#prompt)
- [Prompt Setup](#ptatsetup)
- [Risk evaluations for prompt template asset subscription](#evaluate)
- [Display the Model Risk metrics](#mrmmetric)
- [Display the Generative AI Quality metrics](#genaimetrics)
- [Plot rougel and rougelsum metrics against records](#plotproject)
- [See factsheets information](#factsheetsspace)

## Setup <a name="settingup"></a>

In [None]:
!pip install --upgrade datasets==2.10.0 --no-cache | tail -n 1
!pip install --upgrade evaluate --no-cache | tail -n 1
!pip install --upgrade ibm-aigov-facts-client | tail -n 1
!pip install --upgrade ibm-watson-openscale | tail -n 1
!pip install --upgrade ibm-watsonx-ai | tail -n 1
!pip install --upgrade matplotlib | tail -n 1
!pip install --upgrade pydantic==2.7.4 --no-cache | tail -n 1
!pip install --upgrade sacrebleu --no-cache | tail -n 1
!pip install --upgrade sacremoses --no-cache | tail -n 1
!pip install --upgrade textstat --no-cache | tail -n 1
!pip install --upgrade transformers --no-cache | tail -n 1

Note: you may need to restart the kernel to use updated packages.

In [None]:
!pip install --upgrade pydantic==2.7.4 --no-cache | tail -n 1

### Provision services and configure credentials

If you have not already, provision an instance of IBM Watson OpenScale using the [OpenScale link in the Cloud catalog](https://cloud.ibm.com/catalog/services/watson-openscale).

Your Cloud API key can be generated by going to the [**Users** section of the Cloud console](https://cloud.ibm.com/iam#/users). From that page, click your name, scroll down to the **API Keys** section, and click **Create an IBM Cloud API key**. Give your key a name and click **Create**, then copy the created key and paste it below.

**NOTE:** You can also get OpenScale `API_KEY` using IBM CLOUD CLI.

How to install IBM Cloud (bluemix) console: [instruction](https://console.bluemix.net/docs/cli/reference/ibmcloud/download_cli.html#install_use)

How to get api key using console:
```
bx login --sso
bx iam api-key-create 'my_key'
```

In [2]:
IAM_URL = "https://iam.cloud.ibm.com"
DATAPLATFORM_URL = "https://api.dataplatform.cloud.ibm.com"
#DATAPLATFORM_URL = "https://api.eu-de.dataplatform.cloud.ibm.com"
#DATAPLATFORM_URL = "https://api.au-syd.dataplatform.cloud.ibm.com"
SERVICE_URL = "https://aiopenscale.cloud.ibm.com"
#SERVICE_URL = "https://au-syd.aiopenscale.cloud.ibm.com"
CLOUD_API_KEY = " " # YOUR_CLOUD_API_KEY
WML_CREDENTIALS = {
                "url": "https://us-south.ml.cloud.ibm.com",
                "apikey": CLOUD_API_KEY,
                "auth_url": IAM_URL,
                "wml_location" : "cloud"
}


## Set the project ID

In order to set up a development type subscription, the PTA must be within the project. Please supply the project ID where the PTA needs to be created.

In [3]:
PROJECT_ID = "" # YOUR_PROJECT_ID

## Read space id from user

User can use an existing space or can create a new space to promote the model. User should choose any of these options with the below variable.

In [4]:
use_existing_space = True # Set it as False if user wants to create a new space
space_id=""

In [5]:
import json
from ibm_watsonx_ai import APIClient

wml_client = APIClient(WML_CREDENTIALS)
wml_client.version

'1.3.39'

## Function to create the access token

This function generates an IAM access token using the provided credentials. The API calls for creating and scoring prompt template assets utilize the token generated by this function.

In [6]:
import requests, json
def generate_access_token():
    headers={}
    headers["Content-Type"] = "application/x-www-form-urlencoded"
    headers["Accept"] = "application/json"
    data = {
        "grant_type": "urn:ibm:params:oauth:grant-type:apikey",
        "apikey": CLOUD_API_KEY,
        "response_type": "cloud_iam"
    }
    response = requests.post(IAM_URL + "/identity/token", data=data, headers=headers)
    json_data = response.json()
    iam_access_token = json_data["access_token"]
        
    return iam_access_token

iam_access_token = generate_access_token()
print(iam_access_token)

eyJraWQiOiIyMDE5MDcyNCIsImFsZyI6IlJTMjU2In0.eyJpYW1faWQiOiJJQk1pZC02NjMwMDNMTDMwIiwiaWQiOiJJQk1pZC02NjMwMDNMTDMwIiwicmVhbG1pZCI6IklCTWlkIiwianRpIjoiMzcxYmI0NTItMzFmOS00OWE4LWE3ZTYtMTQ1NGEwZTJmNGVjIiwiaWRlbnRpZmllciI6IjY2MzAwM0xMMzAiLCJnaXZlbl9uYW1lIjoiRGhhcmEiLCJmYW1pbHlfbmFtZSI6IkJhZ2FkaWEiLCJuYW1lIjoiRGhhcmEgQmFnYWRpYSIsImVtYWlsIjoiZGhhcmEuYmFnYWRpYUBpYm0uY29tIiwic3ViIjoiZGhhcmEuYmFnYWRpYUBpYm0uY29tIiwiYXV0aG4iOnsic3ViIjoiZGhhcmEuYmFnYWRpYUBpYm0uY29tIiwiaWFtX2lkIjoiSUJNaWQtNjYzMDAzTEwzMCIsIm5hbWUiOiJEaGFyYSBCYWdhZGlhIiwiZ2l2ZW5fbmFtZSI6IkRoYXJhIiwiZmFtaWx5X25hbWUiOiJCYWdhZGlhIiwiZW1haWwiOiJkaGFyYS5iYWdhZGlhQGlibS5jb20ifSwiYWNjb3VudCI6eyJ2YWxpZCI6dHJ1ZSwiYnNzIjoiNDIzNzkwMjdkNDMwNDc1Yjg4MzQ1ZTEwYzBiM2EyODYiLCJpbXNfdXNlcl9pZCI6IjE0NDg5MjU3IiwiZnJvemVuIjp0cnVlLCJpc19lbnRlcnByaXNlX2FjY291bnQiOmZhbHNlLCJlbnRlcnByaXNlX2lkIjoiZWU1NzVjNTc3ODc2NGQ0MDkxNTVhYTM1NzgwZWM4ZDEiLCJpbXMiOiIyNjIwNzM2In0sImlhdCI6MTc1OTMxMjMyMywiZXhwIjoxNzU5MzE1OTIzLCJpc3MiOiJodHRwczovL2lhbS5jbG91ZC5pYm0uY29tL2lkZW50aXR5

# Demo Dataset <a name="alternative"></a>


Used as alternative to be run for testing in low resource CPD clusters


In [7]:
!wget https://ibm.box.com/shared/static/3ysiqmcqzemlbp68pc7dg7homj5jjztt.csv


--2025-10-01 15:22:07--  https://ibm.box.com/shared/static/3ysiqmcqzemlbp68pc7dg7homj5jjztt.csv
Resolving ibm.box.com (ibm.box.com)... 2620:117:bff0:12d::, 74.112.186.157
Connecting to ibm.box.com (ibm.box.com)|2620:117:bff0:12d::|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /public/static/3ysiqmcqzemlbp68pc7dg7homj5jjztt.csv [following]
--2025-10-01 15:22:08--  https://ibm.box.com/public/static/3ysiqmcqzemlbp68pc7dg7homj5jjztt.csv
Reusing existing connection to [ibm.box.com]:443.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://ibm.ent.box.com/public/static/3ysiqmcqzemlbp68pc7dg7homj5jjztt.csv [following]
--2025-10-01 15:22:08--  https://ibm.ent.box.com/public/static/3ysiqmcqzemlbp68pc7dg7homj5jjztt.csv
Resolving ibm.ent.box.com (ibm.ent.box.com)... 2620:117:bff0:12d::, 74.112.186.157
Connecting to ibm.ent.box.com (ibm.ent.box.com)|2620:117:bff0:12d::|:443... connected.
HTTP request sent, awaiting response...

In [8]:
!mv 3ysiqmcqzemlbp68pc7dg7homj5jjztt.csv RAG_data.csv

In [9]:
import pandas as pd

test_data_path = "RAG_data.csv"
llm_data = pd.read_csv(test_data_path)
llm_data=llm_data.head(10)


In [10]:
llm_data.to_csv(test_data_path)

In [11]:
llm_data

Unnamed: 0,question,generated_text,answer,context1,context2,context3
0,"After cancelling policy , how long we have to ...","If you cancel your policy, you can still make ...","Sorry, I don't have information on that.","\n<t1></t1>\nAt Bingle, we don't have the opti...",\n<t2></t2>\nWeâ€™re sorry to hear your car ha...,"\n<t3></t3>\nCompulsory Third Party Insurance,..."
1,Am I covered for asbestos in my vehicle?,"We will not cover asbestos, asbestos fibres or...","We will not cover asbestos, asbestos fibres or...",\n<t1></t1>\nâ€¢ We will not cover incidents o...,"\n<t2></t2>\nYes, if youâ€™re a salesperson, r...",\n<t3></t3>\nBingle offers third party propert...
2,Am I covered for bushfires,Our Comprehensive Car Insurance covers acciden...,Our Comprehensive Car Insurance covers acciden...,\n<t1></t1>\nMother Nature can be harsh at tim...,\n<t2></t2>\nWe will cover liability which is ...,\n<t3></t3>\nâ€¢ We will not cover incidents o...
3,Am I covered for every car I drive?,Your policy only covers you for the specific c...,Your policy only covers you for the specific c...,\n<t1></t1>\nWeâ€™re happy to cover licenced d...,\n<t2></t2>\nâ€¢ If your car cannot be driven ...,"\n<t3></t3>\nTo add hire car to your policy, y..."
4,Am I covered for fire?,Our Comprehensive Car Insurance covers acciden...,Our Comprehensive Car Insurance covers acciden...,\n<t1></t1>\nMother Nature can be harsh at tim...,\n<t2></t2>\nBingle's Third Party Property Dam...,\n<t3></t3>\nBingle's Comprehensive Car Insura...
5,Am I covered for mould in my car in comprehens...,"We do not cover mildew, mould, rust, corrosion...","We do not cover mildew, mould, rust, corrosion...",\n<t1></t1>\nTypes of loss or damage to your c...,\n<t2></t2>\nBingle's Comprehensive Car Insura...,\n<t3></t3>\nâ€¢ We will not cover incidents o...
6,Am I covered for problems with the fuel tank i...,We do not cover loss or damage to your car (in...,We do not cover loss or damage to your car (in...,\n<t1></t1>\nTypes of loss or damage to your c...,\n<t2></t2>\nYour car is described on your ins...,\n<t3></t3>\nâ€¢ We will not cover incidents o...
7,Am I covered for windscreen replacement,"With our Comprehensive car Insurance, your car...","With our Comprehensive car Insurance, your car...",\n<t1></t1>\nWe will be upfront with you. Beca...,\n<t2></t2>\nHow exciting! You will probably b...,\n<t3></t3>\nWith our Comprehensive car Insura...
8,Am I covered to get a hire car,"For an extra premium, our Comprehensive Policy...","For an extra premium, our Comprehensive Policy...",\n<t1></t1>\nWe are a busy bunch these days. T...,\n<t2></t2>\nFor an extra premium our Comprehe...,"\n<t3></t3>\nTo add hire car to your policy, y..."
9,Am I insured if I drive a manual on an auto li...,If you hold an automatic licence and drive a m...,If you hold an automatic licence and drive a m...,\n<t1></t1>\nHere are the reasons you can't fi...,\n<t2></t2>\nWeâ€™re happy to cover licenced d...,\n<t3></t3>\nâ€¢ We will neither provide legal...


# Create Prompt template <a name="prompt"></a>

Create a prompt template for a retrieval augmented generation task

In [12]:
from ibm_aigov_facts_client import AIGovFactsClient

facts_client = AIGovFactsClient(
    api_key=CLOUD_API_KEY,
    container_id=PROJECT_ID,
    container_type="project",
    disable_tracing=True
    #region="europe"
)


  import pkg_resources
  _bootstrap._exec(spec, module)


In [13]:
prompt_input="""
[INST] <>You are an assistant named Buddy who helps customers of an Australian online-only insurer named Bingle. Being online only, you should not suggest contacting Bingle. You should only answer queries related to car insurance. Your answer must be general in nature. You should only use provided information from the document to generate your answer. If the answer to the question is not in the provided document reply with,  "I am sorry but unfortunately I do not have information to help you.".  Use Australian spelling and Australian insurance terminology. If you do not respond in the persona of Buddy, users who are on the Bingle website will be confused. You should maintain a friendly customer service tone.<> 
    Here is the document you should use to answer the user:
    {context1}\n{context2}\n{context3}
    Here are some important rules for the interaction:
    - Always stay in character, as Buddy from Bingle and answer user questions in first person.
    - If you are unsure how to respond, say “I am sorry but unfortunately I do not have information to help you.”.
    - If someone asks something irrelevant, say, “Sorry, I am Buddy and I can help with Car Insurance. Do you have an insurance related question today I can help you with?”.
    - Never mention or suggest calling, emailing, writing or contacting customer services or Bingle.
    - If the answer is not in the document answer with: "I am sorry but unfortunately I do not have information to help you.”.
    - If the document provides instruction ensure to list them out so that the user understands the process to take, this will be extremly helpful.
    - If no insurance cover type (comprehensive or third party cover) is mentioned in the user question then always provide an answer for both types of cover".
    - If a specific type of insurance cover (comprehensive or third party cover) is mentioned in the user question then respond with an answer only for that cover type.
    - Remember, user have already looked at the Bingle website so do not suggest them to check our website as this will be condescending and rude, instead suggest they review the Product Disclosure Statement (PDS).
    
    Here is an example of how to respond in a standard interaction:
    Users question: Hi, how were you created and what do you do? [/INST]
    Step 1: Check if the question is related to Bingle car insurance. Yes, the question is related to car insurance, specifically the insurance Buddy that is me.
    Step 2: Check if the answer can be found in the provided document. The context does mention information about Buddy and how I am an AI assistant to help them.
    Step 3: Provide the answer in structured json. 
    ANSWER: {{"answer": "Hello! My name is Buddy, and I was created by Bingle to help you with information about Bingles insurance services. What can I help you with today?."}} 
    [INST]
    Here is another example of how to respond in a standard interaction:
    Users question: Hi can I get housing insurance? [/INST]
    Step 1: Check if the question is related to Bingle car insurance. No, the question is not related to car insurance.
    Step 2: Check if the answer can be found in the provided document. The document does not mention information about housing insurance.
    Step 3: Provide the answer in structured json. 
    ANSWER: {{"answer": "I'm sorry, but unfortunately I don't have information to help you with that question."}}
    [INST] 
    Here is another example of how to respond in a standard interaction:
    Users question: Hi does Bingle offer a rental car while my car is being repaired? [/INST]
    Step 1: Check if the question is related to Bingle car insurance. Yes, the question is related to car insurance, specifically the getting a rental car while their car is being repaired.
    Step 2: Check if the answer can be found in the provided document. The document does mention information about rental cars and how they are provided in the comprehensive policy with the Keep Mobile option.
    Step 3: Provide the answer in structured json. 
    ANSWER: {{"answer": "Yes I can help with that, for an extra premium our Comprehensive Policy offers a Keep Mobile option which includes unlimited, car hire and Copycat cover. Our Third Party Policy does not include the Keep Mobile option. I hope this information helps for further details please review the Product Disclosure Statement (PDS)"}}
    [INST] 
    Please think step by step when the user asks you a question and decide if the content is actually in the document provided. Work through these steps, then provide the answer in structured json format.
    
    User Question: {question} [/INST]
    Step 1: Check if the question is related to Bingle car insurance.

"""

In [14]:
from ibm_aigov_facts_client import DetachedPromptTemplate, PromptTemplate

detached_information = DetachedPromptTemplate(
    prompt_id="detached_prompt",
    model_id="meta-llama/llama-3-70b-instruct",
    model_provider="Facebook",
    model_name="llama-3-70b-instruct",
    model_url="https://us-south.ml.cloud.ibm.com/ml/v1/deployments/insurance_test_deployment/text/generation?version=2021-05-01",
    prompt_url="prompt_url",
    prompt_additional_info={"IBM Cloud Region": "us-east1"}
)

task_id = "retrieval_augmented_generation"
name = "RAG Prompt"
description = "RAG Prompt"
model_id = "meta-llama/llama-3-70b-instruct"

# define parameters for PromptTemplate
prompt_variables = {"context1": "","context2": "","context3": "","question": ""}
input = prompt_input
input_prefix= ""
output_prefix= ""

prompt_template = PromptTemplate(
    input=input,
    prompt_variables=prompt_variables,
    input_prefix=input_prefix,
    output_prefix=output_prefix,
)

pta_details = facts_client.assets.create_detached_prompt(
    model_id=model_id,
    task_id=task_id,
    name=name,
    description=description,
    prompt_details=prompt_template,
    detached_information=detached_information)
project_pta_id = pta_details.to_dict()["asset_id"]

2025/10/01 15:23:25 INFO : ------------------------------ Detached Prompt Creation Started ------------------------------
2025/10/01 15:23:27 INFO : The detached prompt with ID a6d84a24-d49c-4e3d-afba-660e1bc461d4 was created successfully in container_id 2ee97ecb-f652-41a9-b360-dd2a7d594a8b.


# See factsheets information <a name="factsheetsspace"></a>

In [15]:
factsheets_url = f"{DATAPLATFORM_URL.replace('api.', '')}/wx/prompt-details/{project_pta_id}/factsheet?context=wx&project_id={PROJECT_ID}"

print(f"User can navigate to the published facts in project {factsheets_url}")

User can navigate to the published facts in project https://dataplatform.cloud.ibm.com/wx/prompt-details/a6d84a24-d49c-4e3d-afba-660e1bc461d4/factsheet?context=wx&project_id=2ee97ecb-f652-41a9-b360-dd2a7d594a8b


# Evaluate Prompt template from space <a name="evaluatespace"></a>

Now, we can promote the created prompt template asset to space and perform similar actions.

# Promote PTA to space <a name="promottospace"></a> 

Below cell promotes the prompt template asset from the project to the space.

In [16]:

headers={}
headers["Content-Type"] = "application/json"
headers["Accept"] = "*/*"
headers["Authorization"] = "Bearer {}".format(iam_access_token)
verify = True

url = "{}/v2/assets/{}/promote".format(DATAPLATFORM_URL ,project_pta_id)

params = {
    "project_id":PROJECT_ID
}

payload = {
    "space_id": space_id
}
response = requests.post(url, json=payload, headers=headers, params = params, verify = verify)
json_data = response.json()
space_pta_id = json_data["metadata"]["asset_id"]
space_pta_id

'96371823-fe8f-401e-9705-38a85aec9fee'

# Create deployment for prompt template asset in space <a name="ptadeployment"></a>

To create a subscription from space, it is necessary to create a deployment for prompt template assets in spaces.

In [17]:
DEPLOYMENTS_URL = WML_CREDENTIALS["url"] + "/ml/v4/deployments"

payload = {
    "prompt_template": {
      "id": space_pta_id
    },
    "detached": {
    },
    "base_model_id": "meta-llama/llama-3-70b-instruct",
    "description": "rag qa deployment",
    "name": "RAG Prompt Evaluation",
    "space_id": space_id
}

version = "2023-07-07" # The version date for the API of the form YYYY-MM-DD. Example : 2023-07-07
params = {
    "version":version,
    "space_id":space_id
}

response = requests.post(DEPLOYMENTS_URL, json=payload, headers=headers, params = params, verify = verify)
json_data = response.json()


if "metadata" in json_data:
    deployment_id = json_data["metadata"]["id"]
    print(deployment_id)
else:
    print(json_data)

be715d31-1319-4ba1-8a7b-471ae8b3c81a


In [18]:
from ibm_cloud_sdk_core.authenticators import IAMAuthenticator, CloudPakForDataAuthenticator

from ibm_watson_openscale import *
from ibm_watson_openscale.supporting_classes.enums import *
from ibm_watson_openscale.supporting_classes import *

service_instance_id = "2c08665d-dacd-4fe2-9035-6963dc3fb1a4" # Update this to refer to a particular service instance
authenticator = IAMAuthenticator(
    apikey=CLOUD_API_KEY,
    url=IAM_URL
)
wos_client = APIClient(
    authenticator=authenticator,
    service_url=SERVICE_URL,
    service_instance_id=service_instance_id
)
data_mart_id = wos_client.service_instance_id
print(wos_client.version)

3.1.0


# Setup the prompt template asset in space for evaluation with supported monitor dimensions <a name="ptaspace"></a>

The prompt template assets from space is only supported with [`pre_production` and `production`] operational space IDs. Running the below cell will create a `pre_production` type subscription from the prompt template asset promoted to the space. The `problem_type` value should depend on the task type specified in the prompt template asset.

In [19]:
gen_ai_evaluator = wos_client.integrated_systems.add(
            name="llm as a judge",
            description="llm as judge evaluator",
            type="generative_ai_evaluator",
            parameters={
                "evaluator_type": "watsonx.ai",
                "model_id": "meta-llama/llama-3-3-70b-instruct"
            },
            credentials=WML_CREDENTIALS
        )

# Get evaluator integrated system ID
result = gen_ai_evaluator.result._to_dict()
evaluator_id = result["metadata"]["id"]
print(f"Evaluator created with ID: {evaluator_id}")
        

Evaluator created with ID: 01999f31-554e-7921-8d44-74c713ee765c


In [20]:
label_column = "answer"
context_fields = ["context1", "context2", "context3"]
question_field = "question"
operational_space_id = "production"
problem_type= "retrieval_augmented_generation"
input_data_type= "unstructured_text"

monitors ={"generative_ai_quality": {
            "parameters": {
                "generative_ai_evaluator": {
                    "enabled": True,
                    "evaluator_id": evaluator_id,
                },
                "min_sample_size": 1,
                "metrics_configuration": {
                    "faithfulness":{},
                    "unsuccessful_requests":{},
                    "answer_relevance": {},
                    "retrieval_quality": {
                        "context_relevance": {}
                    },
                    "answer_similarity": {}
                }
            }
        }
    }
   


response = wos_client.wos.execute_prompt_setup(prompt_template_asset_id = space_pta_id, 
                                                                   space_id = space_id,
                                                                   deployment_id = deployment_id,
                                                                   label_column = label_column,
                                                                   context_fields=context_fields,     
                                                                   question_field = question_field,   
                                                                   operational_space_id = operational_space_id, 
                                                                   problem_type = problem_type,
                                                                   input_data_type = input_data_type, 
                                                                   supporting_monitors = monitors, 
                                                                   background_mode = False)

result = response.result
result._to_dict()




 Waiting for end of adding prompt setup 96371823-fe8f-401e-9705-38a85aec9fee 




running...
finished

---------------------------------------------------------------
 Successfully finished setting up prompt template subscription 
---------------------------------------------------------------




{'prompt_template_asset_id': '96371823-fe8f-401e-9705-38a85aec9fee',
 'space_id': '888ea269-8065-4573-b755-30e6ab163afc',
 'deployment_id': 'be715d31-1319-4ba1-8a7b-471ae8b3c81a',
 'service_provider_id': '01999daf-32a1-7972-a6c9-e8e0fa9a6d2a',
 'subscription_id': '01999f31-6e5b-708c-bb49-94184b8b8053',
 'mrm_monitor_instance_id': '01999f31-bc12-7b3a-ab9b-9bcec520e7b9',
 'start_time': '2025-10-01T09:53:59.696675Z',
 'end_time': '2025-10-01T09:54:27.574098Z',
 'status': {'state': 'FINISHED'}}

With the below cell, users can read the prompt setup task and check its status

In [21]:
response = wos_client.monitor_instances.mrm.get_prompt_setup(prompt_template_asset_id = space_pta_id,
                                                             deployment_id = deployment_id,
                                                             space_id = space_id)

result = response.result
result_json = result._to_dict()
result_json

This method will be deprecated in the next release and be replaced by wos_client.wos.get_prompt_setup() method


{'prompt_template_asset_id': '96371823-fe8f-401e-9705-38a85aec9fee',
 'space_id': '888ea269-8065-4573-b755-30e6ab163afc',
 'deployment_id': 'be715d31-1319-4ba1-8a7b-471ae8b3c81a',
 'service_provider_id': '01999daf-32a1-7972-a6c9-e8e0fa9a6d2a',
 'subscription_id': '01999f31-6e5b-708c-bb49-94184b8b8053',
 'mrm_monitor_instance_id': '01999f31-bc12-7b3a-ab9b-9bcec520e7b9',
 'start_time': '2025-10-01T09:53:59.696675Z',
 'end_time': '2025-10-01T09:54:27.574098Z',
 'status': {'state': 'FINISHED'}}

### Read subscription id from prompt setup

Once prompt setup status is finished, Read the subscription id from it.

In [22]:
prod_subscription_id = result_json["subscription_id"]
prod_subscription_id

'01999f31-6e5b-708c-bb49-94184b8b8053'

## Below segment is required only if the user chooses PRODUCTION SPACE <a name="Prod"></a>

Now that the WML service has been bound and the subscription has been created, we need to score the prompt template asset. The downloaded csv is used to construct the payload as well as feedback for the deployment.

In [23]:
import csv

feature_fields = context_fields + [question_field]
prediction = "generated_text"

pl_data = []
prediction_list = []

with open(test_data_path, 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for row in csv_reader:
        request = {
            "parameters": {
                "template_variables": {
                }
            }
        }
        for each in feature_fields:
            request["parameters"]["template_variables"][each] = str(row[each])

        predicted_val = row[prediction]
        prediction_list.append(predicted_val)
        response = {
            "results": [
                {
                    prediction: predicted_val
                }
            ]
        }
        record = {"request": request, "response": response}
        pl_data.append(record)
pl_data

[{'request': {'parameters': {'template_variables': {'context1': "\n<t1></t1>\nAt Bingle, we don't have the option to change your policy from Comprehensive to Third Party Property Damage, or vice versa. If you want to change the level of cover, youâ€™ll need to start up a NEW Comprehensive Insurance (https://online.bingle.com.au/motor/pub/binglequote?productType=comprehensive) or Third Party Property Damage (https://online.bingle.com.au/motor/pub/binglequote?productType=thirdParty) policy online, and cancel (https://online.bingle.com.au/onesuncorp) the existing policy. Cancelling within the cooling off period. If you change your mind within 21 days of when your policy starts, and you haven't made a claim, you can cancel your policy and receive a full refund. Just make sure you select 'Cooling off' as the reason for the cancellation, and select the start date of your policy as the cancellation date. If a refund is due to you, this will automatically be issued back to the original card yo

In [24]:
import time
from ibm_watson_openscale.supporting_classes.enums import *

time.sleep(5)
payload_data_set_id = None
payload_data_set_id = wos_client.data_sets.list(type=DataSetTypes.PAYLOAD_LOGGING, 
                                                target_target_id=prod_subscription_id, 
                                                target_target_type=TargetTypes.SUBSCRIPTION).result.data_sets[0].metadata.id
if payload_data_set_id is None:
    print("Payload data set not found. Please check subscription status.")
else:
    print("Payload data set id: ", payload_data_set_id)

Payload data set id:  01999f31-9a02-7edb-a190-d73cc46f0d20


In [25]:
import uuid
from ibm_watson_openscale.supporting_classes.payload_record import PayloadRecord
time.sleep(5)
pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
print("Number of records in the payload logging table: {}".format(pl_records_count))
if pl_records_count < 110:
    print("Payload logging did not happen, performing explicit payload logging.")
    wos_client.data_sets.store_records(data_set_id=payload_data_set_id, request_body=pl_data,background_mode=False)
    time.sleep(5)
    pl_records_count = wos_client.data_sets.get_records_count(payload_data_set_id)
    print("Number of records in the payload logging table: {}".format(pl_records_count))

Number of records in the payload logging table: 0
Payload logging did not happen, performing explicit payload logging.



 Waiting for end of storing records with request id: c17682d5-ee7d-4cf1-9620-d893c77bdc40 




pending
active

---------------------------------------
 Successfully finished storing records 
---------------------------------------


Number of records in the payload logging table: 10


Run below 2 cells if you have Devlopment / Validation  Deployment Space to trigger manual evaluation. Production Deployment Space will have the evaluation triggered through auto scheduler at every 1 hour with the min sample size condition met.

In [26]:
monitor_definition_id = "mrm"
target_target_id = prod_subscription_id
result = wos_client.monitor_instances.list(data_mart_id=data_mart_id,
                                           monitor_definition_id=monitor_definition_id,
                                           target_target_id=target_target_id,
                                           space_id=space_id).result
result_json = result._to_dict()
mrm_monitor_id = result_json["monitor_instances"][0]["metadata"]["id"]
mrm_monitor_id

'01999f31-bc12-7b3a-ab9b-9bcec520e7b9'

In [27]:
#####################################################################################
######### For pre_production flow 
######################################################################################
body = {}
#content_type = "multipart/form-data"
response  = wos_client.monitor_instances.mrm.evaluate_risk(
                                                    monitor_instance_id=mrm_monitor_id,
                                                    body=body, 
                                #                    test_data_set_name=test_data_set_name,
                                 #                   test_data_path=test_data_path,        
                                #                    content_type=content_type,            
                                                    includes_model_output=True,           
                                                    space_id = space_id,
                                                    background_mode = False
                                                    )




 Waiting for risk evaluation of MRM monitor 01999f31-bc12-7b3a-ab9b-9bcec520e7b9 




running......
finished

---------------------------------------
 Successfully finished evaluating risk 
---------------------------------------




In [29]:
monitor_definition_id = "generative_ai_quality"
result = wos_client.monitor_instances.list(data_mart_id = data_mart_id,
                                           monitor_definition_id = monitor_definition_id,
                                           target_target_id = target_target_id,
                                           space_id = space_id).result
result_json = result._to_dict()
genaiquality_monitor_id = result_json["monitor_instances"][0]["metadata"]["id"]
genaiquality_monitor_id

'01999f31-ae92-7ace-a555-7c7aaf8efc03'

In [30]:
wos_client.monitor_instances.show_metrics(monitor_instance_id=genaiquality_monitor_id, space_id=space_id)

0,1,2,3,4,5,6,7,8,9,10,11
2025-10-01 09:56:34.274398+00:00,faithfulness,01999f33-c562-76ee-a46c-4acccf412af0,0.84,0.7,,"['computed_on:payload_logging', 'field_type:subscription', 'aggregation_type:mean']",generative_ai_quality,01999f31-ae92-7ace-a555-7c7aaf8efc03,7dd38940-aa6e-493c-9f6b-60d542481e28,subscription,01999f31-6e5b-708c-bb49-94184b8b8053
2025-10-01 09:56:34.274398+00:00,average_precision,01999f33-c562-76ee-a46c-4acccf412af0,0.82,0.7,,"['computed_on:payload_logging', 'field_type:subscription', 'aggregation_type:mean']",generative_ai_quality,01999f31-ae92-7ace-a555-7c7aaf8efc03,7dd38940-aa6e-493c-9f6b-60d542481e28,subscription,01999f31-6e5b-708c-bb49-94184b8b8053
2025-10-01 09:56:34.274398+00:00,records_processed,01999f33-c562-76ee-a46c-4acccf412af0,10.0,,,"['computed_on:payload_logging', 'field_type:subscription', 'aggregation_type:mean']",generative_ai_quality,01999f31-ae92-7ace-a555-7c7aaf8efc03,7dd38940-aa6e-493c-9f6b-60d542481e28,subscription,01999f31-6e5b-708c-bb49-94184b8b8053
2025-10-01 09:56:34.274398+00:00,hit_rate,01999f33-c562-76ee-a46c-4acccf412af0,1.0,0.7,,"['computed_on:payload_logging', 'field_type:subscription', 'aggregation_type:mean']",generative_ai_quality,01999f31-ae92-7ace-a555-7c7aaf8efc03,7dd38940-aa6e-493c-9f6b-60d542481e28,subscription,01999f31-6e5b-708c-bb49-94184b8b8053
2025-10-01 09:56:34.274398+00:00,answer_relevance,01999f33-c562-76ee-a46c-4acccf412af0,0.68,0.7,,"['computed_on:payload_logging', 'field_type:subscription', 'aggregation_type:mean']",generative_ai_quality,01999f31-ae92-7ace-a555-7c7aaf8efc03,7dd38940-aa6e-493c-9f6b-60d542481e28,subscription,01999f31-6e5b-708c-bb49-94184b8b8053
2025-10-01 09:56:34.274398+00:00,reciprocal_rank,01999f33-c562-76ee-a46c-4acccf412af0,0.8667,0.7,,"['computed_on:payload_logging', 'field_type:subscription', 'aggregation_type:mean']",generative_ai_quality,01999f31-ae92-7ace-a555-7c7aaf8efc03,7dd38940-aa6e-493c-9f6b-60d542481e28,subscription,01999f31-6e5b-708c-bb49-94184b8b8053
2025-10-01 09:56:34.274398+00:00,ndcg,01999f33-c562-76ee-a46c-4acccf412af0,0.9205,0.7,,"['computed_on:payload_logging', 'field_type:subscription', 'aggregation_type:mean']",generative_ai_quality,01999f31-ae92-7ace-a555-7c7aaf8efc03,7dd38940-aa6e-493c-9f6b-60d542481e28,subscription,01999f31-6e5b-708c-bb49-94184b8b8053
2025-10-01 09:56:34.274398+00:00,retrieval_precision,01999f33-c562-76ee-a46c-4acccf412af0,0.7,0.7,,"['computed_on:payload_logging', 'field_type:subscription', 'aggregation_type:mean']",generative_ai_quality,01999f31-ae92-7ace-a555-7c7aaf8efc03,7dd38940-aa6e-493c-9f6b-60d542481e28,subscription,01999f31-6e5b-708c-bb49-94184b8b8053
2025-10-01 09:56:34.274398+00:00,context_relevance,01999f33-c562-76ee-a46c-4acccf412af0,1.0,0.7,,"['computed_on:payload_logging', 'field_type:subscription', 'aggregation_type:mean']",generative_ai_quality,01999f31-ae92-7ace-a555-7c7aaf8efc03,7dd38940-aa6e-493c-9f6b-60d542481e28,subscription,01999f31-6e5b-708c-bb49-94184b8b8053
2025-10-01 09:56:34.274398+00:00,unsuccessful_requests,01999f33-c562-76ee-a46c-4acccf412af0,0.0,,0.1,"['computed_on:payload_logging', 'field_type:subscription', 'aggregation_type:mean']",generative_ai_quality,01999f31-ae92-7ace-a555-7c7aaf8efc03,7dd38940-aa6e-493c-9f6b-60d542481e28,subscription,01999f31-6e5b-708c-bb49-94184b8b8053


Note: First 10 records were displayed.


In [28]:
factsheets_url = "https://dataplatform.cloud.ibm.com/ml-runtime/deployments/{}/details?space_id={}&context=wx&flush=true".format(deployment_id, space_id)

print("User can navigate to the published facts in space {}".format(factsheets_url))

User can navigate to the published facts in space https://dataplatform.cloud.ibm.com/ml-runtime/deployments/be715d31-1319-4ba1-8a7b-471ae8b3c81a/details?space_id=888ea269-8065-4573-b755-30e6ab163afc&context=wx&flush=true


## Congratulations!

You have finished the hands-on lab for IBM Watson OpenScale. You can now navigate to the prompt template asset in your project / spaceand click on the Evaluate tab to visualise the results on the UI.