# Prompt Evaluator for RAG Task Type

This notebook uses `PromptEvaluator` to do end to end lifecycle evaluation for a **RAG** Prompt Template Asset. The notebook will help the user to:
1. Create Prompt Template Asset in a given project.
2. Configure watsonx.governance to monitor the prompt template asset created.
3. Evaluate the metrics for the configured monitors.

## Prerequisites
- Service credentials.
- Development stage project id.
- Production stage space id.
- Service instance id (only when using watsonx.governance).

In [None]:
%pip install -U ibm-watsonx-gov[visualization]
import warnings
warnings.filterwarnings("ignore")

### Configure your credentials

#### Using watsonx.governance as Service
These are the needed values when using watsonx.governance as service:
- `region`: This is the region for the watsonx.governance as service. This field is optional; by default, it is set to the us-south(Dallas) region. Supported region values are us-south, eu-de, au-syd, ca-tor, jp-tok, eu-gb.
- `api_key`: The API key required for authentication. Instructions for generating API keys can be found
[here](https://cloud.ibm.com/docs/account?topic=account-userapikey&interface=ui).
- `service_instance_id`: This is the instance ID for watsonx.governance as service. To retrieve these details, follow the steps in next section.

##### Retrieving `service_instance_id` value
You can view the service instance ID that is associated with your watsonx.governance service instance by navigating to your IBM Cloud resource list.
1. [Log in to IBM Cloud](https://cloud.ibm.com/).
2. Go to Menu > Resource list, and then click Services to browse a list of your cloud services.
3. Click the table row that describes your watsonx.governance service instance.
4. Go to Overview > Instance, and copy the GUID value.

#### Using watsonx.governance software
These are the needed values when using watsonx.governance software:
- `url`: The URL of the instance. This is required when using watsonx.governance software.
- `api_key`: The user's API key. Instructions for generating API keys can be found [here](https://www.ibm.com/docs/en/watsonx/w-and-w/2.1.0?topic=tutorials-generating-api-keys).
- `username`: The username needed for authentication.
- `version`:  The version of watsonx.governance software being used.


In [None]:
from ibm_watsonx_gov.entities.credentials import Credentials
PROJECT_ID = ""
SPACE_ID = ""
INPUT_FILE_PATH = "https://raw.githubusercontent.com/IBM/ibm-watsonx-gov/refs/heads/samples/notebooks/data/rag/rag_for_detached_promt_template.csv"

# Usecase details, only needed if tracking the prompt templates in usecase
USECASE_ID = ""
CATALOG_ID = ""

credentials = Credentials(
    api_key="<EDIT_THIS>",
    service_instance_id="<EDIT_THIS>", #optional. Not needed when using watsonx.ai

    # Uncomment the following attributes when using watsonx.governance software
    # url="<EDIT_THIS>",
    # username="<EDIT_THIS>",
    # version="<EDIT_THIS>",
    # disable_ssl="<EDIT_THIS>",
)

## Detached Prompt configuration
In this cell the configuration of the detached prompt template is defined. 

Note: to only run the prompt template set up without running the evaluations, set `input_file_path` to `None`.

## `config`

| **Attribute** | **Required** | **Description** |
|---------------|--------------|------------------|
| `prompt_setup` | Yes | Dictionary defining the setup for prompt evaluation. |
| `development_project_id` | Yes | Project ID for the development stage. Enables development evaluation. |
| `production_space_id` | No | Space ID for the production stage. Enables production evaluation. |
| `prompt_template` | No | Dictionary defining a prompt template. Used if `prompt_template_id` is not provided. |
| `detached_prompt_template` | No | Dictionary defining a detached prompt template. Used if `prompt_template_id` is not provided. |
| `prompt_template_id` | No | ID of an existing prompt template. Used if `prompt_template` is not provided. |
| `development_monitors` | No | List of monitor configurations for development stage. |
| `production_monitors` | No | List of monitor configurations for production stage. |
| `space_deployment` | No | Dictionary describing metadata for production deployment. |
| `ai_usecase_id` | No, Required if tracking the usecase | Usecase id to track the prompt template. |
| `catalog_id` | No, Required if tracking the usecase | The catalog where the usecase is stored in. By default this is the `inventory_id` |
| `approach_version` | No, Required if tracking the usecase | usecase approach version. This must be in sematic version format, for example `0.0.1` |
| `approach_id` | No | The usecase approach id. |

### `prompt_template`

| **Attribute** | **Required** | **Description** | **Example** |
|---------------|--------------|------------------|-------------|
| `name` | Yes | Name of the prompt template. | `"MyPrompt"` |
| `model_id` | Yes | ID of the model used in the prompt. | `"ibm/granite-3-2-8b-instruct"` |
| `input_text` | Yes | Template text with placeholders. | `"Generate a summary for {text}"` |
| `input_variables` | Yes | List of variables used in the prompt. | `["text"]` |
| `task_ids` | Yes | List of task identifiers. | `["summarization"]` |
| `description` | No | Description of the prompt template. | `"Summarization prompt"` |

### `detached_prompt_template`

| **Attribute** | **Required** | **Description** | **Example** |
|---------------|--------------|------------------|-------------|
| `name` | Yes | Name of the detached prompt template. | `"ExternalPrompt"` |
| `model_id` | Yes | Model ID for the detached prompt. | `"external-model-001"` |
| `input_text` | Yes | Prompt text. | `"Answer the question: {question}"` |
| `input_variables` | Yes | Variables used in the prompt. | `["question"]` |
| `task_ids` | Yes | Task identifiers. | `["qa"]` |
| `detached_model_url` | Yes | URL of the external model. | `"https://model.api"` |
| `detached_prompt_id` | No | ID of the detached prompt. | `"dp-001"` |
| `detached_model_id` | No | ID of the detached model. | `"dm-001"` |
| `detached_model_provider` | No | Provider of the detached model. | `"OpenAI"` |
| `detached_model_name` | No | Name of the detached model. | `"gpt-4"` |
| `detached_prompt_url` | No | URL of the prompt. | `"https://prompt.api"` |
| `detached_prompt_additional_information` | No | Additional metadata. | `{"notes": "test"}` |


### `prompt_setup`

| **Attribute** | **Required** | **Description** | **Example** |
|---------------|--------------|------------------|-------------|
| `problem_type` | Yes | Type of task (e.g., `rag`, `qa`, `summarization`). | `"rag"` |
| `input_data_type` | No | Type of input data. | `"unstructured_text"` |
| `prediction_field` | No | Field name for model output. | `"generated_text"` |
| `label_column` | No | Field name for ground truth labels. | `"answer"` |
| `question_field` | No (for RAG) | Field name for the question. | `"question"` |
| `context_fields` | No (for RAG) | List of context fields. | `["context1", "context2"]` |


### `development_monitors` and `production_monitors`

| **Attribute** | **Required** | **Description** | **Example** |
|---------------|--------------|------------------|-------------|
| `monitor_name` | Yes | Name of the monitor. | `"quality_monitor"` |
| `thresholds` | No | Thresholds for the monitor. | `{"accuracy": 0.9}` |
| `parameters` | No | Parameters for the monitor. | `{"param1": "value"}` |

### `space_deployment`

| **Attribute** | **Required** | **Description** | **Example** |
|---------------|--------------|------------------|-------------|
| `serving_name` | No | Name of the deployment. | `"deployment_123456"` |
| `base_model_id` | Yes (if not inferred) | Model ID used in deployment. | `"model-xyz"` |
| `description` | No | Description of the deployment. | `"Production deployment"` |
| `name` | No | Name of the space. | `"Production Space"` |
| `version_date` | No | Version date of the deployment. | `"2024-12-18"` |


In [None]:
from ibm_watsonx_gov.entities.enums import TaskType

input_file_path = INPUT_FILE_PATH
context_fields = ["context1", "context2", "context3"]
question_field = "question"
prompt_input = """
[INST] <>You are an assistant named Buddy who helps customers of an Australian online-only insurer named Bingle. Being online only, you should not suggest contacting Bingle. You should only answer queries related to car insurance. Your answer must be general in nature. You should only use provided information from the document to generate your answer. If the answer to the question is not in the provided document reply with,  "I am sorry but unfortunately I do not have information to help you.".  Use Australian spelling and Australian insurance terminology. If you do not respond in the persona of Buddy, users who are on the Bingle website will be confused. You should maintain a friendly customer service tone.<>
    Here is the document you should use to answer the user:
    {context1}\n{context2}\n{context3}
    Here are some important rules for the interaction:
    - Always stay in character, as Buddy from Bingle and answer user questions in first person.
    - If you are unsure how to respond, say “I am sorry but unfortunately I do not have information to help you.”.
    - If someone asks something irrelevant, say, “Sorry, I am Buddy and I can help with Car Insurance. Do you have an insurance related question today I can help you with?”.
    - Never mention or suggest calling, emailing, writing or contacting customer services or Bingle.
    - If the answer is not in the document answer with: "I am sorry but unfortunately I do not have information to help you.”.
    - If the document provides instruction ensure to list them out so that the user understands the process to take, this will be extremely helpful.
    - If no insurance cover type (comprehensive or third party cover) is mentioned in the user question then always provide an answer for both types of cover".
    - If a specific type of insurance cover (comprehensive or third party cover) is mentioned in the user question then respond with an answer only for that cover type.
    - Remember, user have already looked at the Bingle website so do not suggest them to check our website as this will be condescending and rude, instead suggest they review the Product Disclosure Statement (PDS).

    Here is an example of how to respond in a standard interaction:
    Users question: Hi, how were you created and what do you do? [/INST]
    Step 1: Check if the question is related to Bingle car insurance. Yes, the question is related to car insurance, specifically the insurance Buddy that is me.
    Step 2: Check if the answer can be found in the provided document. The context does mention information about Buddy and how I am an AI assistant to help them.
    Step 3: Provide the answer in structured json.
    ANSWER: {{"answer": "Hello! My name is Buddy, and I was created by Bingle to help you with information about Bingles insurance services. What can I help you with today?."}}
    [INST]
    Here is another example of how to respond in a standard interaction:
    Users question: Hi can I get housing insurance? [/INST]
    Step 1: Check if the question is related to Bingle car insurance. No, the question is not related to car insurance.
    Step 2: Check if the answer can be found in the provided document. The document does not mention information about housing insurance.
    Step 3: Provide the answer in structured json.
    ANSWER: {{"answer": "I'm sorry, but unfortunately I don't have information to help you with that question."}}
    [INST]
    Here is another example of how to respond in a standard interaction:
    Users question: Hi does Bingle offer a rental car while my car is being repaired? [/INST]
    Step 1: Check if the question is related to Bingle car insurance. Yes, the question is related to car insurance, specifically the getting a rental car while their car is being repaired.
    Step 2: Check if the answer can be found in the provided document. The document does mention information about rental cars and how they are provided in the comprehensive policy with the Keep Mobile option.
    Step 3: Provide the answer in structured json.
    ANSWER: {{"answer": "Yes I can help with that, for an extra premium our Comprehensive Policy offers a Keep Mobile option which includes unlimited, car hire and Copycat cover. Our Third Party Policy does not include the Keep Mobile option. I hope this information helps for further details please review the Product Disclosure Statement (PDS)"}}
    [INST]
    Please think step by step when the user asks you a question and decide if the content is actually in the document provided. Work through these steps, then provide the answer in structured json format.

    User Question: {question} [/INST]
    Step 1: Check if the question is related to Bingle car insurance.

"""

config = {
    "prompt_setup": {
        "problem_type": TaskType.RAG.value,
        "context_fields": context_fields,
    },
    "development_project_id": PROJECT_ID,
    "production_space_id": SPACE_ID,
    "detached_prompt_template": {
        "input_text": prompt_input,
        "input_variables": [question_field] + context_fields,
        "task_ids": [TaskType.RAG.value],
    },

    # # Uncomment the following block to use prompt templates instead of detached prompt template
    # "prompt_template": {
    #     "input_text": prompt_input,
    #     "input_variables": [question_field] + context_fields,
    #     "task_ids": [TaskType.RAG.value],
    # },

    # # Uncomment the following block to track the prompt template in a usecase
    # "ai_usecase_id": USECASE_ID,
    # "catalog_id": CATALOG_ID,
}

## Execute The Detached Prompt Template set up and evaluate the risks

In this step the prompt evaluator object will be initialized based on the configuration above, detached prompt template is set up, and the risk evaluation will be ran

In [4]:
from ibm_watsonx_gov.entities.enums import EvaluationStage
from ibm_watsonx_gov.prompt_evaluator import PromptEvaluator

evaluator = PromptEvaluator(
    credentials=credentials
)

### End to End Prompt Template Evaluation

In this step the prompt template will be set up and evaluated in both development project and production space respectively. To only do the set up and evaluation for development project or to promote an existing prompt template id to a production space and evaluate it, skip this step and refer to the next steps. 

In [5]:
evaluator.e2e_prompt_evaluation(
    config=config,
    input_file_path=input_file_path,
)


Starting setup process for development


Creating Prompt Template Asset...


Prompt template created successfully. Prompt template id: 28685303-48f4-4652-919d-a11f554975fb


Setting up prompt for evaluation stage 'development'...


Started prompt set up for 'development':

{
  "prompt_template_asset_id": "28685303-48f4-4652-919d-a11f554975fb",
  "start_time": "2025-06-26T13:54:34.243383Z",
  "status": {
    "state": "RUNNING"
  },
  "project_id": "05ba49c7-1927-4b6d-9c09-2766e7aa554a"
}


Prompt set up for the stage EvaluationStage.DEVELOPMENT finished successfully:

{
  "prompt_template_asset_id": "28685303-48f4-4652-919d-a11f554975fb",
  "start_time": "2025-06-26T13:54:34.243383Z",
  "status": {
    "state": "FINISHED"
  },
  "project_id": "05ba49c7-1927-4b6d-9c09-2766e7aa554a",
  "end_time": "2025-06-26T13:54:56.106703Z",
  "service_provider_id": "0197a77d-f575-791d-a34c-0d0fac3c4bcc",
  "deployment_id": "0ced02d2-65f4-4388-b48e-2c8f98111452",
  "subscription_id": "0197ac84-dd0e-7b

Unnamed: 0,monitor_name,data_mart_id,status,monitor_instance_id,measurement_id
0,generative_ai_quality,00000000-0000-0000-0000-000000000000,active,0197ac84-ff77-7905-b077-b88b1bf3ba57,
1,model_health,00000000-0000-0000-0000-000000000000,active,0197ac85-04ba-70f8-afb0-59f8a6f0ab8f,
2,mrm,00000000-0000-0000-0000-000000000000,active,0197ac85-09fd-715c-8122-1f2b0aca03a7,



development prompt set up finished successfully


Starting Prompt template usecase tracking process


Checking if workspace 05ba49c7-1927-4b6d-9c09-2766e7aa554a is associated with usecase c651dc8a-3692-4977-b20d-63690c04e306.


Prompt template id 28685303-48f4-4652-919d-a11f554975fb is tracked with usecase id c651dc8a-3692-4977-b20d-63690c04e306 successfully.


Starting evaluation for development stage


Evaluating risk of MRM monitor id 0197ac85-09fd-715c-8122-1f2b0aca03a7


Successfully finished the risk evaluation.
Measurement id for risk evaluation for PTA subscription: 0197ac85-a52a-7395-b9e1-07fb11373ed1

Monitors list for subscription_id 0197ac84-dd0e-7b8c-a100-fc8bf7829d3d:


Unnamed: 0,monitor_name,data_mart_id,status,monitor_instance_id,measurement_id
0,mrm,00000000-0000-0000-0000-000000000000,active,0197ac85-09fd-715c-8122-1f2b0aca03a7,0197ac85-a52a-7395-b9e1-07fb11373ed1
1,generative_ai_quality,00000000-0000-0000-0000-000000000000,active,0197ac84-ff77-7905-b077-b88b1bf3ba57,0197ac86-483b-7fe2-99e8-cdfe3e699cb0
2,model_health,00000000-0000-0000-0000-000000000000,active,0197ac85-04ba-70f8-afb0-59f8a6f0ab8f,0197ac86-1cd0-7ad7-8950-ad2120379d2c



User can navigate to the published facts in project https://cpd-cpd-instance.apps.wxgnfs416gpu90-1.cp.fyre.ibm.com/wx/prompt-details/28685303-48f4-4652-919d-a11f554975fb/factsheet?context=wx&project_id=05ba49c7-1927-4b6d-9c09-2766e7aa554a


Finished evaluation for development stage


Starting setup process for production


Loading Prompt Template Asset...


Prompt template loaded successfully. Prompt template id: 28685303-48f4-4652-919d-a11f554975fb


Promoting prompt from project id: 05ba49c7-1927-4b6d-9c09-2766e7aa554a to space id b59e0805-e301-4e01-8f8c-e0191ee1ca92


Template promoted to space successfully. Prompt template id: 6d23fc2d-34dd-4031-a949-df2975103ff7


Creating space deployment for space id b59e0805-e301-4e01-8f8c-e0191ee1ca92 and prompt template id 6d23fc2d-34dd-4031-a949-df2975103ff7


Deployment created successfully. Space deployment id: 914750f9-4ca3-4194-a5f0-ef0938c6ed68


Setting up prompt for evaluation stage 'production'...


Started prompt set up for 'produc

Unnamed: 0,monitor_name,data_mart_id,status,monitor_instance_id,measurement_id
0,generative_ai_quality,00000000-0000-0000-0000-000000000000,active,0197ac86-c964-7708-b1f0-839d2ffa99aa,
1,model_health,00000000-0000-0000-0000-000000000000,active,0197ac86-cf86-747b-81e6-0cb0e3987389,
2,mrm,00000000-0000-0000-0000-000000000000,active,0197ac86-d5f6-77e2-b945-d86512318124,



production prompt set up finished successfully


Starting Prompt template usecase tracking process


Checking if workspace b59e0805-e301-4e01-8f8c-e0191ee1ca92 is associated with usecase c651dc8a-3692-4977-b20d-63690c04e306.



[2025-06-26 16:57:03,639]-[ibm_watsonx_gov.prompt_evaluator.impl.pta_lifecycle_evaluator]-[ ERROR ]-[Line 167] ~~> HTTP Error: 400 Client Error: Bad Request for url: https://cpd-cpd-instance.apps.wxgnfs416gpu90-1.cp.fyre.ibm.com/v1/aigov/model_inventory/models/6d23fc2d-34dd-4031-a949-df2975103ff7/model_entry?space_id=b59e0805-e301-4e01-8f8c-e0191ee1ca92. Response body: {"errors":[{"code":"Bad Request","message":"Model is already being tracked with AI Use Case: usecase_test, hence cannot track this model."}],"trace":"bux0cywv9p8r9lvtzb7jmr6kq"}



Starting evaluation for production stage


payload logging data set id: 0197ac86-bf73-7626-b39f-0c7943821798

Adding payload logging data to data set id: 0197ac86-bf73-7626-b39f-0c7943821798

feedback data set id: 0197ac86-c6cd-72a4-80a7-0f9598ba9f96

Adding feedback data to data set id: 0197ac86-c6cd-72a4-80a7-0f9598ba9f96

Evaluating risk of MRM monitor id 0197ac86-d5f6-77e2-b945-d86512318124


Successfully finished the risk evaluation.
Measurement id for risk evaluation for PTA subscription: 0197ac87-7d87-7d57-a7a6-d514fe0573d8

Monitors list for subscription_id 0197ac86-b23c-7029-b631-fe55ab33704e:


Unnamed: 0,monitor_name,data_mart_id,status,monitor_instance_id,measurement_id
0,mrm,00000000-0000-0000-0000-000000000000,active,0197ac86-d5f6-77e2-b945-d86512318124,0197ac87-7d87-7d57-a7a6-d514fe0573d8
1,generative_ai_quality,00000000-0000-0000-0000-000000000000,active,0197ac86-c964-7708-b1f0-839d2ffa99aa,0197ac87-f003-7a7c-925e-6172d186d11e
2,model_health,00000000-0000-0000-0000-000000000000,active,0197ac86-cf86-747b-81e6-0cb0e3987389,0197ac87-90b9-7388-96b1-db14434f5666



User can navigate to the published facts in space https://cpd-cpd-instance.apps.wxgnfs416gpu90-1.cp.fyre.ibm.com/ml-runtime/deployments/914750f9-4ca3-4194-a5f0-ef0938c6ed68/details?space_id=b59e0805-e301-4e01-8f8c-e0191ee1ca92&context=wx&flush=true


Finished evaluation for production stage



### Set up and Evaluate The Prompt Template In Development Environment Only

The production project details can be excluded from the configuration object to perform a complete end-to-end setup and evaluation of the prompt template in development environment only.

Uncomment the next cell to set up the prompt template and run evaluation in the development environment only.

In [6]:
# development_only_config = {
#     "prompt_setup": {
#         "problem_type": TaskType.RAG.value,
#         "context_fields": context_fields,
#     },
#     "development_project_id": PROJECT_ID,
#     "detached_prompt_template": {
#         "input_text": prompt_input,
#         "input_variables": [question_field] + context_fields,
#         "task_ids": [TaskType.RAG.value],
#     }
# }

# evaluator.e2e_prompt_evaluation(
#     config=development_only_config,
#     input_file_path=input_file_path,
# )

### Set up and Evaluate The Prompt Template In Production Environment Only

The development project details can be excluded from the configuration object to perform a complete end-to-end setup and evaluation of the prompt template in production environment only.

Uncomment the next cell to set up the prompt template and run evaluation in the production environment only.

In [None]:
# production_only_config = {
#     "prompt_setup": {
#         "problem_type": TaskType.RAG.value,
#         "context_fields": context_fields,
#     },
#     "production_space_id": SPACE_ID,
#     "detached_prompt_template": {
#         "input_text": prompt_input,
#         "input_variables": [question_field] + context_fields,
#         "task_ids": [TaskType.RAG.value],
#     },
# }

# evaluator.e2e_prompt_evaluation(
#     config=production_only_config,
#     input_file_path=input_file_path,
# )

### Promote Existing Prompt Template From a Project To a Space and Evaluate It

An existing prompt template can be promoted to a production space and evaluated by adding the `prompt_template_id` and the `project_id` to the configuration object.

Uncomment the following cell to promote a existing prompt template to a production space

In [7]:
# This will use the prompt template created from the previous step. To use another prompt template, update the value of PROMPT_TEMPLATE_ID
# PROMPT_TEMPLATE_ID = evaluator.get_prompt_template_id()

# production_config = {
#     "prompt_setup": config["prompt_setup"],
#     "production_space_id": SPACE_ID,
#     "prompt_template_id": PROMPT_TEMPLATE_ID,
#     "development_project_id": PROJECT_ID,
# }

# evaluator.e2e_prompt_evaluation(
#     config=production_config,
#     input_file_path=input_file_path,
# )

## Display the metric results

### Display Generative AI Quality Metrics For Development Project

In [8]:
from ibm_watsonx_gov.entities.container import BaseMonitor

metrics = evaluator.get_monitor_metrics(
    monitor=BaseMonitor(monitor_name="generative_ai_quality"),
    environment=EvaluationStage.DEVELOPMENT,
    show_table=True,
)


Metrics for generative_ai_quality



Unnamed: 0,id,value,monitor_definition_id,ts
0,hap_input_score,0.0,generative_ai_quality,2025-06-26T13:56:10.939993Z
1,rouge2,0.9,generative_ai_quality,2025-06-26T13:56:10.939993Z
2,records_processed,10.0,generative_ai_quality,2025-06-26T13:56:10.939993Z
3,rougelsum,0.9048,generative_ai_quality,2025-06-26T13:56:10.939993Z
4,hap_score,0.0,generative_ai_quality,2025-06-26T13:56:10.939993Z
5,pii,0.0,generative_ai_quality,2025-06-26T13:56:10.939993Z
6,bleu,0.9206,generative_ai_quality,2025-06-26T13:56:10.939993Z
7,rougel,0.9048,generative_ai_quality,2025-06-26T13:56:10.939993Z
8,exact_match,0.9,generative_ai_quality,2025-06-26T13:56:10.939993Z
9,unsuccessful_requests,0.0,generative_ai_quality,2025-06-26T13:56:10.939993Z


### Display Generative AI Quality Dataset Records For Development Project

In [9]:
records = evaluator.get_dataset_records(
    dataset_type="gen_ai_quality_metrics",
    environment=EvaluationStage.DEVELOPMENT,
    show_table=True,
)


Getting monitor data set records for data set type 'gen_ai_quality_metrics' from subscription id 0197ac84-dd0e-7b8c-a100-fc8bf7829d3d


Records from data set id 0197ac85-076d-74b8-88e8-840311c3c77d



Unnamed: 0,hap_input_score,rouge2,pii_entities,scoring_id,computed_on,scoring_timestamp,rougelsum,hap_score,pii_input_entities,pii,...,run_id,pii_input_positions,pii_positions,rougel,exact_match,unsuccessful_requests,hap_score_entities,pii_input,hap_input_score_positions,rouge1
0,0.001233,1.0,[],MRM_7403169a-68f9-4978-aa15-5be699cfc6dc-9,feedback,2025-06-26T13:55:17.142Z,1.0,0.000451,[Credential.Username],0.0,...,691b183e-0d4f-4e30-85d3-1659144cf44b,,,1.0,1.0,0.0,,0.8,,1.0
1,0.000939,1.0,[],MRM_7403169a-68f9-4978-aa15-5be699cfc6dc-7,feedback,2025-06-26T13:55:17.141Z,1.0,0.000606,[Credential.Username],0.0,...,691b183e-0d4f-4e30-85d3-1659144cf44b,,,1.0,1.0,0.0,,0.8,,1.0
2,0.001911,1.0,[],MRM_7403169a-68f9-4978-aa15-5be699cfc6dc-8,feedback,2025-06-26T13:55:17.141Z,1.0,0.00038,[Credential.Username],0.0,...,691b183e-0d4f-4e30-85d3-1659144cf44b,,,1.0,1.0,0.0,,0.8,,1.0
3,0.002946,1.0,[],MRM_7403169a-68f9-4978-aa15-5be699cfc6dc-6,feedback,2025-06-26T13:55:17.138Z,1.0,0.000707,[Credential.Username],0.0,...,691b183e-0d4f-4e30-85d3-1659144cf44b,,,1.0,1.0,0.0,,0.8,,1.0
4,0.001322,1.0,[],MRM_7403169a-68f9-4978-aa15-5be699cfc6dc-4,feedback,2025-06-26T13:55:17.137Z,1.0,0.000481,[Credential.Username],0.0,...,691b183e-0d4f-4e30-85d3-1659144cf44b,,,1.0,1.0,0.0,,0.8,,1.0
5,0.002946,1.0,[],MRM_7403169a-68f9-4978-aa15-5be699cfc6dc-5,feedback,2025-06-26T13:55:17.137Z,1.0,0.001931,[Credential.Username],0.0,...,691b183e-0d4f-4e30-85d3-1659144cf44b,,,1.0,1.0,0.0,,0.8,,1.0
6,0.001113,1.0,[],MRM_7403169a-68f9-4978-aa15-5be699cfc6dc-3,feedback,2025-06-26T13:55:17.136Z,1.0,0.000299,[Credential.Username],0.0,...,691b183e-0d4f-4e30-85d3-1659144cf44b,,,1.0,1.0,0.0,,0.8,,1.0
7,0.002409,1.0,[],MRM_7403169a-68f9-4978-aa15-5be699cfc6dc-1,feedback,2025-06-26T13:55:17.135Z,1.0,0.001202,[Credential.Username],0.0,...,691b183e-0d4f-4e30-85d3-1659144cf44b,,,1.0,1.0,0.0,,0.8,,1.0
8,0.001291,1.0,[],MRM_7403169a-68f9-4978-aa15-5be699cfc6dc-2,feedback,2025-06-26T13:55:17.135Z,1.0,0.000668,[Credential.Username],0.0,...,691b183e-0d4f-4e30-85d3-1659144cf44b,,,1.0,1.0,0.0,,0.8,,1.0
9,0.001234,0.0,[],MRM_7403169a-68f9-4978-aa15-5be699cfc6dc-0,feedback,2025-06-26T13:55:17.134Z,0.0476,0.000467,"[Credential.Username, NationalNumber.Passport....",0.0,...,691b183e-0d4f-4e30-85d3-1659144cf44b,,,0.0476,0.0,0.0,,0.8,,0.0476


### Display Generative AI Quality Metrics For Production Space

In [10]:
metrics = evaluator.get_monitor_metrics(
    monitor=BaseMonitor(monitor_name="generative_ai_quality"),
    environment=EvaluationStage.PRODUCTION,
    show_table=True,
)


Metrics for generative_ai_quality



Unnamed: 0,id,value,monitor_definition_id,ts
0,hap_input_score,0.0,generative_ai_quality,2025-06-26T13:57:59.427294Z
1,rouge2,0.9,generative_ai_quality,2025-06-26T13:57:59.427294Z
2,records_processed,10.0,generative_ai_quality,2025-06-26T13:57:59.427294Z
3,rougelsum,0.9048,generative_ai_quality,2025-06-26T13:57:59.427294Z
4,hap_score,0.0,generative_ai_quality,2025-06-26T13:57:59.427294Z
5,pii,0.0,generative_ai_quality,2025-06-26T13:57:59.427294Z
6,bleu,0.9206,generative_ai_quality,2025-06-26T13:57:59.427294Z
7,rougel,0.9048,generative_ai_quality,2025-06-26T13:57:59.427294Z
8,exact_match,0.9,generative_ai_quality,2025-06-26T13:57:59.427294Z
9,unsuccessful_requests,0.0,generative_ai_quality,2025-06-26T13:57:59.427294Z
