## Evaluating with the PromptFlow SDK

https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/flow-evaluate-sdk

### An Azure OpenAI resource created

To evaluate with AI-assisted metrics, you need:

A test dataset in .jsonl format. See the next section for dataset requirements
A deployment of one of these models: GPT 3.5 models, GPT 4 models, or Davinci models AND an embedding model for grounded responses with RAG.
Ideally, GPT 4 models are recommended for the best evaluation capabilities.


## Add steps to create an Azure OpenAI resource and deploy a model

### Install SDK

### Prepare config files

#### create .env file containing secrets
```
SUBSCRIPTION_ID=
RESOURCE_GROUP_NAME=
PROJECT_NAME=
AZURE_OPENAI_ENDPOINT=
AZURE_OPENAI_KEY=
AZURE_OPENAI_EVALUATION_DEPLOYMENT=
```

In [4]:
from dotenv import load_dotenv
load_dotenv('../.env', override=True)

True

### Initialise Azure OpenAI connection

#### Make sure user is Azure OpenAI contributor
https://learn.microsoft.com/en-us/azure/ai-studio/concepts/rbac-ai-studio#scenario-use-an-existing-azure-openai-resource

In [13]:
import os
from promptflow.core import AzureOpenAIModelConfiguration
from promptflow.entities import AzureOpenAIConnection
# Initialize Azure OpenAI Connection with your environment variables
model_config = AzureOpenAIModelConfiguration(
    azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
    api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
    azure_deployment=os.environ.get("AZURE_OPENAI_EVALUATION_DEPLOYMENT"),
    api_version=os.environ.get("AZURE_OPENAI_API_VERSION"),
)

# #Initialize Azure OpenAI Connection with your environment variables
# model_config = AzureOpenAIModelConfiguration(
#     azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT"),
#     azure_deployment=os.environ.get("AZURE_OPENAI_EVALUATION_DEPLOYMENT"),
#     api_version=os.environ.get("AZURE_OPENAI_API_VERSION"),
# )

# Initialize Azure OpenAI Connection using AI Studio connection
# from promptflow.rag.config import ConnectionConfig

# model_connect_config = ConnectionConfig(
#     subscription_id = os.environ.get("SUBSCRIPTION_ID"),
#     resource_group_name = os.environ.get("RESOURCE_GROUP_NAME"),
#     workspace_name = os.environ.get("PROJECT_NAME"),
#     connection_name = "mssecureai4034688619"
# )
# model_connect = AzureOpenAIConnection(
#     api_base = os.environ.get("AZURE_OPENAI_ENDPOINT"),
#     auth_mode = "meid_token"
# )
model_connect = AzureOpenAIConnection(
    name=os.environ.get("AISTUDIO_AOAI_CONNECTION_NAME"),
    api_base=os.environ.get("AZURE_OPENAI_ENDPOINT"),
    api_type="azure",
    api_key=os.environ.get("AZURE_OPENAI_API_KEY"),
    api_version=os.environ.get("AZURE_OPENAI_API_VERSION"),
)

# model_config = AzureOpenAIModelConfiguration(
#     azure_deployment="gpt-4o",
#     connection=model_connect
# )

In [2]:
# # try using connection config
# # from promptflow.rag.config import ConnectionConfig

# model_connect_config = ConnectionConfig(
#     subscription_id = os.environ.get("SUBSCRIPTION_ID"),
#     resource_group_name = os.environ.get("RESOURCE_GROUP_NAME"),
#     workspace_name = os.environ.get("PROJECT_NAME"),
#     connection_name = "mssecureai4034688619"

# model_connect = AzureOpenAIModelConfiguration.from_connection(model_connect_config)

In [6]:
from promptflow.evals.evaluators import RelevanceEvaluator

# Initialzing Relevance Evaluator
relevance_eval = RelevanceEvaluator(model_config)
# Running Relevance Evaluator on single input row
relevance_score = relevance_eval(
    answer="The Alpine Explorer Tent is the most waterproof.",
    context="From the our product list,"
    " the alpine explorer tent is the most waterproof."
    " The Adventure Dining Table has higher weight.",
    question="Which tent is the most waterproof?",
)
print(relevance_score)

{'gpt_relevance': 5.0}


#### Test Risk and Safety Evaluators
GPT not required - instead we use Azure AI Studio safety evaluations back-end service.

Note - Risk and safety metrics are only available in the following regions: East US 2, France Central, UK South, Sweden Central. 

***Groundedness measurement leveraging Azure AI Content Safety Groundedness Detection is only supported following regions: East US 2 and Sweden Central.***

Check [region-availability](https://learn.microsoft.com/en-us/azure/ai-services/content-safety/overview#region-availability)

In [5]:
# from azure.identity import DefaultAzureCredential, get_bearer_token_provider
# token_provider = get_bearer_token_provider(
#     DefaultAzureCredential(), "https://cognitiveservices.azure.com/.default"
# )

In [7]:
# define the Azure AI Studio connection
azure_ai_project = {
    "subscription_id": os.environ.get("SUBSCRIPTION_ID"),
    "resource_group_name": os.environ.get("RESOURCE_GROUP_NAME"),
    "project_name": os.environ.get("PROJECT_NAME"),
    #"credential": token_provider,
}


#### Test it out

In [8]:
from promptflow.evals.evaluators import ViolenceEvaluator

# Initialzing Violence Evaluator with project information
violence_eval = ViolenceEvaluator(azure_ai_project)
# Running Violence Evaluator on single input row
violence_score = violence_eval(question="What is the capital of France?", answer="Paris.")
print(violence_score)

[2024-08-02 15:56:14 +1000][flowinvoker][INFO] - Getting connections from pf client with provider from args: local...
[2024-08-02 15:56:14 +1000][flowinvoker][INFO] - Promptflow get connections successfully. keys: dict_keys([])
[2024-08-02 15:56:14 +1000][flowinvoker][INFO] - Promptflow executor starts initializing...
[2024-08-02 15:56:14 +1000][flowinvoker][INFO] - Promptflow executor initiated successfully.
[2024-08-02 15:56:14 +1000][flowinvoker][INFO] - Validating flow input with data {'metric_name': 'violence', 'question': 'What is the capital of France?', 'answer': 'Paris.', 'project_scope': {'subscription_id': '3c8972d9-f541-46b2-b70b-d81baba3595d', 'resource_group_name': 'secure-ai-rg', 'project_name': 'krbock-0635'}, 'credential': None}
[2024-08-02 15:56:14 +1000][flowinvoker][INFO] - Execute flow with data {'metric_name': 'violence', 'question': 'What is the capital of France?', 'answer': 'Paris.', 'project_scope': {'subscription_id': '3c8972d9-f541-46b2-b70b-d81baba3595d', '

2024-08-02 15:56:14 +1000   32528 execution.flow     INFO     Start executing nodes in thread pool mode.
2024-08-02 15:56:14 +1000   32528 execution.flow     INFO     Start to run 2 nodes with concurrency level 16.
2024-08-02 15:56:14 +1000   32528 execution.flow     INFO     Executing node validate_inputs. node run id: fbefce61-e223-4a1a-ab12-85d32dbc9a42_validate_inputs_71b169ec-ea5a-4422-8bb3-a441165f794f
2024-08-02 15:56:14 +1000   32528 execution.flow     INFO     Node validate_inputs completes.
2024-08-02 15:56:14 +1000   32528 execution.flow     INFO     The node 'evaluate_with_rai_service' will be executed because the activate condition is met, i.e. '${validate_inputs.output}' is equal to 'True'.
2024-08-02 15:56:14 +1000   32528 execution.flow     INFO     Executing node evaluate_with_rai_service. node run id: fbefce61-e223-4a1a-ab12-85d32dbc9a42_evaluate_with_rai_service_a7d67dee-872d-4c5f-868b-7ad5038070bf
2024-08-02 15:56:25 +1000   32528 execution.flow     INFO     Node ev

### Examine local dataset

In [8]:
import json
def load_jsonl(path):
    with open(path, "r") as f:
        return [json.loads(line) for line in f.readlines()]
    
mydata = load_jsonl('../data/evaluation_dataset.jsonl')
mydata[:10]

[{'question': 'Which tent is the most waterproof?',
  'truth': 'The Alpine Explorer Tent has the highest rainfly waterproof rating at 3000m'},
 {'question': 'Which camping table holds the most weight?',
  'truth': 'The Adventure Dining Table has a higher weight capacity than all of the other camping tables mentioned'},
 {'question': 'How much does TrailWalker Hiking Shoes cost? ',
  'truth': '$110'},
 {'question': 'What is the proper care for trailwalker hiking shoes? ',
  'truth': 'After each use, remove any dirt or debris by brushing or wiping the shoes with a damp cloth.'},
 {'question': 'What brand is for TrailMaster tent? ',
  'truth': 'OutdoorLiving'},
 {'question': 'How do I carry the TrailMaster tent around? ',
  'truth': ' Carry bag included for convenient storage and transportation'},
 {'question': 'What is the floor area for Floor Area? ',
  'truth': '80 square feet'},
 {'question': 'What is the material for TrailBlaze Hiking Pants',
  'truth': 'Made of high-quality nylon fa

In [10]:
import os
# create directory for output
output_dir = '../data/evaluate'
os.makedirs(output_dir, exist_ok=True)

In [9]:
# callable function that invokes Azure OpenAI.  For use as target in evaluator.
from genai.llm import llm_tool

| Evaluator | question | answer | context | ground_truth |
| --- | --- | --- | --- | --- | 
| GroundednessEvaluator | N/A | Required: String | Required: String | N/A |
| RelevanceEvaluator | Required: String | Required: String | Required: String | N/A |
| CoherenceEvaluator | Required: String | Required: String | N/A | N/A |
| FluencyEvaluator | Required: String | Required: String | N/A | N/A |
| SimilarityEvaluator | Required: String | Required: String | N/A | Required: String |
| F1ScoreEvaluator | N/A | Required: String | N/A | Required: String |
| ViolenceEvaluator | Required: String | Required: String | N/A | N/A |
| SexualEvaluator | Required: String | Required: String | N/A | N/A |
| SelfHarmEvaluator | Required: String | Required: String | N/A | N/A |
| HateUnfairnessEvaluator | Required: String | Required: String | N/A | N/A |

### Run a  qa evaluation against the AI studio to ensure the connection is working

In [10]:
from promptflow.evals.evaluators import CoherenceEvaluator, RelevanceEvaluator, GroundednessEvaluator, FluencyEvaluator, SimilarityEvaluator, F1ScoreEvaluator

coherence_eval = CoherenceEvaluator(model_config=model_config)
relevance_eval = RelevanceEvaluator(model_config=model_config)
groundedness_eval = GroundednessEvaluator(model_config=model_config)
fluency_eval = FluencyEvaluator(model_config=model_config)
similarity_eval = SimilarityEvaluator(model_config=model_config)
f1score_eval = F1ScoreEvaluator()


In [10]:
from promptflow.evals.evaluate import evaluate

result = evaluate(
    evaluation_name="rai-workshop-test", #name your evaluation to view in AI Studio
    data='../data/evaluation_dataset.jsonl', # provide your data here - must be string
    target=llm_tool,
    evaluators={
        #"relevance": relevance_eval,
        "coherence": coherence_eval,
        #"groundedness": groundedness_eval,
        "fluency": fluency_eval,
        "similarity": similarity_eval,
        "f1score": f1score_eval

    },
    # column mapping
    evaluator_config={
        "default": {
            "questions": "${data.question)", #column of data providing input to model
            #"contexts": "${data.context}", #column of data providing context for each input
            "answer": "${target.answer}", #column of data providing output from model
            "ground_truth":"${data.truth}" #column of data providing ground truth answer, optional for default metrics
        }
    },
    # Optionally provide your AI Studio project information to track your evaluation results in your Azure AI studio project
    azure_ai_project = azure_ai_project,
    # Optionally provide an output path to dump a json of metric summary, row level data and metric and studio URL
    output_path=output_dir
)

[2024-08-02 15:46:12 +1000][promptflow._sdk._orchestrator.run_submitter][INFO] - Upload run to cloud: True


Prompt flow service has started...
You can view the traces in local from http://127.0.0.1:23333/v1.0/ui/traces/?#run=n0_prerequisites_20240802_154612_455515
You can view the traces in azure portal since trace destination is set to: azureml://subscriptions/3c8972d9-f541-46b2-b70b-d81baba3595d/resourceGroups/secure-ai-rg/providers/Microsoft.MachineLearningServices/workspaces/krbock-0635. The link will be printed once the run is finished.


[2024-08-02 15:46:16 +1000][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run n0_prerequisites_20240802_154612_455515, log path: /home/krbock/.promptflow/.runs/n0_prerequisites_20240802_154612_455515/logs.txt


2024-08-02 15:46:39 +1000   30491 execution.bulk     INFO     Process 30521 terminated.
2024-08-02 15:46:39 +1000   30491 execution.bulk     INFO     Process 30533 terminated.


[2024-08-02 15:46:40 +1000][promptflow._sdk._orchestrator.run_submitter][INFO] - Uploading run 'n0_prerequisites_20240802_154612_455515' to cloud...
[2024-08-02 15:47:00 +1000][promptflow._sdk._orchestrator.run_submitter][INFO] - Updating run 'n0_prerequisites_20240802_154612_455515' portal url to 'https://ai.azure.com/projectflows/trace/run/n0_prerequisites_20240802_154612_455515/details?wsid=/subscriptions/3c8972d9-f541-46b2-b70b-d81baba3595d/resourcegroups/secure-ai-rg/providers/Microsoft.MachineLearningServices/workspaces/krbock-0635'.


Portal url: https://ai.azure.com/projectflows/trace/run/n0_prerequisites_20240802_154612_455515/details?wsid=/subscriptions/3c8972d9-f541-46b2-b70b-d81baba3595d/resourcegroups/secure-ai-rg/providers/Microsoft.MachineLearningServices/workspaces/krbock-0635
2024-08-02 15:46:16 +1000   29780 execution.bulk     INFO     Current thread is not main thread, skip signal handler registration in BatchEngine.
2024-08-02 15:46:16 +1000   29780 execution.bulk     INFO     Set process count to 4 by taking the minimum value among the factors of {'default_worker_count': 4, 'row_count': 13}.
2024-08-02 15:46:24 +1000   29780 execution.bulk     INFO     Process name(ForkProcess-4:4)-Process id(30533)-Line number(0) start execution.
2024-08-02 15:46:24 +1000   29780 execution.bulk     INFO     Process name(ForkProcess-4:1)-Process id(30516)-Line number(1) start execution.
2024-08-02 15:46:24 +1000   29780 execution.bulk     INFO     Process name(ForkProcess-4:2)-Process id(30521)-Line number(2) start exe

[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Getting connections from pf client with provider from args: local...
[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Getting connections from pf client with provider from args: local...
[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Promptflow get connections successfully. keys: dict_keys([])
[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Promptflow get connections successfully. keys: dict_keys([])
[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Promptflow executor starts initializing...
[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Promptflow executor starts initializing...
[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Promptflow executor initiated successfully.
[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Promptflow executor initiated successfully.
[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Validating flow input with data {'answer': "When looking for a camping table that holds the most weight, you'll want to consi

2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start executing nodes in thread pool mode.


[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Validating flow input with data {'answer': "When looking for a highly waterproof tent, you'll want to consider the hydrostatic head rating, which measures the waterproofness of the tent fabric. A higher rating indicates better water resistance. Here are some key features to look for in a waterproof tent:\n\n1. **Hydrostatic Head Rating**: A rating of 1500 mm to 3000 mm is generally considered waterproof for tents. However, for heavy rain conditions, look for ratings above 3000 mm.\n\n2. **Material**: High-quality polyester or nylon with a durable waterproof coating (like polyurethane or silicone) tends to be more waterproof.\n\n3. **Seam Sealing**: Ensure the tent has fully taped or sealed seams to prevent water from leaking through the stitching.\n\n4. **Design Features**: Look for tents with a full-coverage rainfly, bathtub-style floors that extend a few inches up the walls, and well-ventilated designs to reduce condensation.\n\nSome p

2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start to run 2 nodes with concurrency level 16.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Current thread is not main thread, skip signal handler registration in AsyncNodesScheduler.


[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Execute flow with data {'answer': "When looking for a highly waterproof tent, you'll want to consider the hydrostatic head rating, which measures the waterproofness of the tent fabric. A higher rating indicates better water resistance. Here are some key features to look for in a waterproof tent:\n\n1. **Hydrostatic Head Rating**: A rating of 1500 mm to 3000 mm is generally considered waterproof for tents. However, for heavy rain conditions, look for ratings above 3000 mm.\n\n2. **Material**: High-quality polyester or nylon with a durable waterproof coating (like polyurethane or silicone) tends to be more waterproof.\n\n3. **Seam Sealing**: Ensure the tent has fully taped or sealed seams to prevent water from leaking through the stitching.\n\n4. **Design Features**: Look for tents with a full-coverage rainfly, bathtub-style floors that extend a few inches up the walls, and well-ventilated designs to reduce condensation.\n\nSome popular wa

2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Executing node validate_inputs. node run id: cda7e023-24f3-405a-8878-3bd06eff0b64_validate_inputs_3fbcc017-c1ec-45e3-8588-dafe6026bd38
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start executing nodes in thread pool mode.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Node validate_inputs completes.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start to run 2 nodes with concurrency level 16.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     The node 'compute_f1_score' will be executed because the activate condition is met, i.e. '${validate_inputs.output}' is equal to 'True'.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Current thread is not main thread, skip signal handler registration in AsyncNodesScheduler.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Executing node compute_f1_score. node run id: cda7e023-24f3-405a-8878-3bd06eff0

[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Validating flow input with data {'answer': "I don't have real-time access to current prices, but the cost of TrailWalker Hiking Shoes can vary based on the retailer, model, and any ongoing promotions or discounts. For the most accurate and up-to-date pricing, I recommend checking the official website of the brand, online marketplaces like Amazon or eBay, or visiting a local outdoor or sporting goods store. Is there a specific model or retailer you're interested in?", 'ground_truth': '$110'}


2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Executing node compute_f1_score. node run id: 68d191c7-2f5a-4bb0-9207-8cccffee5dd5_compute_f1_score_d3137d17-087c-4942-b078-32352a3d1d05


[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Execute flow with data {'answer': "I don't have real-time access to current prices, but the cost of TrailWalker Hiking Shoes can vary based on the retailer, model, and any ongoing promotions or discounts. For the most accurate and up-to-date pricing, I recommend checking the official website of the brand, online marketplaces like Amazon or eBay, or visiting a local outdoor or sporting goods store. Is there a specific model or retailer you're interested in?", 'ground_truth': '$110'}


2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Node compute_f1_score completes.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start executing nodes in thread pool mode.


[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Validating flow input with data {'answer': "Taking proper care of your trailwalker hiking shoes will help extend their lifespan and maintain their performance. Here are some tips for caring for your hiking shoes:\n\n1. **Cleaning**:\n   - **After Each Hike**: Remove dirt and debris by brushing off the shoes with a soft brush or cloth.\n   - **Deep Cleaning**: If the shoes are very dirty, use a mixture of water and mild soap to scrub them gently. Avoid using harsh chemicals or detergents as they can damage the materials.\n   - **Rinse**: Rinse the shoes thoroughly with clean water to remove any soap residue.\n\n2. **Drying**:\n   - **Air Dry**: Let the shoes air dry naturally in a well-ventilated area away from direct sunlight and heat sources (like radiators or heaters), which can cause the materials to crack or warp.\n   - **Remove Insoles and Laces**: Take out the insoles and laces to allow all parts of the shoe to dry completely.\n   

2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start to run 2 nodes with concurrency level 16.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Current thread is not main thread, skip signal handler registration in AsyncNodesScheduler.


[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Execute flow with data {'answer': "Taking proper care of your trailwalker hiking shoes will help extend their lifespan and maintain their performance. Here are some tips for caring for your hiking shoes:\n\n1. **Cleaning**:\n   - **After Each Hike**: Remove dirt and debris by brushing off the shoes with a soft brush or cloth.\n   - **Deep Cleaning**: If the shoes are very dirty, use a mixture of water and mild soap to scrub them gently. Avoid using harsh chemicals or detergents as they can damage the materials.\n   - **Rinse**: Rinse the shoes thoroughly with clean water to remove any soap residue.\n\n2. **Drying**:\n   - **Air Dry**: Let the shoes air dry naturally in a well-ventilated area away from direct sunlight and heat sources (like radiators or heaters), which can cause the materials to crack or warp.\n   - **Remove Insoles and Laces**: Take out the insoles and laces to allow all parts of the shoe to dry completely.\n   - **Stuff

2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Executing node validate_inputs. node run id: 8a131f1b-fc2b-4280-8c25-41c4057ca478_validate_inputs_6231dbc1-5a23-4b99-9538-8a327c95cebe
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start executing nodes in thread pool mode.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Node validate_inputs completes.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start to run 2 nodes with concurrency level 16.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     The node 'compute_f1_score' will be executed because the activate condition is met, i.e. '${validate_inputs.output}' is equal to 'True'.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Current thread is not main thread, skip signal handler registration in AsyncNodesScheduler.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Executing node compute_f1_score. node run id: 8a131f1b-fc2b-4280-8c25-41c4057ca

[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Validating flow input with data {'answer': 'TrailMaster tents are often associated with the brand Wenzel. Wenzel is known for producing a variety of outdoor gear, including tents, and the TrailMaster model is one of their offerings. Is there anything specific you would like to know about the TrailMaster tent or Wenzel products in general?', 'ground_truth': 'OutdoorLiving'}


2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     The node 'compute_f1_score' will be executed because the activate condition is met, i.e. '${validate_inputs.output}' is equal to 'True'.


[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Execute flow with data {'answer': 'TrailMaster tents are often associated with the brand Wenzel. Wenzel is known for producing a variety of outdoor gear, including tents, and the TrailMaster model is one of their offerings. Is there anything specific you would like to know about the TrailMaster tent or Wenzel products in general?', 'ground_truth': 'OutdoorLiving'}


2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Executing node compute_f1_score. node run id: adb678d9-3c6b-4f58-ac4d-aa73b5d22fa6_compute_f1_score_c9a8424f-0abb-4418-9323-84836109d3c9
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start executing nodes in thread pool mode.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Node compute_f1_score completes.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start to run 2 nodes with concurrency level 16.


[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Validating flow input with data {'answer': 'Carrying the TrailMaster tent is straightforward with a few steps:\n\n1. **Disassemble and Pack**: After using the tent, ensure it is fully disassembled. Remove all stakes, poles, and other components, and fold the tent fabric neatly.\n\n2. **Storage Bag**: Most TrailMaster tents come with a storage bag. Carefully place the folded tent fabric, poles, stakes, and any additional components into the bag.\n\n3. **Secure the Bag**: Once everything is inside, securely close the bag, ensuring the zippers or drawstrings are fastened to prevent anything from falling out.\n\n4. **Use Carrying Straps**: The storage bag typically has carrying straps. Use these straps to carry the bag over your shoulder or by hand.\n\n5. **Distribute Weight**: If you are hiking or traveling a long distance, consider distributing the weight of the tent among your group members or within your backpack to make it easier to car

2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Current thread is not main thread, skip signal handler registration in AsyncNodesScheduler.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Executing node validate_inputs. node run id: cb25c0f6-60cf-4ef2-8009-ce7caa4ed057_validate_inputs_db1211f0-d6a3-4b15-8724-7bba21927b98
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start executing nodes in thread pool mode.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Node validate_inputs completes.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start to run 2 nodes with concurrency level 16.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     The node 'compute_f1_score' will be executed because the activate condition is met, i.e. '${validate_inputs.output}' is equal to 'True'.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Current thread is not main thread, skip signal handler registration in AsyncNod

[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Validating flow input with data {'answer': 'It looks like you\'re asking about the floor area, but I\'m not sure if you\'re referring to a specific context or if there\'s a typo. Could you clarify what you mean by "Floor Area"? Are you asking how to calculate the floor area of a room, building, or something else?', 'ground_truth': '80 square feet'}
[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Execute flow with data {'answer': 'It looks like you\'re asking about the floor area, but I\'m not sure if you\'re referring to a specific context or if there\'s a typo. Could you clarify what you mean by "Floor Area"? Are you asking how to calculate the floor area of a room, building, or something else?', 'ground_truth': '80 square feet'}


2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Executing node compute_f1_score. node run id: c7cd1c16-a210-4f0d-84b4-5cd597e818d1_compute_f1_score_90db4ae2-33e5-487a-80dd-25122feccdaa
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Node compute_f1_score completes.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start executing nodes in thread pool mode.


[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Validating flow input with data {'answer': "TrailBlaze Hiking Pants are typically made from a blend of materials designed to offer durability, flexibility, and comfort. Common materials include nylon, polyester, and spandex. Nylon provides durability and resistance to abrasion, polyester offers moisture-wicking properties, and spandex adds stretch for ease of movement. The exact material composition can vary depending on the specific model and brand, so it's always a good idea to check the product details for the most accurate information.", 'ground_truth': 'Made of high-quality nylon fabric'}
[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Execute flow with data {'answer': "TrailBlaze Hiking Pants are typically made from a blend of materials designed to offer durability, flexibility, and comfort. Common materials include nylon, polyester, and spandex. Nylon provides durability and resistance to abrasion, polyester offers moisture-wicki

2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start to run 2 nodes with concurrency level 16.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start executing nodes in thread pool mode.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Current thread is not main thread, skip signal handler registration in AsyncNodesScheduler.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start to run 2 nodes with concurrency level 16.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Executing node validate_inputs. node run id: 1defb51a-299f-49ad-94df-5f33c2e6053f_validate_inputs_2a8a4746-5bac-412a-b121-c16061c07d71
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Current thread is not main thread, skip signal handler registration in AsyncNodesScheduler.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Node validate_inputs completes.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Executing node

[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Validating flow input with data {'answer': "I'm not sure about the specific product details of TrailBlaze Hiking Pants. However, hiking pants in general often come in a variety of colors like khaki, olive green, black, navy blue, and sometimes even more vibrant colors like red or orange. To get the most accurate information, you might want to check the manufacturer's website or a retailer that sells them.", 'ground_truth': 'Khaki'}


2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Node compute_f1_score completes.


[2024-08-02 15:47:03 +1000][flowinvoker][INFO] - Execute flow with data {'answer': "I'm not sure about the specific product details of TrailBlaze Hiking Pants. However, hiking pants in general often come in a variety of colors like khaki, olive green, black, navy blue, and sometimes even more vibrant colors like red or orange. To get the most accurate information, you might want to check the manufacturer's website or a retailer that sells them.", 'ground_truth': 'Khaki'}


2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start executing nodes in thread pool mode.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Start to run 2 nodes with concurrency level 16.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Current thread is not main thread, skip signal handler registration in AsyncNodesScheduler.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Executing node validate_inputs. node run id: 3543175e-3f38-41ac-a810-9b060eb4243b_validate_inputs_05a264c7-39c4-4328-857e-dc7cbef54e9c
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Node validate_inputs completes.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     The node 'compute_f1_score' will be executed because the activate condition is met, i.e. '${validate_inputs.output}' is equal to 'True'.
2024-08-02 15:47:03 +1000   29780 execution.flow     INFO     Executing node compute_f1_score. node run id: 3543175e-3f38-41ac-a810-9b060eb42

[2024-08-02 15:47:06 +1000][flowinvoker][INFO] - Validating flow input with data {'answer': "Warranty policies can vary depending on the manufacturer or retailer from which you purchased the TrailBlaze pants. Generally, most warranties are non-transferable and apply only to the original purchaser. However, it's always a good idea to check the specific warranty terms provided by the manufacturer or retailer. You can usually find this information on their website or in the documentation that came with the product. If you're still unsure, contacting their customer service directly would be the best way to get a definitive answer.", 'ground_truth': 'he warranty is non-transferable and applies only to the original purchaser of the TrailBlaze Hiking Pants. It is valid only when the product is purchased from an authorized retailer.'}
[2024-08-02 15:47:06 +1000][flowinvoker][INFO] - Execute flow with data {'answer': "Warranty policies can vary depending on the manufacturer or retailer from whi

2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Start executing nodes in thread pool mode.
2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Start to run 2 nodes with concurrency level 16.
2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Current thread is not main thread, skip signal handler registration in AsyncNodesScheduler.
2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Executing node validate_inputs. node run id: d07f08f0-4102-46bd-ac14-b3f5df3e2c33_validate_inputs_3b98460d-b6cc-4f7a-a1d7-db383fee0b82
2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Node validate_inputs completes.
2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     The node 'compute_f1_score' will be executed because the activate condition is met, i.e. '${validate_inputs.output}' is equal to 'True'.
2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Executing node compute_f1_score. node run id: d07f08f0-4102-46bd-ac14-b3f5df3e2

[2024-08-02 15:47:06 +1000][flowinvoker][INFO] - Validating flow input with data {'answer': "The warranty period for TrailBlaze pants can vary depending on the brand and retailer you purchased them from. Most outdoor clothing brands offer a warranty ranging from one year to a lifetime against manufacturing defects. It's best to check the warranty information provided by the specific brand or retailer where you bought the pants. If you have that information on hand, I can help you look it up!", 'ground_truth': ' The TrailBlaze Hiking Pants are backed by a 1-year limited warranty from the date of purchase.'}
[2024-08-02 15:47:06 +1000][flowinvoker][INFO] - Execute flow with data {'answer': "The warranty period for TrailBlaze pants can vary depending on the brand and retailer you purchased them from. Most outdoor clothing brands offer a warranty ranging from one year to a lifetime against manufacturing defects. It's best to check the warranty information provided by the specific brand or 

2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Start executing nodes in thread pool mode.
2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Start to run 2 nodes with concurrency level 16.
2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Current thread is not main thread, skip signal handler registration in AsyncNodesScheduler.
2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Executing node validate_inputs. node run id: 49349024-b5a3-4fdd-8dfd-2ab2a36e3234_validate_inputs_822f0801-d089-43dd-a6ff-69caeb4e1f9f
2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Node validate_inputs completes.


[2024-08-02 15:47:06 +1000][flowinvoker][INFO] - Validating flow input with data {'answer': "The PowerBurner Camping Stove is typically made from a combination of durable materials designed to withstand high temperatures and outdoor conditions. Common materials include stainless steel for the burner and body, aluminum for lightweight components, and heat-resistant plastics for handles and knobs. These materials ensure the stove is both robust and portable, making it ideal for camping and outdoor use. Always check the specific model's specifications for precise material details.", 'ground_truth': 'Stainless Steel'}
[2024-08-02 15:47:06 +1000][flowinvoker][INFO] - Execute flow with data {'answer': "The PowerBurner Camping Stove is typically made from a combination of durable materials designed to withstand high temperatures and outdoor conditions. Common materials include stainless steel for the burner and body, aluminum for lightweight components, and heat-resistant plastics for handles

2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     The node 'compute_f1_score' will be executed because the activate condition is met, i.e. '${validate_inputs.output}' is equal to 'True'.
2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Start executing nodes in thread pool mode.


[2024-08-02 15:47:06 +1000][flowinvoker][INFO] - Validating flow input with data {'answer': "Yes, that's correct! France is a country located in Western Europe. It's known for its rich history, culture, cuisine, and landmarks such as the Eiffel Tower and the Louvre Museum. Is there something specific you'd like to know about France?", 'ground_truth': 'Sorry, I can only truth questions related to outdoor/camping gear and equipment'}
[2024-08-02 15:47:06 +1000][flowinvoker][INFO] - Execute flow with data {'answer': "Yes, that's correct! France is a country located in Western Europe. It's known for its rich history, culture, cuisine, and landmarks such as the Eiffel Tower and the Louvre Museum. Is there something specific you'd like to know about France?", 'ground_truth': 'Sorry, I can only truth questions related to outdoor/camping gear and equipment'}


2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Executing node compute_f1_score. node run id: 49349024-b5a3-4fdd-8dfd-2ab2a36e3234_compute_f1_score_b2515c57-5ee0-467f-a674-83abf9d7f546
2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Start to run 2 nodes with concurrency level 16.
2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Node compute_f1_score completes.
2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Start executing nodes in thread pool mode.
2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Current thread is not main thread, skip signal handler registration in AsyncNodesScheduler.
2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Start to run 2 nodes with concurrency level 16.
2024-08-02 15:47:06 +1000   29780 execution.flow     INFO     Executing node validate_inputs. node run id: 59d474ea-8fd2-4e51-a65f-8ca367a0019e_validate_inputs_07d71376-6257-4278-8a38-ea2720fc0a4e
2024-08-02 15:47:06 +1000   29

### Create a Retrieval Augmented Generation (RAG) application using Promptflow SDK

We will use the RAG pattern to validate our model against ground-truth.

#### Data
This sample uses files from the folder data/ in this repo. You can clone this repo or copy this folder to make sure you have access to these files when running the sample.

In [12]:
# from earlier step
# import json
# def load_jsonl(path):
#     with open(path, "r") as f:
#         return [json.loads(line) for line in f.readlines()]

#mydata = load_jsonl('../data/evaluation_dataset.jsonl')

mydata[:2]

[{'question': 'Which tent is the most waterproof?',
  'truth': 'The Alpine Explorer Tent has the highest rainfly waterproof rating at 3000m'},
 {'question': 'Which camping table holds the most weight?',
  'truth': 'The Adventure Dining Table has a higher weight capacity than all of the other camping tables mentioned'}]

#### Create an local FAISS index from your local files
https://learn.microsoft.com/en-us/azure/ai-studio/how-to/develop/index-build-consume-sdk

Install the FAISS package with
```
"Please install it with `pip install faiss-gpu` (for CUDA supported GPU) "
    "or `pip install faiss-cpu` (depending on Python version)."
```

In [11]:
# connect to the AI Studio project
from azure.identity import DefaultAzureCredential
from azure.ai.ml import MLClient

client=MLClient(
    DefaultAzureCredential(), 
    subscription_id=os.environ.get("SUBSCRIPTION_ID"),
    resource_group_name=os.environ.get("RESOURCE_GROUP_NAME"),
    workspace_name=os.environ.get("PROJECT_NAME") 
)

#### Use AIStudio's Azure OpenAI connection

In [14]:
from promptflow.rag.config import ConnectionConfig
# embedding_model_config = ConnectionConfig(
#     subscription_id = os.environ.get("SUBSCRIPTION_ID"),
#     resource_group_name = os.environ.get("RESOURCE_GROUP_NAME"),
#     workspace_name = os.environ.get("PROJECT_NAME"),
#     connection_name = "Default_AzureOpenAI"
# )

embedding_model_config = ConnectionConfig(
    subscription_id = os.environ.get("SUBSCRIPTION_ID"),
    resource_group_name = os.environ.get("RESOURCE_GROUP_NAME"),
    workspace_name = os.environ.get("PROJECT_NAME"),
    connection_name = os.environ.get("AISTUDIO_AOAI_CONNECTION_NAME"),
)

In [15]:
from promptflow.rag.config import LocalSource, EmbeddingsModelConfig
from promptflow.rag import build_index

faiss_index_name = "product-info-faiss-index"
embedding_output_dir = "../data"

# build the index
faiss_index=build_index(
    name=faiss_index_name,  # name of your index
    vector_store="faiss",  # the type of vector store
    embeddings_model_config=EmbeddingsModelConfig(
        model_name=os.getenv("AZURE_OPENAI_EMBEDDING_MODEL"),
        deployment_name=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT"),
        connection_config=embedding_model_config
    ),
    input_source=LocalSource(input_data="../data/product-info/"),  # the location of your file/folders
    #index_config=LocalSource(input_data="../data/product-info/"
        #ai_search_index_name="<your-index-name>" + "-aoai-store", # the name of the index store inside the azure ai search service
    #),
    tokens_per_chunk = 800, # Optional field - Maximum number of tokens per chunk
    token_overlap_across_chunks = 0, # Optional field - Number of tokens to overlap between chunks
    embeddings_cache_path=embedding_output_dir, # Optional field - Path to store embeddings cache
)

Crack and chunk files from local path: ../data/product-info/


INFO:azureml.rag.connections:Using ml_client base_url: https://management.azure.com, original_base_url: https://management.azure.com.
INFO:azureml.rag.connections:Parsed Connection: /subscriptions/3c8972d9-f541-46b2-b70b-d81baba3595d/resourceGroups/secure-ai-rg/providers/Microsoft.MachineLearningServices/workspaces/krbock-0635/connections/mssecureai4034688619_aoai
INFO:azureml.rag.connections:Got connection: /subscriptions/3c8972d9-f541-46b2-b70b-d81baba3595d/resourceGroups/secure-ai-rg/providers/Microsoft.MachineLearningServices/workspaces/krbock-0635/connections/mssecureai4034688619_aoai as <class 'azure.ai.ml.entities._workspace.connections.connection_subtypes.AzureOpenAIConnection'>.
INFO:azureml.rag.connections:Getting workspace connection: mssecureai4034688619_aoai, with input credential: <class 'NoneType'>.
INFO:azureml.rag.connections:Getting workspace connection via MLClient with auth: <class 'azure.identity._credentials.default.DefaultAzureCredential'>, subscription_id: 3c897

Start embedding using connection with id = /subscriptions/3c8972d9-f541-46b2-b70b-d81baba3595d/resourceGroups/secure-ai-rg/providers/Microsoft.MachineLearningServices/workspaces/krbock-0635/connections/mssecureai4034688619_aoai


INFO:azureml.rag.connections:Parsed Connection: /subscriptions/3c8972d9-f541-46b2-b70b-d81baba3595d/resourceGroups/secure-ai-rg/providers/Microsoft.MachineLearningServices/workspaces/krbock-0635/connections/mssecureai4034688619_aoai
INFO:azureml.rag.connections:Got connection: /subscriptions/3c8972d9-f541-46b2-b70b-d81baba3595d/resourceGroups/secure-ai-rg/providers/Microsoft.MachineLearningServices/workspaces/krbock-0635/connections/mssecureai4034688619_aoai as <class 'azure.ai.ml.entities._workspace.connections.connection_subtypes.AzureOpenAIConnection'>.
INFO:azureml.rag.connections:The connection 'mssecureai4034688619_aoai' is a <class 'azure.ai.ml.entities._workspace.connections.connection_subtypes.AzureOpenAIConnection'> with api_key auth type.
INFO:azureml.rag.azureml.rag.documents:[DocumentChunksIterator::filter_extensions] Filtered 0 files out of 20
INFO:azureml.rag.azureml.rag.documents.cracking:[DocumentChunksIterator::crack_documents] Total time to load files: 0.001402854919

Successfully created index at ../data/product-info-faiss-index-mlindex


#### Consume index

In [16]:
from promptflow.rag import get_langchain_retriever_from_index

# Get the OpenAI embedded Index
#retriever=get_langchain_retriever_from_index(faiss_index)
retriever=get_langchain_retriever_from_index(faiss_index)
retriever.get_relevant_documents("Which tent is the most waterproof")


INFO:azureml.rag.connections:The connection 'mssecureai4034688619_aoai' is a <class 'azure.ai.ml.entities._workspace.connections.connection_subtypes.AzureOpenAIConnection'> with api_key auth type.
INFO:azureml.rag.connections:The connection 'mssecureai4034688619_aoai' is a <class 'azure.ai.ml.entities._workspace.connections.connection_subtypes.AzureOpenAIConnection'> with api_key auth type.
INFO:azureml.rag.connections:The connection 'mssecureai4034688619_aoai' is a <class 'azure.ai.ml.entities._workspace.connections.connection_subtypes.AzureOpenAIConnection'> with api_key auth type.


TypeError: argument of type 'NoneType' is not iterable

#### Register Index (Optional)

In [None]:
# connect to the AI Studio project
from azure.identity import DefaultAzureCredential
from azure.ai.ml import MLClient

client=MLClient(
    DefaultAzureCredential(), 
    subscription_id=os.environ.get("SUBSCRIPTION_ID"),
    resource_group_name=os.environ.get("RESOURCE_GROUP_NAME"),
    workspace_name=os.environ.get("PROJECT_NAME") 
)

In [None]:
from azure.ai.ml.entities import Index

# register the index with Azure OpenAI embeddings
client.indexes.create_or_update(
    Index(name=faiss_index_name + "aoai", 
          path=faiss_index, 
          version="1")
          )

#### Option 2: Use Azure AI Search to create an index


In [17]:
from promptflow.rag.config import ConnectionConfig
embedding_model_config = ConnectionConfig(
    subscription_id = os.environ.get("SUBSCRIPTION_ID"),
    resource_group_name = os.environ.get("RESOURCE_GROUP_NAME"),
    workspace_name = os.environ.get("PROJECT_NAME"),
    connection_name = os.environ.get("AISTUDIO_AOAI_CONNECTION_NAME"),
)

ais_model_config = ConnectionConfig(
    subscription_id = os.environ.get("SUBSCRIPTION_ID"),
    resource_group_name = os.environ.get("RESOURCE_GROUP_NAME"),
    workspace_name = os.environ.get("PROJECT_NAME"),
    connection_name = os.environ.get("AISTUDIO_AIS_CONNECTION_NAME"),
)

In [18]:
# connect to the AI Studio project
from azure.identity import DefaultAzureCredential
from azure.ai.ml import MLClient

client=MLClient(
    DefaultAzureCredential(), 
    subscription_id=os.environ.get("SUBSCRIPTION_ID"),
    resource_group_name=os.environ.get("RESOURCE_GROUP_NAME"),
    workspace_name=os.environ.get("PROJECT_NAME")
    )

In [19]:
from promptflow.rag.config import AzureAISearchConfig, EmbeddingsModelConfig, LocalSource
from promptflow.rag import build_index

ais_index_name = "product-info-ais-index"
embedding_output_dir = "../data"

local_index_aoai=build_index(
    name=ais_index_name,  # name of your index
    vector_store="azure_ai_search",  # the type of vector store
    embeddings_model_config=EmbeddingsModelConfig(
        model_name=os.getenv("AZURE_OPENAI_EMBEDDING_MODEL"),
        deployment_name=os.getenv("AZURE_OPENAI_EMBEDDING_DEPLOYMENT"),
        connection_config=embedding_model_config
    ),
    input_source=LocalSource(input_data="../data/product-info/"),  # the location of your file/folders
    index_config=AzureAISearchConfig(
        ai_search_index_name="product-info-ais-index" + "-aoai-store", # the name of the index store inside the azure ai search service
        ai_search_connection_config=ais_model_config
    ),
    tokens_per_chunk = 800, # Optional field - Maximum number of tokens per chunk
    token_overlap_across_chunks = 0, # Optional field - Number of tokens to overlap between chunks
    embeddings_cache_path=embedding_output_dir, # Optional field - Path to store embeddings cache
)

Crack and chunk files from local path: ../data/product-info/
Start embedding using connection with id = /subscriptions/3c8972d9-f541-46b2-b70b-d81baba3595d/resourceGroups/secure-ai-rg/providers/Microsoft.MachineLearningServices/workspaces/krbock-0635/connections/mssecureai4034688619_aoai


INFO:azureml.rag.azureml.rag.documents:[DocumentChunksIterator::filter_extensions] Filtered 0 files out of 20
INFO:azureml.rag.azureml.rag.documents.cracking:[DocumentChunksIterator::crack_documents] Total time to load files: 0.0018575191497802734
{
  ".txt": 0.0,
  ".md": 20.0,
  ".html": 0.0,
  ".htm": 0.0,
  ".py": 0.0,
  ".pdf": 0.0,
  ".ppt": 0.0,
  ".pptx": 0.0,
  ".doc": 0.0,
  ".docx": 0.0,
  ".xls": 0.0,
  ".xlsx": 0.0,
  ".csv": 0.0,
  ".json": 0.0
}
INFO:azureml.rag.azureml.rag.documents.chunking:[DocumentChunksIterator::split_documents] Total time to split 20 documents into 75 chunks: 0.40856122970581055
INFO:azureml.rag.azureml.rag.embeddings:Processing document: product_info_8.md0
INFO:azureml.rag.azureml.rag.embeddings:Processing document: product_info_8.md1
INFO:azureml.rag.azureml.rag.embeddings:Processing document: product_info_8.md2
INFO:azureml.rag.azureml.rag.embeddings:Processing document: product_info_8.md3
INFO:azureml.rag.azureml.rag.embeddings:Processing docum

Start creating index from embeddings.


INFO:azureml.rag.connections:Using ml_client base_url: https://management.azure.com, original_base_url: https://management.azure.com.
INFO:azureml.rag.connections:Parsed Connection: /subscriptions/3c8972d9-f541-46b2-b70b-d81baba3595d/resourceGroups/secure-ai-rg/providers/Microsoft.MachineLearningServices/workspaces/krbock-0635/connections/mssecureaisearch
INFO:azureml.rag.connections:Got connection: /subscriptions/3c8972d9-f541-46b2-b70b-d81baba3595d/resourceGroups/secure-ai-rg/providers/Microsoft.MachineLearningServices/workspaces/krbock-0635/connections/mssecureaisearch as <class 'azure.ai.ml.entities._workspace.connections.connection_subtypes.AzureAISearchConnection'>.
INFO:azureml.rag.connections:The connection 'mssecureaisearch' is a <class 'azure.ai.ml.entities._workspace.connections.connection_subtypes.AzureAISearchConnection'> with api_key auth type.
INFO:azureml.rag.update_acs:Using Index fields: {
  "content": "content",
  "url": "url",
  "filename": "filepath",
  "title": "t

Successfully created index at ../data/product-info-ais-index-mlindex


### Consume the local index

In [21]:
from promptflow.rag import get_langchain_retriever_from_index

# Get the OpenAI embedded Index
retriever=get_langchain_retriever_from_index(local_index_aoai)
#retriever=get_langchain_retriever_from_index("product-info-ais-index")
retriever.get_relevant_documents("Which tent is the most waterproof")



INFO:azureml.rag.mlindex:Get ACS credential for with credential:<class 'NoneType'>.
INFO:azureml.rag.connections:The connection 'mssecureaisearch' is a <class 'azure.ai.ml.entities._workspace.connections.connection_subtypes.AzureAISearchConnection'> with api_key auth type.
INFO:azureml.rag.connections:The connection 'mssecureai4034688619_aoai' is a <class 'azure.ai.ml.entities._workspace.connections.connection_subtypes.AzureOpenAIConnection'> with api_key auth type.
INFO:azureml.rag.connections:The connection 'mssecureai4034688619_aoai' is a <class 'azure.ai.ml.entities._workspace.connections.connection_subtypes.AzureOpenAIConnection'> with api_key auth type.
INFO:azureml.rag.embeddings.openai:Attempt 0 to embed 1 documents.
  warn_deprecated(
INFO:azureml.rag.embeddings.openai:Attempt 0 to embed 1 documents.


[Document(page_content='# Information about product item_number: 1\n\n5. **Resolution Options**:\n   - Upon receipt of the warranty claim, our customer support team will assess the issue and determine the appropriate resolution.\n   - Options may include repair, replacement of the defective parts, or, if necessary, replacement of the entire tent.\n\n6. **Limitations and Exclusions**:\n   - Our warranty is non-transferable and applies only to the original purchaser of the TrailMaster X4 Tent.\n   - The warranty does not cover any incidental or consequential damages resulting from the use or inability to use the tent.\n   - Any unauthorized repairs or alterations void the warranty.\n\n### Contact Information\n\nIf you have any questions or need further assistance, please contact our customer support:\n\n- Customer Support Phone: +1-800-123-4567\n- Customer Support Email: support@example.com\n\n## Return Policy\n- **If Membership status "None        ":**\tReturns are accepted within 30 da

In [20]:
from azure.ai.ml.entities import Index
# register the index so that it shows up in the project
cloud_index = client.indexes.create_or_update(Index(name=ais_index_name, path=local_index_aoai))

print(f"Created index '{cloud_index.name}'")
print(f"Cloud Path: {cloud_index.path}")

Method indexes: This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class Index: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.


Created index 'product-info-ais-index'
Cloud Path: azureml://subscriptions/3c8972d9-f541-46b2-b70b-d81baba3595d/resourcegroups/secure-ai-rg/workspaces/krbock-0635/datastores/workspaceblobstore/paths/LocalUpload/409b87928c3c49dbbc1daf85bfac8699/product-info-ais-index-mlindex


### Use a Flow to evaluate

| Evaluator | question | answer | context | ground_truth |
| --- | --- | --- | --- | --- | 
| GroundednessEvaluator | N/A | Required: String | Required: String | N/A |
| RelevanceEvaluator | Required: String | Required: String | Required: String | N/A |
| CoherenceEvaluator | Required: String | Required: String | N/A | N/A |
| FluencyEvaluator | Required: String | Required: String | N/A | N/A |
| SimilarityEvaluator | Required: String | Required: String | N/A | Required: String |
| F1ScoreEvaluator | N/A | Required: String | N/A | Required: String |
| ViolenceEvaluator | Required: String | Required: String | N/A | N/A |
| SexualEvaluator | Required: String | Required: String | N/A | N/A |
| SelfHarmEvaluator | Required: String | Required: String | N/A | N/A |
| HateUnfairnessEvaluator | Required: String | Required: String | N/A | N/A |

In [16]:
from promptflow.evals.evaluate import evaluate

In [1]:
myflow = "../3-metaprompt-grounding/prompt-flow/product-chat"

In [14]:
from promptflow.client import load_flow


flow_path = myflow
sample_input = '../data/evaluation_dataset.jsonl', # data to be evaluated

f = load_flow(source=flow_path)

f.context.connections = {"DetermineIntent": {"connection": model_connect}}

result = f(url=sample_input)

print(result)

UserErrorException: Connection auth_mode: key
name: mssecureai4034688619_aoai
module: promptflow.connections
type: azure_open_ai
api_key: '******'
api_base: https://mssecureai4034688619.openai.azure.com
api_type: azure
api_version: '2024-02-01'
 contains scrubbed secrets with key dict_keys(['api_key']), please make sure connection has decrypted secrets to use in flow execution. 

In [12]:
def copilot_wrapper(*, chat_input, **kwargs):
    from copilot_flow.copilot import get_chat_response

    result = get_chat_response(chat_input)

    parsedResult = {"answer": str(result["reply"]), "context": str(result["context"])}
    return parsedResult

In [13]:
## Test using a flow
result = evaluate( 
    evaluation_name="qa-eval-with-flow", #name your evaluation to view in AI Studio
    target=llm_tool, # pass in a flow that you want to run then evaluate results on 
    data='../data/evaluation_dataset.jsonl', # data to be evaluated
    task_type="qa", # for different task types, different metrics are available
    metrics_list=["gpt_groundedness", "gpt_relevance", "gpt_coherence", "gpt_fluency", "gpt_similarity"], #optional superset over default set of metrics
    # model_config= { #for AI-assisted metrics, need to hook up AOAI GPT model for doing the measurement
    #         "api_version": "2023-05-15",
    #         "api_base": os.getenv("AZURE_OPENAI_ENDPOINT"),
    #         "api_type": "azure",
    #         "api_key": os.getenv("AZURE_OPENAI_KEY"),
    #         "deployment_id": os.getenv("AZURE_OPENAI_EVALUATION_DEPLOYMENT")
    # },
    data_mapping={
        "questions":"question", #column of data providing input to model
        "contexts":"context", #column of data providing context for each input
        "y_pred":"answer", #column of data providing output from model
        "y_test":"groundtruth" #column of data providing ground truth answer, optional for default metrics
        },
    # Optionally provide your AI Studio project information to track your evaluation results in your Azure AI studio project
    azure_ai_project = azure_ai_project,
    # Optionally provide an output path to dump a json of metric summary, row level data and metric and studio URL
    output_path=output_dir
)


NameError: name 'evaluate' is not defined

In [None]:
result = evaluate(
    target=copilot_wrapper,
    evaluation_name="qa-eval-with-flow", #name your evaluation to view in AI Studio
    data='../data/evaluation_dataset.jsonl', # data to be evaluated
    evaluators={
        "relevance": relevance_eval,
        "groundedness": groundedness_eval,
        "coherence": coherence_eval,
        "fluency": fluency_eval,
        "similarity": similarity_eval,
        "f1score": f1score_eval
    },
    evaluator_config={
        "relevance": {"question": "${data.question}"},
        "coherence": {"question": "${data.question}"},
        "groundedness": {"question": "${data.question}"},
        "fluency": {"question": "${data.answer}"},
        
                "default": {
            "questions": "${data.question)", #column of data providing input to model
            #"contexts": "${data.context}", #column of data providing context for each input
            "answer": "${target.answer}", #column of data providing output from model
            "ground_truth":"${data.truth}" #column of data providing ground truth answer, optional for default metrics
        }
    },
    # to log evaluation to the cloud AI Studio project
    azure_ai_project={
        "subscription_id": os.getenv("AZURE_SUBSCRIPTION_ID"),
        "resource_group_name": os.getenv("AZURE_RESOURCE_GROUP"),
        "project_name": os.getenv("AZUREAI_PROJECT_NAME"),
    },
)

In [17]:
## Test using a flow
result = evaluate( 
    evaluation_name="qa-eval-with-flow", #name your evaluation to view in AI Studio
    target=myflow, # pass in a flow that you want to run then evaluate results on 
    data=mydata, # data to be evaluated
    task_type="qa", # for different task types, different metrics are available
    metrics_list=["gpt_groundedness", "gpt_relevance", "gpt_coherence", "gpt_fluency", "gpt_similarity"], #optional superset over default set of metrics
    model_config= { #for AI-assisted metrics, need to hook up AOAI GPT model for doing the measurement
            "api_version": "2023-05-15",
            "api_base": os.getenv("AZURE_OPENAI_ENDPOINT"),
            "api_type": "azure",
            "api_key": os.getenv("AZURE_OPENAI_KEY"),
            "deployment_id": os.getenv("AZURE_OPENAI_EVALUATION_DEPLOYMENT")
    },
    data_mapping={
        "questions":"question", #column of data providing input to model
        "contexts":"context", #column of data providing context for each input
        "y_pred":"answer", #column of data providing output from model
        "y_test":"groundtruth" #column of data providing ground truth answer, optional for default metrics
        },
    # Optionally provide your AI Studio project information to track your evaluation results in your Azure AI studio project
    azure_ai_project = azure_ai_project,
    # Optionally provide an output path to dump a json of metric summary, row level data and metric and studio URL
    output_path=output_dir
)


ValueError: target must be a callable function.

### Generate ground truth

In [None]:
#pip install azure-ai-generative[simulator]
from azure.ai.generative.synthetic.simulator import Simulator

In [None]:
from azure.identity import DefaultAzureCredential
from azure.ai.resources.client import AIClient
from azure.ai.resources.entities import AzureOpenAIModelConfiguration

# initialize ai_client. This assums that config.json downloaded from ai workspace is present in the working directory
ai_client = AIClient.from_config(DefaultAzureCredential())
# Retrieve default aoai connection if it exists
aoai_connection = ai_client.get_default_aoai_connection()
# alternatively, retrieve connection by name
# aoai_connection = ai_client.connections.get("<name of connection>")

# # Specify model and deployment name for your system large language model
# aoai_config = AzureOpenAIModelConfiguration.from_connection(
#     connection=aoai_connection,
#     model_name=os.getenv('AZURE_OPENAI_EVALUATION_MODEL'),
#     deployment_name=os.getenv('AZURE_OPENAI_EVALUATION_DEPLOYMENT'),
#     temperature=0.1,
#     max_tokens=300
# )
# # Specify model and deployment name for your system large language model
aoai_config = AzureOpenAIModelConfiguration.from_connection(
    connection=aoai_connection,
    model_name='gpt-4-32k',
    deployment_name='gpt-4-32k-0613',
    temperature=0.1,
    max_tokens=300
)

In [None]:
import os

In [None]:
from openai import AsyncAzureOpenAI
oai_client = AsyncAzureOpenAI(api_key=os.getenv('AZURE_OPENAI_KEY'), azure_endpoint=os.getenv('AZURE_OPENAI_ENDPOINT'), api_version="2024-02-15-preview")
async_oai_chat_completion_fn = oai_client.chat.completions.create

In [None]:
function_simulator = Simulator.from_fn(
    fn=async_oai_chat_completion_fn, # Simulate against a local function OR callback function
    simulator_connection=aoai_config # Configure the simulator
) 

In [None]:
template = Simulator.get_template("summarization")

In [None]:
template_params = [
    {
        "name": "John Doe",
        "chatbot_name": "AI Chatbot",
        "filename": "company_report.txt",
        "file_content": "The company is doing well. The stock price is up 10% this quarter. The company is expanding into new markets. The company is investing in new technology. The company is hiring new employees. The company is launching new products. The company is opening new stores. The company is increasing its market share. The company is increasing its revenue. The company is increasing its profits.",
    },
    {
        "name": "Jane Doe",
        "chatbot_name": "AI Chatbot",
        "filename": "sales_report.txt",
        "file_content": "The sales team is doing well. The sales team is meeting its targets. The sales team is increasing its revenue. The sales team is increasing its market share. The sales team is increasing its profits. The sales team is expanding into new markets. The sales team is launching new products. The sales team is opening new stores. The sales team is hiring new employees. The sales team is investing in new technology.",
    },
]

In [None]:
outputs = await function_simulator.simulate_async(
    template,
    parameters=template_params,
    max_conversation_turns=2,
    api_call_delay_sec=10,
    max_simulation_results=10,
)

### Generate QA from files

In [None]:
from pathlib import Path
# product sample data
texts_glob = Path("../data/product-info/")
# azureai-samples data
#texts_glob = Path("../../azureai-samples/scenarios/generate-synthetic-data/ai-generated-data-qna/data/data_generator_texts/")
files = Path.glob(texts_glob, pattern="**/*")
files = [file for file in files if Path.is_file(file)]

In [None]:
from langchain.document_loaders import UnstructuredFileLoader
from langchain.text_splitter import NLTKTextSplitter
import nltk

# download pre-trained Punkt tokenizer for sentence splitting
nltk.download("punkt")

text_splitter = NLTKTextSplitter.from_tiktoken_encoder(
    encoding_name="cl100k_base",  # encoding for gpt-4 and gpt-35-turbo
    chunk_size=300,  # number of tokens to split on
    chunk_overlap=0,
)
texts = []
for file in files:
    loader = UnstructuredFileLoader(file)
    docs = loader.load()
    data = docs[0].page_content
    texts += text_splitter.split_text(data)
print(f"Number of texts after splitting: {len(texts)}")

In [None]:
from azure.ai.generative.synthetic.qa import QADataGenerator, QAType

## Uses AzureOpenAI environment variables

# For granular logs you may set DEBUG log level:
import logging
#logging.basicConfig(level=logging.DEBUG)
logging.basicConfig(level=logging.ERROR)

model_config = {
    "deployment": "gpt-4-1106-preview",
    "model": "gpt-4",
    "max_tokens": 2000,
}

qa_generator = QADataGenerator(model_config=model_config)

In [None]:
QADataGenerator(model_config=model_config)

#### Generate QA asynchronously

In [None]:
from azure.ai.generative.synthetic.qa import QADataGenerator, QAType
import asyncio
from collections import Counter
from typing import Dict

concurrency = 3  # number of concurrent calls
sem = asyncio.Semaphore(concurrency)

qa_type = QAType.CONVERSATION


async def generate_async(text: str) -> Dict:
    async with sem:
        return await qa_generator.generate_async(
            text=text,
            qa_type=qa_type,
            num_questions=3,  # Number of questions to generate per text
        )


results = await asyncio.gather(*[generate_async(text) for text in texts], return_exceptions=True)

question_answer_list = []
token_usage = Counter()
for result in results:
    if isinstance(result, Exception):
        raise result  # exception raised inside generate_async()
    question_answer_list.append(result["question_answers"])
    token_usage += result["token_usage"]

print("Successfully generated QAs")

In [None]:
print(f"Tokens used: {result['token_usage']}")

In [None]:
question_answer_list

### Save the generated data for later use
Let us save the generated QnA in a format which can be understood by prompt flow (for evaluation, batch runs). 

In [None]:
import os
generated_dir = "../data/generated_qa"
os.makedirs(generated_dir, exist_ok=True)
output_file = os.path.join(generated_dir, "generated_qa.jsonl")
qa_generator.export_to_file(output_file, qa_type, question_answer_list)