# Upload of evaluators
In this notebook we are demonstrating the upload of the standard evaluators.

### Import

In [35]:
import os
import json
import pandas as pd
import shutil
import uuid
import yaml

from azure.ai.ml import MLClient
from azure.identity import DefaultAzureCredential
from azure.ai.ml.entities import (
    Model
)

from promptflow.client import PFClient
from promptflow.evals.evaluate import evaluate
from promptflow.evals.evaluators import F1ScoreEvaluator

## End to end demonstration of evaluator saving and uploading to Azure.
### Saving the standard evaluators to the flex format.
First we will create the promptflow client, which will be used to save the existing flows.

In [2]:
pf = PFClient()

We will use F1 score evaluator from the standard evaluator set and save it to local directory. 

In [3]:
pf.flows.save(F1ScoreEvaluator, path='./f1_score')

Let us inspect, what has been saved

In [5]:
print('\n'.join(os.listdir('f1_score')))

f1_score.py
flow
flow.flex.yaml
__init__.py


The file, defining entrypoint of our model is called flow.flex.yaml, let us display it.

In [6]:
with open(os.path.join('f1_score', 'flow.flex.yaml')) as fp:
    flex_definition = yaml.safe_load(fp)
print(f"The evaluator entrypoint is {flex_definition['entry']}")

The evaluator entrypoint is f1_score:F1ScoreEvaluator


In [7]:
pf = PFClient()
run = pf.run(
    flow='f1_score',
    data='data.jsonl',
    name=f'test_{uuid.uuid1()}',
    stream=True
)

Prompt flow service has started...


[2024-04-23 11:51:20,257][promptflow._sdk._orchestrator.run_submitter][INFO] - Submitting run test_76ad6428-01a2-11ef-b9d7-00224877b1ea, log path: C:\Users\anksing\.promptflow\.runs\test_76ad6428-01a2-11ef-b9d7-00224877b1ea\logs.txt


You can view the traces from local: http://localhost:64667/v1.0/ui/traces/?#run=test_76ad6428-01a2-11ef-b9d7-00224877b1ea
2024-04-23 11:51:20 -0700   40292 execution.bulk     INFO     Current system's available memory is 31091.125MB, memory consumption of current process is 297.828125MB, estimated available worker count is 31091.125/297.828125 = 104
2024-04-23 11:51:20 -0700   40292 execution.bulk     INFO     Set process count to 3 by taking the minimum value among the factors of {'default_worker_count': 4, 'row_count': 3, 'estimated_worker_count_based_on_memory_usage': 104}.
2024-04-23 11:51:22 -0700   40292 execution.bulk     INFO     Process name(SpawnProcess-2)-Process id(17416)-Line number(0) start execution.
2024-04-23 11:51:22 -0700   40292 execution.bulk     INFO     Process name(SpawnProcess-4)-Process id(27636)-Line number(1) start execution.
2024-04-23 11:51:22 -0700   40292 execution.bulk     INFO     Process name(SpawnProcess-3)-Process id(25448)-Line number(2) start exec

Now let us test the flow with the simple dataset, consisting of one ground true and one actual sentense.

In [8]:
data = pd.DataFrame({
    "ground_truth": ["January is the coldest winter month."],
    "answer": ["June is the coldest summer month."]
})
in_file = 'sample_data.jsonl'
data.to_json('sample_data.jsonl', orient='records', lines=True, index=False)

Load the evaluator in a FLEX format and test it.

In [9]:
flow_result = pf.test(flow='f1_score', inputs='sample_data.jsonl')
print(f"Flow outputs: {flow_result}")

Prompt flow service has started...
You can view the traces from local: http://localhost:64667/v1.0/ui/traces/?#collection=f1_score
Flow outputs: {'f1_score': 0.6}


Now we have all the tools to upload our model to Azure
### Uploading data to Azure
First we will need to authenticate to azure. For this purpose we will use the the configuration file of the net structure.
```json
{
    "resource_group_name": "resource-group-name",
    "workspace_name": "ws-name",
    "subscription_id": "subscription-uuid",
    "registry_name": "registry-name"
}
```


In [10]:
with open('config.json') as f:
    configuration = json.load(f)

#### Uploading to the workspace
In this scenario we will not need the `registry_name` in our configuration.

In [12]:
config_ws = configuration.copy()
del config_ws["registry_name"]

credential = DefaultAzureCredential()
ml_client = MLClient(
    credential=credential,
    **config_ws,
)

We will use the evaluator operations API to upload our model to workspace.

In [17]:
eval = Model(
    path="f1_score",
    name='F1Score-Evaluator',
    description="Measures the ratio of the number of shared words between the model generation and the ground truth answers.",
)
ml_client.evaluators.create_or_update(eval)

Model({'job_name': None, 'intellectual_property': None, 'is_anonymous': False, 'auto_increment_version': False, 'auto_delete_setting': None, 'name': 'F1Score-Evaluator', 'description': 'Measures the ratio of the number of shared words between the model generation and the ground truth answers.', 'tags': {}, 'properties': {'is-promptflow': 'true', 'is-evaluator': 'true'}, 'print_as_yaml': False, 'id': '/subscriptions/b17253fa-f327-42d6-9686-f3e553e24763/resourceGroups/anksing-vanilla-eval/providers/Microsoft.MachineLearningServices/workspaces/anksing-vanilla-eval/models/F1Score-Evaluator/versions/1', 'Resource__source_path': '', 'base_path': 'c:\\Users\\anksing\\Repos\\AML\\promptflow\\src\\promptflow-evals\\samples', 'creation_context': <azure.ai.ml.entities._system_data.SystemData object at 0x00000209025CD2B0>, 'serialize': <msrest.serialization.Serializer object at 0x00000209025CDA90>, 'version': '1', 'latest_version': None, 'path': 'azureml://subscriptions/b17253fa-f327-42d6-9686-f3e

Now we will retrieve model and check that it is functional.

In [19]:
ml_client.evaluators.download('F1Score-Evaluator', version='1', download_path='f1_score_downloaded')

Downloading the model LocalUpload/2d2a340d57888049d2a0ab4f58c25eb5/f1_score at f1_score_downloaded\F1Score-Evaluator\f1_score



In [22]:
flow_result = pf.test(flow=os.path.join('f1_score_downloaded', 'F1Score-Evaluator', 'f1_score'), inputs='data.jsonl')
print(f"Flow outputs: {flow_result}")

Prompt flow service has started...
You can view the traces from local: http://localhost:64667/v1.0/ui/traces/?#collection=f1_score
Flow outputs: {'f1_score': 0.015384615384615384}


In [23]:
shutil.rmtree('f1_score_downloaded')
assert not os.path.isdir('f1_score_downloaded')

#### Uploading to the registry
In this scenario we will not need the `workspace_name` in our configuration.

In [24]:
config_reg = configuration.copy()
del config_reg["workspace_name"]

ml_client = MLClient(
    credential=credential,
    **config_reg
)

We are creating new eval here, because create_or_update changes the model inplace, adding non existing link to workspace

In [27]:
eval = Model(
    path="f1_score",
    name='F1Score-Evaluator',
    description="Measures the ratio of the number of shared words between the model generation and the ground truth answers.",
    properties={"show-artifact": "true"}
)
ml_client.evaluators.create_or_update(eval)

[32mUploading f1_score (0.01 MBs): 100%|##########| 11837/11837 [00:03<00:00, 3089.58it/s]
[39m



Model({'job_name': None, 'intellectual_property': None, 'is_anonymous': False, 'auto_increment_version': False, 'auto_delete_setting': None, 'name': 'F1Score-Evaluator', 'description': 'Measures the ratio of the number of shared words between the model generation and the ground truth answers.', 'tags': {}, 'properties': {'show-artifact': 'true', 'is-promptflow': 'true', 'is-evaluator': 'true'}, 'print_as_yaml': False, 'id': 'azureml://registries/azureml-dev/models/F1Score-Evaluator/versions/2', 'Resource__source_path': '', 'base_path': 'c:\\Users\\anksing\\Repos\\AML\\promptflow\\src\\promptflow-evals\\samples', 'creation_context': <azure.ai.ml.entities._system_data.SystemData object at 0x0000020902407C10>, 'serialize': <msrest.serialization.Serializer object at 0x0000020954791310>, 'version': '2', 'latest_version': None, 'path': 'https://amldevos5mguse01.blob.core.windows.net/azureml-de-c8c82dfc-230d-5228-a79b-1e6ef394d7be/f1_score', 'datastore': None, 'utc_time_created': None, 'flavo

Now we will perform the same sanity check, we have done for the workspace.

In [29]:
ml_client.evaluators.download('F1Score-Evaluator', version='1', download_path='f1_score_downloaded')
flow_result = pf.test(flow=os.path.join('f1_score_downloaded', 'F1Score-Evaluator', 'f1_score'), inputs='data.jsonl')
print(f"Flow outputs: {flow_result}")

Downloading the model f1_score at f1_score_downloaded\F1Score-Evaluator\f1_score



Prompt flow service has started...
You can view the traces from local: http://localhost:64667/v1.0/ui/traces/?#collection=f1_score
Flow outputs: {'f1_score': 0.015384615384615384}


In [None]:
from promptflow.core import Flow

# This is not working but it should. Will uncomment once PF team provides a fix.
# f = Flow.load('f1_score_downloaded/F1Score-Evaluator/f1_score')
# f(question='What is the capital of France?', answer='Paris', ground_truth='Paris is the capital of France.')

Finally, we will do the cleanup.

In [40]:
shutil.rmtree('f1_score_downloaded')
assert not os.path.isdir('f1_score_downloaded')