### Evaluate accuracy on production data

We are going to:

-   connect to Label Studio and retrieve the details of all the “tasks” associated with our Food11 project
-   connect to MinIO, and get the predicted class (from the tag!) of every object in the “production” bucket

and compare those, to evaluate the accuracy of our system on “production” data.

In [17]:
# runs inside Jupyter container on node-eval-loop
import requests
import boto3 
from urllib.parse import urlparse
from collections import defaultdict, Counter
import os
import json

First, we need to get the details we will need to authenticate to MinIO and to Label Studio. We passed these as environment variables to the Jupyter container:

In [2]:
# runs inside Jupyter container on node-eval-loop
LABEL_STUDIO_URL = os.environ['LABEL_STUDIO_URL']
LABEL_STUDIO_TOKEN = os.environ['LABEL_STUDIO_USER_TOKEN']
PROJECT_ID = 1  # use the first project set up in Label Studio

MINIO_URL = os.environ['MINIO_URL']
MINIO_ACCESS_KEY = os.environ['MINIO_USER']
MINIO_SECRET_KEY = os.environ['MINIO_PASSWORD']
BUCKET_NAME = "production"

In [30]:
LABEL_CONFIG = """
<View>
  <Text name="prompt" value="$prompt"/>
  <Text name="response" value="$response"/>
  <View style="box-shadow: 2px 2px 5px #999;
               padding: 20px; margin-top: 2em;
               border-radius: 5px;">
    <Header value="Choose text sentiment"/>
    <Choices name="sentiment" toName="response"
             choice="single" showInLine="true">
      <Choice value="Good Response"/>
      <Choice value="Bad Response"/>
    </Choices>
  </View>
</View>
"""

Create Label Studio project

In [31]:
# runs inside Jupyter container on node-eval-loop
headers = {"Authorization": f"Token {LABEL_STUDIO_TOKEN}"}

# configure a project - set up its name and the appearance of the labeling interface
project_config = {
    "title": "Taigi Medical LLM Random Test",
    "label_config": LABEL_CONFIG
}

# send it to Label Studio API
res = requests.post(f"{LABEL_STUDIO_URL}/api/projects", json=project_config, headers=headers)
if res.status_code == 201:
    PROJECT_ID = res.json()['id']
    print(f"Created new project: Taigi Medical LLM Random Test (ID {PROJECT_ID})")
else:
    raise Exception("Failed to create project:", res.text)

Created new project: Taigi Medical LLM Random Test (ID 4)


Let’s authenticate to MinIO:

In [32]:
# runs inside Jupyter container on node-eval-loop
public_ip = requests.get("http://169.254.169.254/latest/meta-data/public-ipv4").text.strip()
s3 = boto3.client(
    "s3",
    endpoint_url=f"http://{public_ip}:9000",
    aws_access_key_id=MINIO_ACCESS_KEY,
    aws_secret_access_key=MINIO_SECRET_KEY,
    region_name="us-east-1"
)

Get a list of objects in the “production” bucket:

In [33]:
all_keys = []
paginator = s3.get_paginator("list_objects_v2")
for page in paginator.paginate(Bucket=BUCKET_NAME):
    for obj in page.get("Contents", []):
        all_keys.append(obj["Key"])


and then, send those as tasks to Label Studio:

In [34]:
# generate a URL for each object we want to label, so that the annotator can view the image from their browser
tasks = []
for key in all_keys:
    obj = s3.get_object(Bucket=BUCKET_NAME, Key=key)
    body = obj["Body"].read().decode("utf-8")
    conversation = json.loads(body)

    tasks.append({"data": conversation, "meta": {"original_key": key}})

# then, send the lists of tasks to the Label Studio project
res = requests.post(
    f"{LABEL_STUDIO_URL}/api/projects/{PROJECT_ID}/import",
    json=tasks,
    headers=headers
)
if res.status_code == 201:
    print(f"Imported {len(tasks)} tasks into project {PROJECT_ID}")
else:
    raise Exception("Failed to import tasks:", res.text)


Imported 4 tasks into project 4
