# LangWatch Evaluation Tracking

## Simple Evaluation Loop

In [1]:
import langwatch

langwatch.login()

LangWatch API key is already set, if you want to login again, please call as langwatch.login(relogin=True)


In [2]:
import random
import pandas as pd
import time

df = pd.DataFrame(
    [
        {
            "question": "What is LangWatch?",
            "answer": "LangWatch is a platform for evaluating and improving language models.",
        },
        {
            "question": "How do I use LangWatch?",
            "answer": "You can use LangWatch by installing the LangWatch SDK and then calling the LangWatch API.",
        },
        {
            "question": "Does LangWatch support multiple language models?",
            "answer": "Yes, LangWatch is compatible with all language models by using LiteLLM under the hood.",
        },
        {
            "question": "Can I visualize evaluation metrics in LangWatch?",
            "answer": "Yes, LangWatch provides dashboards for visualizing key evaluation metrics.",
        },
        {
            "question": "Is there a free tier for LangWatch?",
            "answer": "LangWatch offers a free tier with limited usage, ideal for small projects and evaluation.",
        },
        {
            "question": "Where can I find documentation for LangWatch?",
            "answer": "You can find the official documentation on the LangWatch website or GitHub repository.",
        },
    ]
)

evaluation = langwatch.evaluation.init("my-incredible-experiment")


@langwatch.trace()
def agent(question):
    time.sleep(random.randint(0, 10))
    return {"text": "foo bar"}


for index, row in evaluation.loop(df.iterrows()):
    result = agent(row["question"])  # your code

    score = random.randint(0, 80) / 100 + 0.2
    evaluation.log("sample_metric", index=index, score=score, passed=score > 0.5)


2025-06-04 23:07:57,550 - langwatch.utils.initialization - INFO - Setting up LangWatch client...
2025-06-04 23:07:57,555 - langwatch.client - INFO - Configuring OTLP exporter with endpoint: http://localhost:5560/api/otel/v1/traces
2025-06-04 23:07:57,556 - langwatch.client - INFO - Registering atexit handler to flush tracer provider on exit
2025-06-04 23:07:57,556 - langwatch.client - INFO - Successfully configured tracer provider with OTLP exporter
2025-06-04 23:07:57,556 - langwatch.utils.initialization - INFO - LangWatch client setup complete
Follow the results at: http://localhost:5560/inbox-narrator/experiments/my-incredible-experiment?runId=purring-hungry-orca


Evaluating:   0%|          | 0/6 [00:00<?, ?it/s]

## Parallel Evaluation Loop

In [3]:
import random
import time

langwatch.setup()
evaluation = langwatch.evaluation.init("my-incredible-experiment")

@langwatch.trace()
def agent(question):
    time.sleep(random.randint(0, 10))
    return "foo parallel"

for index, row in evaluation.loop(df.iterrows(), threads=4):
    def evaluate(index, row):
        result = agent(row["question"])
        evaluation.log("sample_metric", index=index, data={"response": result}, score=1)
    evaluation.submit(evaluate, index, row)

2025-06-04 23:08:39,668 - langwatch.client - INFO - Registering atexit handler to flush tracer provider on exit
Follow the results at: http://localhost:5560/inbox-narrator/experiments/my-incredible-experiment?runId=ruddy-numbat-of-domination


Evaluating:   0%|          | 0/6 [00:00<?, ?it/s]