# Exporting LLM Runs and Feedback
[![Open In Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/langchain-ai/langsmith-cookbook/blob/main/exploratory-data-analysis/exporting-llm-runs-and-feedback/llm_run_etl.ipynb)

Understanding how your LLM app interacts with users is crucial. LangSmith offers a number of useful ways to interact with and annotate trace data directly in the app. You can also easily query that trace data so you can process it with your tool of choice.

This tutorial guides you through exporting LLM traces and associated feedback from LangSmith for further analysis. By the end, you'll be able to export a flat table of LLM run information that you can analyze, enrich, or use for model training.

Before we start, ensure you have a LangChain project with some logged traces. You can generate some using almost any of the other recipes in this cookbook. The overall steps will be:

1. Query runs, filtering by time, tags, or other attributes.
2. Add in associated feedback metrics (if captured)
3. Export to analysis tool.

To make things easy, we will be loading the data into a pandas dataframe. We will be doing the ETL on LLM runs logged from LangChain, but you can modify the code below to handle whatever schema is used by your deployed model. Now let's set up our environment!

#### Setup

First, install langsmith and pandas and set your langsmith API key to connect to your project.
We will also install LangChain to use one of its formatting utilities.

In [1]:
%pip install -U langchain langsmith pandas seaborn --quiet

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m974.6/974.6 kB[0m [31m6.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m125.3/125.3 kB[0m [31m5.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m13.0/13.0 MB[0m [31m28.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m294.9/294.9 kB[0m [31m12.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m321.8/321.8 kB[0m [31m5.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m145.0/145.0 kB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
cudf-cu12 24.4.1 requires pandas<2.2.2dev0,>=2.0, but you have pandas 2.2.2 which 

In [2]:
%env LANGCHAIN_API_KEY="paste_langchain_key_here"

env: LANGCHAIN_API_KEY="paste_langchain_key_here"


In [3]:
from langsmith import Client

client = Client()

## 1. Query Runs

Now that the environment is ready, we will load the run data from LangSmith. Let's try loading all our LLM runs from the past week. To do so, we will filter for runs with the "llm" `run_type` from the past week.

Please reference the [docs](https://docs.smith.langchain.com/tracing/faq/querying_traces) for guidance on more complex filters (using metadata, tags, and other attributes).


In [10]:
from datetime import datetime, timedelta

start_time = datetime.utcnow() - timedelta(days=2)

runs = list(
    client.list_runs(
        project_name="default",
        run_type="llm",
        start_time=start_time,
    )
)

In [12]:
import pandas as pd

df = pd.DataFrame(
    [
        {
            "name": run.name,
            "model": run.extra["invocation_params"][
                "model"
            ],  # The parameters used when invoking the model are nested in the extra info
            **run.inputs,
            **(run.outputs or {}),
            "error": run.error,
            "latency": (run.end_time - run.start_time).total_seconds()
            if run.end_time
            else None,  # Pending runs have no end time
            "prompt_tokens": run.prompt_tokens,
            "completion_tokens": run.completion_tokens,
            "total_tokens": run.total_tokens,
        }
        for run in runs
    ],
    index=[run.id for run in runs],
)

df.head(5)

Unnamed: 0,name,model,messages,llm_output,run,generations,error,latency,prompt_tokens,completion_tokens,total_tokens
0d529c8a-6162-4179-9791-37db70734fbc,ChatOpenAI,gpt-3.5-turbo,"[{'lc': 1, 'type': 'constructor', 'id': ['lang...","{'model_name': 'gpt-3.5-turbo', 'system_finger...",,"[{'text': '{""follow_up_questions"": [""Can you p...",,1.606764,1435,47,1482
18971d27-61f4-44b1-8bbf-30a27958dd28,ChatOpenAI,gpt-3.5-turbo,"[{'lc': 1, 'type': 'constructor', 'id': ['lang...","{'model_name': 'gpt-3.5-turbo', 'system_finger...",,[{'text': ' ### Average Hydrophobicity of PF00...,,0.99936,109,36,145
4a917e49-9e76-41e5-acfd-ad98697443b1,ChatOpenAI,gpt-4,"[{'lc': 1, 'type': 'constructor', 'id': ['lang...","{'model_name': 'gpt-4', 'system_fingerprint': ...",,[{'text': 'SELECT AVG(avg_hydrophobicity) FROM...,,2.039317,949,22,971
818397a9-ec23-4c02-8169-dfe5f6fbe143,ChatOpenAI,gpt-4,"[{'lc': 1, 'type': 'constructor', 'id': ['lang...","{'model_name': 'gpt-4', 'system_fingerprint': ...",,"[{'text': '', 'generation_info': {'finish_reas...",,1.698907,543,22,565
984d8981-44f6-48e2-b4e2-4dadfb495406,ChatOpenAI,gpt-4,"[{'lc': 1, 'type': 'constructor', 'id': ['lang...","{'model_name': 'gpt-4', 'system_fingerprint': ...",,"[{'text': 'query', 'generation_info': {'finish...",,0.76966,243,1,244


#### Stringify

If you are using a regular "Completion" style model that expects string input and returns a single string output, you can easily view the text data without further formatting.

For chat models, the message dictionaries contain a lot of information that can be hard to read. If you want to _just_ see the string, you may consider parsing the message content into a human-readable format. You can use LangChain's `get_buffer_string` helper to do so, as shown below.

In [47]:
from typing import Optional
from langchain_core.load import load
from langchain_core.messages import get_buffer_string


def stringify_inputs(inputs: dict) -> dict:
    return {"messages": get_buffer_string(load(inputs["messages"]))}


def stringify_outputs(outputs: Optional[dict]) -> dict:
    if not outputs:
        return {}
    if isinstance(outputs["generations"], dict):
        # Function Message
        return {
            "generated_message": get_buffer_string(
                [load(outputs["generations"]["message"])]
            )
        }
    else:
        return {
            "generated_message": get_buffer_string(
                [load(outputs["generations"][0]["message"])]
            )
        }


df = pd.DataFrame(
    [
        {
            "model": run.extra["invocation_params"][
                "model"
            ],  # The parameters used when invoking the model are nested in the extra info
            **stringify_inputs(run.inputs),
            **stringify_outputs(run.outputs),
            "error": run.error,
            "latency": (run.end_time - run.start_time).total_seconds()
            if run.end_time
            else None,  # Pending runs have no end time
            "prompt_tokens": run.prompt_tokens,
            "completion_tokens": run.completion_tokens,
            "total_tokens": run.total_tokens,
        }
        for run in runs
    ],
    index=[run.id for run in runs],
)

df.head(5)

Unnamed: 0,model,messages,generated_message,error,latency,prompt_tokens,completion_tokens,total_tokens
0d529c8a-6162-4179-9791-37db70734fbc,gpt-3.5-turbo,System: \n\n You are an intelligent assista...,"AI: {""follow_up_questions"": [""Can you provide ...",,1.606764,1435,47,1482
18971d27-61f4-44b1-8bbf-30a27958dd28,gpt-3.5-turbo,"Human: \nGiven the following user question, co...",AI: \n### Average Hydrophobicity of PF00063\n\...,,0.99936,109,36,145
4a917e49-9e76-41e5-acfd-ad98697443b1,gpt-4,System: \nYou are a MySQL expert. Given an inp...,AI: SELECT AVG(avg_hydrophobicity) FROM protei...,,2.039317,949,22,971
818397a9-ec23-4c02-8169-dfe5f6fbe143,gpt-4,Human: \nReturn the names of ALL the SQL table...,AI:,,1.698907,543,22,565
984d8981-44f6-48e2-b4e2-4dadfb495406,gpt-4,Human: \nPlease classify the following input s...,AI: query,,0.76966,243,1,244


In [48]:
df.to_csv("langsmith_data.csv")

## Extracting data related to the classification step in the langchain flow

In [39]:
# a filtered data set that only contains rows that have the 'generated_message' column with vaules of "AI: query" or "AI: conversation"
classification = df[df["generated_message"].str.contains("AI: query") | df["generated_message"].str.contains("AI: conversation")]

# create a new column called userQuery which has the content of the messages column by only the content after the string "Now, classify the following input: Input:"
classification["userQuery"] = classification['messages'].apply(
    lambda x: x.split("input:\nInput: ")[1].split("\nClassification:\n")[0] if "input:\nInput: " in x and "\nClassification:\n" in x else None
)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  classification["userQuery"] = classification['messages'].apply(


In [45]:
classification = classification[["userQuery", "generated_message"]]

In [49]:
classification.head(5)

Unnamed: 0,userQuery,generated_message
984d8981-44f6-48e2-b4e2-4dadfb495406,"""How can I find the average hydrophobicity of ...",AI: query
b5fda785-38aa-49a4-820e-d7e64536d57c,"""What specific details are you looking to retr...",AI: conversation
250be867-86d3-46c7-8a03-f29ff5e4205a,"""What kind of information is stored in the gen...",AI: query
4404e737-ce2d-4ec9-828a-6ad2d7f0957a,"""How can you help me?""",AI: conversation
6781a6cb-2a84-45f6-a009-05c6d40ed2be,"""Can you provide more details about the codon_...",AI: query


In [46]:
# export the above dataframe as a csv file
classification.to_csv("classification.csv")