# Running Chains on Traced Datasets

Developing applications with language models can be uniquely challenging. To manage this complexity and ensure reliable performance, LangChain provides tracing and evaluation functionality. This notebook demonstrates how to run Chains, which are language model functions, as well as Chat models, and LLMs on previously captured datasets or traces. Some common use cases for this approach include:

- Running an evaluation chain to grade previous runs.
- Comparing different chains, LLMs, and agents on traced datasets.
- Executing a stochastic chain multiple times over a dataset to generate metrics before deployment.

Please note that this notebook assumes you have LangChain+ tracing running in the background. It is also configured to work only with the V2 endpoints. To set it up, follow the [tracing directions here](..\/..\/tracing\/local_installation.md).
 
We'll start by creating a client to connect to LangChain+.

In [1]:
from langchain.client import LangChainPlusClient

client = LangChainPlusClient(
    api_url="http://localhost:8000",
    api_key=None,
    # tenant_id="your_tenant_uuid",  # This is required when connecting to a hosted LangChain instance
)
print("You can click the link below to view the UI")
client

You can click the link below to view the UI


## Capture traces

If you have been using LangChainPlus already, you may have datasets available. To view all saved datasets, run:

```
datasets = client.list_datasets()
print(datasets)
```

Datasets can be created in a number of ways, most often by collecting `Run`'s captured through the LangChain tracing API and converting a set of runs to a dataset.

The V2 tracing API is currently accessible using the `tracing_v2_enabled` context manager. Assuming the server was succesfully started above, running LangChain Agents, Chains, LLMs, and other primitives will then automatically capture traces. We'll start with a simple math example.

In [2]:
from langchain.callbacks.manager import tracing_v2_enabled

In [3]:
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, load_tools
from langchain.agents import AgentType

llm = ChatOpenAI(temperature=0)
tools = load_tools(['serpapi', 'llm-math'], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False)

In [4]:
inputs = [
'How many people live in canada as of 2023?',
 "who is dua lipa's boyfriend? what is his age raised to the .43 power?",
 "what is dua lipa's boyfriend age raised to the .43 power?",
 'how far is it from paris to boston in miles',
 'what was the total number of points scored in the 2023 super bowl? what is that number raised to the .23 power?',
 'what was the total number of points scored in the 2023 super bowl raised to the .23 power?',
 'how many more points were scored in the 2023 super bowl than in the 2022 super bowl?',
 'what is 153 raised to .1312 power?',
 "who is kendall jenner's boyfriend? what is his height (in inches) raised to .13 power?",
 'what is 1213 divided by 4345?'
]
with tracing_v2_enabled(session_name="search_and_math_chain"):
    for input_example in inputs:
        try:
            print(agent.run(input_example))
        except Exception as e:
            # The agent sometimes makes mistakes! These will be captured by the tracing.
            print(e)
           



The current population of Canada as of 2023 is 38,681,797.
Anwar Hadid is Dua Lipa's boyfriend and his age raised to the 0.43 power is approximately 3.87.
'age'. Please try again with a valid numerical expression
The distance between Paris and Boston is approximately 3448 miles.
unknown format from LLM: Assuming we don't have any information about the actual number of points scored in the 2023 super bowl, we cannot provide a mathematical expression to solve this problem.
invalid syntax. Perhaps you forgot a comma? (<expr>, line 1). Please try again with a valid numerical expression
0
1.9347796717823205
1.275494929129063
1213 divided by 4345 is approximately 0.2791.


## Creating the Dataset

Now that you've captured a session entitled 'search_and_math_chain', it's time to create a dataset:

   1. Navigate to the UI by clicking on the link below.
   2. Select the 'search_and_math_chain' session from the list.
   3. Next to the fist example, click "+ to Dataset".
   4. Click "Create Dataset" and create a title **"calculator-example-dataset"**.
   5. Add the other examples to the dataset as well

In [9]:
dataset_name = "calculator-example-dataset"

In [5]:
client

**Optional:** If you didn't run the trace above, you can also create datasets by uploading dataframes or CSV files.

In [None]:
# !pip install datasets > /dev/null
# !pip install pandas > /dev/null

In [None]:
# import pandas as pd
# from langchain.evaluation.loading import load_dataset

# dataset = load_dataset("agent-search-calculator")
# df = pd.DataFrame(dataset, columns=["question", "answer"])
# df.columns = ["input", "output"] # The chain we want to evaluate below expects inputs with the "input" key 
# df.head()

In [None]:
# dataset_name = "calculator-example-dataset"

# if dataset_name not in set([dataset.name for dataset in client.list_datasets()]):
#     dataset = client.upload_dataframe(df, 
#                             name=dataset_name,
#                             description="A calculator example dataset",
#                             input_keys=["input"],
#                             output_keys=["output"],
#                    )

## Running a Chain on a Traced Dataset

Once you have a dataset, you can run a compatible chain or other object over it to see its results. The run traces will automatically be associated with the dataset for easy attribution and analysis.

**First, we'll define the chain we wish to run over the dataset.**

In this case, we're using an agent, but it can be any simple chain.

In [6]:
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, load_tools
from langchain.agents import AgentType

llm = ChatOpenAI(temperature=0)
tools = load_tools(['serpapi', 'llm-math'], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False)

**Now we're ready to run the chain!**

The docstring below hints ways you can configure the method to run.

In [7]:
?client.arun_on_dataset

[0;31mSignature:[0m
[0mclient[0m[0;34m.[0m[0marun_on_dataset[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mdataset_name[0m[0;34m:[0m [0;34m'str'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mllm_or_chain[0m[0;34m:[0m [0;34m'Union[Chain, BaseLanguageModel]'[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mconcurrency_level[0m[0;34m:[0m [0;34m'int'[0m [0;34m=[0m [0;36m5[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mnum_repetitions[0m[0;34m:[0m [0;34m'int'[0m [0;34m=[0m [0;36m1[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0msession_name[0m[0;34m:[0m [0;34m'Optional[str]'[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mverbose[0m[0;34m:[0m [0;34m'bool'[0m [0;34m=[0m [0;32mFalse[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0;34m'Dict[str, Any]'[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Run the chain on a dataset and store traces to the specified session name.

Args:
    dataset_name: Name of th

In [10]:
chain_results = await client.arun_on_dataset(
    dataset_name=dataset_name,
    llm_or_chain=agent,
    verbose=True
)

# Sometimes, the agent will error due to parsing issues, incompatible tool inputs, etc.
# These are logged as warnings here and captured as errors in the tracing UI.

Chain failed for example bb17c39c-c90c-4706-8410-e501e89a81fb. Error: 'age'. Please try again with a valid numerical expression


Processed examples: 1

Chain failed for example e66a366d-f5d3-4dcc-9ec4-aae88369f45e. Error: invalid syntax. Perhaps you forgot a comma? (<expr>, line 1). Please try again with a valid numerical expression


Processed examples: 6

## Reviewing the Chain Results

You can review the results of the run in the tracing UI below and navigating to the session 
with the title 'calculator-example-dataset-AgentExecutor-YYYY-MM-DD-HH-MM-SS'

In [11]:
# You can navigate to the UI by clicking on the link below
client

### Running a Chat Model over a Traced Dataset

We've shown how to run a _chain_ over a dataset, but you can also run an LLM or Chat model over a datasets formed from runs. 

First, we'll show an example using a ChatModel. This is useful for things like:
- Comparing results under different decoding parameters
- Comparing model providers
- Testing for regressions in model behavior
- Running multiple times with a temperature to gauge stability 

To speed things up, we'll upload a dataset we've previously captured directly to the tracing service.

In [12]:
import pandas as pd
from langchain.evaluation.loading import load_dataset

chat_dataset = load_dataset("two-player-dnd")
chat_df = pd.DataFrame(chat_dataset)
chat_df.head()

Found cached dataset parquet (/Users/wfh/.cache/huggingface/datasets/LangChainDatasets___parquet/LangChainDatasets--two-player-dnd-2e84407830cdedfc/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)


  0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,prompts,generations
0,[System: Here is the topic for a Dungeons & Dr...,"[[{'generation_info': None, 'message': {'conte..."
1,[System: Here is the topic for a Dungeons & Dr...,"[[{'generation_info': None, 'message': {'conte..."
2,[System: Here is the topic for a Dungeons & Dr...,"[[{'generation_info': None, 'message': {'conte..."
3,[System: Here is the topic for a Dungeons & Dr...,"[[{'generation_info': None, 'message': {'conte..."
4,[System: Here is the topic for a Dungeons & Dr...,"[[{'generation_info': None, 'message': {'conte..."


In [13]:
chat_dataset_name = "two-player-dnd"

if chat_dataset_name not in set([dataset.name for dataset in client.list_datasets()]):
    client.upload_dataframe(chat_df, 
                            name=chat_dataset_name,
                            description="An example dataset traced from chat models in a multiagent bidding dialogue",
                            input_keys=["prompts"],
                            output_keys=["generations"],
                   )

#### Reviewing behavior with temperature

Here, we will set `num_repetitions > 1` and set the temperature to 0.3 to see the variety of response types for a each example.


In [14]:
from langchain.chat_models import ChatAnthropic

chat_model = ChatAnthropic(temperature=.3)

In [15]:
chat_model_results = await client.arun_on_dataset(
    dataset_name=chat_dataset_name,
    llm_or_chain=chat_model,
    concurrency_level=5, # Optional, sets the number of examples to run at a time
    num_repetitions=3,
    verbose=True
)

# The 'experimental tracing v2' warning is expected, as we are still actively developing the v2 tracing API 
# Since we are running examples concurrently,  you may run into some RateLimit warnings from your model
# provider. In most cases, the tests will still run to completion (the wrappers have backoff).



Processed examples: 36

## Reviewing the Chat Model Results

You can review the latest runs by clicking on the link below and navigating to the "two-player-dnd" session.

In [16]:
client

## Running an LLM over a Traced Dataset

You can run an LLM over a dataset in much the same way as the chain and chat models, provided the dataset you've captured is in the appropriate format. We've cached one for you here, but using application-specific traces will be much more useful for your use cases.

In [17]:
from langchain.llms import OpenAI

llm = OpenAI(model_name='text-curie-001', temperature=0)

In [18]:
completions_dataset = load_dataset("state-of-the-union-completions")
completions_df = pd.DataFrame(completions_dataset)
completions_df.head()

Found cached dataset parquet (/Users/wfh/.cache/huggingface/datasets/LangChainDatasets___parquet/LangChainDatasets--state-of-the-union-completions-ae7542e7bbd0ae0a/0.0.0/2a3b91fbd88a2c90d1dbbb32b460cf621d31bd5b05b934492fdef7d8d6f236ec)


  0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,prompts,generations,ground_truth
0,"[Putin may circle Kyiv with tanks, but he will...",[[{'generation_info': {'finish_reason': 'stop'...,The pandemic has been punishing. \n\nAnd so ma...
1,"[Madam Speaker, Madam Vice President, our Firs...",[[]],With a duty to one another to the American peo...
2,[With a duty to one another to the American pe...,[[{'generation_info': {'finish_reason': 'stop'...,He thought he could roll into Ukraine and the ...
3,"[Madam Speaker, Madam Vice President, our Firs...",[[{'generation_info': {'finish_reason': 'lengt...,With a duty to one another to the American peo...
4,"[Please rise if you are able and show that, Ye...",[[]],And the costs and the threats to America and t...


In [19]:
completions_dataset_name = "state-of-the-union-completions"

if completions_dataset_name not in set([dataset.name for dataset in client.list_datasets()]):
    client.upload_dataframe(completions_df, 
                            name=completions_dataset_name,
                            description="An example dataset traced from completion endpoints over the state of the union address",
                            input_keys=["prompts"],
                            output_keys=["generations"],
                   )

In [20]:
# We also offer a synchronous method for running examples if a cahin's async methods aren't yet implemented
completions_model_results = client.run_on_dataset(
    dataset_name=completions_dataset_name,
    llm_or_chain=llm,
    num_repetitions=1,
    verbose=True
)



55 processed

## Reviewing the LLM Results

You can once again inspect the latest runs by clicking on the link below and navigating to the "two-player-dnd" session.

In [21]:
client