## Running chains on Datasets

In [1]:
from langchain.client import LangChainPlusClient

client = LangChainPlusClient()

## Seed an example dataset

If you have been using LangChainPlus already, you may have datasets available. You can generate these from `Run`'s captured through the LangChain tracing API.

```
datasets = client.list_datasets()
datasets
```

Assuming you're running locally for the first time, you may not have runs stored, so we'll first upload an example evaluation dataset.

In [2]:
# !pip install datasets > /dev/null
# !pip install pandas > /dev/null

In [3]:
import pandas as pd
from langchain.evaluation.loading import load_dataset

dataset = load_dataset("agent-search-calculator")
df = pd.DataFrame(dataset, columns=["question", "answer"])
df.columns = ["input", "output"] # The chain we want to evaluate below expects inputs with the "input" key 
df.head()

Found cached dataset json (/Users/wfh/.cache/huggingface/datasets/LangChainDatasets___json/LangChainDatasets--agent-search-calculator-8a025c0ce5fb99d2/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)


  0%|          | 0/1 [00:00<?, ?it/s]

Unnamed: 0,input,output
0,How many people live in canada as of 2023?,"approximately 38,625,801"
1,who is dua lipa's boyfriend? what is his age r...,her boyfriend is Romain Gravas. his age raised...
2,what is dua lipa's boyfriend age raised to the...,her boyfriend is Romain Gravas. his age raised...
3,how far is it from paris to boston in miles,"approximately 3,435 mi"
4,what was the total number of points scored in ...,approximately 2.682651500990882


In [4]:
from uuid import uuid4
dataset_name = f"calculator_example_{uuid4()}.csv"

In [5]:
dataset = client.upload_dataframe(df, 
                        name=dataset_name,
                        description="Acalculator example dataset",
                        input_keys=["input"],
                        output_keys=["output"],
               )

## Running a Chain on a Traced Dataset

Once you have a dataset, you can run a chain over it to see its results. The run traces will automatically be associated with the dataset for easy attribution and analysis.

**First, we'll define the chain we wish to run over the dataset.**

In this case, we're using an agent, but it can be any simple chain.

In [6]:
from langchain.chat_models import ChatOpenAI
from langchain.agents import initialize_agent, load_tools
from langchain.agents import AgentType

llm = ChatOpenAI(temperature=0)
tools = load_tools(['serpapi', 'llm-math'], llm=llm)
agent = initialize_agent(tools, llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION, verbose=False)

**Now we're ready to run the chain!**

In [7]:
chain_results = await client.arun_chain_on_dataset(
    dataset_name=dataset_name,
    chain=agent,
    batch_size=5, # Optional, sets the number of examples to run at a time
    session_name="Calculator Dataset Runs", # Optional. Will be sent to the 'default' session otherwise
)

Chain failed for example 514f5a58-1594-4e51-85c6-ea301a66fe57. Error: 'age'. Please try again with a valid numerical expression
Chain failed for example c8a38da2-5f07-4204-b340-ccb75a127c47. Error: invalid syntax. Perhaps you forgot a comma? (<expr>, line 1). Please try again with a valid numerical expression
