# Pandas Dataframe

This notebook shows how to use agents to interact with a `Pandas DataFrame`. 

It is mostly optimized for question answering.

**NOTE: this agent calls the `Python` agent under the hood, which executes LLM generated Python code:**
* This can be bad if the LLM generated Python code is harmful.
* Use cautiously.

## Get dataframe

Download [`titanic.csv`](https://github.com/datasciencedojo/datasets/blob/master/titanic.csv).

In [5]:
import pandas as pd

df = pd.read_csv("titanic.csv")

## Initialize agent

In [7]:
from langchain import hub
from langchain.agents import AgentExecutor

In [8]:
# Prompt
instructions = """You are an assistant."""
base_prompt = hub.pull("langchain-ai/openai-functions-template")
prompt = base_prompt.partial(instructions=instructions)

#### Using `ZERO_SHOT_REACT_DESCRIPTION`

Initialize the agent using the `ZERO_SHOT_REACT_DESCRIPTION` agent type.

In [20]:
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(temperature=0)
agent = create_pandas_dataframe_agent(llm, df, verbose=True)

#### Using OpenAI Functions

This shows how to initialize the agent using the `OPENAI_FUNCTIONS` agent type.

#### Execute

In [23]:
agent.run("how many rows are there?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': 'df.shape[0]'}`


[0m[36;1m[1;3m891[0m[32;1m[1;3mThere are 891 rows in the dataframe.[0m

[1m> Finished chain.[0m


'There are 891 rows in the dataframe.'

In [24]:
agent.run("how many people have more than 3 siblings")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': "df[df['SibSp'] > 3]['PassengerId'].count()"}`


[0m[36;1m[1;3m30[0m[32;1m[1;3mThere are 30 people in the dataframe who have more than 3 siblings.[0m

[1m> Finished chain.[0m


'There are 30 people in the dataframe who have more than 3 siblings.'

In [6]:
agent.run("whats the square root of the average age?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to calculate the average age first
Action: python_repl_ast
Action Input: df['Age'].mean()[0m
Observation: [36;1m[1;3m29.69911764705882[0m
Thought:[32;1m[1;3m I now need to calculate the square root of the average age
Action: python_repl_ast
Action Input: math.sqrt(df['Age'].mean())[0m
Observation: [36;1m[1;3mNameError("name 'math' is not defined")[0m
Thought:[32;1m[1;3m I need to import the math library
Action: python_repl_ast
Action Input: import math[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3m I now need to calculate the square root of the average age
Action: python_repl_ast
Action Input: math.sqrt(df['Age'].mean())[0m
Observation: [36;1m[1;3m5.449689683556195[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: The square root of the average age is 5.449689683556195.[0m

[1m> Finished chain.[0m


'The square root of the average age is 5.449689683556195.'

## Multi DataFrame Example

This next part shows how the agent can interact with multiple dataframes passed in as a list.

In [7]:
df1 = df.copy()
df1["Age"] = df1["Age"].fillna(df1["Age"].mean())

In [8]:
agent = create_pandas_dataframe_agent(
    ChatOpenAI(temperature=0), [df, df1], verbose=True
)
agent.run("how many rows in the age column are different?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to compare the age columns in both dataframes
Action: python_repl_ast
Action Input: len(df1[df1['Age'] != df2['Age']])[0m
Observation: [36;1m[1;3m177[0m
Thought:[32;1m[1;3m I now know the final answer
Final Answer: 177 rows in the age column are different.[0m

[1m> Finished chain.[0m


'177 rows in the age column are different.'