-----

# **Multidataframs**

----

- In langchain, agents can access many dataframes at once.

## **Install Required Packages**

In [None]:
!pip install langchain langchain-experimental langchain-community watermark openai langchain-openai

Collecting langchain
  Downloading langchain-0.3.3-py3-none-any.whl.metadata (7.1 kB)
Collecting langchain-experimental
  Downloading langchain_experimental-0.3.2-py3-none-any.whl.metadata (1.7 kB)
Collecting langchain-community
  Downloading langchain_community-0.3.2-py3-none-any.whl.metadata (2.8 kB)
Collecting watermark
  Downloading watermark-2.5.0-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting openai
  Downloading openai-1.51.2-py3-none-any.whl.metadata (24 kB)
Collecting langchain-openai
  Downloading langchain_openai-0.2.2-py3-none-any.whl.metadata (2.6 kB)
Collecting langchain-core<0.4.0,>=0.3.10 (from langchain)
  Downloading langchain_core-0.3.10-py3-none-any.whl.metadata (6.3 kB)
Collecting langchain-text-splitters<0.4.0,>=0.3.0 (from langchain)
  Downloading langchain_text_splitters-0.3.0-py3-none-any.whl.metadata (2.3 kB)
Collecting langsmith<0.2.0,>=0.1.17 (from langchain)
  Downloading langsmith-0.1.134-py3-none-any.whl.metadata (13 kB)
Collecting tenacity!=8.4.0,<9.0

## **Import Required Libraries**

In [None]:
import os
import warnings
warnings.filterwarnings("ignore")

In [None]:
import pandas as pd
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain_openai import OpenAI

## **Set Up OpenAI API Key**

In [None]:
os.environ['OPENAI_API_KEY'] = "YOUR_API_KEY"

## **Read Data**

In [None]:
url = "https://raw.githubusercontent.com/adamerose/datasets/master/titanic.csv"
df=pd.read_csv(url)
print(df.shape)
df.head()

(891, 15)


Unnamed: 0,survived,pclass,sex,age,sibsp,parch,fare,embarked,class,who,adult_male,deck,embark_town,alive,alone
0,0,3,male,22.0,1,0,7.25,S,Third,man,True,,Southampton,no,False
1,1,1,female,38.0,1,0,71.2833,C,First,woman,False,C,Cherbourg,yes,False
2,1,3,female,26.0,0,0,7.925,S,Third,woman,False,,Southampton,yes,True
3,1,1,female,35.0,1,0,53.1,S,First,woman,False,C,Southampton,yes,False
4,0,3,male,35.0,0,0,8.05,S,Third,man,True,,Southampton,no,True


### **Agent to Interact With Single Dataframe**

In [None]:
llm=OpenAI()

In [None]:
# Creating an agent that can interact with a Pandas DataFrame using a language model (llm)
# The agent is initialized with the DataFrame (df) and configured with the following options:
# - verbose: If set to True, the agent will provide detailed output during its operation
# - allow_dangerous_code: If set to True, the agent is permitted to execute code that could be potentially unsafe

agent = create_pandas_dataframe_agent(llm, df, verbose=True, allow_dangerous_code=True)

In [None]:
agent.invoke("How many rows are there")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to count the number of rows in the dataframe
Action: python_repl_ast
Action Input: len(df)[0m[36;1m[1;3m891[0m[32;1m[1;3m891 is the number of rows in the dataframe
Final Answer: There are 891 rows in the dataframe.[0m

[1m> Finished chain.[0m


{'input': 'How many rows are there',
 'output': 'There are 891 rows in the dataframe.'}

In [None]:
agent.invoke("How many people have more than 23 age")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to filter the dataframe for rows where age is greater than 23 and then count the number of rows
Action: python_repl_ast
Action Input: len(df[df['age'] > 23])[0m[36;1m[1;3m468[0m[32;1m[1;3m468 people have more than 23 age
Final Answer: 468[0m

[1m> Finished chain.[0m


{'input': 'How many people have more than 23 age', 'output': '468'}

## **Multi DataFrame Example**

- Agent can also interact with Multiple DataFrames passed in a list

In [None]:
df1=df.copy() # making copy of orignal nata

In [None]:
df1["age"] = df1["age"].fillna(df1["age"].mean()) # impute missing values in df1 in age column

In [None]:
# Creating an agent that can interact with multiple Pandas DataFrames using a language model (llm)
# The agent is initialized with a list of DataFrames ([df, df1]) and configured with the following options:
# - verbose: If set to True, the agent will provide detailed output during its operation
# - allow_dangerous_code: If set to True, the agent is permitted to execute code that could be potentially unsafe

agent = create_pandas_dataframe_agent(llm, [df, df1], verbose=True, allow_dangerous_code=True)

In [None]:
agent.invoke("How many rows in the age column are different")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I should count the number of unique values in the age column
Action: python_repl_ast
Action Input: df1['age'].nunique()[0m[36;1m[1;3m88[0m[32;1m[1;3m88 unique values in the age column
Action: python_repl_ast
Action Input: df2['age'].nunique()[0m[36;1m[1;3m89[0m[32;1m[1;3m89 unique values in the age column
Action: python_repl_ast
Action Input: df1['age'].nunique() - df2['age'].nunique()[0m[36;1m[1;3m-1[0m[32;1m[1;3m88 - 89 = -1, so there is one more unique value in df2 compared to df1
Final Answer: There is 1 more unique value in the age column of df2 compared to df1.[0m

[1m> Finished chain.[0m


{'input': 'How many rows in the age column are different',
 'output': 'There is 1 more unique value in the age column of df2 compared to df1.'}