# Agent

For complex questions it can be helpful for an LLM to be able to iteratively execute code while maintaining the inputs and outputs of its previous executions. This is where Agents come into play. They allow an LLM to decide how many times a tool needs to be invoked and keep track of the executions it's made so far. The create_pandas_dataframe_agent is a built-in agent that makes it easy to work with dataframes:

In [1]:
import os
from dotenv import load_dotenv
load_dotenv()
os.environ["LANGCHAIN_API_KEY"]=os.environ.get('LANGCHAIN_API_KEY')
os.environ["LANGCHAIN_TRACING_V2"]="true"
os.environ["LANGCHAIN_PROJECT"]="Q&A_over_CSV"

In [2]:
import pandas as pd

df = pd.read_csv("titanic.csv")
print(df.shape)
print(df.columns.tolist())

(891, 12)
['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked']


In [3]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

# Pandas

Instead of SQL we can also use data analysis libraries like pandas and the code generating abilities of LLMs to interact with CSV data. Again, this approach is not fit for production use cases unless you have extensive safeguards in place. For this reason, our code-execution utilities and constructors live in the langchain-experimental package.



# Agent

In [4]:
from langchain_experimental.agents import create_pandas_dataframe_agent

agent = create_pandas_dataframe_agent(llm, df, agent_type="openai-tools", verbose=True)
agent.invoke(
    {
        "input": "What's the correlation between age and fare? is that greater than the correlation between fare and survival?"
    }
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': "df['Age'].corr(df['Fare'])"}`


[0m[36;1m[1;3m0.09606669176903887[0m[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': "df['Fare'].corr(df['Survived'])"}`


[0m[36;1m[1;3m0.2573065223849625[0m[32;1m[1;3mThe correlation between age and fare is approximately 0.096, while the correlation between fare and survival is approximately 0.257. Therefore, the correlation between fare and survival is greater than the correlation between age and fare.[0m

[1m> Finished chain.[0m


{'input': "What's the correlation between age and fare? is that greater than the correlation between fare and survival?",
 'output': 'The correlation between age and fare is approximately 0.096, while the correlation between fare and survival is approximately 0.257. Therefore, the correlation between fare and survival is greater than the correlation between age and fare.'}

# Printing only the final output

In [10]:
def print_final_generation(agent_input):
    # Extract the final answer from the output dictionary
    agent_output = agent.invoke({"input": agent_input})
    print(agent_output.get('output', ''))


In [11]:
while True:
    user_input = input("User: ")
    if user_input.lower() in ["quit", "exit", "q"]:
        print("Goodbye!")
        break
    print_final_generation(user_input)




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': "df['Age'].corr(df['Fare'])"}`


[0m[36;1m[1;3m0.09606669176903887[0m[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': "df['Fare'].corr(df['Survived'])"}`


[0m[36;1m[1;3m0.2573065223849625[0m[32;1m[1;3mThe correlation between age and fare is approximately \(0.096\), while the correlation between fare and survival is approximately \(0.257\). Therefore, the correlation between fare and survival is greater than the correlation between age and fare.[0m

[1m> Finished chain.[0m
The correlation between age and fare is approximately \(0.096\), while the correlation between fare and survival is approximately \(0.257\). Therefore, the correlation between fare and survival is greater than the correlation between age and fare.
Goodbye!
