In [1]:
import pandas as pd
from IPython.display import Markdown, display
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
from langchain.agents import load_tools, initialize_agent
import os

In [2]:
from langchain.agents import AgentExecutor, Tool, ZeroShotAgent
from langchain.chains import LLMChain
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferMemory
from langchain.utilities import GoogleSearchAPIWrapper

In [3]:
os.environ["OPENAI_API_KEY"] = open("OPENAI_API_KEY",'r').read()

In [4]:
data = pd.read_csv('titanic.csv')

In [5]:
data.head()

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.925,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.05,,S


In [6]:
data.shape

(891, 12)

In [11]:
memory = ConversationBufferMemory(memory_key="chat_history")

In [18]:
chayGPTAgent = create_pandas_dataframe_agent(
    ChatOpenAI(model_name='gpt-4'),
    df=data,
    verbose=True,
    agent_executor_kwargs={"memory": memory}

)

In [7]:
Markdown(chayGPTAgent.agent.llm_chain.prompt.template)   


You are working with a pandas dataframe in Python. The name of the dataframe is `df`.
You should use the tools below to answer the question posed of you:

python_repl_ast: A Python shell. Use this to execute python commands. Input should be a valid python command. When using this tool, sometimes output is abbreviated - make sure it does not look abbreviated before using it in your answer.

Use the following format:

Question: the input question you must answer
Thought: you should always think about what to do
Action: the action to take, should be one of [python_repl_ast]
Action Input: the input to the action
Observation: the result of the action
... (this Thought/Action/Action Input/Observation can repeat N times)
Thought: I now know the final answer
Final Answer: the final answer to the original input question


This is the result of `print(df.head())`:
{df_head}

Begin!
Question: {input}
{agent_scratchpad}

In [19]:
chayGPTAgent.run('How many rows does this dataset contain ?')



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: To find the number of rows in a dataframe, we can use the shape attribute, which returns a tuple representing the dimensionality of the DataFrame. The first element of the tuple is the number of rows.
Action: python_repl_ast
Action Input: df.shape[0][0m
Observation: [36;1m[1;3m891[0m
Thought:[32;1m[1;3mThe dataset contains 891 rows.
Final Answer: 891[0m

[1m> Finished chain.[0m


'891'

In [20]:
prompt = ('Qui sont les passagers les plus vieux ? ')
chayGPTAgent.run(prompt)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: To identify the oldest passengers, I need to sort the dataframe by the 'Age' column in descending order.
Action: python_repl_ast
Action Input: df.sort_values('Age', ascending=False).head()[0m
Observation: [36;1m[1;3m     PassengerId  Survived  Pclass                                  Name  \
630          631         1       1  Barkworth, Mr. Algernon Henry Wilson   
851          852         0       3                   Svensson, Mr. Johan   
493          494         0       1               Artagaveytia, Mr. Ramon   
96            97         0       1             Goldschmidt, Mr. George B   
116          117         0       3                  Connors, Mr. Patrick   

      Sex   Age  SibSp  Parch    Ticket     Fare Cabin Embarked  
630  male  80.0      0      0     27042  30.0000   A23        S  
851  male  74.0      0      0    347060   7.7750   NaN        S  
493  male  71.0      0      0  PC 17609  49.5042   NaN  

'Les passagers les plus âgés sont Barkworth, Mr. Algernon Henry Wilson âgé de 80 ans, Svensson, Mr. Johan âgé de 74 ans, Artagaveytia, Mr. Ramon et Goldschmidt, Mr. George B tous deux âgés de 71 ans, et Connors, Mr. Patrick âgé de 70,5 ans.'

In [21]:
prompt = ('on-t-il survécu ? ')
chayGPTAgent.run(prompt)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: The question is in French and it translates to "Did they survive?". It seems like the question is asking for the survival status of the passengers in the dataframe. However, the question is too vague as it doesn't specify which passengers it's referring to. We can check the unique values in the 'Survived' column to understand the data better. 

Action: python_repl_ast
Action Input: df['Survived'].unique()[0m
Observation: [36;1m[1;3m[0 1][0m
Thought:[32;1m[1;3mThe 'Survived' column contains two unique values: 0 and 1. This suggests that the data is binary, where 1 likely represents survival and 0 represents non-survival. However, without specific passengers to refer to, we cannot provide a more detailed answer.
Final Answer: The 'Survived' column indicates whether a passenger survived or not, with 1 for survival and 0 for non-survival. Without specific passenger information, we cannot provide details on individu

"The 'Survived' column indicates whether a passenger survived or not, with 1 for survival and 0 for non-survival. Without specific passenger information, we cannot provide details on individual survival."