# Lesson 2: Interacting with a CSV Data

## Setup and connect to the Azure OpenAI endpoint

**Note**: The pre-configured cloud resource grants you access to the Azure OpenAI GPT model. The key and endpoint provided below are intended for teaching purposes only. Your notebook environment is already set up with the necessary keys, which may differ from those used by the instructor during the filming.

In [1]:
import os 
import pandas as pd
from langchain.llms import OpenAI
from IPython.display import Markdown, HTML, display
from langchain.schema import HumanMessage
from langchain_openai import AzureChatOpenAI

model = AzureChatOpenAI(
    openai_api_version="2024-05-01-preview",
    azure_deployment="gpt-35-turbo",
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key = os.getenv("AZURE_OPENAI_API_KEY"),
)

## Load the dataset

**Note**: To access the data locally, use the following code:

```
os.makedirs("data",exist_ok=True)
!wget https://covidtracking.com/data/download/all-states-history.csv -P ./data/
file_url = "./data/all-states-history.csv"
df = pd.read_csv(file_url).fillna(value = 0)
```

In [2]:
data_path = os.getenv("DATA_PATH")
csv_path = os.path.join(data_path, "all-states-history.csv")
df = pd.read_csv(csv_path).fillna(value = 0)

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21 entries, 0 to 20
Data columns (total 7 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   date          21 non-null     object 
 1   state         21 non-null     object 
 2   death         21 non-null     int64  
 3   hospitalized  21 non-null     float64
 4   negative      21 non-null     float64
 5   positive      21 non-null     int64  
 6   recovered     21 non-null     float64
dtypes: float64(3), int64(2), object(2)
memory usage: 1.3+ KB


## Prepare the Langchain dataframe agent

In [41]:
from langchain.agents.agent_types import AgentType
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent

agent = create_pandas_dataframe_agent(llm=model,df=df,verbose=True)

agent.invoke("how many rows are there?")



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: To find out the number of rows in the dataframe, I can use the `len()` function on the dataframe.

Action: python_repl_ast

Action Input: `len(df)`
[0m
Observation: [36;1m[1;3m21[0m
Thought:[32;1m[1;3mThere are 21 rows in the dataframe.
Final Answer: 21[0m

[1m> Finished chain.[0m


{'input': 'how many rows are there?', 'output': '21'}

## Design your prompt and ask your question

In [47]:
CSV_PROMPT_PREFIX = """
First set the pandas display options to show all the columns,
get the column names, then answer the question.
"""

CSV_PROMPT_SUFFIX = """
- **ALWAYS** before giving the Final Answer, try another method.
Then reflect on the answers of the two methods you did and ask yourself
if it answers correctly the original question.
If you are not sure, try another method.
- If the methods tried do not give the same result,reflect and
try again until you have two methods that have the same result.
- If you still cannot arrive to a consistent result, say that
you are not sure of the answer.
- If you are sure of the correct answer, create a beautiful
and thorough response using Markdown.
- **DO NOT MAKE UP AN ANSWER OR USE PRIOR KNOWLEDGE,
ONLY USE THE RESULTS OF THE CALCULATIONS YOU HAVE DONE**.
- **ALWAYS**, as part of your "Final Answer", explain how you got
to the answer on a section that starts with: "\n\nExplanation:\n".
In the explanation, mention the column names that you used to get
to the final answer.
"""

QUESTION = "Analyse the different states data and provide the 3 most surprising numbers/facts." 


agent.invoke(CSV_PROMPT_PREFIX + QUESTION + CSV_PROMPT_SUFFIX, handle_parsing_errors=True)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: To set the pandas display options to show all the columns, I need to use the `pd.set_option()` function. To get the column names, I can use the `df.columns` attribute. To analyze the different states data, I will need to examine the values in the 'death', 'hospitalized', 'negative', 'positive', and 'recovered' columns.

Action: pd.set_option()
Action Input: pd.set_option('display.max_columns', None)[0m
Observation: pd.set_option() is not a valid tool, try one of [python_repl_ast].
Thought:[32;1m[1;3mTo set the pandas display options, I can use the `pd.set_option()` function. 

Action: python_repl_ast
Action Input: pd.set_option('display.max_columns', None)[0m
Observation: [36;1m[1;3m[0m
Thought:[32;1m[1;3mI have set the pandas display options to show all the columns. Now I need to get the column names.

Action: df.columns
Action Input: df.columns[0m
Observation: df.columns is not a valid tool, try one of [p

{'input': '\nFirst set the pandas display options to show all the columns,\nget the column names, then answer the question.\nAnalyse the different states data and provide the 3 most surprising numbers/facts.\n- **ALWAYS** before giving the Final Answer, try another method.\nThen reflect on the answers of the two methods you did and ask yourself\nif it answers correctly the original question.\nIf you are not sure, try another method.\n- If the methods tried do not give the same result,reflect and\ntry again until you have two methods that have the same result.\n- If you still cannot arrive to a consistent result, say that\nyou are not sure of the answer.\n- If you are sure of the correct answer, create a beautiful\nand thorough response using Markdown.\n- **DO NOT MAKE UP AN ANSWER OR USE PRIOR KNOWLEDGE,\nONLY USE THE RESULTS OF THE CALCULATIONS YOU HAVE DONE**.\n- **ALWAYS**, as part of your "Final Answer", explain how you got\nto the answer on a section that starts with: "\n\nExplana