# Lesson 1: Basic LLM invocation

In [22]:
import os
import pandas as pd
from IPython.display import display, HTML, Markdown

from dotenv import load_dotenv
load_dotenv("../.env")


True

In [None]:
from langchain.schema import HumanMessage
from langchain_google_genai import ChatGoogleGenerativeAI

model = ChatGoogleGenerativeAI(model="gemini-1.5-flash")

In [27]:
prompt = HumanMessage(
    content="Translate this sentence from English "
    "to Bengali and Spanish. I like red cars and "
    "blue houses, but my dog is yellow.'"
)

In [28]:
model_response = model.invoke([prompt])
print(model_response.content)

Here are the translations:

**Bengali:** আমি লাল গাড়ি এবং নীল বাড়ি পছন্দ করি, কিন্তু আমার কুকুর হলুদ। (Ami lal gaṛi ebong nil bari pochondo kori, kintu amar kukur holud.)

**Spanish:** Me gustan los coches rojos y las casas azules, pero mi perro es amarillo.



# Lesson 2: CSV data interaction

In [35]:
df = pd.read_csv("../data/all-states-history.csv").fillna(0)
# df.head()

In [42]:
from langchain.agents.agent_types import AgentType
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
agent = create_pandas_dataframe_agent(
    llm=model,
    df=df,
    verbose=True,
)

agent_response = agent.invoke("how many rows are there?")
agent_response



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I can use the `.shape` attribute of the pandas DataFrame to get the number of rows and columns.  The number of rows will be the first element of the tuple returned by `.shape`.

Action: python_repl_ast
Action Input: 
```python
print(df.shape[0])
```
[0m
Observation: [36;1m[1;3m20780
[0m
Thought:[32;1m[1;3mI now know the final answer.
Final Answer: 20780
[0m

[1m> Finished chain.[0m


{'input': 'how many rows are there?', 'output': '20780'}

In [44]:
CSV_PROMPT_PREFIX = """
First set the pandas display options to show all the columns,
get the column names, then answer the question.
"""

CSV_PROMPT_SUFFIX = """
- **ALWAYS** before giving the Final Answer, try another method.
Then reflect on the answers of the two methods you did and ask yourself
if it answers correctly the original question.
If you are not sure, try another method.
- If the methods tried do not give the same result,reflect and
try again until you have two methods that have the same result.
- If you still cannot arrive to a consistent result, say that
you are not sure of the answer.
- If you are sure of the correct answer, create a beautiful
and thorough response using Markdown.
- **DO NOT MAKE UP AN ANSWER OR USE PRIOR KNOWLEDGE,
ONLY USE THE RESULTS OF THE CALCULATIONS YOU HAVE DONE**.
- **ALWAYS**, as part of your "Final Answer", explain how you got
to the answer on a section that starts with: "\n\nExplanation:\n".
In the explanation, mention the column names that you used to get
to the final answer.
"""

QUESTION = "How may patients were hospitalized during July 2021" 
"in California, and nationwide as the total of all states?"
"Use the hospitalizedIncrease column" 

response = agent.invoke(CSV_PROMPT_PREFIX + QUESTION + CSV_PROMPT_SUFFIX)
display(HTML("<h2>Response</h2>"))
display(Markdown(response["output"]))



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to set pandas display options to show all columns, get the column names, and then find a way to calculate the number of hospitalized patients in July 2021.  I'll try two different approaches to verify the result.

Action: python_repl_ast
Action Input:
```python
import pandas as pd
pd.set_option('display.max_columns', None)
print(df.columns)
```
[0m
Observation: [36;1m[1;3mIndex(['date', 'state', 'death', 'deathConfirmed', 'deathIncrease',
       'deathProbable', 'hospitalized', 'hospitalizedCumulative',
       'hospitalizedCurrently', 'hospitalizedIncrease', 'inIcuCumulative',
       'inIcuCurrently', 'negative', 'negativeIncrease',
       'negativeTestsAntibody', 'negativeTestsPeopleAntibody',
       'negativeTestsViral', 'onVentilatorCumulative', 'onVentilatorCurrently',
       'positive', 'positiveCasesViral', 'positiveIncrease', 'positiveScore',
       'positiveTestsAntibody', 'positiveTestsAntigen',
  

5535849

Explanation:
Two methods were used to calculate the total number of hospitalized patients in July 2021.

Method 1:  This method first filtered the DataFrame `df` to include only rows where the 'date' column fell within the range of July 1st, 2021 to July 31st, 2021.  Then, it summed the values in the 'hospitalized' column of the filtered DataFrame.

Method 2: This method used the pandas `dt` accessor to extract the month from the 'date' column. It then filtered the DataFrame to include only rows where the month was 7 (July) and summed the values in the 'hospitalized' column.

Both methods produced the same result: 5535849.  The 'hospitalized' column represents the number of hospitalized patients on a given date.  Therefore, summing this column across all entries in July 2021 gives the total number of hospitalized patients during that month.