# Objective
- Allow user to explore the Game Of Thrones episodes data by asking questions in plain english language

# Solution Approach
We will use the below tools:
- Langchain - Framework for building LLM based applications
- Pandas - Powerful python library designed for data manipulation and analysis
- OpenAI LLM Model

In [1]:
from langchain_openai import ChatOpenAI
from langchain_experimental.agents import create_pandas_dataframe_agent
import pandas as pd

df = pd.read_csv("data/got_data.csv")
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
agent_executor = create_pandas_dataframe_agent(
    llm,
    df,
    agent_type="tool-calling",
    verbose=True,
    allow_dangerous_code=True
)

In [2]:
result = agent_executor.invoke({
    "input": "How many seasons and episodes are present?"
})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': "df[['Season', 'Episode Number']].max()"}`


[0m[36;1m[1;3mSeason             8
Episode Number    73
dtype: int64[0m[32;1m[1;3mThere are 8 seasons and 73 episodes present in the dataframe.[0m

[1m> Finished chain.[0m


## Observation
- As we can observe, the `pandas_dataframe_agent` is orchestraing the whole flow
- We can notice that the LLM analysed the user request and generated the query `df[['Season', 'Episode Number']].max()` which then is passed to the `python_repl_ast` tool
- The `python_repl_ast` tool executes the query and pass on the result
- The result is then read by the LLM and the final response is generated

In [8]:
# Manually execute the query to double check the results
df[['Season', 'Episode Number']].max()

Season             8
Episode Number    73
dtype: int64

## Hypothesis
* Anything that we can possible think of solving by writing code using pandas dataframe APIs, can now be automated using the `create_pandas_dataframe_agent`. Allowing the users to ask questions in plain english and get the descriptive answers
* Additionally, we can assume that the LLM will be able to handle the miss-spelling or other forms user queries without us needing to write complex rules to parse and process

## Experiment
Let us consider few end-user queries with varying complexity:
### Query: Tell me about the season1, episode 4

### Pandas Dataframe query

> df[df['Season'] == 1][df['Episode Number'] == 4]

### Expected Answer:

**Episode Title:** "Cripples, Bastards, and Broken Things"

**Director:** Brian Kirk

**Writer:** Bryan Cogman

**Original Air Date:** May 8, 2011

**US Viewers:** 2.45 million

**Runtime:** 56 minutes

**IMDB Rating:** 8.8

**Notable Death Count:** 1

**Plot Summary:**

Eddard investigates Jon Arryn's murder. Jon befriends Samwell Tarly, a coward who has come to join the Night's Watch.

### Observed Answer:

Season 1, Episode 4 is titled "Cripples, Bastards, and Broken Things". Here are some details about this episode:\n- Director: Brian Kirk\n- Writer: Bryan Cogman\n- Original Air Date: May 8, 2011\n- US viewers (million): 2.45\n- Runtime: 56 minutes\n- IMDB Description: Eddard investigates Jon Arryn\'s murder. Jon befriends Samwell Tarly, a coward who has come to join the Night\'s Watch.\n- IMDB Rating: 8.8\n- Notable Death Count: 1

In [9]:
result = agent_executor.invoke({
    "input": "Tell me about the season1, episode 4"
})
result



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': "df[df['Season'] == 1][df['Episode Number'] == 4]"}`


[0m[36;1m[1;3m   Season  Episode Number  Number in Season  \
3       1               4                 4   

                            Episode Name    Director        Writer  \
3  Cripples, Bastards, and Broken Things  Brian Kirk  Bryan Cogman   

  Original Air Date  US viewers (million)  Runtime (mins)  \
3       May 8, 2011                  2.45              56   

                                    IMDB Description  IMDB votes  Imdb Rating  \
3  Eddard investigates Jon Arryn's murder. Jon be...       19123          8.8   

   Notable Death Count  
3                    1  [0m



[32;1m[1;3mSeason 1, Episode 4 is titled "Cripples, Bastards, and Broken Things". Here are some details about this episode:
- Director: Brian Kirk
- Writer: Bryan Cogman
- Original Air Date: May 8, 2011
- US viewers (million): 2.45
- Runtime: 56 minutes
- IMDB Description: Eddard investigates Jon Arryn's murder. Jon befriends Samwell Tarly, a coward who has come to join the Night's Watch.
- IMDB Rating: 8.8
- Notable Death Count: 1[0m

[1m> Finished chain.[0m


{'input': 'Tell me about the season1, episode 4',
 'output': 'Season 1, Episode 4 is titled "Cripples, Bastards, and Broken Things". Here are some details about this episode:\n- Director: Brian Kirk\n- Writer: Bryan Cogman\n- Original Air Date: May 8, 2011\n- US viewers (million): 2.45\n- Runtime: 56 minutes\n- IMDB Description: Eddard investigates Jon Arryn\'s murder. Jon befriends Samwell Tarly, a coward who has come to join the Night\'s Watch.\n- IMDB Rating: 8.8\n- Notable Death Count: 1'}

## Query: Tell me about the episode 6
### Expected Output
- There are mutliple episodes with number 6 acorss different seasons, which season are you referring to?
### Observed Output


In [10]:
result = agent_executor.invoke({
    "input": "Tell me more about the episode 6"
})
result



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': 'df.iloc[5]'}`


[0m[36;1m[1;3mSeason                                                                  1
Episode Number                                                          6
Number in Season                                                        6
Episode Name                                               A Golden Crown
Director                                                   Daniel Minahan
Writer                  Story by : Jane Espenson and David Benioff & D...
Original Air Date                                            May 22, 2011
US viewers (million)                                                 2.44
Runtime (mins)                                                         53
IMDB Description        While recovering from his battle with Jaime, E...
IMDB votes                                                          19908
Imdb Rating                                   

{'input': 'Tell me more about the episode 6',
 'output': 'The details for episode 6 are as follows:\n- **Season:** 1\n- **Episode Number:** 6\n- **Number in Season:** 6\n- **Episode Name:** A Golden Crown\n- **Director:** Daniel Minahan\n- **Writer:** Story by Jane Espenson and David Benioff & D. B. Weiss\n- **Original Air Date:** May 22, 2011\n- **US viewers (million):** 2.44\n- **Runtime (mins):** 53\n- **IMDB Description:** While recovering from his battle with Jaime, Eddard is forced to run the kingdom while Robert goes hunting. Tyrion demands a trial by combat for his freedom. Viserys is losing his patience with Drogo.\n- **IMDB votes:** 19908\n- **IMDB Rating:** 9.2\n- **Notable Death Count:** 4'}

## Justificaiton
May be because we are using Chat model, it picked the context of season 1 from the previous conversation.
Let us start a fresh session and re-execute

In [3]:
# Fresh session
result = agent_executor.invoke({
    "input": "Tell me more about the episode 6"
})
result



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `python_repl_ast` with `{'query': 'df.iloc[5]'}`


[0m[36;1m[1;3mSeason                                                                  1
Episode Number                                                          6
Number in Season                                                        6
Episode Name                                               A Golden Crown
Director                                                   Daniel Minahan
Writer                  Story by : Jane Espenson and David Benioff & D...
Original Air Date                                            May 22, 2011
US viewers (million)                                                 2.44
Runtime (mins)                                                         53
IMDB Description        While recovering from his battle with Jaime, E...
IMDB votes                                                          19908
Imdb Rating                                   

{'input': 'Tell me more about the episode 6',
 'output': 'The details for episode 6 are as follows:\n- **Season:** 1\n- **Episode Number:** 6\n- **Number in Season:** 6\n- **Episode Name:** A Golden Crown\n- **Director:** Daniel Minahan\n- **Writer:** Story by Jane Espenson and David Benioff & D. B. Weiss\n- **Original Air Date:** May 22, 2011\n- **US viewers (million):** 2.44\n- **Runtime (mins):** 53\n- **IMDB Description:** While recovering from his battle with Jaime, Eddard is forced to run the kingdom while Robert goes hunting. Tyrion demands a trial by combat for his freedom. Viserys is losing his patience with Drogo.\n- **IMDB votes:** 19908\n- **IMDB Rating:** 9.2\n- **Notable Death Count:** 4'}

Let us enable the debugging to understand what prompts are being passed over to the LLM. 
Note: We can also use LangSmith for this purpose.

In [3]:
import langchain
langchain.debug = True

In [8]:
# Fresh session
result = agent_executor.invoke({
    "input": "Tell me more about the episode 6"
})
result

[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor] Entering Chain run with input:
[0m{
  "input": "Tell me more about the episode 6"
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence] Entering Chain run with input:
[0m{
  "input": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain:RunnableAssign<agent_scratchpad>] Entering Chain run with input:
[0m{
  "input": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain:RunnableAssign<agent_scratchpad> > chain:RunnableParallel<agent_scratchpad>] Entering Chain run with input:
[0m{
  "input": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain:RunnableAssign<agent_scratchpad> > chain:RunnableParallel<agent_scratchpad> > chain:RunnableLambda] Entering Chain run with input:
[0m{
  "input": ""
}
[36;1m[1;3m[chain/end][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain

{'input': 'Tell me more about the episode 6',
 'output': 'Episode 6 of Season 1 is titled "A Golden Crown". Here are some details about this episode:\n- Director: Daniel Minahan\n- Writers: Story by Jane Espenson and David Benioff & D. B. Weiss\n- Original Air Date: May 22, 2011\n- US viewers (million): 2.44\n- Runtime: 53 minutes\n- IMDB Description: While recovering from his battle with Jaime, Eddard is forced to run the kingdom while Robert goes hunting. Tyrion demands a trial by combat for his freedom. Viserys is losing his patience with Drogo.\n- IMDB Rating: 9.2\n- Notable Death Count: 4'}

## Observation on Debug Output
On viewing the full output in a text editor, we can see that the actual prompt being sent to LLM is as below:
```json
{
  "prompts": [
    "System: \nYou are working with a pandas dataframe in Python. The name of the dataframe is `df`.\nThis is the result of `print(df.head())`:\n|    |   Season |   Episode Number |   Number in Season | Episode Name                          | Director       | Writer                      | Original Air Date   |   US viewers (million) |   Runtime (mins) | IMDB Description                                                                                                                                                                                                                           |   IMDB votes |   Imdb Rating |   Notable Death Count |\n|---:|---------:|-----------------:|-------------------:|:--------------------------------------|:---------------|:----------------------------|:--------------------|-----------------------:|-----------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------:|--------------:|----------------------:|\n|  0 |        1 |                1 |                  1 | Winter Is Coming                      | Tim Van Patten | David Benioff & D. B. Weiss | April 17, 2011      |                   2.22 |               62 | Jon Arryn, the Hand of the King, is dead. King Robert Baratheon plans to ask his oldest friend, Eddard Stark, to take Jon's place. Across the sea, Viserys Targaryen plans to wed his sister to a nomadic warlord in exchange for an army. |        27685 |           9   |                     4 |\n|  1 |        1 |                2 |                  2 | The Kingsroad                         | Tim Van Patten | David Benioff & D. B. Weiss | April 24, 2011      |                   2.2  |               56 | While Bran recovers from his fall, Ned takes only his daughters to King's Landing. Jon Snow goes with his uncle Benjen to the Wall. Tyrion joins them.                                                                                     |        21256 |           8.8 |                     3 |\n|  2 |        1 |                3 |                  3 | Lord Snow                             | Brian Kirk     | David Benioff & D. B. Weiss | May 1, 2011         |                   2.44 |               58 | Lord Stark and his daughters arrive at King's Landing to discover the intrigues of the king's realm.                                                                                                                                       |        20090 |           8.7 |                     0 |\n|  3 |        1 |                4 |                  4 | Cripples, Bastards, and Broken Things | Brian Kirk     | Bryan Cogman                | May 8, 2011         |                   2.45 |               56 | Eddard investigates Jon Arryn's murder. Jon befriends Samwell Tarly, a coward who has come to join the Night's Watch.                                                                                                                      |        19123 |           8.8 |                     1 |\n|  4 |        1 |                5 |                  5 | The Wolf and the Lion                 | Brian Kirk     | David Benioff & D. B. Weiss | May 15, 2011        |                   2.58 |               55 | Catelyn has captured Tyrion and plans to bring him to her sister, Lysa Arryn, at the Vale, to be tried for his, supposed, crimes against Bran. Robert plans to have Daenerys killed, but Eddard refuses to be a part of it and quits.      |        20062 |           9.1 |                     5 |\nHuman: Tell me more about the episode 6"
  ]
}
```
So, LLM is getting to see only the headers and first 5 rows. And it is understanding the pattern of episode numbers and generting the query as `df.iloc[5]`. Therefore, LLM has no clue if the episode numbers are repeated across seasons, and hence it can not ask for a clarification quesiton

## Query: Tell me about the Game of thrones, winer is comig

### Expected Output

- It should automatically handle spell mistakes and understand that the query is referring to Game of thrones series episode, with title "Winter is coming" and give the details

### Observed Output

"Winter Is Coming" is the first episode of Season 1 of Game of Thrones. It was directed by Tim Van Patten and written by David Benioff and D. B. Weiss. The episode originally aired on April 17, 2011, with a viewership of 2.22 million. The episode has a runtime of 62 minutes and received an IMDb rating of 9.0 based on 27685 votes. In this episode, Jon Arryn, the Hand of the King, is dead, and King Robert Baratheon plans to ask his oldest friend, Eddard Stark, to take Jon\'s place. Across the sea, Viserys Targaryen plans to wed his sister to a nomadic warlord in exchange for an army. The notable death count in this episode is 4.

In [9]:
 # Fresh session
result = agent_executor.invoke({
    "input": "Tell me about the Game of thrones, winer is comig"
})
result

[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor] Entering Chain run with input:
[0m{
  "input": "Tell me about the Game of thrones, winer is comig"
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence] Entering Chain run with input:
[0m{
  "input": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain:RunnableAssign<agent_scratchpad>] Entering Chain run with input:
[0m{
  "input": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain:RunnableAssign<agent_scratchpad> > chain:RunnableParallel<agent_scratchpad>] Entering Chain run with input:
[0m{
  "input": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain:RunnableAssign<agent_scratchpad> > chain:RunnableParallel<agent_scratchpad> > chain:RunnableLambda] Entering Chain run with input:
[0m{
  "input": ""
}
[36;1m[1;3m[chain/end][0m [1m[chain:AgentExecutor > chain:Runnabl

{'input': 'Tell me about the Game of thrones, winer is comig',
 'output': '"Winter Is Coming" is the first episode of Season 1 of Game of Thrones. It was directed by Tim Van Patten and written by David Benioff and D. B. Weiss. The episode originally aired on April 17, 2011, with a viewership of 2.22 million. The episode has a runtime of 62 minutes and received an IMDb rating of 9.0 based on 27685 votes. In this episode, Jon Arryn, the Hand of the King, is dead, and King Robert Baratheon plans to ask his oldest friend, Eddard Stark, to take Jon\'s place. Across the sea, Viserys Targaryen plans to wed his sister to a nomadic warlord in exchange for an army. The notable death count in this episode is 4.'}

## Query: In which episode did Ned die?
### Expected Output
- This question requires a bit more reasoning from data as it can not be directly infered from the column names or first few rows.
- The LLM should automatically understand that this information is something it can infer from the episode description and write appropriate pandas dataframe queries to analyse the same and provide an answer.
- The correct answer is: Ned Stark died in the episode Baelor.
### Observed Output
- Ned Stark died in the episode "Baelor".
The query used to get the answer was `df[df['Notable Death Count'] > 0]['Episode Name']`
- This is a bit surprising to see. How can the LLM pick the right episode by just looking at the titles of the episodes which are having some specific death count??


In [10]:
# Fresh session
result = agent_executor.invoke({
    "input": "In which episode did Ned die?"
})
result

[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor] Entering Chain run with input:
[0m{
  "input": "In which episode did Ned die?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence] Entering Chain run with input:
[0m{
  "input": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain:RunnableAssign<agent_scratchpad>] Entering Chain run with input:
[0m{
  "input": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain:RunnableAssign<agent_scratchpad> > chain:RunnableParallel<agent_scratchpad>] Entering Chain run with input:
[0m{
  "input": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain:RunnableAssign<agent_scratchpad> > chain:RunnableParallel<agent_scratchpad> > chain:RunnableLambda] Entering Chain run with input:
[0m{
  "input": ""
}
[36;1m[1;3m[chain/end][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain:Ru

{'input': 'In which episode did Ned die?',
 'output': 'Ned Stark died in the episode "Baelor".'}

In [11]:
df[df['Notable Death Count'] > 0]['Episode Name']

0                          Winter Is Coming
1                             The Kingsroad
3     Cripples, Bastards, and Broken Things
4                     The Wolf and the Lion
5                            A Golden Crown
                      ...                  
67                               Winterfell
69                           The Long Night
70                   The Last of the Starks
71                                The Bells
72                          The Iron Throne
Name: Episode Name, Length: 66, dtype: object

## Justification
May be because the `GOT` series is old and Open AI models may have been trained with the GOT info, the answer may be just coming from the knowledge stored in the model weights.

## Hypothesis
If we just add a random character and his death information in one of the episodes, then the LLM must not be able to answer correctly. Unless it uses different queries to analyze the data.

Let us add the information about the fictual character with name "Rahul" and mention about his death in the episode 10 and then re-start the fresh and ask the question about Rahul's death

In [4]:
# Fresh session
result = agent_executor.invoke({
    "input": "In which episode did Rahul die?"
})
result

[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor] Entering Chain run with input:
[0m{
  "input": "In which episode did Rahul die?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence] Entering Chain run with input:
[0m{
  "input": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain:RunnableAssign<agent_scratchpad>] Entering Chain run with input:
[0m{
  "input": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain:RunnableAssign<agent_scratchpad> > chain:RunnableParallel<agent_scratchpad>] Entering Chain run with input:
[0m{
  "input": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain:RunnableAssign<agent_scratchpad> > chain:RunnableParallel<agent_scratchpad> > chain:RunnableLambda] Entering Chain run with input:
[0m{
  "input": ""
}
[36;1m[1;3m[chain/end][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain:

{'input': 'In which episode did Rahul die?',
 'output': 'Rahul\'s death occurred in multiple episodes throughout the series. Here are the episodes where notable deaths, including Rahul\'s, took place:\n\n- Season 1, Episode 1: "Winter Is Coming"\n- Season 1, Episode 2: "The Kingsroad"\n- Season 1, Episode 4: "Cripples, Bastards, and Broken Things"\n- Season 1, Episode 5: "The Wolf and the Lion"\n- Season 1, Episode 6: "A Golden Crown"\n- ... and more\n\nIf you need more specific information about Rahul\'s death in a particular episode, please let me know.'}

## Observation
What!!?? Rahul's death occurred in multiple episodes??
From where is it understanding this information??

The prompt used under the hood before the final answer is as below. And the query is `df[df['Notable Death Count'] > 0]`

```json
{
  "prompts": [
    "System: \nYou are working with a pandas dataframe in Python. The name of the dataframe is `df`.\nThis is the result of `print(df.head())`:\n|    |   Season |   Episode Number |   Number in Season | Episode Name                          | Director       | Writer                      | Original Air Date   |   US viewers (million) |   Runtime (mins) | IMDB Description                                                                                                                                                                                                                           |   IMDB votes |   Imdb Rating |   Notable Death Count |\n|---:|---------:|-----------------:|-------------------:|:--------------------------------------|:---------------|:----------------------------|:--------------------|-----------------------:|-----------------:|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------:|--------------:|----------------------:|\n|  0 |        1 |                1 |                  1 | Winter Is Coming                      | Tim Van Patten | David Benioff & D. B. Weiss | April 17, 2011      |                   2.22 |               62 | Jon Arryn, the Hand of the King, is dead. King Robert Baratheon plans to ask his oldest friend, Eddard Stark, to take Jon's place. Across the sea, Viserys Targaryen plans to wed his sister to a nomadic warlord in exchange for an army. |        27685 |           9   |                     4 |\n|  1 |        1 |                2 |                  2 | The Kingsroad                         | Tim Van Patten | David Benioff & D. B. Weiss | April 24, 2011      |                   2.2  |               56 | While Bran recovers from his fall, Ned takes only his daughters to King's Landing. Jon Snow goes with his uncle Benjen to the Wall. Tyrion joins them.                                                                                     |        21256 |           8.8 |                     3 |\n|  2 |        1 |                3 |                  3 | Lord Snow                             | Brian Kirk     | David Benioff & D. B. Weiss | May 1, 2011         |                   2.44 |               58 | Lord Stark and his daughters arrive at King's Landing to discover the intrigues of the king's realm.                                                                                                                                       |        20090 |           8.7 |                     0 |\n|  3 |        1 |                4 |                  4 | Cripples, Bastards, and Broken Things | Brian Kirk     | Bryan Cogman                | May 8, 2011         |                   2.45 |               56 | Eddard investigates Jon Arryn's murder. Jon befriends Samwell Tarly, a coward who has come to join the Night's Watch.                                                                                                                      |        19123 |           8.8 |                     1 |\n|  4 |        1 |                5 |                  5 | The Wolf and the Lion                 | Brian Kirk     | David Benioff & D. B. Weiss | May 15, 2011        |                   2.58 |               55 | Catelyn has captured Tyrion and plans to bring him to her sister, Lysa Arryn, at the Vale, to be tried for his, supposed, crimes against Bran. Robert plans to have Daenerys killed, but Eddard refuses to be a part of it and quits.      |        20062 |           9.1 |                     5 |\nHuman: In which episode did Rahul die?\nAI: To find out in which episode Rahul died, I will search the dataframe for the episode where the \"Notable Death Count\" column indicates Rahul's death. Let me check the dataframe for this information.\nTool:     Season  Episode Number  Number in Season  \\\n0        1               1                 1   \n1        1               2                 2   \n3        1               4                 4   \n4        1               5                 5   \n5        1               6                 6   \n..     ...             ...               ...   \n67       8              68                 1   \n69       8              70                 3   \n70       8              71                 4   \n71       8              72                 5   \n72       8              73                 6   \n\n                             Episode Name                     Director  \\\n0                        Winter Is Coming               Tim Van Patten   \n1                           The Kingsroad               Tim Van Patten   \n3   Cripples, Bastards, and Broken Things                   Brian Kirk   \n4                   The Wolf and the Lion                   Brian Kirk   \n5                          A Golden Crown               Daniel Minahan   \n..                                    ...                          ...   \n67                             Winterfell                 David Nutter   \n69                         The Long Night             Miguel Sapochnik   \n70                 The Last of the Starks                 David Nutter   \n71                              The Bells             Miguel Sapochnik   \n72                        The Iron Throne  David Benioff & D. B. Weiss   \n\n                                               Writer Original Air Date  \\\n0                         David Benioff & D. B. Weiss    April 17, 2011   \n1                         David Benioff & D. B. Weiss    April 24, 2011   \n3                                        Bryan Cogman       May 8, 2011   \n4                         David Benioff & D. B. Weiss      May 15, 2011   \n5   Story by : Jane Espenson and David Benioff & D...      May 22, 2011   \n..                                                ...               ...   \n67                                          Dave Hill    April 14, 2019   \n69                        David Benioff & D. B. Weiss    April 28, 2019   \n70                        David Benioff & D. B. Weiss       May 5, 2019   \n71                        David Benioff & D. B. Weiss      May 12, 2019   \n72                        David Benioff & D. B. Weiss      May 19, 2019   \n\n    US viewers (million)  Runtime (mins)  \\\n0                   2.22              62   \n1                   2.20              56   \n3                   2.45              56   \n4                   2.58              55   \n5                   2.44              53   \n..                   ...             ...   \n67                 11.76              54   \n69                 12.02              82   \n70                 11.80              78   \n71                 12.40              78   \n72                 13.61              80   \n\n                                     IMDB Description  IMDB votes  \\\n0   Jon Arryn, the Hand of the King, is dead. King...       27685   \n1   While Bran recovers from his fall, Ned takes o...       21256   \n3   Eddard investigates Jon Arryn's murder. Jon be...       19123   \n4   Catelyn has captured Tyrion and plans to bring...       20062   \n5   While recovering from his battle with Jaime, E...       19908   \n..                                                ...         ...   \n67  Jon and Daenerys arrive in Winterfell and are ...      120508   \n69  The Night King and his army have arrived at Wi...      198482   \n70  The Battle of Winterfell is over and a new cha...      151229   \n71  Forces have arrived at King's Landing for the ...      176140   \n72  In the aftermath of the devastating attack on ...      216879   \n\n    Imdb Rating  Notable Death Count  \n0           9.0                    4  \n1           8.8                    3  \n3           8.8                    1  \n4           9.1                    5  \n5           9.2                    4  \n..          ...                  ...  \n67          7.6                    1  \n69          7.5                    7  \n70          5.5                    2  \n71          6.0                    7  \n72          4.1                    1  \n\n[66 rows x 13 columns]"
  ]
}
```

## Observation
The LLM is getting a larger data in its context and since the `python_repl_ast` tool is trimming data and passing only the summary data. And the LLM has started hallucinating

## Query: How many views did the GOT s2e5 get?

## Expected Output
Season 2 and episode 5 has got 16846 views

## Observed Output

"It seems there was an error because there is no data for Season 2, Episode 5 in the provided dataframe."
- The query used is `df[df['Season'] == 2][['Episode Number', 'Episode Name']]`


In [4]:
# Fresh session
result = agent_executor.invoke({
    "input": "How many views did the GOT s2e5 get?"
})
result

[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor] Entering Chain run with input:
[0m{
  "input": "How many views did the GOT s2e5 get?"
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence] Entering Chain run with input:
[0m{
  "input": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain:RunnableAssign<agent_scratchpad>] Entering Chain run with input:
[0m{
  "input": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain:RunnableAssign<agent_scratchpad> > chain:RunnableParallel<agent_scratchpad>] Entering Chain run with input:
[0m{
  "input": ""
}
[32;1m[1;3m[chain/start][0m [1m[chain:AgentExecutor > chain:RunnableSequence > chain:RunnableAssign<agent_scratchpad> > chain:RunnableParallel<agent_scratchpad> > chain:RunnableLambda] Entering Chain run with input:
[0m{
  "input": ""
}
[36;1m[1;3m[chain/end][0m [1m[chain:AgentExecutor > chain:RunnableSequence > c



[36;1m[1;3m[llm/end][0m [1m[chain:AgentExecutor > chain:RunnableSequence > llm:ChatOpenAI] [1.42s] Exiting LLM run with output:
[0m{
  "generations": [
    [
      {
        "text": "It seems there was an error because there is no data for Season 2, Episode 5 in the provided dataframe. Let me check the available episodes in Season 2 first.",
        "generation_info": {
          "finish_reason": "tool_calls",
          "model_name": "gpt-3.5-turbo-0125"
        },
        "type": "ChatGenerationChunk",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessageChunk"
          ],
          "kwargs": {
            "content": "It seems there was an error because there is no data for Season 2, Episode 5 in the provided dataframe. Let me check the available episodes in Season 2 first.",
            "additional_kwargs": {
              "tool_calls": [
          

{'input': 'How many views did the GOT s2e5 get?',
 'output': 'Season 2 of Game of Thrones has episodes numbered from 11 to 20. There is no data for Season 2, Episode 5 in the provided dataframe. Let me know if you would like information on any other episode.'}

## Justification
Looks like LLM was not able to infer the heading `Number in Season` as the Episode number as well, instead it was just relying on the `Episode Number` and its possible values.

In [6]:
df[df['Season'] == 2][['Episode Number', 'Episode Name']]

Unnamed: 0,Episode Number,Episode Name
10,11,The North Remembers
11,12,The Night Lands
12,13,What Is Dead May Never Die
13,14,Garden of Bones
14,15,The Ghost of Harrenhal
15,16,The Old Gods and the New
16,17,A Man Without Honor
17,18,The Prince of Winterfell
18,19,Blackwater
19,20,Valar Morghulis


## Tokens And Cost
In this work
* Total Tokens: 25,799 tokens
  * Input Tokens: 24,212 tokens
  * Output Tokens: 1,587 tokens
* Number of Open AI requests: 28
* Total Cost: 0.02 dollars
