<a href="https://colab.research.google.com/github/forgivefarouk/RAG_with_agent/blob/main/02-%20Agent_Powered_Barbenheimer_Application.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Questioning Barbie and Oppenheimer Through the Use of Agents

In the following notebook we will build an application that queries both the Barbie and Oppenheimer movies Wikipedia pages, as well as their reviews.

The main focus of this notebook is to showcase a brief introduction to Agents.

## Build 🏗️

There are 3 main tasks in this notebook:

1. Contruct a Barbie retriever
2. Construct an Oppenheimer retriever
3. Combine the two and allow users to query both resources from a single input through the use of Agents


In [1]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


### Dependencies

As always, let's start with some dependencies!

In [2]:
!pip install -q -U langchain openai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m806.2/806.2 kB[0m [31m13.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m227.4/227.4 kB[0m [31m23.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m58.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m252.6/252.6 kB[0m [31m24.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.9/64.9 kB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m9.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.4/49.4 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.8/77.8 kB[0m [31m10.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━

In [3]:
import getpass
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('openai_key')

### LLM

We will be leveraging OpenAI's `gpt-3.5-turbo` throughout the notebook, and we can keep it consistent throughout!

In [4]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature = 0)

  warn_deprecated(


### Data Collection and Transformation

We'll be leveraging the `WikipediaLoader` tool to collect information from Wikipedia.

Be sure to set the `doc_content_chars_max` parameter so that you capture the *entire* Wikipedia article content.

In [5]:
!pip install -q -U wikipedia

  Preparing metadata (setup.py) ... [?25l[?25hdone
  Building wheel for wikipedia (setup.py) ... [?25l[?25hdone


In [6]:
from langchain.document_loaders import WikipediaLoader, CSVLoader

barbie_wikipedia_docs = WikipediaLoader(
    query="Barbie (film)",
    load_max_docs= 1,
    doc_content_chars_max=1_000_000
    ).load()

barbie_csv_docs = CSVLoader(
    file_path='/content/drive/MyDrive/thotron/RAG/Week 1/Tuesday/barbie.csv',
    source_column='Review_Url',
    ).load()

Since we'll be using same format source documentation separated by topic, we can save ourselves some extra effort and set up our splitters once.

We're going to leverage the `RecursiveCharacterTextSplitter` again, this time paying close attention to the format our Wikipedia articles and reviews are in so we can be sure to chunk them appropritately.

> HINT: You can pass a list of separators when you intialize your `RecursiveTextSplitter`! They are acted on in order of element 0 -> element len(list).

RELEVANT DOCS:
- [`RecursiveCharacterTextSplitter`](https://api.python.langchain.com/en/latest/text_splitter/langchain.text_splitter.RecursiveCharacterTextSplitter.html#langchain.text_splitter.RecursiveCharacterTextSplitter)

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

wikipedia_text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 500,
    chunk_overlap = 0,
    length_function = len,
    is_separator_regex= False,
    separators = ["\n==","\n"," "]            # keep headings, then paragraphs, then sentences
)

csv_text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 50,
    length_function = len,
    is_separator_regex= False,
    separators = ["\n"," "] # keep paragraphs, then sentences
)

chunked_barbie_wikipedia_docs = wikipedia_text_splitter.transform_documents(barbie_wikipedia_docs)
chunked_barbie_csv_docs = csv_text_splitter.transform_documents(barbie_csv_docs)

#### Retrieval and Embedding Strategy

We've already discussed the useful application of `CacheBackedEmbeddings`, so let's do it again!

RELEVANT DOCS:
- [`CacheBackedEmbeddings`](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.cache.CacheBackedEmbeddings.html#langchain-embeddings-cache-cachebackedembeddings)

In [8]:
!pip install -q -U rank_bm25 tiktoken faiss-cpu

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m15.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.0/27.0 MB[0m [31m35.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [9]:
from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import LocalFileStore

# set up cached embeddings store
store = LocalFileStore('./store/')

core_embeddings_model = OpenAIEmbeddings()

embedder = CacheBackedEmbeddings.from_bytes_store(
    core_embeddings_model , store ,namespace=core_embeddings_model.model
)

  warn_deprecated(


We'll implement a `FAISS` vectorstore, and create a retriever from it.

In [10]:
barbie_csv_faiss_retriever = FAISS.from_documents(chunked_barbie_csv_docs,embedder).as_retriever()


There are a number of excellent options to retrieve documents - we'll be looking at an additional example today, which is called the `EnsembleRetriever`.

The method this is using is outlined in [this paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf).

The brief explanation is:

1. We collect results from two different retrieval methods over the same corpus
2. We apply a reranking algorithm to rerank our source documents to be the *most relevant* without losing specific or potentially low-ranked information rich documents
3. We feed the top-k results into the LLM with our query as context.

> HINT: Your weight list should be of type `List[float]` and the `sum(List[float])` should be `1`.

In [11]:
# set up BM25 retriever
barbie_wikipedia_bm25_retriever = BM25Retriever.from_documents(
    chunked_barbie_wikipedia_docs,
)
barbie_wikipedia_bm25_retriever.k = 1

# set up FAISS vector store
barbie_wikipedia_faiss_store = FAISS.from_documents(
    chunked_barbie_wikipedia_docs,
    embedder
)
barbie_wikipedia_faiss_retriever = barbie_wikipedia_faiss_store.as_retriever(search_kwargs={"k": 1})

# set up ensemble retriever
barbie_ensemble_retriever = EnsembleRetriever(
    retrievers=[barbie_wikipedia_bm25_retriever,barbie_wikipedia_faiss_retriever],
    weights= [0.25,0.75]
)

#### Retrieval Agent

We can create a simple conversational retrieval Agent by using the built-ins provided by LangChain!

> HINT: Be sure to provide good natural language descriptions of what the tool should be used for to get the best results.

RELEVANT DOCS:
- [`create_retriever_tool`](https://api.python.langchain.com/en/latest/agents/langchain.agents.agent_toolkits.conversational_retrieval.tool.create_retriever_tool.html#langchain.agents.agent_toolkits.conversational_retrieval.tool.create_retriever_tool)

In [12]:
from langchain.agents.agent_toolkits import create_retriever_tool

barbie_wikipedia_retrieval_tool = create_retriever_tool(
    barbie_ensemble_retriever,
    "wikipedia",
    "Searches and returns documents regarding the plot, history, and cast of the Barbie movie"
)

barbie_csv_retrieval_tool = create_retriever_tool(
    barbie_csv_faiss_retriever,
    "PublicReviews",
    "Searches and returns documents regarding public reviews of the Barbie movie"
)

barbie_retriever_tools = [barbie_wikipedia_retrieval_tool,barbie_csv_retrieval_tool]

Now that we've created our tools, we can combined them into an agent!

RELEVANT DOCS:
- [`create_conversational_retrieval_agent`](https://api.python.langchain.com/en/latest/agents/langchain.agents.agent_toolkits.conversational_retrieval.openai_functions.create_conversational_retrieval_agent.html#langchain.agents.agent_toolkits.conversational_retrieval.openai_functions.create_conversational_retrieval_agent)

In [13]:
from langchain.agents.agent_toolkits import create_conversational_retrieval_agent

barbie_retriever_agent_executor = create_conversational_retrieval_agent(llm,barbie_retriever_tools,verbose=True)

In [14]:
barbie_retriever_agent_executor({"input" : "Did people like Barbie, or did they find it too Philosphical? If they did, can you tell me why the movie is so Philosophical?"})

  warn_deprecated(




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `PublicReviews` with `{'query': 'Barbie movie reviews'}`


[0m[33;1m[1;3mReview_Date: 29 July 2023
Author: planktonrules
Rating: 7
Review_Title: Good...not great
Review: "Barbie" is a movie that is setting all sorts of records, so it's obvious that the film is touching many people. For some, it's the fun of seeing a doll come to life, for others it's the theme of empowerment that make the movie a big hit. As for me, I did like it and thought it was a pretty good movie...but it's not for everyone.
Review_Url: /review/rw9243378/?ref_=tt_urv

Review_Date: 7 August 2023
Author: bradykitchen-33648
Rating: 6
Review_Title: Barbie: Review
Review: Good movie not a great one.
Review_Url: /review/rw9243378/?ref_=tt_urv

Review_Date: 19 July 2023
Author: mieriks
Rating: 7
Review_Title: Fantastic attention to details
Review: This movie, about Barbie who suffers a crisis that leads her to question her world and her existence,

{'input': 'Did people like Barbie, or did they find it too Philosphical? If they did, can you tell me why the movie is so Philosophical?',
 'chat_history': [HumanMessage(content='Did people like Barbie, or did they find it too Philosphical? If they did, can you tell me why the movie is so Philosophical?'),
  AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"query":"Barbie movie reviews"}', 'name': 'PublicReviews'}}),
  FunctionMessage(content='Review_Date: 29 July 2023\nAuthor: planktonrules\nRating: 7\nReview_Title: Good...not great\nReview: "Barbie" is a movie that is setting all sorts of records, so it\'s obvious that the film is touching many people. For some, it\'s the fun of seeing a doll come to life, for others it\'s the theme of empowerment that make the movie a big hit. As for me, I did like it and thought it was a pretty good movie...but it\'s not for everyone.\nReview_Url: /review/rw9243378/?ref_=tt_urv\n\nReview_Date: 7 August 2023\nAuthor: bradyki

In [15]:
barbie_retriever_agent_executor({"input" : "What is a very quick summary of the plot of the Barbie movie?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `wikipedia` with `{'query': 'Barbie movie plot'}`


[0m[36;1m[1;3mBarbie in a toy box for remanufacturing, but she escapes with Gloria and Sasha's help, and the three travel to Barbieland with Mattel executives in pursuit.

song, director Greta Gerwig said that she was a fan of Matchbox Twenty and "I never put anything in a movie I don't love." "Spice Up Your Life" by Spice Girls plays in a flashback scene showing a little girl's rough play creating "Weird Barbie."[0m[32;1m[1;3mThe quick summary of the Barbie movie plot is that Barbie is in a toy box for remanufacturing but escapes with the help of Gloria and Sasha. They travel to Barbieland while being pursued by Mattel executives.[0m

[1m> Finished chain.[0m


{'input': 'What is a very quick summary of the plot of the Barbie movie?',
 'chat_history': [HumanMessage(content='Did people like Barbie, or did they find it too Philosphical? If they did, can you tell me why the movie is so Philosophical?'),
  AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"query":"Barbie movie reviews"}', 'name': 'PublicReviews'}}),
  FunctionMessage(content='Review_Date: 29 July 2023\nAuthor: planktonrules\nRating: 7\nReview_Title: Good...not great\nReview: "Barbie" is a movie that is setting all sorts of records, so it\'s obvious that the film is touching many people. For some, it\'s the fun of seeing a doll come to life, for others it\'s the theme of empowerment that make the movie a big hit. As for me, I did like it and thought it was a pretty good movie...but it\'s not for everyone.\nReview_Url: /review/rw9243378/?ref_=tt_urv\n\nReview_Date: 7 August 2023\nAuthor: bradykitchen-33648\nRating: 6\nReview_Title: Barbie: Review\nReview: Go

### Oppenheimer Retrieval System

We're going to repourpose some of what we created previously, but this time we'll explore a different multi-source retrieval system.

In [17]:
oppenheimer_wikipedia_docs = WikipediaLoader(
    query="oppenheimer (movie)",
    load_max_docs=1,
    doc_content_chars_max=1_000_000,
).load()

oppenheimer_csv_docs = CSVLoader(
    file_path="/content/drive/MyDrive/thotron/RAG/Week 1/Thursday/oppenheimer.csv",
    source_column="Review_Url"
).load()

In [19]:
opp_wikipedia_text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=0,
    length_function = len,
    is_separator_regex= False,
    separators=["\n==","\n"," "]
)
opp_csv_text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=50,
    length_function = len,
    is_separator_regex= False,
    separators=["\n"," "]
)

chunked_opp_wikipedia_docs = opp_wikipedia_text_splitter.transform_documents(oppenheimer_wikipedia_docs)
chunked_opp_csv_docs = opp_csv_text_splitter.transform_documents(oppenheimer_csv_docs)

In [21]:
opp_csv_faiss_retriever = FAISS.from_documents(chunked_opp_csv_docs,embedder).as_retriever()

# set up BM25 retriever
opp_wikipedia_bm25_retriever = BM25Retriever.from_documents(
    chunked_opp_wikipedia_docs
)
opp_wikipedia_bm25_retriever.k = 1

# set up FAISS vector store
opp_wikipedia_faiss_store = FAISS.from_documents(chunked_opp_wikipedia_docs,embedder)
opp_wikipedia_faiss_retriever = opp_wikipedia_faiss_store.as_retriever(search_kwargs={"k": 1})

# set up ensemble retriever
opp_ensemble_retriever = EnsembleRetriever(
    retrievers=[opp_wikipedia_bm25_retriever,opp_wikipedia_faiss_retriever],
    weights=[0.25,0.75]
)

#### Multi-source chain

We're going to allow the LLM to decide which information is most -> least valuable.

The way we'll do this is with LangChain's rather powerful "Expression Language"!

> HINT: You can leverage [this](https://python.langchain.com/docs/use_cases/question_answering/how_to/multiple_retrieval) resource if you get stuck - but experiment with different prompts/formats.

In [22]:
from langchain.prompts import ChatPromptTemplate

system_message = """Use the information from the below two sources to answer any questions.

Source 1: public user reviews about the Oppenheimer movie
<source1>
{source1}
</source1>

Source 2: the wikipedia page for the Oppenheimer movie including the plot summary, cast, and production information
<source2>
{source2}
</source2>
"""

prompt = ChatPromptTemplate.from_messages([("system", system_message), ("human", "{question}")])

In [23]:
oppenheimer_multisource_chain = {
    "source1": (lambda x: x["question"]) | opp_ensemble_retriever,
    "source2": (lambda x: x["question"]) | opp_csv_faiss_retriever,
    "question": lambda x: x["question"],
} | prompt | llm

In [24]:
oppenheimer_multisource_chain.invoke({"question" : "What did people think of the Oppenheimer movie?"})

AIMessage(content="People generally had positive opinions about the Oppenheimer movie. Critics praised the film for its direction, cast performances (especially from Cillian Murphy, Emily Blunt, and Robert Downey Jr.), visuals, screenplay, score, and acting. The movie received critical acclaim and was ranked as one of Christopher Nolan's best films. It also received high ratings on review aggregator websites like Rotten Tomatoes. Audience members found the film compelling, engaging, and visually stunning, with some praising the actors' performances. However, there were some criticisms about historical accuracy and emotional depth in the portrayal of the main character, J. Robert Oppenheimer.")

# Agent Creation

Now we can finally start building our Agent!

The first thing we'll need to do is provide our Agent a Toolbelt. (list of tools). Much like Batman, our LLM-powered Agent can use these tools as it sees fit.

While the examples we're constructing in this notebook are straightforward for brevity and simplicities sake - there is no limit to what you can build with Agents, as we'll see as we progress through the program.

So, let's begin by setting up our Tools!

You'll notice that we have to set up a function to allow our `OppenheimerInfo` tool to interface with the Agent - this is due to it have a specific required input. Creating custom tools is a pattern that you'll want to grow acustomed to as you use LangChain more and more.

In [25]:
from langchain.agents import Tool

def query_oppenheimer(input):
    return oppenheimer_multisource_chain.invoke({"question" : input})

tools = [
    Tool(
        name = "BarbieInfo",
        func=barbie_retriever_agent_executor.invoke,
        description="useful for when you need to answer questions about Barbie. Input should be a fully formed question."
    ),
    Tool(
        name = "OppenheimerInfo",
        func=query_oppenheimer,
        description="useful for when you need to answer questions about Oppenheimer. Input should be a fully formed question."
    ),
]

Now that we've set up our Agents toolbelt, let's set up the LLM that will be powering it!

I would suggest playing around with these prompts - and experiments to find what works best for you.

RELEVANT DOCS:
- [`ZeroShotAgent`](https://api.python.langchain.com/en/latest/agents/langchain.agents.mrkl.base.ZeroShotAgent.html#langchain-agents-mrkl-base-zeroshotagent)

In [26]:
from langchain.agents import ZeroShotAgent, AgentExecutor

prefix = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suffix = """Begin!"

Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
    tools,
    prefix=prefix,
    suffix =suffix,
    input_variables=["input","agent_scratchpad"]
)

In [28]:
from langchain import LLMChain

llm_chain = LLMChain(llm=llm , prompt=prompt)

All that's left to do now is create our `ZeroShotAgent` and our `AgentExecutor`, which are the "reasoner" and "actor" halfs of the `ReAct` method of Agent implementation.

Read all about the `ReAct` framework [here](https://react-lm.github.io/)

In [31]:
barbenheimer_agent = ZeroShotAgent(
    llm_chain=llm_chain,
    tools=tools,
    verbose=True)

barbenheimer_agent_chain = AgentExecutor.from_agent_and_tools(
    agent=barbenheimer_agent,
    tools=tools,
    verbose=1)

  warn_deprecated(


## Conclusion

All that is left to do now, is feed inputs to your Agent and watch it work!

Remember to use the `{"input" : "YOUR QUERY HERE"}` format when prompting the Agent.

In [32]:
barbenheimer_agent_chain.invoke({"input" : "What did people like about the Barbie movie?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find out what people liked about the Barbie movie.
Action: BarbieInfo
Action Input: What did people like about the Barbie movie?[0m

[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mPeople liked the Barbie movie for various reasons, including:

1. Fun of seeing a doll come to life
2. Theme of empowerment
3. Attention to details
4. Margot Robbie's performance as Barbie
5. Highlighting the absurdity of Barbie's world
6. Incorporation of adult humor
7. Mix of fantasy elements

These aspects contributed to the positive reception of the movie among viewers.[0m

[1m> Finished chain.[0m

Observation: [36;1m[1;3m{'input': 'What did people like about the Barbie movie?', 'chat_history': [HumanMessage(content='Did people like Barbie, or did they find it too Philosphical? If they did, can you tell me why the movie is so Philosophical?'), AIMessage(content='', additional_kwargs={'function_call': {'argume

{'input': 'What did people like about the Barbie movie?',
 'output': "People liked the Barbie movie for various reasons, including the fun of seeing a doll come to life, the theme of empowerment, attention to details, Margot Robbie's performance as Barbie, highlighting the absurdity of Barbie's world, incorporation of adult humor, and mix of fantasy elements."}

In [33]:
barbenheimer_agent_chain.run({"input" : "What did people like about the Oppenheimer movie?"})

  warn_deprecated(




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find out what specific aspects of the Oppenheimer movie people enjoyed.
Action: OppenheimerInfo
Action Input: "What aspects of the Oppenheimer movie did people enjoy?"[0m
Observation: [33;1m[1;3mcontent="People enjoyed various aspects of the Oppenheimer movie, as highlighted in the reviews:\n\n1. **Direction**: Critics praised the direction of the film.\n2. **Cast Performances**: Particularly, the performances of Cillian Murphy, Emily Blunt, and Robert Downey Jr. were highly acclaimed.\n3. **Visuals**: The cinematography and visuals were crafted beautifully, enhancing the overall viewing experience.\n4. **Screenplay**: The screenplay was well-received for its storytelling.\n5. **Score**: The film's score was also appreciated.\n6. **Historical Context**: The movie made the complicated story of J. Robert Oppenheimer compelling, engaging, and simple to understand.\n7. **Interest in History**: It sparked an 

'People enjoyed the direction, cast performances, visuals, screenplay, score, historical context, interest in history, acting, cinematography, and entertainment value of the Oppenheimer movie.'

In [None]:
barbenheimer_agent_chain.run({"input" : "Did the movies Barbie and Oppenheimer share similar themes or ideas?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to gather information about both Barbie and Oppenheimer to see if they share similar themes or ideas.
Action: BarbieInfo
Action Input: "What are some common themes in Barbie movies?"[0m

[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `wikipedia` with `{'query': 'Barbie movies common themes'}`


[0m[36;1m[1;3mBarbie has been characterized as exploring themes of existentialism. Lucy Ford of GQ wrote that the film "ruminates on the very idea of what makes us human, the idea of 'the other', whether there's truly such a thing as autonomy or if we're all simply pawns to be picked up and disposed of when we are no longer useful." Ford observed that, in the film, Barbie and Ken go on "opposite but equal" journeys of self-discovery, after venturing out into the Real World and learning that it is an

qualifying that its themes were too often advanced by slogans rather than the experiences of i