# Questioning Barbie and Oppenheimer Through the Use of Agents

In the following notebook we will build an application that queries both the Barbie and Oppenheimer movies Wikipedia pages, as well as their reviews. 

The main focus of this notebook is to showcase a brief introduction to Agents.

## Build 🏗️

There are 3 main tasks in this notebook:

1. Contruct a Barbie retriever
2. Construct an Oppenheimer retriever
3. Combine the two and allow users to query both resources from a single input through the use of Agents

## Ship 🚢

Based on Tuesday's session - construct a Chainlit (or Gradio) application that allows users to interface with the application.

## Share 🚀

Make a social media post about your final application.

### Dependencies

As always, let's start with some dependencies!

In [1]:
!pip install -q -U langchain openai

In [2]:
import getpass
import os
openai_api_key = getpass.getpass("Enter your OpenAI API Key: ")
os.environ["OPENAI_API_KEY"] = openai_api_key


### LLM 

We will be leveraging OpenAI's `gpt-3.5-turbo` throughout the notebook, and we can keep it consistent throughout!

In [3]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature = 0)



### Data Collection and Transformation

We'll be leveraging the `WikipediaLoader` tool to collect information from Wikipedia. 

Be sure to set the `doc_content_chars_max` parameter so that you capture the *entire* Wikipedia article content.

In [4]:
!pip install -q -U wikipedia

In [5]:
from langchain.document_loaders import WikipediaLoader, CSVLoader

# Don't know if there's a better way to do this, but I copy and pasted the web page into a text file and "wc"ed it to see it's 117839 characters, so let's do 150000 max chars
barbie_wikipedia_docs = WikipediaLoader(
    query="Barbie (film)", 
    load_max_docs= 1, 
    doc_content_chars_max=150000
    ).load()

barbie_csv_docs = CSVLoader(
    file_path="barbie_data/barbie.csv", 
    source_column="Review_Url"
    ).load()

Since we'll be using same format source documentation separated by topic, we can save ourselves some extra effort and set up our splitters once. 

We're going to leverage the `RecursiveCharacterTextSplitter` again, this time paying close attention to the format our Wikipedia articles and reviews are in so we can be sure to chunk them appropritately. 

> HINT: You can pass a list of separators when you intialize your `RecursiveTextSplitter`! They are acted on in order of element 0 -> element len(list).

RELEVANT DOCS:
- [`RecursiveCharacterTextSplitter`](https://api.python.langchain.com/en/latest/text_splitter/langchain.text_splitter.RecursiveCharacterTextSplitter.html#langchain.text_splitter.RecursiveCharacterTextSplitter)

In [6]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

wikipedia_text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 100,
    length_function = len,
    is_separator_regex= False,
    separators = ["\n\n", "\n",  "."]   ### YOUR CODE HERE # keep headings, then paragraphs, then sentences
)

csv_text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 100,
    length_function = len,
    is_separator_regex= False,
    separators = ["\n",  "."]   ### YOUR CODE HERE # keep paragraphs, then sentences
)

# removing because I don't use these variables
#chunked_barbie_wikipedia_docs = "barbie_wiki.txt"
#chunked_barbie_csv_docs = "barbie_csv.txt"

#### Retrieval and Embedding Strategy

We've already discussed the useful application of `CacheBackedEmbeddings`, so let's do it again!

RELEVANT DOCS:
- [`CacheBackedEmbeddings`](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.cache.CacheBackedEmbeddings.html#langchain-embeddings-cache-cachebackedembeddings)

In [7]:
!pip install -q -U rank_bm25 tiktoken faiss-cpu

In [9]:
from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import LocalFileStore

# set up cached embeddings store
store = LocalFileStore("./cache/")

core_embeddings_model = OpenAIEmbeddings()

embedder = CacheBackedEmbeddings.from_bytes_store(
    core_embeddings_model,
    store,
    namespace=core_embeddings_model.model
)


We'll implement a `FAISS` vectorstore, and create a retriever from it.

In [89]:
barbie_wikipedia_content = [d.page_content for d in barbie_wikipedia_docs]
barbie_csv_content = [d.page_content for d in barbie_csv_docs]

barbie_wikipedia_documents = wikipedia_text_splitter.create_documents(barbie_wikipedia_content)
barbie_csv_documents = csv_text_splitter.create_documents(barbie_csv_content)

vector_store = FAISS.from_documents(barbie_csv_documents, embedder)
barbie_csv_faiss_retriever = vector_store.as_retriever()


There are a number of excellent options to retrieve documents - we'll be looking at an additional example today, which is called the `EnsembleRetriever`.

The method this is using is outlined in [this paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf).

The brief explanation is:

1. We collect results from two different retrieval methods over the same corpus
2. We apply a reranking algorithm to rerank our source documents to be the *most relevant* without losing specific or potentially low-ranked information rich documents
3. We feed the top-k results into the LLM with our query as context.

> HINT: Your weight list should be of type `List[float]` and the `sum(List[float])` should be `1`.

In [90]:
# set up BM25 retriever
barbie_wikipedia_bm25_retriever = BM25Retriever.from_documents(
    barbie_wikipedia_documents
)
barbie_wikipedia_bm25_retriever.k = 1

# set up FAISS vector store
#barbie_wikipedia_faiss_store = FAISS.from_documents(
#    barbie_wikipedia_documents,
#    core_embeddings_model
#)
#barbie_wikipedia_faiss_retriever = barbie_wikipedia_faiss_store.as_retriever(search_kwargs={"k": 1})

# set up ensemble retriever
barbie_ensemble_retriever = EnsembleRetriever(
    retrievers=[barbie_csv_faiss_retriever, barbie_wikipedia_bm25_retriever],
    weights= [0.25, 0.75]  # give more weight to wikipedia
)

#### Retrieval Agent

We can create a simple conversational retrieval Agent by using the built-ins provided by LangChain!

> HINT: Be sure to provide good natural language descriptions of what the tool should be used for to get the best results.

RELEVANT DOCS:
- [`create_retriever_tool`](https://api.python.langchain.com/en/latest/agents/langchain.agents.agent_toolkits.conversational_retrieval.tool.create_retriever_tool.html#langchain.agents.agent_toolkits.conversational_retrieval.tool.create_retriever_tool)

In [91]:
from langchain.agents.agent_toolkits import create_retriever_tool

barbie_wikipedia_retrieval_tool = create_retriever_tool(
    barbie_wikipedia_bm25_retriever,
    "Barbie_Wikipedia_Retriever",
    "Retrieves text about the Babie movie from Wikipedia"
)

barbie_csv_retrieval_tool = create_retriever_tool(
    barbie_csv_faiss_retriever,
    "Barbie_CSV_Retiever",
    "Retrieves text about the Babie movie from a CSV file of user-generated reviews"
)

barbie_retriever_tools = [barbie_wikipedia_retrieval_tool, barbie_csv_retrieval_tool]

Now that we've created our tools, we can combined them into an agent!

RELEVANT DOCS:
- [`create_conversational_retrieval_agent`](https://api.python.langchain.com/en/latest/agents/langchain.agents.agent_toolkits.conversational_retrieval.openai_functions.create_conversational_retrieval_agent.html#langchain.agents.agent_toolkits.conversational_retrieval.openai_functions.create_conversational_retrieval_agent)

In [92]:
from langchain.agents.agent_toolkits import create_conversational_retrieval_agent

barbie_retriever_agent_executor = create_conversational_retrieval_agent(llm, barbie_retriever_tools, verbose=True)



In [93]:
barbie_retriever_agent_executor({"input" : "Did people like Barbie, or did they find it too Philosphical? If they did, can you tell me why the movie is so Philosophical?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Barbie_CSV_Retiever` with `Barbie`


[0m[33;1m[1;3m[Document(page_content=": 122\nReview_Date: 30 July 2023\nAuthor: walfordior\nRating: 9\nReview_Title: This is more than just a 'lighthearted comedy'\nReview: Barbie is no longer just a toy; rather, this idea of 'Barbie' has evolved into one of the most famous and significant topics in the world at this moment in time. It is so much more than merely a toy. It will not longer be just 'a toy'\nReview_Url: /review/rw9199947/?ref_=tt_urv", metadata={}), Document(page_content=': 64\nReview_Date: 22 July 2023\nAuthor: fernandoschiavi\nRating: 5\nReview_Title: "Barbie" is fun and visually beautiful, but unfortunately Barbie doll was dragged into the cultural war, used as a puppet by political militancy', metadata={}), Document(page_content=": 39\nReview_Date: 23 July 2023\nAuthor: Anurag-Shetty\nRating: 10\nReview_Title: A wholesome delight!\nReview: Barbie is based o

{'input': 'Did people like Barbie, or did they find it too Philosphical? If they did, can you tell me why the movie is so Philosophical?',
 'chat_history': [HumanMessage(content='Did people like Barbie, or did they find it too Philosphical? If they did, can you tell me why the movie is so Philosophical?', additional_kwargs={}, example=False),
  AIMessage(content='', additional_kwargs={'function_call': {'name': 'Barbie_CSV_Retiever', 'arguments': '{\n  "__arg1": "Barbie"\n}'}}, example=False),
  FunctionMessage(content='[Document(page_content=": 122\\nReview_Date: 30 July 2023\\nAuthor: walfordior\\nRating: 9\\nReview_Title: This is more than just a \'lighthearted comedy\'\\nReview: Barbie is no longer just a toy; rather, this idea of \'Barbie\' has evolved into one of the most famous and significant topics in the world at this moment in time. It is so much more than merely a toy. It will not longer be just \'a toy\'\\nReview_Url: /review/rw9199947/?ref_=tt_urv", metadata={}), Document(

In [86]:
barbie_retriever_agent_executor({"input" : "What is a very quick summary of the plot of the Barbie movie?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThe Barbie movie follows the journey of Barbie, who lives in Barbieland, a matriarchal society of dolls. When Barbie faces an existential crisis and realizes the impact of unrealistic beauty standards, she embarks on a journey of self-discovery. Along the way, she forms friendships, challenges societal norms, and confronts the expectations placed on her. The movie explores themes of identity, self-acceptance, and the importance of female empowerment.[0m

[1m> Finished chain.[0m


{'input': 'What is a very quick summary of the plot of the Barbie movie?',
 'chat_history': [HumanMessage(content='Did people like Barbie, or did they find it too Philosphical? If they did, can you tell me why the movie is so Philosophical?', additional_kwargs={}, example=False),
  AIMessage(content='', additional_kwargs={'function_call': {'name': 'Barbie_CSV_Retiever', 'arguments': '{\n  "__arg1": "Barbie"\n}'}}, example=False),
  FunctionMessage(content='[Document(page_content=\'== Plot ==\\nStereotypical Barbie ("Barbie") and fellow dolls reside in Barbieland; a matriarchal society with different variations of Barbies, Kens, and a group of discontinued models, who are treated like outcasts due to their unconventional traits. While the Kens spend their days playing at the beach, considering it as their profession, the Barbies hold prestigious jobs such as doctors, lawyers, and politicians. Beach Ken ("Ken") is only happy when he is with Barbie and seeks a closer relationship, but Bar

### Oppenheimer Retrieval System

We're going to repourpose some of what we created previously, but this time we'll explore a different multi-source retrieval system.

In [87]:
# Don't know if there's a better way to do this, but I copy and pasted the web page into a text file and "wc"ed it to see it's 82359 characters, so let's do 100000 max chars
oppenheimer_wikipedia_docs = WikipediaLoader(
    query="Oppenheimer (film)", 
    load_max_docs= 1, 
    doc_content_chars_max=100000
    ).load()

oppenheimer_csv_docs = CSVLoader(
    file_path="oppenheimer_data/oppenheimer.csv", 
    source_column="Review_Url"
    ).load()

In [88]:
oppenheimer_wikipedia_content = [d.page_content for d in oppenheimer_wikipedia_docs]
oppenheimer_csv_content = [d.page_content for d in oppenheimer_csv_docs]

chunked_opp_wikipedia_docs = wikipedia_text_splitter.create_documents(oppenheimer_wikipedia_content)
chunked_opp_csv_docs = csv_text_splitter.create_documents(oppenheimer_csv_content)

In [96]:
vector_store = FAISS.from_documents(chunked_opp_csv_docs, embedder)
opp_csv_faiss_retriever = vector_store.as_retriever()

# set up BM25 retriever -- where does this get used?
opp_wikipedia_bm25_retriever = BM25Retriever.from_documents(
    chunked_opp_wikipedia_docs
)
opp_wikipedia_bm25_retriever.k = 1

# set up FAISS vector store
opp_wikipedia_faiss_store = FAISS.from_documents(
    chunked_opp_wikipedia_docs,
    embedder
)
opp_wikipedia_faiss_retriever = opp_wikipedia_faiss_store.as_retriever(search_kwargs={"k": 1})

# set up ensemble retriever
opp_ensemble_retriever = EnsembleRetriever(
    retrievers=[opp_csv_faiss_retriever, opp_wikipedia_bm25_retriever],
    weights= [0.25, 0.75]  # give more weight to wikipedia
)

#### Multi-source chain

We're going to allow the LLM to decide which information is most -> least valuable.

The way we'll do this is with LangChain's rather powerful "Expression Language"!

> HINT: You can leverage [this](https://python.langchain.com/docs/use_cases/question_answering/how_to/multiple_retrieval) resource if you get stuck - but experiment with different prompts/formats.

In [97]:
from langchain.prompts import ChatPromptTemplate

system_message = """Use the information from the below two sources to answer any questions.

Source 1: public user reviews about the Oppenheimer movie
<source1>
{source1}
</source1>

Source 2: the wikipedia page for the Oppenheimer movie including the plot summary, cast, and production information
<source2>
{source2}
</source2>
"""

prompt = ChatPromptTemplate.from_messages([("system", system_message), ("human", "{question}")])

In [98]:
# Is this a bug?  Shouldn't source1 be opp_csv_faiss_retriever and source2 be opp_wikipedia_faiss_retriever?

oppenheimer_multisource_chain = {
    "source1": (lambda x: x["question"]) | opp_csv_faiss_retriever,
    "source2": (lambda x: x["question"]) | opp_wikipedia_faiss_retriever,
    "question": lambda x: x["question"],
} | prompt | llm


# now do the same for Barbie
system_message = """Use the information from the below two sources to answer any questions.

Source 1: public user reviews about the Barbie movie
<source1>
{source1}
</source1>

Source 2: the wikipedia page for the Barbie movie including the plot summary, cast, and production information
<source2>
{source2}
</source2>
"""

prompt = ChatPromptTemplate.from_messages([("system", system_message), ("human", "{question}")])

barbie_multisource_chain = {
    "source1": (lambda x: x["question"]) | barbie_csv_faiss_retriever,
    "source2": (lambda x: x["question"]) | barbie_wikipedia_faiss_retriever,
    "question": lambda x: x["question"],
} | prompt | llm


In [99]:
oppenheimer_multisource_chain.invoke({"question" : "What did people think of the Oppenheimer movie?"})

AIMessage(content='Based on the public user reviews and critical reception, opinions about the Oppenheimer movie are mixed. Some viewers found the movie compelling, engaging, and simple to understand. They praised the actors\' performances, particularly Robert Downey, and appreciated the cinematography. These viewers enjoyed the film and expressed interest in seeing more biographical films from Christopher Nolan.\n\nHowever, there were also criticisms of the movie. One reviewer felt that the film lacked emotion and failed to capture the relationships and charisma of J. Robert Oppenheimer as portrayed in the book it was based on. They expressed disappointment that the film focused more on the Los Alamos project and testimonies rather than delving into the man himself. Another reviewer mentioned conflicting themes and internal conflicts portrayed in the movie, but still found it fascinating.\n\nIn terms of critical reception, Richard Roeper of the Chicago Sun-Times and The A.V. Club\'s M

# Agent Creation

Now we can finally start building our Agent!

The first thing we'll need to do is provide our Agent a Toolbelt. (list of tools). Much like Batman, our LLM-powered Agent can use these tools as it sees fit. 

While the examples we're constructing in this notebook are straightforward for brevity and simplicities sake - there is no limit to what you can build with Agents, as we'll see as we progress through the program.

So, let's begin by setting up our Tools!

You'll notice that we have to set up a function to allow our `OppenheimerInfo` tool to interface with the Agent - this is due to it have a specific required input. Creating custom tools is a pattern that you'll want to grow acustomed to as you use LangChain more and more.

In [110]:
from langchain.agents import Tool

def query_oppenheimer(input):
    return oppenheimer_multisource_chain.invoke({"question" : input})

def query_barbie(input):
    return barbie_multisource_chain.invoke({"question" : input})

tools = [
    Tool(
        name = "opp_retriever",
        func=query_oppenheimer,
        description="Useful for getting information about the Oppenheimer film"
    ),
    Tool(
        name = "barbie_retriever",
        func=query_barbie,
        description="Useful for getting information about the Barbie film"
    ),
]

Now that we've set up our Agents toolbelt, let's set up the LLM that will be powering it!

I would suggest playing around with these prompts - and experiments to find what works best for you.

RELEVANT DOCS:
- [`ZeroShotAgent`](https://api.python.langchain.com/en/latest/agents/langchain.agents.mrkl.base.ZeroShotAgent.html#langchain-agents-mrkl-base-zeroshotagent)

In [111]:
from langchain.agents import ZeroShotAgent, AgentExecutor

prefix = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suffix = """Begin!"

Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
    tools=tools,
    prefix=prefix,
    suffix=suffix,
    input_variables=["input", "agent_scratchpad"]
)

In [112]:
from langchain import LLMChain

llm_chain = LLMChain(
    llm=llm,
    prompt=prompt
)

All that's left to do now is create our `ZeroShotAgent` and our `AgentExecutor`, which are the "reasoner" and "actor" halfs of the `ReAct` method of Agent implementation.

Read all about the `ReAct` framework [here](https://react-lm.github.io/)

In [113]:
barbenheimer_agent = ZeroShotAgent(
    llm_chain=llm_chain, 
    tools=tools, 
    verbose=True)


# why are tools included twice?
barbenheimer_agent_chain = AgentExecutor.from_agent_and_tools(
    agent=barbenheimer_agent, 
    tools=tools, 
    verbose=True)

## Conclusion

All that is left to do now, is feed inputs to your Agent and watch it work!

Remember to use the `{"input" : "YOUR QUERY HERE"}` format when prompting the Agent.

In [107]:
barbenheimer_agent_chain.invoke({"input" : "What did people like about the Barbie movie?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find information about the Barbie movie to answer this question.
Action: barbie_retriever
Action Input: "What did people like about the Barbie movie?"[0m
Observation: [33;1m[1;3mcontent='Based on the information from the sources, people liked the following aspects of the Barbie movie:\n\n- The movie was considered the best movie based on a toy.\n- The set design, colors, and camera movements were enjoyed.\n- The movie had references to the Barbie universe, including the dolls and their impact on society.\n- Some moments were funny, while others were more serious or dramatic.\n- The movie addressed societal and business problems.\n- The movie was considered a great Barbie movie overall.\n- The movie was praised for its production design and brand extension.\n- The partnership between the director and performer was applauded.\n- The movie was described as brilliant, beautiful, and fun.' additional_kwargs={

{'input': 'What did people like about the Barbie movie?',
 'output': 'People liked the Barbie movie for its set design, colors, camera movements, references to the Barbie universe, funny and serious moments, addressing societal and business problems, overall quality as a Barbie movie, production design and brand extension, and the partnership between the director and performer.'}

In [108]:
barbenheimer_agent_chain.run({"input" : "What did people like about the Oppenheimer movie?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to retrieve information about the Oppenheimer film to answer this question.
Action: opp_retriever
Action Input: "What did people like about the Oppenheimer movie?"[0m
Thought:[32;1m[1;3mI now know the final answer.

[1m> Finished chain.[0m




In [114]:
barbenheimer_agent_chain.run({"input" : "Did the movies Barbie and Oppenheimer share similar themes or ideas?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to gather information about both movies to determine if they share similar themes or ideas.
Action: opp_retriever
Action Input: "Oppenheimer film"[0m
Observation: [36;1m[1;3mcontent='The Oppenheimer film is a 2023 epic biographical thriller written and directed by Christopher Nolan. It is based on the 2005 biography "American Prometheus" by Kai Bird and Martin J. Sherwin. The film chronicles the career of American theoretical physicist J. Robert Oppenheimer, with a focus on his studies, his direction of the Manhattan Project during World War II, and his eventual fall from grace due to his 1954 security hearing.\n\nThe film stars Cillian Murphy as J. Robert Oppenheimer, Emily Blunt as his wife "Kitty", Matt Damon as head of the Manhattan Project Leslie Groves, Robert Downey Jr. as U.S. Atomic Energy Commission member Lewis Strauss, and Florence Pugh as Communist Party USA member Jean Tatlock. The ensemble su

'The Oppenheimer film is a biographical thriller that focuses on the career of J. Robert Oppenheimer and his involvement in the Manhattan Project. It explores themes of scientific discovery, the consequences of technology, and the moral implications of creating weapons of mass destruction. On the other hand, the Barbie film is a fantasy comedy that follows Barbie and Ken on a journey of self-discovery. It tackles themes of identity, self-acceptance, and challenging gender stereotypes. While both films have their own unique themes and ideas, they do not share similar themes or ideas.'

## Next Steps

It's time to build a Chainlit (or Gradio) application and host it on Hugging Face Spaces! :ship: