# Questioning Barbie and Oppenheimer Through the Use of Agents

In the following notebook we will build an application that queries both the Barbie and Oppenheimer movies Wikipedia pages, as well as their reviews. 

The main focus of this notebook is to showcase a brief introduction to Agents.

## Build 🏗️

There are 3 main tasks in this notebook:

1. Contruct a Barbie retriever
2. Construct an Oppenheimer retriever
3. Combine the two and allow users to query both resources from a single input through the use of Agents

## Ship 🚢

Based on Tuesday's session - construct a Chainlit (or Gradio) application that allows users to interface with the application.

## Share 🚀

Make a social media post about your final application.

### Dependencies

As always, let's start with some dependencies!

In [None]:
!pip install -q -U langchain openai

In [1]:
import getpass
import os
openai_api_key = getpass.getpass("Enter your OpenAI API Key: ")
os.environ["OPENAI_API_KEY"] = openai_api_key

### LLM 

We will be leveraging OpenAI's `gpt-3.5-turbo` throughout the notebook, and we can keep it consistent throughout!

In [2]:
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature = 0)

### Data Collection and Transformation

We'll be leveraging the `WikipediaLoader` tool to collect information from Wikipedia. 

Be sure to set the `doc_content_chars_max` parameter so that you capture the *entire* Wikipedia article content.

In [3]:
!pip install -q -U wikipedia

In [7]:
from langchain.document_loaders import WikipediaLoader, CSVLoader
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

BARBIE_REVIEW_CSV_FILE_PATH = "../../llm_retrieval_augmented_question_answering/data/barbie_review_data.csv"

barbie_wikipedia_docs = WikipediaLoader(
    query="Barbie (film)", 
    load_max_docs=10000, 
    doc_content_chars_max=1000
    ).load()

barbie_csv_docs = CSVLoader(
    file_path=BARBIE_REVIEW_CSV_FILE_PATH, 
    source_column="Review_Url"
    ).load()



  lis = BeautifulSoup(html).find_all('li')


Since we'll be using same format source documentation separated by topic, we can save ourselves some extra effort and set up our splitters once. 

We're going to leverage the `RecursiveCharacterTextSplitter` again, this time paying close attention to the format our Wikipedia articles and reviews are in so we can be sure to chunk them appropritately. 

> HINT: You can pass a list of separators when you intialize your `RecursiveTextSplitter`! They are acted on in order of element 0 -> element len(list).

RELEVANT DOCS:
- [`RecursiveCharacterTextSplitter`](https://api.python.langchain.com/en/latest/text_splitter/langchain.text_splitter.RecursiveCharacterTextSplitter.html#langchain.text_splitter.RecursiveCharacterTextSplitter)

In [8]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

wikipedia_text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 100,
    length_function = len,
    is_separator_regex= False,
    separators = ["\n\n", "\n", ". "] # keep headings, then paragraphs, then sentences
)

csv_text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = 1000,
    chunk_overlap = 100,
    length_function = len,
    is_separator_regex= False,
    separators = ["\n", ". "]  # keep paragraphs, then sentences
)

chunked_barbie_wikipedia_docs = wikipedia_text_splitter.transform_documents(barbie_wikipedia_docs)
chunked_barbie_csv_docs = csv_text_splitter.transform_documents(barbie_csv_docs)

#### Retrieval and Embedding Strategy

We've already discussed the useful application of `CacheBackedEmbeddings`, so let's do it again!

RELEVANT DOCS:
- [`CacheBackedEmbeddings`](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.cache.CacheBackedEmbeddings.html#langchain-embeddings-cache-cachebackedembeddings)

In [9]:
!pip install -q -U rank_bm25 tiktoken faiss-cpu

In [13]:
from langchain.retrievers import BM25Retriever, EnsembleRetriever
from langchain.vectorstores import FAISS as faiss
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import LocalFileStore

# set up cached embeddings store
store = LocalFileStore("vector_database/cache/")

core_embeddings_model = OpenAIEmbeddings(model="text-embedding-ada-002")

embedder = CacheBackedEmbeddings.from_bytes_store(core_embeddings_model, store,
    namespace=core_embeddings_model.model
)

We'll implement a `FAISS` vectorstore, and create a retriever from it.

In [14]:
barbie_csv_faiss_retriever = faiss.from_documents(chunked_barbie_wikipedia_docs, embedder)

INFO:faiss.loader:Loading faiss.
INFO:faiss.loader:Successfully loaded faiss.


There are a number of excellent options to retrieve documents - we'll be looking at an additional example today, which is called the `EnsembleRetriever`.

The method this is using is outlined in [this paper](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf).

The brief explanation is:

1. We collect results from two different retrieval methods over the same corpus
2. We apply a reranking algorithm to rerank our source documents to be the *most relevant* without losing specific or potentially low-ranked information rich documents
3. We feed the top-k results into the LLM with our query as context.

> HINT: Your weight list should be of type `List[float]` and the `sum(List[float])` should be `1`.

In [15]:
# set up BM25 retriever
barbie_wikipedia_bm25_retriever = BM25Retriever.from_documents(
    documents=chunked_barbie_wikipedia_docs
)
barbie_wikipedia_bm25_retriever.k = 1

# set up FAISS vector store
barbie_wikipedia_faiss_store = faiss.from_documents(documents=chunked_barbie_wikipedia_docs, embedding=core_embeddings_model)
barbie_wikipedia_faiss_retriever = barbie_wikipedia_faiss_store.as_retriever(search_kwargs={"k": 1})

# set up ensemble retriever
barbie_ensemble_retriever = EnsembleRetriever(
    retrievers=[barbie_wikipedia_bm25_retriever, barbie_wikipedia_faiss_retriever],
    weights=[0.5, 0.5]
)

#### Retrieval Agent

We can create a simple conversational retrieval Agent by using the built-ins provided by LangChain!

> HINT: Be sure to provide good natural language descriptions of what the tool should be used for to get the best results.

RELEVANT DOCS:
- [`create_retriever_tool`](https://api.python.langchain.com/en/latest/agents/langchain.agents.agent_toolkits.conversational_retrieval.tool.create_retriever_tool.html#langchain.agents.agent_toolkits.conversational_retrieval.tool.create_retriever_tool)

In [17]:
from langchain.agents.agent_toolkits import create_retriever_tool

barbie_wikipedia_retrieval_tool = create_retriever_tool(
    retriever=barbie_wikipedia_bm25_retriever,
    name="Wikipedia_BM25",
    description="A tool using the BM25 retriever algorithm to source information from Barbie's Wikipedia page."
)

barbie_csv_retrieval_tool = create_retriever_tool(
    retriever=barbie_csv_faiss_retriever.as_retriever(search_kwargs={"k": 1}), # converting barbie_csv_faiss_retriever from vector store to retriever
    name="CSV_FAISS",
    description="A FAISS-based retrieval tool for sourcing data from Barbie-related CSV files using a cache vector store."
)

barbie_retriever_tools = [barbie_wikipedia_retrieval_tool, barbie_csv_retrieval_tool]

Now that we've created our tools, we can combined them into an agent!

RELEVANT DOCS:
- [`create_conversational_retrieval_agent`](https://api.python.langchain.com/en/latest/agents/langchain.agents.agent_toolkits.conversational_retrieval.openai_functions.create_conversational_retrieval_agent.html#langchain.agents.agent_toolkits.conversational_retrieval.openai_functions.create_conversational_retrieval_agent)

In [18]:
from langchain.agents.agent_toolkits import create_conversational_retrieval_agent

barbie_retriever_agent_executor = create_conversational_retrieval_agent(
    llm=llm,
    tools=barbie_retriever_tools,
    verbose=True
)

In [19]:
barbie_retriever_agent_executor({"input" : "Did people like Barbie, or did they find it too Philosphical? If they did, can you tell me why the movie is so Philosophical?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Wikipedia_BM25` with `Barbie`


[0m[36;1m[1;3m[Document(page_content='The Barbie Diaries is a 2006 computer-animated motion capture teen drama film directed by Eric Fogel and written by Elise Allen and Laura McCreary which premiered on Nickelodeon on April 30 and on DVD on May 9.The eighth entry in the Barbie film series, it features Kelly Sheridan as the talking vocal provider for Barbie and Skye Sweetnam as her singing vocals. The only Barbie film produced by Curious Pictures and thus the only film in the first decade of the Barbie film franchise not to be produced by Mainframe Entertainment (currently Mainframe Studios), this is also the last Barbie film to be originally distributed by Lionsgate in the United States and Canada, as Universal Studios (now Universal Pictures) was set to take over the home video distribution rights from the next film, Barbie in the 12 Dancing Princesses.\n\n\n== Plot ==\nIn this

{'input': 'Did people like Barbie, or did they find it too Philosphical? If they did, can you tell me why the movie is so Philosophical?',
 'chat_history': [HumanMessage(content='Did people like Barbie, or did they find it too Philosphical? If they did, can you tell me why the movie is so Philosophical?', additional_kwargs={}, example=False),
  AIMessage(content='', additional_kwargs={'function_call': {'name': 'Wikipedia_BM25', 'arguments': '{\n  "__arg1": "Barbie"\n}'}}, example=False),
  FunctionMessage(content="[Document(page_content='The Barbie Diaries is a 2006 computer-animated motion capture teen drama film directed by Eric Fogel and written by Elise Allen and Laura McCreary which premiered on Nickelodeon on April 30 and on DVD on May 9.The eighth entry in the Barbie film series, it features Kelly Sheridan as the talking vocal provider for Barbie and Skye Sweetnam as her singing vocals. The only Barbie film produced by Curious Pictures and thus the only film in the first decade 

In [20]:
barbie_retriever_agent_executor({"input" : "What is a very quick summary of the plot of the Barbie movie?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThe Barbie Diaries is a film that follows Barbie, a typical American teenager who is a sophomore in high school. She faces the challenges and problems that real-life teens often encounter, such as navigating friendships, discovering her true self, and dealing with the ups and downs of adolescence. The movie explores themes of friendship, self-discovery, and personal growth.[0m

[1m> Finished chain.[0m


{'input': 'What is a very quick summary of the plot of the Barbie movie?',
 'chat_history': [HumanMessage(content='Did people like Barbie, or did they find it too Philosphical? If they did, can you tell me why the movie is so Philosophical?', additional_kwargs={}, example=False),
  AIMessage(content='', additional_kwargs={'function_call': {'name': 'Wikipedia_BM25', 'arguments': '{\n  "__arg1": "Barbie"\n}'}}, example=False),
  FunctionMessage(content="[Document(page_content='The Barbie Diaries is a 2006 computer-animated motion capture teen drama film directed by Eric Fogel and written by Elise Allen and Laura McCreary which premiered on Nickelodeon on April 30 and on DVD on May 9.The eighth entry in the Barbie film series, it features Kelly Sheridan as the talking vocal provider for Barbie and Skye Sweetnam as her singing vocals. The only Barbie film produced by Curious Pictures and thus the only film in the first decade of the Barbie film franchise not to be produced by Mainframe Ent

### Oppenheimer Retrieval System

We're going to repourpose some of what we created previously, but this time we'll explore a different multi-source retrieval system.

In [23]:
OPPENHEIMER_REVIEW_CSV_FILE_PATH = "../data/oppenheimer_review_data.csv"

oppenheimer_wikipedia_docs = WikipediaLoader(
    query="J. Robert Oppenheimer",
    load_max_docs=1000,
    doc_content_chars_max=100
).load()

oppenheimer_csv_docs = CSVLoader(
    file_path=OPPENHEIMER_REVIEW_CSV_FILE_PATH, # Modify the file path to the correct location
    source_column="Review_Url"
).load()



  lis = BeautifulSoup(html).find_all('li')


In [24]:
chunked_opp_wikipedia_docs = wikipedia_text_splitter.transform_documents(oppenheimer_wikipedia_docs)
chunked_opp_csv_docs = csv_text_splitter.transform_documents(oppenheimer_csv_docs)

In [25]:
opp_csv_faiss_retriever = faiss.from_documents(chunked_opp_wikipedia_docs, core_embeddings_model)

# set up BM25 retriever
opp_wikipedia_bm25_retriever = BM25Retriever.from_documents(documents=chunked_opp_wikipedia_docs)
opp_wikipedia_bm25_retriever.k = 1

# set up FAISS vector store
opp_wikipedia_faiss_store = faiss.from_documents(documents=chunked_opp_wikipedia_docs, embedding=core_embeddings_model)
opp_wikipedia_faiss_retriever = opp_wikipedia_faiss_store.as_retriever(search_kwargs={"k": 1})

# set up ensemble retriever
opp_ensemble_retriever = EnsembleRetriever(
    retrievers=[opp_wikipedia_bm25_retriever, opp_wikipedia_faiss_retriever],
    weights=[0.5, 0.5]
)


#### Multi-source chain

We're going to allow the LLM to decide which information is most -> least valuable.

The way we'll do this is with LangChain's rather powerful "Expression Language"!

> HINT: You can leverage [this](https://python.langchain.com/docs/use_cases/question_answering/how_to/multiple_retrieval) resource if you get stuck - but experiment with different prompts/formats.

In [26]:
from langchain.prompts import ChatPromptTemplate

system_message = """Use the information from the below two sources to answer any questions.

Source 1: public user reviews about the Oppenheimer movie
<source1>
{source1}
</source1>

Source 2: the wikipedia page for the Oppenheimer movie including the plot summary, cast, and production information
<source2>
{source2}
</source2>
"""

prompt = ChatPromptTemplate.from_messages([("system", system_message), ("human", "{question}")])

In [28]:
oppenheimer_multisource_chain = {
    "source1": (lambda x: x["question"]) | opp_ensemble_retriever,
    "source2": (lambda x: x["question"]) | opp_ensemble_retriever, # changed opp_csv_faiss_retriever to opp_ensemble_retriever
    "question": lambda x: x["question"],
} | prompt | llm

In [29]:
oppenheimer_multisource_chain.invoke({"question" : "What did people think of the Oppenheimer movie?"})

AIMessage(content='There is no specific information provided in the given sources about public user reviews or opinions on the Oppenheimer movie.', additional_kwargs={}, example=False)

# Agent Creation

Now we can finally start building our Agent!

The first thing we'll need to do is provide our Agent a Toolbelt. (list of tools). Much like Batman, our LLM-powered Agent can use these tools as it sees fit. 

While the examples we're constructing in this notebook are straightforward for brevity and simplicities sake - there is no limit to what you can build with Agents, as we'll see as we progress through the program.

So, let's begin by setting up our Tools!

You'll notice that we have to set up a function to allow our `OppenheimerInfo` tool to interface with the Agent - this is due to it have a specific required input. Creating custom tools is a pattern that you'll want to grow acustomed to as you use LangChain more and more.

In [30]:
from langchain.agents import Tool

def query_oppenheimer_movie_system(question: str) -> str:
    """
    Queries the Oppenheimer Movie Q&A system.
    
    Args:
    - question (str): A fully formed question related to the Oppenheimer movie.
    
    Returns:
    - str: The response from the Oppenheimer Movie Q&A system.
    """
    return oppenheimer_multisource_chain.invoke({"question": question})

def query_barbie_information_system(question: str) -> str:
    """
    Queries the Barbie Information Retrieval system.
    
    Args:
    - question (str): A fully formed question about Barbie or related topics.
    
    Returns:
    - str: The response from the Barbie Information Retrieval system.
    """
    return oppenheimer_multisource_chain.invoke({"question": question})

tools = [
    Tool(
        name="Barbie Knowledge Retriever",
        func=barbie_retriever_agent_executor.invoke,
        description="Retrieves detailed information about Barbie dolls, history, and associated topics. Ensure your query is a complete question for best results."
    ),
    Tool(
        name="Oppenheimer Movie Query System",
        func=query_oppenheimer_movie_system,
        description="Fetches answers related to the Oppenheimer movie, including its plot, characters, and behind-the-scenes details. Ensure your query is a complete question for accurate answers."
    ),
]


Now that we've set up our Agents toolbelt, let's set up the LLM that will be powering it!

I would suggest playing around with these prompts - and experiments to find what works best for you.

RELEVANT DOCS:
- [`ZeroShotAgent`](https://api.python.langchain.com/en/latest/agents/langchain.agents.mrkl.base.ZeroShotAgent.html#langchain-agents-mrkl-base-zeroshotagent)

In [31]:
from langchain.agents import ZeroShotAgent, AgentExecutor

prefix = """Have a conversation with a human, answering the following questions as best you can. You have access to the following tools:"""
suffix = """Begin!"

Question: {input}
{agent_scratchpad}"""

prompt = ZeroShotAgent.create_prompt(
    prefix=prefix,
    suffix=suffix,
    tools=tools
)

In [32]:
from langchain import LLMChain

llm_chain = LLMChain(llm=llm, prompt=prompt)

All that's left to do now is create our `ZeroShotAgent` and our `AgentExecutor`, which are the "reasoner" and "actor" halfs of the `ReAct` method of Agent implementation.

Read all about the `ReAct` framework [here](https://react-lm.github.io/)

In [33]:
barbenheimer_agent = ZeroShotAgent(
    llm_chain=llm_chain,
    tools=tools,
    verbose=True)

barbenheimer_agent_chain = AgentExecutor.from_agent_and_tools(
    agent=barbenheimer_agent,
    tools=tools,
    verbose=True)

## Conclusion

All that is left to do now, is feed inputs to your Agent and watch it work!

Remember to use the `{"input" : "YOUR QUERY HERE"}` format when prompting the Agent.

In [34]:
barbenheimer_agent_chain.invoke({"input" : "What did people like about the Barbie movie?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find out what people liked about the Barbie movie.
Action: Barbie Knowledge Retriever
Action Input: What did people like about the Barbie movie?[0m

[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Wikipedia_BM25` with `Barbie`


[0m[36;1m[1;3m[Document(page_content='The Barbie Diaries is a 2006 computer-animated motion capture teen drama film directed by Eric Fogel and written by Elise Allen and Laura McCreary which premiered on Nickelodeon on April 30 and on DVD on May 9.The eighth entry in the Barbie film series, it features Kelly Sheridan as the talking vocal provider for Barbie and Skye Sweetnam as her singing vocals. The only Barbie film produced by Curious Pictures and thus the only film in the first decade of the Barbie film franchise not to be produced by Mainframe Entertainment (currently Mainframe Studios), this is also the last Barbie film to be originally distributed b

{'input': 'What did people like about the Barbie movie?',
 'output': 'The specific reasons why people liked the Barbie movie are not mentioned, but the positive messages, strong female characters, and themes of friendship, self-discovery, and personal growth in the Barbie film series may contribute to its appeal.'}

In [35]:
barbenheimer_agent_chain.run({"input" : "What did people like about the Oppenheimer movie?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find out what people liked about the Oppenheimer movie.
Action: Oppenheimer Movie Query System
Action Input: "What did people like about the Oppenheimer movie?"[0m
Observation: [33;1m[1;3mcontent='Based on the provided information, there is no specific mention of what people liked about the Oppenheimer movie.' additional_kwargs={} example=False[0m
Thought:[32;1m[1;3mI couldn't find any specific information about what people liked about the Oppenheimer movie.
Final Answer: There is no specific mention of what people liked about the Oppenheimer movie.[0m

[1m> Finished chain.[0m


'There is no specific mention of what people liked about the Oppenheimer movie.'

In [36]:
barbenheimer_agent_chain.run({"input" : "Did the movies Barbie and Oppenheimer share similar themes or ideas?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3mThought: I need to find information about the themes and ideas in both movies to determine if they are similar.
Action: Oppenheimer Movie Query System
Action Input: What are the main themes in the Oppenheimer movie?[0m
Observation: [33;1m[1;3mcontent='Based on the provided information, the main themes in the Oppenheimer movie are not explicitly mentioned. To determine the main themes, it would be necessary to analyze the plot summary and watch the series.' additional_kwargs={} example=False[0m
Thought:[32;1m[1;3mI couldn't find specific information about the main themes in the Oppenheimer movie. I'll need to try another approach to compare the themes in both movies.
Action: Barbie Knowledge Retriever
Action Input: What are the main themes in Barbie movies?[0m

[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Wikipedia_BM25` with `Barbie`


[0m[36;1m[1;3m[Document(page_content='The Barbie Diarie

'It is unclear if the movies Barbie and Oppenheimer share similar themes or ideas.'

## Next Steps

It's time to build a Chainlit (or Gradio) application and host it on Hugging Face Spaces! :ship: