<a href="https://colab.research.google.com/github/AI-Maker-Space/LLM-Ops-Vault/blob/main/Week%201/First%20Session/Barbie_Retrieval_Augmented_Question_Answering_(RAQA)_Assignment%20(Completed).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Questioning Barbie Reviews with RAQA (Retrieval Augmented Question Answering)

In the following notebook, you are tasked with creating a system that answers questions based on information found in reviews of the 2023 Barbie movie.

## Build 🏗️

There are 3 main tasks in this notebook:

1. Obtain and parse reviews from a review website
2. Create a Vectorstore from the reviews
3. Create a `RetrievalQA` chain 

## Ship 🚢

Create a Hugging Face Space that hosts your application.

## Share 🚀

Make a social media post about your final application.

>### Why RAQA and not RAG?
>If we look at the original [paper](https://arxiv.org/abs/2005.11401), we find that RAG is a fairly specific and well defined term that isn't exactly the same as "retrieve context, feed context to model in the prompt".
>For that reason, we're making the decision to delineate between "actual" RAG, and Retrieval Augmented Question Answering - which is not a well defined phrase.

### Pre-task Work

All we really need to do to get started is to get our prerequisites!

We'll be leveraging `langchain`, `openai`, and `pinecone` today.

Check out the docs:
- [LangChain](https://docs.langchain.com/docs/)
- [OpenAI](https://github.com/openai/openai-python)
- [Pinecone](https://docs.pinecone.io/docs/overview)

In [3]:
!pip install -q -U openai langchain "pinecone-client[grpc]"

In [4]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Open AI API Key:")

In [5]:
os.environ["PINECONE_API_KEY"] = getpass.getpass("Pinecone API Key:")

In [6]:
os.environ["PINECONE_ENV"] = getpass.getpass("Pinecone Environment:")

### Task 1: Data Preparation

In this task we'll be collecting, and then parsing, our data.

#### Scraping IMDB Reviews of Barbie

We'll use some Selenium based trickery to get the reviews we need to make our application.

Check out the docs here:
- [Selenium](https://www.selenium.dev/documentation/)

In [7]:
!pip install -q -U requests

In [8]:
!pip install -q -U scrapy selenium

You will need to install the `chromium-chromedriver` in order to use the method presented. 

`!apt install chromium-chromedriver` will install the chromedriver. 

You may have to use your terminal if you receive an error related to `sudo`. In that case, please use (in your terminal) `sudo apt install chromium-chromedriver`.

Otherwise, the `.csv` is provided and can be loaded through `pandas` if you're experiencing issues relating to the web-scraping portion of the notebook.

In [9]:
import numpy as np
import pandas as pd
from scrapy.selector import Selector
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import time
from tqdm import tqdm
import warnings
warnings.filterwarnings("ignore")

In [10]:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')
driver = webdriver.Chrome(options=chrome_options)

In [11]:
url = "https://www.imdb.com/title/tt1517268/reviews/?ref_=tt_ov_rt"
driver.get(url)

In [12]:
sel = Selector(text = driver.page_source)
review_counts = sel.css('.lister .header span::text').extract_first().replace(',','').split(' ')[0]
more_review_pages = int(int(review_counts)/25)

In [13]:
for i in tqdm(range(more_review_pages)):
    try:
        css_selector = 'load-more-trigger'
        driver.find_element(By.ID, css_selector).click()
    except:
        pass

100%|██████████| 54/54 [00:01<00:00, 45.63it/s]


In [14]:
rating_list = []
review_date_list = []
review_title_list = []
author_list = []
review_list = []
review_url_list = []
error_url_list = []
error_msg_list = []
reviews = driver.find_elements(By.CSS_SELECTOR, 'div.review-container')

for d in tqdm(reviews):
    try:
        sel2 = Selector(text = d.get_attribute('innerHTML'))
        try:
            rating = sel2.css('.rating-other-user-rating span::text').extract_first()
        except:
            rating = np.NaN
        try:
            review = sel2.css('.text.show-more__control::text').extract_first()
        except:
            review = np.NaN
        try:
            review_date = sel2.css('.review-date::text').extract_first()
        except:
            review_date = np.NaN
        try:
            author = sel2.css('.display-name-link a::text').extract_first()
        except:
            author = np.NaN
        try:
            review_title = sel2.css('a.title::text').extract_first()
        except:
            review_title = np.NaN
        try:
            review_url = sel2.css('a.title::attr(href)').extract_first()
        except:
            review_url = np.NaN
        rating_list.append(rating)
        review_date_list.append(review_date)
        review_title_list.append(review_title)
        author_list.append(author)
        review_list.append(review)
        review_url_list.append(review_url)
    except Exception as e:
        error_url_list.append(url)
        error_msg_list.append(e)
review_df = pd.DataFrame({
    'Review_Date':review_date_list,
    'Author':author_list,
    'Rating':rating_list,
    'Review_Title':review_title_list,
    'Review':review_list,
    'Review_Url':review_url
    })

  0%|          | 0/75 [00:00<?, ?it/s]

100%|██████████| 75/75 [00:00<00:00, 115.15it/s]


In [15]:
review_df

Unnamed: 0,Review_Date,Author,Rating,Review_Title,Review,Review_Url
0,21 July 2023,LoveofLegacy,6,"Beautiful film, but so preachy\n","Margot does the best with what she's given, bu...",/review/rw9207456/?ref_=tt_urv
1,26 July 2023,aherdofbeautifulwildponies,6,A Hot Pink Mess\n,"Before making Barbie (2023),",/review/rw9207456/?ref_=tt_urv
2,22 July 2023,imseeg,7,3 reasons FOR seeing it and 1 reason AGAINST.\n,The first reason to go see it:,/review/rw9207456/?ref_=tt_urv
3,31 July 2023,ramair350,10,"As a guy I felt some discomfort, and that's o...",As much as it pains me to give a movie called ...,/review/rw9207456/?ref_=tt_urv
4,22 July 2023,Natcat87,6,Too heavy handed\n,"As a woman that grew up with Barbie, I was ver...",/review/rw9207456/?ref_=tt_urv
...,...,...,...,...,...,...
70,23 July 2023,nethy-nho,10,Barbie is not a simple live action of a timel...,"But it is also in its smallest details, from t...",/review/rw9207456/?ref_=tt_urv
71,23 August 2023,the_oak,7,"Brilliant first part, second part a bit confu...",I loved the first part of this movie. There is...,/review/rw9207456/?ref_=tt_urv
72,1 August 2023,tforbes-2,10,Mind blowing\n,"Barbie is simply a mind-blowing experience, an...",/review/rw9207456/?ref_=tt_urv
73,26 July 2023,JPARM-IMDb,6,Expected more from Greta Gerwig\n,Even though I'm not the target audience for th...,/review/rw9207456/?ref_=tt_urv


Let's save this `pd.DataFrame` as a `.csv` to our local session (this will be terminated when you terminate the Colab session) so we can leverage it in LangChain!

In [16]:
review_df.to_csv("./barbie.csv")

In [17]:
data = review_df

#### Data Parsing

Now that we have our data - let's go ahead and set up some tools to parse it into a more usable format for LangChain!

Our reviews might contain a lot of information, and in order to ensure they don't exceed the context window of our model and to allow us to include a few reviews as context for each query - let's construct a system to "chunk" our data into smaller pieces.

We'll be leveraging the `RecursiveCharacterTextSplitter` for this task today.

While splitting our text seems like a simple enough task - getting this correct/incorrect can have massive downstream impacts on your application's performance.

You can read the docs here:
- [RecursiveCharacterTextSplitter](https://python.langchain.com/docs/modules/data_connection/document_transformers/text_splitters/recursive_text_splitter)

> ### HINT:
>It's always worth it to check out the LangChain source code if you're ever in a bind - for instance, if you want to know how to transform a set of documents, check it out [here](https://github.com/langchain-ai/langchain/blob/5e9687a196410e9f41ebcd11eb3f2ca13925545b/libs/langchain/langchain/text_splitter.py#L268C18-L268C18)

In [18]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size = ### YOUR CODE HERE, # the character length of the chunk
    chunk_overlap = ### YOUR CODE HERE, # the character length of the overlap between chunks
    length_function = ### YOUR CODE HERE, # the length function - in this case, character length (aka the python len() fn.)
)

Now that we have our `RecursiveCharacterTextSplitter` set up - let's look at how it might split our source text. 

Keep in mind that the source text is split by `["\n\n", "\n", " ", ""]` in that order.

We know that each of the subheadings in our review `page_content` is separated by a newline character, so it will preferably chunk the review subheadings together. 

That's great! Let's move on to creating our index!

### Task 2: Creating an "Index"

The term "index" is used largely to mean: Structured documents parsed into a useful format for querying, retrieving, and use in the LLM application stack.

#### Selecting Our VectorStore

There are a number of different VectorStores, and a number of different strengths and weaknesses to each.

In this notebook, we will be keeping it very simple by leveraging Pinecone's API Vector Database.

In [19]:
!pip install -q -U tiktoken

Let's set up a Pinecone index using the methods provided in their [documentation](https://docs.pinecone.io/docs/langchain)!

In [20]:
import pinecone

YOUR_API_KEY = os.environ["PINECONE_API_KEY"]
YOUR_ENV = os.environ["PINECONE_ENV"]

index_name = 'barbie-review-index'

pinecone.init(
    api_key= ### YOUR CODE HERE,
    environment= ### YOUR CODE HERE
)

if index_name not in pinecone.list_indexes():
    # we create a new index
    pinecone.create_index(
        name=index_name,
        metric='cosine',
        dimension= ### YOU CODE HERE - REMEMBER TO USE THE SAME DIMENSION AS THE EMBEDDING MODEL (text-embedding-ada-002)
    )

Now we can connect to our index and view some statistics about it.

In [21]:
index = pinecone.GRPCIndex(index_name)

index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 117}},
 'total_vector_count': 117}

We're going to be setting up our VectorStore with the OpenAI embeddings model. While this embeddings model does not need to be consistent with the LLM selection, it does need to be consistent between embedding our index and embedding our queries over that index.

While we don't have to worry too much about that in this example - it's something to keep in mind for more complex applications.

We're going to leverage a [`CacheBackedEmbeddings`](https://python.langchain.com/docs/modules/data_connection/caching_embeddings ) flow to prevent us from re-embedding similar queries over and over again.

Not only will this save time, it will also save us precious embedding tokens, which will reduce the overall cost for our application.

>#### Note:
>The overall cost savings needs to be compared against the additional cost of storing the cached embeddings for a true cost/benefit analysis. If your users are submitting the same queries often, though, this pattern can be a massive reduction in cost.

Documentation:
 - [`CacheBackedEmbeddings.from_bytes_store()`](https://api.python.langchain.com/en/latest/embeddings/langchain.embeddings.cache.CacheBackedEmbeddings.html#langchain.embeddings.cache.CacheBackedEmbeddings.from_bytes_store)

In [22]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import LocalFileStore

store = LocalFileStore("./cache/")

core_embeddings_model = OpenAIEmbeddings()

embedder = CacheBackedEmbeddings.from_bytes_store(
    ### YOUR CODE HERE, 
    ### YOUR CODE HERE, 
    namespace= ### YOUR CODE HERE
)

Now that we have our `CacheBackedEmbeddings` pipeline set-up, let's index our documents into our Pinecone Vector Database. 

We'll add some useful metadata as well!

In [23]:
data.head(1)

Unnamed: 0,Review_Date,Author,Rating,Review_Title,Review,Review_Url
0,21 July 2023,LoveofLegacy,6,"Beautiful film, but so preachy\n","Margot does the best with what she's given, bu...",/review/rw9207456/?ref_=tt_urv


In [24]:
from tqdm.auto import tqdm
from uuid import uuid4

BATCH_LIMIT = 100

texts = []
metadatas = []

for i in tqdm(range(len(data))):

    record = data.iloc[i]

    metadata = {
        'review-url': str(record["Review_Url"]),
        'review-date' : ### YOUR CODE HERE,
        'author' : ### YOUR CODE HERE,
        'rating' : ### YOUR CODE HERE,
        'review-title' : ### YOUR CODE HERE,
    }

    record_texts = text_splitter.split_text(
        ### YOUR CODE HERE
        )

    record_metadatas = [{
        "chunk": j, "text": text, **metadata
    } for j, text in enumerate(record_texts)]
    texts.extend(record_texts)
    metadatas.extend(record_metadatas)
    
    if len(texts) >= BATCH_LIMIT:
        ids = [str(uuid4()) for _ in range(len(texts))]
        embeds = embedder.embed_documents(texts)
        index.upsert(vectors=zip(ids, embeds, metadatas))
        texts = []
        metadatas = []

if len(texts) > 0:
    ids = [str(uuid4()) for _ in range(len(texts))]
    embeds = embedder.embed_documents(texts)
    index.upsert(vectors=zip(ids, embeds, metadatas))

100%|██████████| 75/75 [00:00<00:00, 9635.29it/s]


In [25]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 205}},
 'total_vector_count': 205}

Now that we've created our index, let's convert it to a LangChain `VectorStroe` so we can use it in the rest of the LangChain ecosystem!

In [26]:
from langchain.vectorstores import Pinecone

text_field = "text"

index = pinecone.Index(index_name)

vectorstore = Pinecone(
    ### YOUR CODE HERE, 
    ### YOUR CODE HERE, 
    ### YOUR CODE HERE
)

Now that we've created the VectorStore, we can check that it's working by embedding a query and retrieving passages from our reviews that are close to it.

In [27]:
query = "Who played Ken?"

vectorstore.similarity_search(
    query, 
    k=3  
)

[Document(page_content="The film's universe and settings are fantastic. The casting is really good too, with Gosling excelling in the role of Ken.", metadata={'author': 'hamsterination', 'chunk': 0.0, 'rating': '6', 'review-date': datetime.datetime(2023, 7, 19, 0, 0), 'review-title': ' It could have been so much better...\n', 'review-url': '/review/rw9207456/?ref_=tt_urv'}),
 Document(page_content="The film's universe and settings are fantastic. The casting is really good too, with Gosling excelling in the role of Ken.", metadata={'author': 'hamsterination', 'chunk': 0.0, 'rating': '6', 'review-date': datetime.datetime(2023, 7, 19, 0, 0), 'review-title': ' It could have been so much better...\n', 'review-url': '/review/rw9221648/?ref_=tt_urv'}),
 Document(page_content='This movie is so much fun. It starts off really strong although the story does move away from "Barbieland" sooner than I would have liked. Nonetheless, it regains its footing with the final act in particular and I could 

Let's see how much time the `CacheBackedEmbeddings` pattern saves us:

In [28]:
%%timeit
query = "I really wanted to enjoy this and I know that I am not the target audience but there were massive plot holes and no real flow."
vectorstore.similarity_search(
    query, 
    k=3  
)

236 ms ± 34.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


As we can see, even over a significant number of runs - the cached query is significantly faster than the first instance of the query!

With that, we're ready to move onto Task 3!

### Task 3: Building a Retrieval Chain

In this task, we'll be making a Retrieval Chain which will allow us to ask semantic questions over our data.

This part is rather abstracted away from us in LangChain and so it seems very powerful.

Be sure to check the documentation, the source code, and other provided resources to build a deeper understanding of what's happening "under the hood"!

#### A Basic RetrievalQA Chain

We're going to leverage `return_source_documents=True` to ensure we have proper sources for our reviews - should the end user want to verify the reviews themselves.

Hallucinations [are](https://arxiv.org/abs/2202.03629) [a](https://arxiv.org/abs/2305.15852) [massive](https://arxiv.org/abs/2303.16104) [problem](https://arxiv.org/abs/2305.18248) in LLM applications.

Though it has been tenuously shown that using Retrieval Augmentation [reduces hallucination in conversations](https://arxiv.org/pdf/2104.07567.pdf), one sure fire way to ensure your model is not hallucinating in a non-transparent way is to provide sources with your responses. This way the end-user can verify the output.

#### Our LLM

In this notebook, we'll continue to leverage OpenAI's suite of models - this time we'll be using the `gpt-3.5-turbo` model to power our RetrievalQAWithSources chain.

In [29]:
from langchain.llms.openai import OpenAIChat

llm = OpenAIChat(
    model=### YOUR CODE HERE, 
    temperature=### YOUR CODE HERE
)

Now we can set up our chain.

First, we need to leverage our `VectorStore` as a retriever!

In [30]:
retriever = ### YOUR CODE HERE

In [31]:
from langchain.chains import RetrievalQA
from langchain.callbacks import StdOutCallbackHandler

handler = StdOutCallbackHandler()

qa_with_sources_chain = RetrievalQA.from_chain_type(
    llm=llm,
    retriever=retriever,
    callbacks=[handler],
    return_source_documents=True
)

In [32]:
qa_with_sources_chain({"query" : "How was Will Ferrell in this movie?"})["result"]



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


'Will Ferrell ruined every scene he was in.'

In [33]:
qa_with_sources_chain({"query" : "Do reviewers consider this movie Kenough?"})["result"]



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


'Based on the given context, it is not possible to determine whether reviewers consider this movie "Kenough" or not.'

Let's look at the available metadata we have, thanks to our index-creation!

In [34]:
result = qa_with_sources_chain({"query" : "Was Will Ferrel funny?"})



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


In [35]:
for k, v in result.items():
    print(f"Key: {k}")
    print(f"Value: {v}")
    print("")

Key: query
Value: Was Will Ferrel funny?

Key: result
Value: No, the person states that Will Ferrell ruined every scene he was in.

Key: source_documents
Value: [Document(page_content="I really wanted to enjoy this and I know that I am not the target audience but there were massive plot holes and no real flow. The film was very disjointed. Ryan Gosling as good as he is seemed to old to play Ken and Will Ferrell ruined every scene he was in. I just didn't get it, it seemed hollow artificial and hackneyed. A waste of some great talent. It was predictable without being reassuring and trying so hard to be woke in the most superficial way in that but trying to tick so many boxes it actually ticked none. Margo Robbie looks beautiful throughout, the costumes and the sets were amazing but the story was way too weak and didn't make much sense at all.", metadata={'author': 'agjbull', 'chunk': 0.0, 'rating': '6', 'review-date': datetime.datetime(2023, 7, 23, 0, 0), 'review-title': ' Just a little

In [36]:
for page_content, metadata in result["source_documents"]:
    print(f"Metadata: {metadata}")
    print(f"Page Content: {page_content}")
    print("")

Metadata: ('metadata', {'author': 'agjbull', 'chunk': 0.0, 'rating': '6', 'review-date': datetime.datetime(2023, 7, 23, 0, 0), 'review-title': ' Just a little empty\n', 'review-url': '/review/rw9207456/?ref_=tt_urv'})
Page Content: ('page_content', "I really wanted to enjoy this and I know that I am not the target audience but there were massive plot holes and no real flow. The film was very disjointed. Ryan Gosling as good as he is seemed to old to play Ken and Will Ferrell ruined every scene he was in. I just didn't get it, it seemed hollow artificial and hackneyed. A waste of some great talent. It was predictable without being reassuring and trying so hard to be woke in the most superficial way in that but trying to tick so many boxes it actually ticked none. Margo Robbie looks beautiful throughout, the costumes and the sets were amazing but the story was way too weak and didn't make much sense at all.")

Metadata: ('metadata', {'author': 'agjbull', 'chunk': 0.0, 'rating': '6', 'rev

### Adding Prompt Caching and Monitoring

Now that we have the basic `RetrievalQAChain` set up and working - let's add a few more tools to help us built a more performant application and add a visibility tool as well!

#### Visibility Tooling

We'll be once again leveraging Weights and Biases as our visibility tool, so let's add that first!

You'll want to use the same Weights and Biases account that you set-up last Thursday here!

In [37]:
os.environ["WANDB_API_KEY"] = getpass.getpass("Weights and Biases API Key:")
os.environ["WANDB_PROJECT"] = "barbie-retrieval-qa"

Now, to set up WandB, all we have to do is...

In [None]:
!pip install -q -U wandb

In [38]:
os.environ["LANGCHAIN_WANDB_TRACING"] = "true"

Yes, that's it. 

Let's use our `RetrievalQA` chain to test it out!

In [39]:
qa_with_sources_chain({"query" : "Do reviewers consider this movie Kenough?"})["result"]

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: Streaming LangChain activity to W&B at https://wandb.ai/fourthbrain/barbie-retrieval-qa/runs/q3no0qvl
[34m[1mwandb[0m: `WandbTracer` is currently in beta.
[34m[1mwandb[0m: Please report any issues to https://github.com/wandb/wandb/issues with the tag `langchain`.




[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m


'Based on the given context, it is not possible to determine whether reviewers consider this movie "Kenough" or not.'

With those simple lines of code - we've added full visibility to our prompts and responses through Weights and Biases! 

#### Prompt Caching


### Adding A Prompt Cache

The basic idea of Prompt Caching is to provide a way to circumvent going to the LLM for prompts we have already seen.

Similar to cached embeddings, the idea is simple:

- Keep track of all the input/output pairs
- If a user query is (in the case of semantic similarity caches) close enough to a previous prompt contained in the cache, return the output associated with that pair

### Initializing a Prompt Cache

There are many different tools you can use to implement a Prompt Cache - from a "build it yourself" VectorStore implementation - to Redis - to custom libraries - there are upsides and downsides to each solution. 

Let's look at the Redis-backed Cache vs. `InMemoryCache` as an example:

Redis Cache
| Pros  | Cons  |
|---|---|
| Managed and Robust  | Expensive to Host  |
| Integrations on all Major Cloud Platforms  | Non-trivial to Integrate |
| Easily Scalable  | Does not have a ChatModel implementation |

`InMemoryCache`
| Pros  | Cons  |
|---|---|
| Easily implemented  | Consumes potentially precious memory |
| Completely Cloud Agnostic  | Does not offer inter-session caching |

For the sake of ease of use - and to allow functionality with our `ChatOpenAI` model - we'll leverage `InMemoryCache`.

We need to set our `langchain.llm_cache` to use the `InMemoryCache`.

- [`InMemoryCache`](https://api.python.langchain.com/en/latest/cache/langchain.cache.InMemoryCache.html)

In [40]:
import langchain
from langchain.cache import InMemoryCache
langchain.llm_cache = InMemoryCache()

One more important fact about the `InMemoryCache` is that it is what's called an "exact-match" cache - meaning it will only trigger when the user query is *exactly* represented in the cache. 

This is a safer cache, as we can guarentee the user's query exactly matches with previous queries and we don't have to worry about edge-cases where semantic similarity might fail - but it does reduce the potential to hit the cache.

We could leverage tools like `GPTCache`, or `RedisCache` (for non-chat model implementations) to get a "semantic similarity" cache, if desired!

In [41]:
%%time
qa_with_sources_chain({"query" : "Do reviewers consider this movie Kenough?"})["result"]



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
CPU times: user 109 ms, sys: 781 µs, total: 109 ms
Wall time: 1.8 s


'Based on the given context, it is not possible to determine whether reviewers consider this movie "Kenough" or not.'

In [42]:
%%time
qa_with_sources_chain({"query" : "Do reviewers consider this movie Kenough?"})["result"]



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
CPU times: user 23 ms, sys: 4 µs, total: 23 ms
Wall time: 269 ms


'Based on the given context, it is not possible to determine whether reviewers consider this movie "Kenough" or not.'

Let's look at an example that is extremely close - but is not the exact query.

In [43]:
%%time
qa_with_sources_chain({"query" : "Do reviewers consider this here movie Kenough?"})["result"]



[1m> Entering new RetrievalQA chain...[0m

[1m> Finished chain.[0m
CPU times: user 65.2 ms, sys: 4.1 ms, total: 69.3 ms
Wall time: 1.77 s


'Based on the given context, it is difficult to determine whether reviewers consider the movie "Kenough" or not.'

As you can see, adding an exact-match prompt cache is a very small lift - but it can significantly improve the latency of your end-user application experience!

### Conclusion

And with that, we have our Barbie Review RAQA Application built!

Let's port it into a Chainlit app and put it up on a Hugging Face Space!