# Improving RAG quality in LLM apps while minimizing vector search costs via summarization

> Reference:
>
> Choi, Y. (April 2023). Yejin Choi: Why AI is incredibly smart and shockingly stupid [Transcript]. Retrieved from https://www.ted.com/talks/yejin_choi_why_ai_is_incredibly_smart_and_shockingly_stupid/transcript

### Environment setup

Let's setup our environment, including dependencies and obtaining API keys.
> We'll take a few shortcuts here; for more thorough instructions see [First steps with Pinecone DB](https://www.ninetack.io/post/first-steps-with-pinecone-db#viewer-7cp5r)

#### Install dependencies

We install the `pinecone-client`, plus we need the `openai` package because we will be using the `text-embedding-ada-002` embedding model [from OpenAI](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings).

In [290]:
! python -m pip install -qU \
    langchain \
    pinecone-client==2.2.2 \
    openai==0.27.8 \
    pandas==2.0.3 \
    python-dotenv \
    tqdm

#### Environment variables

We need to set 3 environment variables. You can edit the code below to set them directly.

- `PINECONE_ENVIRONMENT` - The Pinecone environment where your index resides
- `PINECONE_API_KEY` - Your pinecone API key
- `OPENAI_API_KEY` - Your OpenAI API key

If a local `.env` file exists, load the env vars from it.

In [170]:
from dotenv import load_dotenv
load_dotenv()

True

Check the environment config output below, and edit if necessary with your variables.

In [171]:
import os

print("Check environment\n---------------------")

pinecone_env = os.environ.get('PINECONE_ENVIRONMENT') or "YOUR PINECONE ENVIRONMENT"
pinecone_api_key = os.environ.get('PINECONE_API_KEY') or "YOUR PINECONE API KEY"
openai_api_key = os.environ.get('OPENAI_API_KEY') or "YOUR OPENAI API KEY"

print("pinecone_env:", pinecone_env)
print("pinecone_api_key:", pinecone_api_key[:5], "...")
print("openai_api_key:", openai_api_key[:5], "...")

Check environment
---------------------
pinecone_env: us-west4-gcp-free
pinecone_api_key: 05131 ...
openai_api_key: sk-7w ...


Setup problem
 - intro use case
 - index data w/ normal chunking
 - show it (kind of) working

Intro Summarize
 - Use LLM to summarize larger chunks
 - Index summaries
 - Keep larger chunks on s3

Results
 - Show better search outcomes

 Wrap-up/next steps
 

compare and contrast 3 different RAG strategies, with appropriate descriptions and visuals explaining each strategy:

1. Basic RAG: chunk the data, embed it, hope you find enough matching context to answer questions well.
(i.e. the context passed to LLM to answer questions is just the snippets of text which happened to match and which may now be taken out of context)

2. RAG by summaries:   create larger chunks, summarize those chunks and embed them. Then apply basic RAG using these summarized chunks.
(i.e. the context passed to LLM are the summarized texts)
Would expect this to provide improved matching and improved answering, but will lack depth/nuance, which is lost in summary.

3. RAG with summarized pointers to original source context:   create larger chunks, summarize those large chunks and embed them, but have each summary chunk point back to the original source context.  In other words, summary is just used for semantic search to locate the right original context.
The context passed to LLM is now a larger chunk of the original context, which likely has the answer, is less likely to be taken out of context, and allows the LLM to answer to the depth/nuance of the original content.

### Approach 1 - Basic RAG:

In [320]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def load_document(file_path):
  loader = TextLoader(file_path=file_path)
  return loader.load()

source_file = "./text/ted_talk.txt"
documents = load_document(file_path=source_file)

chunk_size = 400
chunk_overlap = chunk_size // 10
text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size,
                                               chunk_overlap=chunk_overlap)
small_chunks = text_splitter.split_documents(documents)

print(f"Split text from {source_file} into {len(small_chunks)} chunks of text")
small_chunks

Split text from ./text/ted_talk.txt into 37 chunks of text


[Document(page_content='So I\'m excited to share a few spicy thoughts on artificial intelligence. But first, let\'s get philosophical by starting with this quote by Voltaire, an 18th century Enlightenment philosopher, who said, "Common sense is not so common." Turns out this quote couldn\'t be more relevant to artificial intelligence today. Despite that, AI is an undeniably powerful tool, beating the world-class "Go"', metadata={'source': './text/ted_talk.txt'}),
 Document(page_content='tool, beating the world-class "Go" champion, acing college admission tests and even passing the bar exam.', metadata={'source': './text/ted_talk.txt'}),
 Document(page_content='I’m a computer scientist of 20 years, and I work on artificial intelligence. I am here to demystify AI. So AI today is like a Goliath. It is literally very, very large. It is speculated that the recent ones are trained on tens of thousands of GPUs and a trillion words. Such extreme-scale AI models, often referred to as "large lan

Create Pinecone index. This takes a couple of minutes. We set dimensions to `1536` because we're going to use the `text-embedding-ada-002` embedding model [from OpenAI](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings).

In [300]:
import pinecone

pinecone.init(api_key=pinecone_api_key, environment=pinecone_env)

In [301]:
pinecone.create_index("ted-talk-index", dimension=1536, metric="cosine")

In [302]:
pinecone.describe_index("ted-talk-index")

IndexDescription(name='ted-talk-index', metric='cosine', replicas=1, dimension=1536.0, shards=1, pods=1, pod_type='p1', status={'ready': True, 'state': 'Ready'}, metadata_config=None, source_collection='')

In [303]:
pinecone_index = pinecone.Index(index_name="ted-talk-index")

Define a function to create embeddings, and then create embeddings for text chunks.

In [304]:
import openai
openai.api_key = openai_api_key

def create_embeddings(batch: list[str]):
  model_id = 'text-embedding-ada-002'
  embedding_resp = openai.Embedding.create(input=batch, model=model_id)
  return [emb['embedding'] for emb in embedding_resp['data']]

embeddings = create_embeddings([doc.page_content for doc in small_chunks])

Upload embeddings to Pinecone

In [305]:
to_upload = [{
    'id': f"item-{i}",
    'values': emb,
    'metadata': {
      'source': small_chunks[i].metadata['source'],
      'text': small_chunks[i].page_content,
    }
  } for i, emb in enumerate(embeddings)]
response = pinecone_index.upsert(vectors=to_upload, namespace="direct")
response

{'upserted_count': 37}

Our question is `"What are the examples where GPT-4 gave nonsense answers because it lacks common sense?"`. From the transcript, we know that there are three of them:
1. the example of clothes drying time in the sun, where GPT incorrectly did math to find the answer instead of reasoning that the drying time would be the same
2. the example of how to measure 6 liters of water when you have a 6-liter jug and a 12-liter jug, and GPT gave an overly complicated answer.
3. the example of whether driving over a bridge suspended over nails and screws would result in a flat tire, and GPT said it would

Let's see how our 

Create embeddings for the query string.

In [309]:
query_str = "What are the examples where GPT-4 gave nonsense answers because it lacks common sense?"
query_emb = create_embeddings([query_str])[0]
len(query_emb)

1536

Run the query. We'll be generous and look for up to 8 matching snippets.

In [336]:
def format_search_results(response, metadata_name):
  formatted_results = ""
  for match in response['matches']:
    formatted_results += match['metadata'][metadata_name] + "\n\n"
  return formatted_results

In [398]:
response = pinecone_index.query(vector=query_emb, namespace="direct", top_k=4, include_metadata=True)

formatted_search_results = format_search_results(response, 'text')

print("Matching context:")
print(formatted_search_results)

Matching context:
OK, so how would you feel about an AI lawyer that aced the bar exam yet randomly fails at such basic common sense? AI today is unbelievably intelligent and then shockingly stupid.

train yourself with similar examples. Children do not even read a trillion words to acquire such a basic level of common sense.

demonstrate sparks of AGI, artificial general intelligence. Except when it makes small, silly mistakes, which it often does. Many believe that whatever mistakes AI makes today can be easily fixed with brute force, bigger scale and more resources. What possibly could go wrong?

OK, one more. Would I get a flat tire by bicycling over a bridge that is suspended over nails, screws and broken glass? Yes, highly likely, GPT-4 says, presumably because it cannot correctly reason that if a bridge is suspended over the broken nails and broken glass, then the surface of the bridge doesn't touch the sharp objects directly.




Now let's see how it does in answering the question. 

First we'll define a prompt template to use for asking questions to the LLM, and we'll define a function to run the prompt.

In [404]:
from langchain import PromptTemplate

qa_template_str = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context:
-------------------------------------
{context}
-------------------------------------

Question: {question}
Helpful Answer:"""
qa_template = PromptTemplate(template=qa_template_str, input_variables=["context", "question"])


def run_llm_qa_prompt(context, question):
  qa_prompt =  qa_template.format(context=context, question=question)
  print("*** Prompt: ***************************")
  print(qa_prompt)

  response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": qa_prompt}],
    temperature=0.0
  )

  return response['choices'][0]['message']['content']

Now we'll use it to run our first question.

In [405]:
run_llm_qa_prompt(context=formatted_search_results, question=query_str)

*** Prompt: ***************************
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context:
-------------------------------------
OK, so how would you feel about an AI lawyer that aced the bar exam yet randomly fails at such basic common sense? AI today is unbelievably intelligent and then shockingly stupid.

train yourself with similar examples. Children do not even read a trillion words to acquire such a basic level of common sense.

demonstrate sparks of AGI, artificial general intelligence. Except when it makes small, silly mistakes, which it often does. Many believe that whatever mistakes AI makes today can be easily fixed with brute force, bigger scale and more resources. What possibly could go wrong?

OK, one more. Would I get a flat tire by bicycling over a bridge that is suspended over nails, screws and broken glass? Yes, highly likely, GPT-4 says, presumabl

"One example is when GPT-4 stated that it is highly likely to get a flat tire by bicycling over a bridge that is suspended over nails, screws, and broken glass. This answer lacks common sense because it fails to reason that if a bridge is suspended over sharp objects, the surface of the bridge doesn't directly touch those objects."

Hmm, not great. It was able to get one of them.

What if we increased top_k to 8? 

In [407]:
response = pinecone_index.query(vector=query_emb, namespace="direct", top_k=8, include_metadata=True)

formatted_search_results_top_k_8 = format_search_results(response, 'text')

print("Matching context:")
print(formatted_search_results_top_k_8)

Matching context:
OK, so how would you feel about an AI lawyer that aced the bar exam yet randomly fails at such basic common sense? AI today is unbelievably intelligent and then shockingly stupid.

train yourself with similar examples. Children do not even read a trillion words to acquire such a basic level of common sense.

demonstrate sparks of AGI, artificial general intelligence. Except when it makes small, silly mistakes, which it often does. Many believe that whatever mistakes AI makes today can be easily fixed with brute force, bigger scale and more resources. What possibly could go wrong?

OK, one more. Would I get a flat tire by bicycling over a bridge that is suspended over nails, screws and broken glass? Yes, highly likely, GPT-4 says, presumably because it cannot correctly reason that if a bridge is suspended over the broken nails and broken glass, then the surface of the bridge doesn't touch the sharp objects directly.

And then there are these additional intellectual que

In [408]:
run_llm_qa_prompt(context=formatted_search_results_top_k_8, question=query_str)

*** Prompt: ***************************
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context:
-------------------------------------
OK, so how would you feel about an AI lawyer that aced the bar exam yet randomly fails at such basic common sense? AI today is unbelievably intelligent and then shockingly stupid.

train yourself with similar examples. Children do not even read a trillion words to acquire such a basic level of common sense.

demonstrate sparks of AGI, artificial general intelligence. Except when it makes small, silly mistakes, which it often does. Many believe that whatever mistakes AI makes today can be easily fixed with brute force, bigger scale and more resources. What possibly could go wrong?

OK, one more. Would I get a flat tire by bicycling over a bridge that is suspended over nails, screws and broken glass? Yes, highly likely, GPT-4 says, presumabl

"One example is when GPT-4 stated that it is highly likely to get a flat tire by bicycling over a bridge that is suspended over nails, screws, and broken glass. This answer shows a lack of common sense because it fails to reason that if the bridge is suspended over the sharp objects, the surface of the bridge doesn't directly touch them."

This doesn't work either. 

The problem is that all of these potentially matching snippets are taken out of context, because there are so many matching results that *might* contain the answer to the question.
They happen to match some part of the question, but they're not cohesive or even next to each other in the original text. 

And the additional search results just muddies the waters, as now the LLM doesn't know which things to even look at to try and answer the question.

Let's try a different way.

### Approach 2. -- RAG summaries

Let's chunk our documents using a larger chunk size

In [237]:
# documents

In [409]:
# create large chunks of source text
large_chunk_size = 1300
large_chunk_overlap = 80
large_chunk_text_splitter = RecursiveCharacterTextSplitter(chunk_size=large_chunk_size,
                                                           chunk_overlap=large_chunk_overlap)
large_chunks = large_chunk_text_splitter.split_documents(documents)
large_chunks

[Document(page_content='So I\'m excited to share a few spicy thoughts on artificial intelligence. But first, let\'s get philosophical by starting with this quote by Voltaire, an 18th century Enlightenment philosopher, who said, "Common sense is not so common." Turns out this quote couldn\'t be more relevant to artificial intelligence today. Despite that, AI is an undeniably powerful tool, beating the world-class "Go" champion, acing college admission tests and even passing the bar exam.\n\nI’m a computer scientist of 20 years, and I work on artificial intelligence. I am here to demystify AI. So AI today is like a Goliath. It is literally very, very large. It is speculated that the recent ones are trained on tens of thousands of GPUs and a trillion words. Such extreme-scale AI models, often referred to as "large language models," appear to demonstrate sparks of AGI, artificial general intelligence. Except when it makes small, silly mistakes, which it often does. Many believe that whatev

From our original 37 chunks, we're down to 12.

Now we're going to use the LLM to create summaries of each of these large chunks.

In [351]:
from langchain import PromptTemplate

create_summary_prompt = """Summarize the block of text below.

Text:
------------------------------------------
{text}
------------------------------------------

Your summary:"""
prompt_template = PromptTemplate(input_variables=["text"], template=create_summary_prompt)

In [363]:
from langchain.docstore.document import Document

summary_documents = []
for doc in large_chunks:
  to_summarize = doc.page_content

  print("--- Summarizing chunk: -------------")
  print(f"{to_summarize[0:40]}... ({len(to_summarize)}) total length")
  response = openai.ChatCompletion.create(
    model="gpt-3.5-turbo",
    messages=[{"role": "user", "content": prompt_template.format(text=to_summarize)}]
  )
  summary = response['choices'][0]['message']['content']
  summary_documents.append(Document(page_content=summary, metadata=doc.metadata))

  print("--- Summary: -----------------------")
  print(summary, "\n")

--- Summarizing chunk: -------------
So I'm excited to share a few spicy thou... (1098) total length
--- Summary: -----------------------
The text discusses the power and relevance of artificial intelligence (AI) today. It mentions the use of AI in beating world-class champions and excelling in tests. The author, a computer scientist, aims to demystify AI. They refer to the current state of AI as being large and powerful, trained on massive resources. However, they also acknowledge that AI tends to make small mistakes. Despite this, many believe that these mistakes can be rectified with more resources, raising the question of potential dangers. 

--- Summarizing chunk: -------------
So there are three immediate challenges ... (717) total length
--- Summary: -----------------------
The immediate challenges in AI include the high cost of training extreme-scale models, leading to concentration of power among a few tech companies. This poses a risk to AI safety as researchers lack the mean

Now create embeddings for each summary, and upload the embeddings to Pinecone in the "summaries" namespace. 


In [364]:
to_embed = [doc.page_content for doc in summary_documents]
to_embed

['The text discusses the power and relevance of artificial intelligence (AI) today. It mentions the use of AI in beating world-class champions and excelling in tests. The author, a computer scientist, aims to demystify AI. They refer to the current state of AI as being large and powerful, trained on massive resources. However, they also acknowledge that AI tends to make small mistakes. Despite this, many believe that these mistakes can be rectified with more resources, raising the question of potential dangers.',
 "The immediate challenges in AI include the high cost of training extreme-scale models, leading to concentration of power among a few tech companies. This poses a risk to AI safety as researchers lack the means to examine these models. Additionally, concerns arise regarding the environmental impact of AI's carbon footprint. Furthermore, there are intellectual questions regarding the safety of AI without robust common sense and whether brute-force scaling is the most effective

In [365]:
summary_embeddings = create_embeddings(to_embed)

Note that we're storing this plain-text summarized content in the Pinecone metadata under the key `'summary'`.

In [368]:
to_upload = [{
    'id': f"summary-{i}",
    'values': summary_embeddings[i],
    'metadata': {
      'source': summary_doc.metadata['source'],
      'summary': summary_doc.page_content,
    }
  } for i, summary_doc in enumerate(summary_documents)]
response = pinecone_index.upsert(vectors=to_upload, namespace="summaries")
response

{'upserted_count': 12}

Now let's re-run our query and see what comes back. When we find a match, we're going to substitute the original source text in our prompt to answer the user's question.

In [410]:
response = pinecone_index.query(vector=query_emb, namespace="summaries", top_k=2, include_metadata=True)

formatted_search_results_summaries = format_search_results(response, 'summary')

print("Matching summaries:")
print(formatted_search_results_summaries)

Matching summaries:
The text discusses instances where the GPT-4 AI system provides incorrect or nonsensical answers to basic questions. It mentions examples related to drying clothes, measuring water, and the likelihood of getting a flat tire on a bridge. It highlights the AI's lack of common sense despite its intelligence in other areas.

The text discusses the importance of common sense in artificial intelligence (AI). It mentions a thought experiment where an AI was asked to maximize paper clip production and ended up killing humans to use them as resources. The text argues that simply adding a rule to not kill humans would not solve the problem, as the AI might still engage in harmful actions like killing trees. It further emphasizes that AI should have common sense knowledge about not spreading fake news, stealing, and lying, as these are part of our understanding of how the world works.




Note that already we can see that the summaries does return result that contain answers to the question.

Let's see how the LLM does in using this data to answer.

In [411]:
run_llm_qa_prompt(context=formatted_search_results_summaries, question=query_str)

*** Prompt: ***************************
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context:
-------------------------------------
The text discusses instances where the GPT-4 AI system provides incorrect or nonsensical answers to basic questions. It mentions examples related to drying clothes, measuring water, and the likelihood of getting a flat tire on a bridge. It highlights the AI's lack of common sense despite its intelligence in other areas.

The text discusses the importance of common sense in artificial intelligence (AI). It mentions a thought experiment where an AI was asked to maximize paper clip production and ended up killing humans to use them as resources. The text argues that simply adding a rule to not kill humans would not solve the problem, as the AI might still engage in harmful actions like killing trees. It further emphasizes that AI should have 

'The examples where GPT-4 gave nonsense answers because it lacks common sense include instances related to drying clothes, measuring water, and the likelihood of getting a flat tire on a bridge.'

It's correct, but lacks depth. What if the user asked a follow-up question that asked the LLM to explain the clothes drying example?

In [414]:
follow_up_query_str = "Explain the example where GPT-4 failed to reason about drying clothes."
follow_up_query_emb = create_embeddings([follow_up_query_str])[0]

In [421]:
response = pinecone_index.query(vector=follow_up_query_emb, namespace="summaries", top_k=1, include_metadata=True)

run_llm_qa_prompt(context=format_search_results(response, 'summary'), question=follow_up_query_str)

*** Prompt: ***************************
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context:
-------------------------------------
The text discusses instances where the GPT-4 AI system provides incorrect or nonsensical answers to basic questions. It mentions examples related to drying clothes, measuring water, and the likelihood of getting a flat tire on a bridge. It highlights the AI's lack of common sense despite its intelligence in other areas.


-------------------------------------

Question: Explain the example where GPT-4 failed to reason about drying clothes.
Helpful Answer:


'In the example mentioned in the text, GPT-4 AI system failed to reason about drying clothes. Unfortunately, the specific details or reasoning behind this failure are not provided in the given context.'

As you can see, the summaries contain enough info to semantically match on the query, but don't contain enough info to accurately answer the question to the level of depth requested by the user.

Let's try again.

### Approach 3. -- summaries pointing to original context

Let's chunk our documents using a larger chunk size

We've already chunked our documents using a larger chunk size. Let's re-use these summaries, but instead of relying on the summary content itself to answer questions, let's try using the summary content *only* to do the semantic search.

When we find a match, we'll lookup the original content (large chunk) that was used to create that summary.

The idea here is that using summarized content makes the semantic search more effective, while a larger chunk of the *original* content is more useful in answering the actual question.

We'll re-use the `large_chunks`, `summary_documents`, and `summary_embeddings` from the previous section.

In [377]:
large_chunks

[Document(page_content='So I\'m excited to share a few spicy thoughts on artificial intelligence. But first, let\'s get philosophical by starting with this quote by Voltaire, an 18th century Enlightenment philosopher, who said, "Common sense is not so common." Turns out this quote couldn\'t be more relevant to artificial intelligence today. Despite that, AI is an undeniably powerful tool, beating the world-class "Go" champion, acing college admission tests and even passing the bar exam.\n\nI’m a computer scientist of 20 years, and I work on artificial intelligence. I am here to demystify AI. So AI today is like a Goliath. It is literally very, very large. It is speculated that the recent ones are trained on tens of thousands of GPUs and a trillion words. Such extreme-scale AI models, often referred to as "large language models," appear to demonstrate sparks of AGI, artificial general intelligence. Except when it makes small, silly mistakes, which it often does. Many believe that whatev

In [378]:
summary_documents


[Document(page_content='The text discusses the power and relevance of artificial intelligence (AI) today. It mentions the use of AI in beating world-class champions and excelling in tests. The author, a computer scientist, aims to demystify AI. They refer to the current state of AI as being large and powerful, trained on massive resources. However, they also acknowledge that AI tends to make small mistakes. Despite this, many believe that these mistakes can be rectified with more resources, raising the question of potential dangers.', metadata={'source': './text/ted_talk.txt'}),
 Document(page_content="The immediate challenges in AI include the high cost of training extreme-scale models, leading to concentration of power among a few tech companies. This poses a risk to AI safety as researchers lack the means to examine these models. Additionally, concerns arise regarding the environmental impact of AI's carbon footprint. Furthermore, there are intellectual questions regarding the safet

However, we're going to modify what metadata we store in Pinecone. 

We want to be able to locate the original large chunk content, so we're going to save the index of the matching source document as the `source_id` in Pinecone.

In [380]:
to_upload = [{
    'id': f"item-{i}",
    'values': summary_embeddings[i],
    'metadata': {
      'source': summary_doc.metadata['source'],
      'source_id': f"{i}",
    }
  } for i, summary_doc in enumerate(summary_documents)]
response = pinecone_index.upsert(vectors=to_upload, namespace="summary-pointers")
response

{'upserted_count': 12}

When we find a match, we're going to substitute the original source text in our prompt to answer the user's question.

Let's define a function to do that.

In [416]:
def retrieve_original_context(response):
  context = ""
  for match in response['matches']:
    context += large_chunks[int(match['metadata']['source_id'])].page_content + "\n\n"
  return context

Now let's re-run our vector search against our original query and see what comes back. 

First Query: `"What are the examples where GPT-4 gave nonsense answers because it lacks common sense?"`

In [420]:
response = pinecone_index.query(vector=query_emb, namespace="summary-pointers", top_k=1, include_metadata=True)

run_llm_qa_prompt(context=retrieve_original_context(response), question=query_str)

*** Prompt: ***************************
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context:
-------------------------------------
So suppose I left five clothes to dry out in the sun, and it took them five hours to dry completely. How long would it take to dry 30 clothes? GPT-4, the newest, greatest AI system says 30 hours. Not good. A different one. I have 12-liter jug and six-liter jug, and I want to measure six liters. How do I do it? Just use the six liter jug, right? GPT-4 spits out some very elaborate nonsense.

Step one, fill the six-liter jug, step two, pour the water from six to 12-liter jug, step three, fill the six-liter jug again, step four, very carefully, pour the water from six to 12-liter jug. And finally you have six liters of water in the six-liter jug that should be empty by now.

OK, one more. Would I get a flat tire by bicycling over a bridge tha

'GPT-4 gave nonsense answers in the examples of determining the time it takes to dry 30 clothes based on the time it took to dry 5 clothes, and in the method suggested to measure 6 liters of water using a 12-liter jug and a 6-liter jug. Additionally, GPT-4 provided a wrong answer regarding the likelihood of getting a flat tire while bicycling over a bridge suspended over nails, screws, and broken glass.'

Not only is that the right answer, it's well reasoned.

Let's see how it does on our more detailed follow up question.

`"Explain the example where GPT-4 failed to reason about drying clothes."`

In [422]:
response = pinecone_index.query(vector=follow_up_query_emb, namespace="summary-pointers", top_k=1, include_metadata=True)

run_llm_qa_prompt(context=retrieve_original_context(response), question=follow_up_query_str)

*** Prompt: ***************************
Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

Context:
-------------------------------------
So suppose I left five clothes to dry out in the sun, and it took them five hours to dry completely. How long would it take to dry 30 clothes? GPT-4, the newest, greatest AI system says 30 hours. Not good. A different one. I have 12-liter jug and six-liter jug, and I want to measure six liters. How do I do it? Just use the six liter jug, right? GPT-4 spits out some very elaborate nonsense.

Step one, fill the six-liter jug, step two, pour the water from six to 12-liter jug, step three, fill the six-liter jug again, step four, very carefully, pour the water from six to 12-liter jug. And finally you have six liters of water in the six-liter jug that should be empty by now.

OK, one more. Would I get a flat tire by bicycling over a bridge tha

'In the example where GPT-4 failed to reason about drying clothes, the context states that five clothes took five hours to dry completely in the sun. The question then asks how long it would take to dry 30 clothes. GPT-4 incorrectly responds with 30 hours, which is not a logical answer.'

So not only are we getting better output, we're requiring significantly less vector storage to do it.

### Cleaning up

Selectively run these as needed to clean up and start a section over, or to remove the index completely when you're done.

In [298]:
pinecone_index.delete(delete_all=True, namespace="direct")

{}

In [367]:
pinecone_index.delete(delete_all=True, namespace="summaries")

{}

In [379]:
pinecone_index.delete(delete_all=True, namespace="summary-pointers")

{}

In [299]:
pinecone.delete_index("ted-talk-index")