# Question and Answer with OpenAI and RedisVL

This example shows how to use RedisVL to create a question and answer system using OpenAI's API.

In this notebook we will
1. Download a dataset of wikipedia articles (thanks to OpenAI's CDN)
2. Create embeddings for each article
3. Create a RedisVL index and store the embeddings with metadata
4. Construct a simple QnA system using the index and GPT-3
5. Improve the QnA system with LLMCache


The image below shows the architecture of the system we will create in this notebook.

![Diagram](https://github.com/RedisVentures/redis-openai-qna/raw/main/app/assets/RedisOpenAI-QnA-Architecture.drawio.png)

## Setup

In order to run this example, you will need to have a Redis instance with RediSearch running locally. You can do this by running the following command in your terminal:

```bash
docker run --name redis-vecdb -d -p 6379:6379 -p 8001:8001 redis/redis-stack:latest
```

This will also provide the RedisInsight GUI at http://localhost:8001

Next, we will install the dependencies for this notebook.

In [None]:
# first we need to install a few things

%pip install pandas wget tenacity tiktoken openai==0.28.1

In [1]:
import redisvl

ModuleNotFoundError: No module named 'redisvl'

In [None]:
import wget
import pandas as pd

embeddings_url = 'https://cdn.openai.com/API/examples/data/wikipedia_articles_2000.csv'

wget.download(embeddings_url)

In [None]:
df = pd.read_csv('wikipedia_articles_2000.csv')
df = df.drop(columns=['Unnamed: 0'])
df.head()

## Data Preparation



### Text Chunking

In order to create embeddings for the articles, we will need to chunk the text into smaller pieces. This is because there is a maximum length of text that can be sent to the OpenAI API. The code that follows pulls heavily from this [notebook](https://github.com/openai/openai-cookbook/blob/main/apps/enterprise-knowledge-retrieval/enterprise_knowledge_retrieval.ipynb) by OpenAI


In [None]:
TEXT_EMBEDDING_CHUNK_SIZE = 1000
EMBEDDINGS_MODEL = "text-embedding-ada-002"


def chunks(text, n, tokenizer):
    tokens = tokenizer.encode(text)
    """Yield successive n-sized chunks from text.

    Split a text into smaller chunks of size n, preferably ending at the end of a sentence
    """
    i = 0
    while i < len(tokens):
        # Find the nearest end of sentence within a range of 0.5 * n and 1.5 * n tokens
        j = min(i + int(1.5 * n), len(tokens))
        while j > i + int(0.5 * n):
            # Decode the tokens and check for full stop or newline
            chunk = tokenizer.decode(tokens[i:j])
            if chunk.endswith(".") or chunk.endswith("\n"):
                break
            j -= 1
        # If no end of sentence found, use n tokens as the chunk size
        if j == i + int(0.5 * n):
            j = min(i + n, len(tokens))
        yield tokens[i:j]
        i = j

def get_unique_id_for_file_chunk(title, chunk_index):
    return str(title+"-!"+str(chunk_index))

def chunk_text(record, tokenizer):
    chunked_records = []

    url = record['url']
    title = record['title']
    file_body_string = record['text']

    """Return a list of tuples (text_chunk, embedding) for a text."""
    token_chunks = list(chunks(file_body_string, TEXT_EMBEDDING_CHUNK_SIZE, tokenizer))
    text_chunks = [f'Title: {title};\n'+ tokenizer.decode(chunk) for chunk in token_chunks]

    for i, text_chunk in enumerate(text_chunks):
        doc_id = get_unique_id_for_file_chunk(title, i)
        chunked_records.append(({"id": doc_id,
                                "url": url,
                                "title": title,
                                "content": text_chunk,
                                "file_chunk_index": i}))
    return chunked_records

In [None]:
# Initialise tokenizer
import tiktoken
oai_tokenizer = tiktoken.get_encoding("cl100k_base")

records = []
for _, record in df.iterrows():
    records.extend(chunk_text(record, oai_tokenizer))

In [None]:
chunked_data = pd.DataFrame(records)
chunked_data.head()

### Embedding Creation

With the text broken up into chunks, we can create embeddings with the RedisVL `OpenAITextVectorizer`. This provider uses the OpenAI API to create embeddings for the text. The code below shows how to create embeddings for the text chunks.

In [None]:
import os
import getpass
from redisvl.vectorize.text import OpenAITextVectorizer

api_key = os.getenv("OPENAI_API_KEY") or getpass.getpass("Enter your OpenAI API key: ")
oaip = OpenAITextVectorizer(EMBEDDINGS_MODEL, api_config={"api_key": api_key})

chunked_data["embedding"] = oaip.embed_many(chunked_data["content"].tolist(), as_buffer=True)
chunked_data

## Construct the ``SearchIndex``

Now that we have the embeddings, we can create a ``SearchIndex`` to store them in Redis. We will use the ``SearchIndex`` to store the embeddings and metadata for each article.

In [None]:
%%writefile wiki_schema.yaml

index:
    name: wiki
    prefix: oaiWiki

fields:
    text:
        - name: content
        - name: title
    tag:
        - name: id
    vector:
        - name: embedding
          dims: 1536
          distance_metric: cosine
          algorithm: flat

In [None]:
from redisvl.index import AsyncSearchIndex

index = AsyncSearchIndex.from_yaml("wiki_schema.yaml")
index.connect("redis://localhost:6379")

await index.create()

In [None]:
!rvl index listall

In [None]:
await index.load(chunked_data.to_dict(orient="records"))

## Build the QnA System

Now that we have the data and the embeddings, we can build the QnA system. The system will perform three actions

1. Embed the user question and search for the most similar content
2. Make a prompt with the query and retrieved content
3. Send the prompt to the OpenAI API and return the answer


In [None]:
import openai
from redisvl.query import VectorQuery

In [None]:
CHAT_MODEL = "gpt-3.5-turbo"

def make_prompt(query, content):
    retrieval_prompt = f'''Use the content to answer the search query the customer has sent.
    If you can't answer the user's question, do not guess. If there is no content, respond with "I don't know".

    Search query:

    {query}

    Content:

    {content}

    Answer:
    '''
    return retrieval_prompt

async def retrieve_context(query):
    # Embed the query
    query_embedding = oaip.embed(query)

    # Get the top result from the index
    vector_query = VectorQuery(
        vector=query_embedding,
        vector_field_name="embedding",
        return_fields=["content"],
        num_results=1
    )

    results = await index.query(vector_query)
    content = ""
    if len(results) > 1:
        content = results[0]["content"]
    return content

async def answer_question(query):
    # Retrieve the context
    content = await retrieve_context(query)

    prompt = make_prompt(query, content)
    retrieval = await openai.ChatCompletion.acreate(
        model=CHAT_MODEL,
        messages=[{'role':"user",
                   'content': prompt}],
        max_tokens=500)

    # Response provided by GPT-3.5
    return retrieval['choices'][0]['message']['content']

In [None]:
import textwrap

question = "What is a Brontosaurus?"
textwrap.wrap(await answer_question(question), width=80)

In [None]:
# Question that makes no sense
question = "What is a trackiosamidon?"
await answer_question(question)

In [None]:
question = "Tell me about the life of Alanis Morissette"
textwrap.wrap(await answer_question(question))

## Improve the QnA System with LLMCache

The QnA system we built above is pretty good, but it can be improved. We can use the ``LLMCache`` to improve the system. The ``LLMCache`` will store the results of previous queries and return them if the query is similar enough to a previous query. This will reduce the number of queries we need to send to the OpenAI API and increase the overall QPS of the system assuming we expect similar queries to be asked.

In [None]:
from redisvl.llmcache.semantic import SemanticCache

cache = SemanticCache(redis_url="redis://localhost:6379", threshold=0.8)

In [None]:
async def answer_question(query):

    # check the cache
    result = cache.check(prompt=query)
    if result:
        return result[0]

    # Retrieve the context
    content = await retrieve_context(query)

    prompt = make_prompt(query, content)
    retrieval = await openai.ChatCompletion.acreate(
        model=CHAT_MODEL,
        messages=[{'role':"user",
                   'content': prompt}],
        max_tokens=500)

    # Response provided by GPT-3.5
    answer = retrieval['choices'][0]['message']['content']

    # cache the query_embedding and answer
    cache.store(query, answer)
    return answer

In [None]:
# ask a question to cache an answer
import time
start = time.time()
question = "Tell me about the life of Alanis Morissette"
answer = await answer_question(question)
print(f"Time taken: {time.time() - start}\n")
textwrap.wrap(answer, width=80)

In [None]:
# Same question, return cached answer, save time, save money :)
start = time.time()
answer = await answer_question(question)
print(f"Time taken with cache: {time.time() - start}\n")
textwrap.wrap(answer, width=80)

In [None]:
# ask a semantically similar question returns the same answer from the cache
# but isn't exactly the same question. In this case, the semantic similarity between
# the questions is greater than the threshold of 0.8 the cache is set to.
start = time.time()
question = "Who is Alanis Morissette?"
answer = await answer_question(question)
print(f"Time taken with the cache: {time.time() - start}\n")
textwrap.wrap(answer, width=80)