# Basic RAG using Lexy

## Introduction

Let's go through a basic implementation of Retrieval Augmented Generation (RAG) using Lexy. RAG is the process of using a retriever to find relevant documents to include in the prompt as context for a language model. 

In this example, we'll use Lexy to store and retrieve documents describing characters from the TV show House of the Dragon. We'll then use those documents to construct a prompt that GPT-4 can use to answer questions about the characters.

This tutorial is a **simplified introduction** to RAG, and not a real-world application. This is done intentionally to teach the basic concept of RAG and how it is implemented. We'll point out some of these simplifications as we go along, and discuss them in more detail in the section [Real-world considerations](#Real-world-considerations). We'll also provide links to additional tutorials which cover the complexities typically encountered in real-world applications.

### OpenAI API Key

This example requires an OpenAI API key in order to (1) generate embeddings, and (2) interact with GPT-4. Although this example uses OpenAI, you can use Lexy with any language model and any embedding model, including free, open-source models such as `SentenceTransformer`.

You can add your API key as an environment variable using the `.env` file in the root directory of the Lexy repository. See [How do I add a new environment variable](../faq.md#how-do-i-add-a-new-environment-variable) on the [FAQ page](../faq.md) for more details.

```shell title=".env"
OPENAI_API_KEY=your_secret_api_key
```

Remember to rebuild your containers after adding the environment variable (otherwise your container won't see the newly added variable). Simply run the following on the command line:

```shell
make update-dev-containers
```

Then run the following in your notebook to load the environment variables.

In [None]:
import os
from dotenv import load_dotenv

# load environment variables, including OPENAI_API_KEY
load_dotenv()

# alternatively, you can set the environment variable directly
if "OPENAI_API_KEY" not in os.environ:
    os.environ["OPENAI_API_KEY"] = "your_secret_api_key"

## Sample data

Our data is in the [`sample_data/documents`](https://github.com/lexy-ai/lexy/tree/main/sample_data/documents) directory of the Lexy repo. Let's import it and take a look at the first few lines. 

In [None]:
with open("../sample_data/documents/hotd.txt") as f:
    lines = f.read().splitlines()
    
lines[:3]

## Add documents to Lexy

Let's instantiate a Lexy client and create a new collection for our documents.

In [None]:
from lexy_py import LexyClient

lexy = LexyClient()

In [None]:
# create a new collection
collection = lexy.create_collection(
    collection_id="house_of_the_dragon", 
    description="House of the Dragon characters"
)
collection

We can add documents to our new collection using the `Collection.add_documents` method.

In [None]:
collection.add_documents([
    {"content": line} for line in lines
])

## Create an Index and Binding

We'll create a binding to embed each document, and an index to store the resulting embeddings. We're going to use the OpenAI embedding model `text-embedding-3-small` to embed our documents. See the [OpenAI API documentation](https://platform.openai.com/docs/guides/embeddings) for more information on the available embedding models.

In [None]:
# create an index
index_fields = {
    "embedding": {"type": "embedding", "extras": {"dims": 1536, "model": "text.embeddings.openai-3-small"}}
}
index = lexy.create_index(
    index_id="hotd_embeddings", 
    description="Text embeddings for House of the Dragon collection",
    index_fields=index_fields
)

To embed each document and store the result in our index, we'll create a `Binding` which connects our "**house_of_the_dragon**" collection to our "**hotd_embeddings**" index using a `Transformer`. The `LexyClient.transformers` property shows a list of available transformers.

In [None]:
# list of available transformers
lexy.transformers

For this example, we'll use `text.embeddings.openai-3-small`. Let's create our binding.

In [None]:
# create a binding
binding = lexy.create_binding(
    collection_id="house_of_the_dragon",
    index_id="hotd_embeddings",
    transformer_id="text.embeddings.openai-3-small"
)
binding

Our binding was created successfully and is now active (i.e., `binding.status = ON`). Any new documents added to our collection will automatically be embedded and added to our index. The diagram below shows the relationship between our collection, transformer, and index.

<div style="text-align: center;">

```mermaid
flowchart LR
    collection["Collection
      
    &quot;house_of_the_dragon&quot;"] 
    --> 
    transformer["Transformer 
    
    &quot;text.embeddings.openai-3-small&quot;"]
    -->
    index["Index
    
    &quot;hotd_embeddings&quot;"];   
```

</div>

## Retrieve documents

Now that our documents are embedded and those embeddings are stored in our index, we can use the `Index.query` method to retrieve the most relevant documents for a given query. Specifically, the method returns the `k` documents that are most similar to our query string, as measured by **cosine similarity**. 

Let's test this out by retrieving the 2 most relevant documents for the query "parents in Westeros".

In [None]:
index.query(query_text="parents in Westeros", k=2)

The documents returned by our query are profiles of Viserys Targaryen and Alicent Hightower, whose profiles specifically describe them as parents. Notice that none of the documents returned contain any of the exact words in the phrase "parents in Westeros". Yet the embedding model is able to identify these documents as being semantically similar to the text in our query, most likely because they contain the phrases "_...the father of..._" and "_...the mother of..._".

## Context for GPT-4

The documents we've retrieved may not be super useful on their own, but we can provide them as context to a language model in order to generate a more informative response. Let's construct a prompt for GPT-4 to answer questions about House of the Dragon.

### Construct a prompt

With RAG, we construct our prompt **dynamically** using our retrieved documents. Given a question, we'll first retrieve the documents that are most relevant, and then include them in our prompt as context. Below is a basic template for our prompt.

In [None]:
system_prompt = (
    "You are an exceptionally intelligent AI assistant. Answer the following "
    "questions using the context provided. PLEASE CITE YOUR SOURCES. Be concise."
)

question_template = """\
Question: 
{question}

Context:
{context}
"""

As an example, let's construct a prompt for the question "who is the dragon ridden by Daemon Targaryen?".

In [None]:
# retrieve most relevant documents
question_ex = "who is the dragon ridden by Daemon Targaryen?"
results_ex = index.query(query_text=question_ex)

# format results as context
context_ex = "\n".join([
    f'[doc_id: {er["document_id"]}] {er["document.content"]}' for er in results_ex
])

# construct prompt
prompt_ex = question_template.format(question=question_ex, context=context_ex)
print(prompt_ex)

### Chat completion

Now we can use this prompt to generate a response using GPT-4. We'll use the same OpenAI client we've been using in Lexy to interact with the OpenAI API.

In [None]:
# import OpenAI client
from lexy.transformers.openai import openai_client

In [None]:
# generate response
oai_response = openai_client.chat.completions.create(
    model="gpt-4",
    messages=[
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt_ex}
    ]
)
print(oai_response.choices[0].message.content)

We can see that GPT-4 has used the context we provided to answer the question, and has specifically cited the first two documents in our search results.

Let's put everything together into two functions: `construct_prompt` will construct a prompt given a user question, and `chat_completion` will prompt a completion from GPT-4.

In [None]:
def construct_prompt(question: str,  
                     result_template: str = "[doc_id: {r[document_id]}] {r[document.content]}",
                     **query_kwargs):
    # retrieve most relevant results
    results = index.query(query_text=question, **query_kwargs)
    # format results for context
    context = "\n".join([
        result_template.format(r=r) for r in results
    ])
    # format prompt
    return question_template.format(question=question, context=context)

def chat_completion(message: str,
                    system: str = system_prompt, 
                    **chat_kwargs):
    # generate response
    return openai_client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": system},
            {"role": "user", "content": message}
        ],
        **chat_kwargs,
    )

Now let's try asking GPT-4 some more questions.

In [None]:
q = "which one is the blue dragon?"
oai_response = chat_completion(message=construct_prompt(q))
print(oai_response.choices[0].message.content)

In [None]:
q = "who rides Vhagar?"
oai_response = chat_completion(message=construct_prompt(q))
print(oai_response.choices[0].message.content)

In [None]:
q = "who is the second son of King Viserys?"
oai_response = chat_completion(message=construct_prompt(q))
print(oai_response.choices[0].message.content)

In [None]:
q = "who is the heir to the throne?"
oai_response = chat_completion(message=construct_prompt(q))
print(oai_response.choices[0].message.content)

## Using metadata as context

We often want to use additional metadata in our prompts to provide even more context or useful instructions to our language model. Let's look at an example where we might want to include document metadata when constructing our prompts.

First, let's ask "which is the largest Targaryen dragon?". We get the correct answer, Balerion.   

In [None]:
q = "which is the largest Targaryen dragon?"
oai_response = chat_completion(message=construct_prompt(q))
print(oai_response.choices[0].message.content)

But what if we want to add new documents to our collection, and those documents contain new or contradictory information? In that case, we'll want to include additional metadata in our prompt which the language model can use when deriving an answer.

Let's add a new document to our collection which describes a new dragon that is larger than Balerion. Because the binding we created earlier has status set to ON, our new document will automatically be embedded and added to our index. This document will be a more recent document, as measured by the value of its `updated_at` field.

In [None]:
# add a new document
collection.add_documents([
    {"content": "Lexy was by far the largest of the Targaryen dragons, and was ridden by AGI the Conqueror."}
])

Now let's ask the same question as before, but this time we'll include the `updated_at` field in our prompt. We'll use the `return_fields` parameter to return the document's `updated_at` field along with our search results, and we'll update our `result_template` to include its value. Let's take a look at our new prompt.

In [None]:
new_result_template = \
    "[doc_id: {r[document_id]}, updated_at: {r[document.updated_at]}] {r[document.content]}"

new_prompt = construct_prompt(
    question="which is the largest Targaryen dragon?", 
    result_template=new_result_template, 
    return_fields=["document.content", "document.updated_at"]
)
print(new_prompt)

We can see that our prompt now includes the `updated_at` field for each document. Now let's update our system prompt to tell GPT-4 to use the latest document when faced with conflicting information.

In [None]:
new_system_prompt = (
    "You are an exceptionally intelligent AI assistant. Answer the following "
    "questions using the context provided. PLEASE CITE YOUR SOURCES. Be concise. "
    "If the documents provided contain conflicting information, use the most "
    "recent document as determined by the `updated_at` field."
)

Now let's ask GPT-4 again.

In [None]:
q = "which is the largest Targaryen dragon?"
oai_response = chat_completion(
    message=construct_prompt(
        question=q, 
        result_template=new_result_template, 
        return_fields=["document.content", "document.updated_at"]
    ),
    system=new_system_prompt
)
print(oai_response.choices[0].message.content)

## Real-world considerations

As mentioned earlier, this tutorial is intended to teach the basics of RAG and how it's implemented. Let's briefly review some of the simplifications we've made. Our future tutorials will cover these topics in more detail, and show how Lexy helps to address them when building real-world AI applications.

- **Dataset size**: Our [sample data](https://github.com/lexy-ai/lexy/blob/main/sample_data/documents/hotd.txt) is  small, both in the number of documents and the size of each document. In fact, our dataset is so small that we don't even need to perform retrieval; we could simply choose to include all of our documents in the prompt with each API call. But in real world applications, we might have thousands or millions of documents, in which case we'll need to dynamically retrieve the documents that are most relevant for a particular query.
- **Document chunking**: We've used the full text of each document as context for our language model. Documents used in  real-world applications will be much longer. We'll often want to break our documents up into smaller pieces (i.e., chunks), and use those pieces to construct more informative prompts for our language model.
- **Multimodal data**: Our documents only contain text data. In practice, they will include other types of data including images, audio, and video. We'll often want to embed and retrieve multimodal data, and to query for one modality using another (e.g., search for images using audio, or search for text using video).
- **File-based documents**: Our documents consist of "free form" text. In practice, our documents may be stored as  external files in a variety of file formats, including PDFs, Word documents, and images. We'll often want to catalogue, ingest, and process these file-based documents, and to use different parsing logic based on the file or the specific application.
- **Metadata and relationships**: We've only used the `updated_at` field as an example of metadata. In the real world, our document metadata will contain many more fields, including complex relationships with other documents and entities. For example, we may choose to chunk and embed a function docstring, which resides in a single file of Python code, which is part of a larger source code repository, which could be accessible to one or more organizations.
- **Retrieval methods**: We've used a simple cosine similarity search to retrieve documents. In a real-world application, we will want to use more advanced retrieval methods, such as BM25.
- **Custom transformations**: We've used the OpenAI API to transform our text documents into vector embeddings. We may want to use more advanced transformations, such as a custom (i.e., fine-tuned) embedding model, or a combination of multiple Transformer models, some of which might require running your own servers.
- **Topic relevance**: In practice, one of the most difficult aspects of this type of dynamic RAG application is knowing **_when_** to use it (i.e., which requests should trigger it) and **_how_** to use it (i.e., which template should be populated). This is especially true in cases where the language model already contains some information on the underlying topic (i.e., the information contained in our documents is part of the dataset used to train the language model). This is certainly the case with our example (GPT-4 already knows about House of the Dragon, and can answer our questions without the need to refer to our documents). We plan on discussing this topic as part of a future blog post.

## Next steps

In this tutorial we learned how to implement Retrieval Augmented Generation (RAG) using Lexy. Specifically, we've seen how to use Lexy to store and retrieve documents, and how to include those documents and their metadata as context for a language model like GPT-4.

While this is a simple example, the basic principles are powerful. As we'll see, they can be applied to build far more complex AI applications. In the coming examples we'll learn:

- How to parse and store custom metadata along with our documents.
- How to use Lexy to summarize documents, and then leverage those summaries to retrieve the most relevant documents.
- How to use document filters and custom Transformers to build flexible pipelines for our data.
- How to ingest and process file-based documents (including PDFs and images) for use in our AI applications.
