# LLM RAG for Beginners: A Practical Guide with Elasticsearch and OpenSearch

Have you ever tried searching for a movie quote with just a vague phrase or feeling? We've all been there, and sometimes, finding exactly what we're looking for can be tough. Retrieval-Augmented Generation (RAG) offers a more intuitive approach, allowing us to search with the fluidity of human-ish memory.

RAG blends the power of Large Language Models (LLMs) with the precision of information retrieval systems like Elasticsearch and OpenSearch. It moves beyond simple keyword matching, using LLMs to understand the nuances of our search intent and deliver relevant results along with their context.

For example, imagine trying to recall that iconic line from _The Fifth Element_ where Zorg says, "Time not important, only life important." Even if you only remember the phrase "life important," RAG can pinpoint the exact quote and provide context.

## Pre-requisites

Before we dive into building our RAG pipeline, let's get our tech-stack in order. We'll be using the following:

- **Bonsai.io Sandbox:** Bonsai.io provides fully managed OpenSearch clusters, making it incredibly easy to get started without any complex installation or configuration. We'll leverage a free Bonsai Sandbox for this tutorial. You can sign up for an account and launch a cluster at [bonsai.io](https://bonsai.io).
  
> Important:
> 
> Once your Bonsai sandbox cluster is created, you'll see your credentials in the cluster overview page:
> 
> <img src="./assets/bonsai-cluster-details.png" width=800 style="margin-left: auto; margin-right: auto;">

- **Cornell Movie-Dialogs Corpus:** This rich dataset contains conversations extracted from movie scripts. We'll use this corpus to populate our OpenSearch indexes. 

> Important:
> 
> The Cornell Movie-Dialogs Corpus is part of Cornell's ConvoKit project, a toolkit for analyzing conversations. You can find the dataset and learn more about ConvoKit at [https://github.com/CornellNLP/ConvoKit](https://github.com/CornellNLP/ConvoKit). 
> 
> Download the `movie-corpus.zip` file from [this link](https://zissou.infosci.cornell.edu/convokit/datasets/movie-corpus/movie-corpus.zip) and extract it to a location that can be referenced by our code later on.

- **OpenAI Text API**: OpenAI's 4o-mini model is perfect for our small, focused, prompts, and is quite affordable!

> Important:
> For this tutorial, you'll need an OpenAI API Key to use. The OpenAI API costs money to use, and so the steps below may incur charges against your account. 
>
> See [OpenAI's documentation](https://help.openai.com/en/articles/7039783-how-can-i-access-the-chatgpt-api) for details on how to create an OpenAI API Key and associated pricing.


## Understanding Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) enhances LLMs by connecting them to external knowledge sources. Think of it as giving your LLM a library card to access a vast collection of information, allowing it to generate responses grounded in factual data. 

But, while you might remember details about all of the books you've read at the library, the LLM has a limited ability to keep information in its "working memory" (or, *context*) - so we need to help it by filtering out the external knowledge to what is most relevant to the task at-hand.

To that end, RAG involves two key steps:

1. **Retrieval:** Finding the most relevant information from your knowledge base. 
2. **Generation:** Feeding in the most relevant information to the LLM, in order for it to generate a comprehensive response with the added context. 

Essentially, RAG combines the power of information retrieval with the generative capabilities of LLMs, resulting in a search experience that is both accurate and insightful.


## Setting Up Your OpenSearch Environment

Now that you have a Bonsai Sandbox cluster up and running, let's get our movie data indexed in OpenSearch. We'll be using the Cornell Movie-Dialogs Corpus, which we downloaded in our pre-requisites. But first, let's visualize how we'll organize this data.

### Understanding the Data Structure

Since this particular dataset is a bit denormalized, we'll create and use two indexes:

- **`speakers`**: Details about the speaking characters in each movie.
- **`utterances`**: A detailed index of all the conversations within the movies, line by line, with speaker and movie identified.

### Creating and Indexing the Movie Data

We'll use the OpenSearch JavaScript client and the Deno TypeScript kernel for Jupyter notebooks to interact with our OpenSearch cluster. You'll need the credentials from your Bonsai Sandbox to connect to the cluster. 

Now, let's create the index mappings for our `speakers` and `utterances` indexes:

> Tip:
>
> Remember to set the `BONSAI_CLUSTER_URL` environment variable to safely access your Bonsai Cluster's credentials within your code!

In [None]:
import { Client } from "npm:@opensearch-project/opensearch";

In [None]:
const client = new Client({
  node: process.env['BONSAI_CLUSTER_URL'],
});

In [None]:
const speakersIndexName = "speakers";

const speakersIndexBody = {
  settings: {
    number_of_shards: 1,
    number_of_replicas: 0,
  },
  mappings: {
    properties: {
      speakerId: { type: "keyword" },
      movieId: { type: "keyword" },
      gender: { type: "keyword" },
      script: { type: "text" },
      movieName: { type: "text" },
    },
  },
};

await client.indices.create({ index: speakersIndexName, body: speakersIndexBody });

In [None]:
const utterancesIndexName = "utterances";

const utterancesIndexBody = {
  settings: {
    number_of_shards: 1,
    number_of_replicas: 0,
  },
  mappings: {
    properties: {
        id: { type: "keyword" },
        conversationId: { type: "keyword" },
        text: { type: "text" },
        speaker: { type: "text" },
        movieId: { type: "keyword" },
        replyTo: { type: "keyword" },
    },
  },
};

await client.indices.create({ index: utterancesIndexName, body: utterancesIndexBody });

Next, we'll use the bulk API to index the movie data.

In [None]:
async function indexSpeakers(filePath: string) {
  const data = JSON.parse(await Deno.readTextFile(filePath));
  const actions = [];

for (const [speakerId, speaker] of Object.entries(data)) {
    actions.push({ create: {} });
    actions.push({
	  speakerId: speakerId,
        name: speaker.meta.character_name,
        movieId: speaker.meta.movie_idx,
        movieName: speaker.meta.movie_name,
        gender: speaker.meta.gender,
    });
  }

  await client.bulk({ index: "speakers", body: actions });
}

await indexSpeakers("./movie-data/speakers.json");

In [None]:
import JSONL from "npm:jsonl-parse-stringify";

class Utterance {
    id: string;
    conversationId: string;
    text: string;
    speaker: string;
    meta: { 
        movie_id: string;
        parsed: Array<{ [key: string]: any }>;
    };
    reply_to: string;
    timestamp?: string;
    vectors: Array<any>;
}

async function indexUtterances(filePath: string, limit: number, movieIdFilter?: RegExp) {
    const data = JSONL.default.parse(await Deno.readTextFile(filePath));
    const actions = [];

    for (let i = 0; i < limit; i++) {
        const utterance: Utterance = data[i];

        if (movieIdFilter && !utterance.meta.movie_id.match(movieIdFilter)) {
            continue;
        }
      
        actions.push({ create: {} });
        actions.push({
            id: utterance.id,
            conversationId: utterance.conversationId,
            text: utterance.text,
            speaker: utterance.speaker,
            movieId: utterance.meta.movie_id,
            replyTo: utterance["reply-to"],
            timestamp: utterance.timestamp,
        });
    }

    await client.bulk({ index: "utterances", body: actions });
}

await indexUtterances("./movie-data/utterances.jsonl", 31500, new RegExp(/^m\d$/));

> Note:
> 
> For the purposes of this demonstration, we're only indexing a handful of movies' utterance data, filtered by a regular expression on their corpus ID.

Once the data is indexed, we can perform basic OpenSearch queries to test our setup. For example, we can search for movies by title, actors, or keywords within the scripts.

In [None]:
const query = {
  query: {
    match: {
      text: "life important" 
    }
  }
};

const response = await client.search({
  index: "utterances",
  body: query,
});

console.log(response.body.hits);

## Building the RAG Pipeline

With our OpenSearch environment set up and movie data indexed, we're ready to assemble the pieces of our RAG pipeline. This involves three main steps:

1. Query Parsing with an LLM
2. Retrieving Relevant Documents
3. Generating the Response

### Step 1: Query Parsing with an LLM

The first step is to understand what the user is asking. We'll use an LLM to analyze their natural language query and extract two key pieces of information:

- **Query Category:** What type of information are they looking for? For this scenario, we're expecting a "quote recall," request, but we could expand to other categories like "significant event identification" or "plot explanation" in the future.
- **Query Content:** This is the heart of the query - the specific words or phrases the user remembers.
- **Movie**: If available, the LLM should extract the likely movie title intended.
- **Quote**: We're short-cutting a step here, since we're targeting the "quote recall" functionality, and asking the LLM to include the quote if available!

To do this, we'll guide the LLM through carefully crafted prompts. Here's the one we'll be working with. 


> You are a helpful AI assistant that can analyze search queries related to movies.  
> 
> Here's a user query: {user_query}
> 
> Based on this query, identify the following:
> 
> - **Category:** Choose from the following categories: "quote recall", "significant event", "plot explanation", "character information". If none of these fit, choose "unknown".
> - **Content:** Extract the specific phrase or words related to the identified category.
> - **Movie:** If the query contains a probable movie title, extract it into this field.
> - **Quote:** If the query contains part of a quote, line, or utterance or describes one, extract it into this field to be used in a keyword match. Don't infer what the intended quote is, but rather extract the user's intended quote query from the overall query.
>
> Provide your answer in JSON format:
> 
> {
>    "category": "...",
>    "content": "...",
>    "movie": "...",
>    "quote": "..."
> }
> 

> Tip:
>
> Try tweaking your RAG prompts for performance! Getting these right often requires iteration and continuous testing, as LLM models tend to evolve over time!

Let's see this in action with some TypeScript code:

In [None]:
import OpenAI from "https://deno.land/x/openai@v4.69.0/mod.ts";

const openai = new OpenAI({
  apiKey: process.env['OPENAI_API_KEY'], // This is the default and can be omitted
});

In [None]:
const userQuery = "what's that line from The Fifth Element about life being important?";

In [None]:
const prompt = `
You are a helpful AI assistant that can analyze search queries related to movies.

Here's a user query: ${userQuery}

Based on this query, identify the following:

- **Category:** Choose from the following categories: "quote recall", "significant event", "plot explanation", "character information". If none of these fit, choose "unknown".
- **Content:** Extract the specific phrase or words related to the identified category. It's important *not* to try to answer the question directly, yourself. Try to only extract the intended query.
- **Movie:** If the query contains a probable movie title, extract it into this field.
- **Quote:** If the query contains part of a quote, line, or utterance or describes one, extract it into this field to be used in a keyword match. Don't infer what the intended quote is, but rather extract the user's intended quote query from the overall query.

Provide your answer in JSON format. An example response is below.

{
  "category": "...",
  "content": "...",
  "movie": "...",
  "quote": "..."
}

`;

In [None]:
const response = await openai.chat.completions.create({
    messages: [{ role: 'user', content: prompt }],
    model: 'gpt-4o-mini',
});

In [None]:
const parsedQuery = JSON.parse(response.choices[0].message.content);

In [None]:
console.log(parsedQuery); 

By accurately parsing the query, we set the stage for finding the most relevant information in our OpenSearch index.

### Step 2: Retrieving Relevant Documents

Now that we understand the user's request, let's find the relevant documents in our OpenSearch index. Since we're dealing with movie *quotes*, we'll focus on the `utterances` index, but we'll also extract the most likely movie title from our `speakers` index, since it contains the `movieName` and `movieId` in its dataset.

#### Discovering our intended movie

If the LLM was able to extract an intended movie, let's use that to find the ID of the most probable movie in our corpus, if it was provided by the user. This will help improve our likely accuracy in quote-finding!

OpenSearch's keyword search, powered by the BM25 algorithm, will help us pinpoint the best match:

In [None]:
let possibleMovieResults: null | string = null;

if (parsedQuery.movie && parsedQuery.movie.length > 0) {
  possibleMovieResults = await client.search({
    "_source": false,
    index: "speakers",
    size: 1,
    body: {
      fields: ["movieName", "movieId"],
      query: {
        match: {
          movieName: {
            query: parsedQuery.movie
          }
        }
      }
    }
  });
}

Next, we'll try and find the target quote. If we were able to find a `possibleMovie` match, we'll boost any quote results that come from that movie with an OpenSearch [boolean query](https://opensearch.org/docs/latest/query-dsl/compound/bool/), otherwise we'll revert to doing a keyword match query:

In [None]:
// If we have a possible result, let's store it here!
let possibleMovieId: null | string = null;

if (possibleMovieResults?.body?.hits &&
  possibleMovieResults.body.hits.hits.length > 0 &&
  possibleMovieResults.body.hits.hits[0].fields.movieId &&
  possibleMovieResults.body.hits.hits[0].fields.movieId.length > 0) {
    
    possibleMovieId = possibleMovieResults.body.hits.hits[0].fields.movieId[0]
}

In [None]:
// We'll reference the possibleMovieId if available
let quoteQuery!: string = null;

if (possibleMovieId) {
    quoteQuery = {
    query: {
      bool: {
        must: {
          match: {
            text: {
              query: parsedQuery.quote
            }
          }
        },
        should: {
          match: {
            movieId: {
              query: possibleMovieId
            }
          }
        },
      }
    }
  }
} else {
    quoteQuery = {
    query: {
      match: {
        text: { query: parsedQuery.quote }
      }
    }
  }
}

console.log(`Our quote query: ${JSON.stringify(quoteQuery)}`);

In [None]:
let quoteResults = await client.search({
  index: "utterances",
  body: quoteQuery,
});

And, investigating our results, we should see our expected quote float up to the top of our result set!

In [None]:
console.log(quoteResults.body.hits);

> Info:
> 
> We can fine-tune the utterance results search by adjusting the number of results (adjusting the `size` parameter), adding additional filters (like character), or combining multiple fields for a more refined search.

### Step 3: Generating the Response

This is where it all comes together. The goal is to use the retrieved documents to craft a comprehensive response for the user.

This involves providing the LLM with clear instructions and the relevant context. This can be achieved through a prompt like this:

> You are a helpful AI assistant that can provide information about movies.
>
> A user is looking for a movie quote that contains the following phrase: "...". Their original query was for the content: "...".
>
> Here is the most relevant utterance: "..."
> That utterance is from the movie: "..."
>
> Based on this utterance, provide the following information to the user:
>
> - **Movie Quote:** The exact quote containing the provided phrase.
> - **Movie Title:** The title of the movie where the quote appears.
>
> Format your response in a clear, concise, cohesive natural language response, and try to reference their original query to add a justification. That is, don't just use the format of bullet-points noted above. If your response includes their quote, try to use correct grammar around the inclusion.
>
> Absolutely DO NOT include any prefixed or suffixed niceties like, "Certainly, based on your original query...". Your words will be displayed directly to the user, so think of yourself as an iframe.
>
> The only additional context that could be helpful in this scenario would already be included in this prompt.

> Important:
> 
> Although we're stopping at our first result for this demonstration, by looping in surrounding utterances, we could provide the user with additional details like what was happening when their quote was uttered in the movie!

Here's how to send this prompt to the LLM using the OpenAI API:

In [None]:
// Select our top quote result, for demonstration:
let bestQuoteResult = quoteResults.body.hits.hits[0];
console.log(bestQuoteResult);

In [None]:
// If the best result is from our target movie, we're good to go!
// otherwise, we'll need to grab the movie from the our speakers index.
//
// Note: data denormalization or using more complex queries would allow us to skip this step!
let bestQuoteMovie!: string;

if (possibleMovieId === bestQuoteResult._source.movieId) {
  bestQuoteMovie = possibleMovieResults.body.hits.hits[0].fields.movieName[0];
} else {
  const movieResults = await client.search({
    "_source": false,
    index: "speakers",
    size: 1,
    body: {
      fields: ["movieName", "movieId"],
      query: {
        match: {
          movieId: {
            query: bestQuoteResult._source.movieId
          }
        }
      }
    }
  });
  bestQuoteMovie = movieResults.body.hits.hits[0].fields.movieName[0];
}

In [None]:
const prompt = `
You are a helpful AI assistant that can provide information about movies.

A user is looking for a movie quote that contains the following phrase: "${parsedQuery.quote}". Their original query was for the content: "${parsedQuery.content}".

Here is the most relevant utterance: ${JSON.stringify(bestQuoteResult._source.text)}
That utterance is from the movie: "${bestQuoteMovie}"

Based on this utterance, provide the following information to the user:

- **Movie Quote:** The exact quote containing the provided phrase.
- **Movie Title:** The title of the movie where the quote appears.

Format your response in a clear, concise, cohesive natural language response, and try to reference their original query to add a justification. That is, don't just use the format of bullet-points noted above. If your response includes their quote, try to use correct grammar around the inclusion.

Absolutely DO NOT include any prefixed or suffixed niceties like, "Certainly, based on your original query...". Your words will be displayed directly to the user, so think of yourself as an iframe.

The only additional context that could be helpful in this scenario would already be included in this prompt.
`;
console.log(prompt);

In [None]:
const response = await openai.chat.completions.create({
    messages: [{ role: 'user', content: prompt }],
    model: 'gpt-4o-mini',
});

In [None]:
const generatedResponse = response.choices[0].message.content;
console.log(generatedResponse);