# Tutorial: Retrieval Augmented Generation (RAG) with Voyage AI Embeddings/Reranking and OpenAI Chat via the elastic open inference API

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/integrations/voyageai/updated-voyageai-elasticsearch-inference-api.ipynb)

This tutorial shows you how to compute embeddings with
Voyage AI using the elastic open inference API and store them for efficient vector or hybrid
search in Elasticsearch. It also demonstrates reranking with Voyage AI and RAG using OpenAI. This tutorial uses the Python Elasticsearch client
to perform the operations.

You'll learn how to:
* create inference endpoints to use the Voyage AI service for embeddings and reranking,
* create an inference endpoint to use the OpenAI service for chat,
* create index mappings to use semantic search
* rerank with retrievers using Voyage AI's rerank model
* implement a RAG system with OpenAI's Chat API.

This tutorial is adapted from examples using other providers and discusses embeddings, reranking, and RAG.

The tutorial uses the [josephrmartinez/recipe-dataset](https://github.com/josephrmartinez/recipe-dataset) data set.

Refer to the official Elasticsearch and provider documentation for more details.

## 🧰 Requirements

For this example, you will need:
- To know about [semantic-search](https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-search.html)
- An Elastic deployment:
   - We'll be using [Elastic serverless](https://www.elastic.co/docs/current/serverless) for this example (available with a [free trial](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook))
   
- A [Voyage AI account](https://www.voyageai.com/) with an API key.
- An [OpenAI account](https://openai.com/) with an API key.

- Python 3.7+.

- python elasticsearch client 8.15+

## Install and import required packages

Install Elasticsearch and OpenAI:

In [1]:
!pip install elasticsearch==8.15 openai



Import the required packages:

In [1]:
from elasticsearch import Elasticsearch, helpers
import csv
from io import StringIO
import requests
from getpass import getpass

## Create an Elasticsearch client

Now you can instantiate the Python Elasticsearch client.

First provide your API key and Serverless Endpoint.
Then create a `client` object that instantiates an instance of the `Elasticsearch` class.

In [2]:
ELASTIC_SERVERLESS_ENDPOINT = getpass("Elastic serverless endpoint: ")
ELASTIC_API_KEY = getpass("Elastic API key: ")

# Create the client instance
client = Elasticsearch(
    # For local development
    # hosts=["http://localhost:9200"],
    hosts=[ELASTIC_SERVERLESS_ENDPOINT],
    api_key=ELASTIC_API_KEY,
    request_timeout=120,
    max_retries=10,
    retry_on_timeout=True,
)

# Confirm the client has connected
print(client.info())

Elastic serverless endpoint:  ········
Elastic API key:  ········


{'name': 'runTask-0', 'cluster_name': 'runTask', 'cluster_uuid': 'b75VHj-LRXWGjpXDqmNITg', 'version': {'number': '9.1.0-SNAPSHOT', 'build_flavor': 'default', 'build_type': 'tar', 'build_hash': 'b1d5d7aa393d6eb079f6013922fb911efed59469', 'build_date': '2025-02-21T17:33:07.095237Z', 'build_snapshot': True, 'lucene_version': '10.1.0', 'minimum_wire_compatibility_version': '8.19.0', 'minimum_index_compatibility_version': '8.0.0'}, 'tagline': 'You Know, for Search'}


## Create the text embedding inference endpoint (Voyage AI)

Create the inference endpoint first. In this example, the inference endpoint 
uses Voyage AI's `voyage-2` model.

In [3]:
VOYAGE_API_KEY = getpass("Voyage AI API key: ")

Voyage AI API key:  ········


In [4]:
client.inference.put(
    task_type="text_embedding",
    inference_id="voyageai_embeddings",
    body={
        "service": "voyageai",
        "service_settings": {
            "api_key": VOYAGE_API_KEY,
            "model_id": "voyage-3-large",
            "dimensions": 2048
        },
    },
)

ObjectApiResponse({'inference_id': 'voyageai_embeddings', 'task_type': 'text_embedding', 'service': 'voyageai', 'service_settings': {'model_id': 'voyage-3-large', 'rate_limit': {'requests_per_minute': 2000}}, 'chunking_settings': {'strategy': 'sentence', 'max_chunk_size': 250, 'sentence_overlap': 1}})

In [5]:
OPENAI_API_KEY = getpass("OpenAI API key: ")

OpenAI API key:  ········


You can find your API keys in your Voyage AI and OpenAI dashboards.

## Create the index mapping

Create the index mapping for the index that will contain the embeddings.

In [6]:
client.indices.create(
    index="recipes-index-voyageai",
    mappings={
        "properties": {
            "infer_field": {
                "type": "semantic_text",
                "inference_id": "voyageai_embeddings",
            },
            "Title": {"type": "text", "copy_to": "infer_field"},
        }
    },
)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'recipes-index-voyageai'})

## Prepare data and ingest documents
The tutorial uses the [josephrmartinez/recipe-dataset](https://github.com/josephrmartinez/recipe-dataset) data set.

In [7]:
url = "https://raw.githubusercontent.com/josephrmartinez/recipe-dataset/main/13k-recipes.csv"

# Fetch the CSV data from the URL
response = requests.get(url)
response.raise_for_status()  # Ensure we notice bad responses

file = StringIO(
    "ID" + response.text
)  # cast the csv String to a file (the ID field name is missing for this dataset)
reader = csv.DictReader(file)  # load the data as a dict


# Prepare the documents to be indexed
documents = []
for i, line in enumerate(reader):
    if i % 1000 == 999:
        print(i, "of", 13000)
        print(helpers.bulk(client, documents))  # Use the bulk endpoint to index
        print("last doc:", documents[-1])
        documents = []
    documents.append(
        {
            "_index": "recipes-index-voyageai",
            "_source": line,
        }
    )


print("Data ingestion completed, text embeddings generated!")

999 of 13000
(999, [])
last doc: {'_index': 'recipes-index', '_source': {'ID': '998', 'Title': "Mr. Tingles' Punch", 'Ingredients': "['1 (750 ml) bottle light rum', '2 tablespoons Sichuan peppercorns', '25 ounces pomegranate juice', '8 1/2 ounces fresh lemon juice', '8 1/2 ounces 1:1 simple syrup (see note)', '4 ounces water', 'GARNISH: ice block, about 20 lemon wheels, 1/4 cup pomegranate seeds, and 1 tablespoon each black and pink peppercorns (optional)']", 'Instructions': 'At least 24 hours before you plan to serve the punch, fill a Tupperware or cake pan with water and freeze to make an ice block that will fit in your serving vessel, or make several trays of large ice cubes.\nMeanwhile, make the infused rum: Carefully spoon Sichuan peppercorns directly into the bottle of rum, using a funnel if desired. Reseal the bottle and let sit at room temperature for 24 hours, jostling occasionally to move the peppercorns around. Strain the infused rum through a fine-mesh strainer and discard 

## Add the rerank inference endpoint (Voyage AI)

To combine the results more effectively, use 
Voyage AI's rerank model through the
inference API to provide a more precise semantic reranking of the results.

Create an inference endpoint with your Voyage AI API key and the used model name as
the `model_id` (e.g., `rerank-2`).

In [8]:
client.inference.put(
    task_type="rerank",
    inference_id="voyageai_rerank",
    body={
        "service": "voyageai",
        "service_settings": {
            "api_key": VOYAGE_API_KEY,
            "model_id": "rerank-2",
        },
        "task_settings": {"top_k": 100, "return_documents": True},
    },
)

ObjectApiResponse({'inference_id': 'voyageai_rerank', 'task_type': 'rerank', 'service': 'voyageai', 'service_settings': {'model_id': 'rerank-2', 'rate_limit': {'requests_per_minute': 2000}}, 'task_settings': {'top_k': 100, 'return_documents': True}})

## Semantic search with reranking

Let's start by defining our query

In [9]:
def semantic_search_with_reranking(query):
    return client.search(
        index="recipes-index-voyageai",
        retriever={
            "text_similarity_reranker": {
                "retriever": {
                    "standard": {
                        "query": {"semantic": {"field": "infer_field", "query": query}}
                    }
                },
                "field": "Title",
                "inference_id": "voyageai_rerank",
                "inference_text": query,
                "rank_window_size": 100,
            }
        },
    )

In [10]:
noodles = semantic_search_with_reranking("best easy spicy noodles")

## Extract the Data

In [11]:
def pretty_print_recipe(recipes):
    hits = recipes["hits"]["hits"]
    if len(hits) > 0:
        for hit in hits:
            source = hit["_source"]
            print(
                f"Title: {source['Title']};\t Ingredients: {source['Ingredients']} ;\t Instructions: {source['Instructions']}\n"
            )
    else:
        print("No recipes found.")

In [12]:
pretty_print_recipe(noodles)

Title: Liu Shaokun's Spicy Buckwheat Noodles with Chicken;	 Ingredients: ['3 cups chicken broth or water (24 fl oz)', '1 lb skinless boneless chicken breast halves (2)', '1/2 lb dried buckwheat noodles such as soba noodles', '1 tablespoon peanut oil', '3 tablespoons Chinese black vinegar', '1 tablespoon light soy sauce', '1 tablespoon dark soy sauce', '1 tablespoon chile oil containing sesame oil (such as Chiu Chow Chili Oil from Lee Kum Kee) plus some of sediment from jar', '2 garlic cloves, minced', '1/2 teaspoon sugar', '1/8 teaspoon salt', '3 scallions (green parts only), thinly sliced', '2 tablespoons soy nuts (roasted salted soybeans)'] ;	 Instructions: Bring broth to a simmer in a 3-quart saucepan, then add chicken and simmer, uncovered, 6 minutes. Remove pan from heat and cover, then let stand until chicken is cooked through, about 15 minutes. Transfer chicken to a plate and cool at least 10 minutes, reserving broth for another use.
While chicken is poaching, bring 4 quarts sal

## Retrieval Augmented Generation (RAG) with OpenAI Chat API

Now, let's use the OpenAI Chat API to generate a response based on the retrieved documents.

You can find your API keys in your OpenAI dashboard.

Now, perform the RAG query:

In [13]:
from openai import OpenAI

openai_client = OpenAI(api_key=OPENAI_API_KEY)

query = "How do I make spicy noodles?"
search_results = semantic_search_with_reranking(query)

summary = openai_client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "you are a personal assistant helping to explain the best recipes for users based on their query and the supplied recipes (seperated by \n\n). You must pick one recipe which best fits the query, and print out the title, ingredients, and instructions in an easy to understand way. If none of the recipes fit the query, supply a query the user might be able to use to find a recipe. Feel free to provide comments about the recipe that are related to the query, including alternative query."},
        {
            "role": "user",
            "content": "Answer the following question:"
            + query
            + "by using the following text:"
            + str(search_results['hits']['hits'][0]["_source"]),
        },
    ],
)

print(summary.choices[0].message.content)

The recipe that best fits your query for making spicy noodles is "Liu Shaokun's Spicy Buckwheat Noodles with Chicken." Here's how you can make it:

**Title**: Liu Shaokun's Spicy Buckwheat Noodles with Chicken

**Ingredients**:
- 3 cups chicken broth or water (24 fl oz)
- 1 lb skinless boneless chicken breast halves (2)
- 1/2 lb dried buckwheat noodles such as soba noodles
- 1 tablespoon peanut oil
- 3 tablespoons Chinese black vinegar
- 1 tablespoon light soy sauce
- 1 tablespoon dark soy sauce
- 1 tablespoon chile oil containing sesame oil (such as Chiu Chow Chili Oil from Lee Kum Kee), plus some of the sediment from the jar
- 2 garlic cloves, minced
- 1/2 teaspoon sugar
- 1/8 teaspoon salt
- 3 scallions (green parts only), thinly sliced
- 2 tablespoons soy nuts (roasted salted soybeans)

**Instructions**:
1. Bring the chicken broth to a simmer in a 3-quart saucepan. Add chicken and simmer, uncovered, for 6 minutes. Remove the pan from heat and cover, letting it stand until the chick

## Clean up resources

Delete the inference endpoints and the index.

In [14]:
client.indices.delete(index="recipes-index-voyageai")
client.inference.delete(inference_id="voyageai_embeddings")
client.inference.delete(inference_id="voyageai_rerank")


ObjectApiResponse({'acknowledged': True, 'pipelines': [], 'indexes': []})