# Tutorial: Retrieval Augmented Generation (RAG) with Cohere via the elastic open inference API

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/integrations/cohere/updated-cohere-elasticsearch-inference-api.ipynb)

This tutorial shows you how to compute embeddings with
Cohere using the elastic open inference API and store them for efficient vector or hybrid
search in Elasticsearch. This tutorial uses the Python Elasticsearch client
to perform the operations.

You'll learn how to:
* create inference endpoints to use the Cohere service,
* create index mappings to use semantic search
* rerank with retrievers using Cohere's rerank model
* implement a RAG system with Cohere's Chat API.

This tutorial is based on [a blog post from April 2024](https://www.elastic.co/search-labs/blog/elasticsearch-cohere-rerank) discussing rerank, and RAG

The tutorial uses the [josephrmartinez/recipe-dataset](https://github.com/josephrmartinez/recipe-dataset) data set.

Refer to [Cohere's tutorial](https://docs.cohere.com/docs/elasticsearch-and-cohere) for an example using a different data set.

## 🧰 Requirements

For this example, you will need:
- To know about [semantic-search](https://www.elastic.co/guide/en/elasticsearch/reference/current/semantic-search.html)
- An Elastic deployment:
   - We'll be using [Elastic serverless](https://www.elastic.co/docs/current/serverless) for this example (available with a [free trial](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook))
   
- A paid [Cohere account](https://cohere.com/) is required to use the Open Inference API with 
the Cohere service as the Cohere free trial API usage is limited.

- Python 3.7+.

- python elasticsearch client 8.15+

## Install and import required packages

Install Elasticsearch and Cohere:

In [None]:
!pip install elasticsearch==8.15

Import the required packages:

In [1]:
from elasticsearch import Elasticsearch, helpers
import csv
from io import StringIO
import requests
from getpass import getpass

## Create an Elasticsearch client

Now you can instantiate the Python Elasticsearch client.

First provide your API key and Serverless Endpoint.
Then create a `client` object that instantiates an instance of the `Elasticsearch` class.

In [2]:
ELASTIC_SERVERLESS_ENDPOINT = getpass("Elastic serverless endpoint: ")
ELASTIC_API_KEY = getpass("Elastic API key: ")

# Create the client instance
client = Elasticsearch(
    # For local development
    # hosts=["http://localhost:9200"]
    hosts=[ELASTIC_SERVERLESS_ENDPOINT],
    api_key=ELASTIC_API_KEY,
    request_timeout=120,
    max_retries=10,
    retry_on_timeout=True,
)

# Confirm the client has connected
print(client.info())

{'name': 'serverless', 'cluster_name': 'e13f203a74d34057a6079bb6ad73afb9', 'cluster_uuid': 't5Vrz6xHRgeDfV0f5t_vqQ', 'version': {'number': '8.11.0', 'build_flavor': 'serverless', 'build_type': 'docker', 'build_hash': '00000000', 'build_date': '2023-10-31', 'build_snapshot': False, 'lucene_version': '9.7.0', 'minimum_wire_compatibility_version': '8.11.0', 'minimum_index_compatibility_version': '8.11.0'}, 'tagline': 'You Know, for Search'}


## Create the text embedding inference endpoints

Create the inference endpoint first. In this example, the inference endpoint 
uses Cohere's `embed-english-v3.0` model and the `embedding_type` is set to
`byte`.

In [22]:
COHERE_API_KEY = getpass("Cohere API key: ")

In [24]:
client.inference.put(
    task_type="text_embedding",
    inference_id="cohere_embeddings",
    body={
        "service": "cohere",
        "service_settings": {
            "api_key": COHERE_API_KEY,
            "model_id": "embed-english-v3.0",
            "embedding_type": "byte",
        },
    },
)

ObjectApiResponse({'model_id': 'cohere_embeddings', 'inference_id': 'cohere_embeddings', 'task_type': 'text_embedding', 'service': 'cohere', 'service_settings': {'similarity': 'dot_product', 'dimensions': 1024, 'model_id': 'embed-english-v3.0', 'rate_limit': {'requests_per_minute': 10000}, 'embedding_type': 'byte'}, 'task_settings': {}})

You can find your API keys in your Cohere dashboard under the
[API keys section](https://dashboard.cohere.com/api-keys).

## Create the index mapping

Create the index mapping for the index that will contain the embeddings.

In [5]:
client.indices.create(
    index="recipes-index",
    mappings={
        "properties": {
            "infer_field": {
                "type": "semantic_text",
                "inference_id": "cohere_embeddings",
            },
            "Title": {"type": "text", "copy_to": "infer_field"},
        }
    },
)

ObjectApiResponse({'acknowledged': True, 'shards_acknowledged': True, 'index': 'recipes-index'})

## Prepare data and ingest documents
The tutorial uses the [josephrmartinez/recipe-dataset](https://github.com/josephrmartinez/recipe-dataset) data set.

In [6]:
url = "https://raw.githubusercontent.com/josephrmartinez/recipe-dataset/main/13k-recipes.csv"

# Fetch the CSV data from the URL
response = requests.get(url)
response.raise_for_status()  # Ensure we notice bad responses

file = StringIO(
    "ID" + response.text
)  # cast the csv String to a file (the ID field name is missing for this dataset)
reader = csv.DictReader(file)  # load the data as a dict


# Prepare the documents to be indexed
documents = []
for i, line in enumerate(reader):
    if i % 1000 == 999:
        print(i, "of", 13000)
        print(helpers.bulk(client, documents))  # Use the bulk endpoint to index
        print("last doc:", documents[-1])
        documents = []
    documents.append(
        {
            "_index": "recipes-index",
            "_source": line,
        }
    )


print("Data ingestion completed, text embeddings generated!")

999 of 13000
(999, [])
last doc: {'_index': 'recipes-index', '_source': {'ID': '998', 'Title': "Mr. Tingles' Punch", 'Ingredients': "['1 (750 ml) bottle light rum', '2 tablespoons Sichuan peppercorns', '25 ounces pomegranate juice', '8 1/2 ounces fresh lemon juice', '8 1/2 ounces 1:1 simple syrup (see note)', '4 ounces water', 'GARNISH: ice block, about 20 lemon wheels, 1/4 cup pomegranate seeds, and 1 tablespoon each black and pink peppercorns (optional)']", 'Instructions': 'At least 24 hours before you plan to serve the punch, fill a Tupperware or cake pan with water and freeze to make an ice block that will fit in your serving vessel, or make several trays of large ice cubes.\nMeanwhile, make the infused rum: Carefully spoon Sichuan peppercorns directly into the bottle of rum, using a funnel if desired. Reseal the bottle and let sit at room temperature for 24 hours, jostling occasionally to move the peppercorns around. Strain the infused rum through a fine-mesh strainer and discard 

## Add the rerank inference endpoint

To combine the results more effectively, use 
[Cohere's Rerank v3](https://docs.cohere.com/docs/rerank-2) model through the
inference API to provide a more precise semantic reranking of the results.

Create an inference endpoint with your Cohere API key and the used model name as
the `model_id` (`rerank-english-v3.0` in this example).

In [25]:
client.inference.put(
    task_type="rerank",
    inference_id="cohere_rerank",
    body={
        "service": "cohere",
        "service_settings": {
            "api_key": COHERE_API_KEY,
            "model_id": "rerank-english-v3.0",
        },
        "task_settings": {"top_n": 100, "return_documents": True},
    },
)

ObjectApiResponse({'model_id': 'cohere_rerank', 'inference_id': 'cohere_rerank', 'task_type': 'rerank', 'service': 'cohere', 'service_settings': {'model_id': 'rerank-english-v3.0', 'rate_limit': {'requests_per_minute': 10000}}, 'task_settings': {'top_n': 100, 'return_documents': True}})

## Semantic search with reranking

Let's start by defining our query

In [8]:
def semantic_search_with_reranking(query):
    return client.search(
        index="recipes-index",
        retriever={
            "text_similarity_reranker": {
                "retriever": {
                    "standard": {
                        "query": {"semantic": {"field": "infer_field", "query": query}}
                    }
                },
                "field": "Title",
                "inference_id": "cohere_rerank",
                "inference_text": query,
                "rank_window_size": 100,
            }
        },
    )

In [9]:
noodles = semantic_search_with_reranking("best easy spicy noodles")
print(noodles)

{'took': 594, 'timed_out': False, '_shards': {'total': 5, 'successful': 5, 'skipped': 0, 'failed': 0}, 'hits': {'total': {'value': 75, 'relation': 'eq'}, 'max_score': 0.5445592, 'hits': [{'_index': 'recipes-index', '_id': 'uMGue5EBXCyFCBTsEnIC', '_score': 0.5445592, '_rank': 1, '_ignored': ['Cleaned_Ingredients.keyword', 'Ingredients.keyword', 'Instructions.keyword'], '_source': {'infer_field': {'inference': {'inference_id': 'cohere_embeddings', 'model_settings': {'task_type': 'text_embedding', 'dimensions': 1024, 'similarity': 'dot_product', 'element_type': 'byte'}, 'chunks': [{'text': 'Spicy Sesame Noodles with Chopped Peanuts and Thai Basil', 'embeddings': [29, -29, -30, -59, -35, -20, -19, 3, -45, 12, -1, -1, -34, -43, 9, -58, 29, -37, -19, -13, 3, -15, 3, 3, 31, -48, -71, 5, -37, -37, -2, 59, 51, 81, -105, 67, -64, 47, -5, 29, 61, -35, -32, 28, -43, -25, 47, -6, 17, -56, 5, 13, 10, -86, -44, -38, 62, 62, 74, -91, -128, -31, 68, -14, 19, -4, -19, 10, 7, 29, 54, -34, 21, -8, -2, -11

## Extract the Data

In [10]:
def pretty_print_recipe(recipes):
    hits = recipes["hits"]["hits"]
    if len(hits) > 0:
        for hit in hits:
            source = hit["_source"]
            print(
                "Title:",
                source["Title"] + ";\t Ingredients:",
                source["Ingredients"],
                ";\t Instructions:",
                source["Instructions"],
            )
            print()
    else:
        print("No hits")


def extract_recipe_as_string(recipes):
    hits = recipes["hits"]["hits"]
    recipes = []
    if len(hits) > 0:
        for hit in hits:
            source = hit["_source"]
            recipes.append(
                "Title: "
                + source["Title"]
                + ";\t Ingredients: "
                + source["Ingredients"]
                + ";\t Instructions: "
                + source["Instructions"]
            )
    else:
        return []
    return recipes

In [43]:
noodles = semantic_search_with_reranking("best easy spicy noodles")
pretty_print_recipe(noodles)

Title: Spicy Sesame Noodles with Chopped Peanuts and Thai Basil;	 Ingredients: ['1 tablespoon peanut oil', '2 tablespoons minced peeled fresh ginger', '2 garlic cloves, minced', '3 tablespoons Asian sesame oil', '2 tablespoons soy sauce', '2 tablespoons balsamic vinegar', '1 1/2 tablespoons sugar', '1 tablespoon (or more) hot chili oil*', '1 1/2 teaspoons salt', '1 pound fresh Chinese egg noodles or fresh angel hair pasta', '12 green onions (white and pale green parts only), thinly sliced', '1/2 cup coarsely chopped roasted peanuts', '1/4 cup thinly sliced fresh Thai basil leaves', '*Available in the Asian foods section of many supermarkets and at Asian markets.'] ;	 Instructions: Heat peanut oil in small skillet over medium heat. Add ginger and garlic; sauté 1 minute. Transfer to large bowl. Add next 6 ingredients; whisk to blend.
Place noodles in sieve over sink. Separate noodles with fingers and shake to remove excess starch. Cook in large pot of boiling salted water until just tend

## Retrieval Augmented Generation (RAG) with Cohere and Elasticsearch

RAG is a method for generating text using additional information fetched from an
external data source. With the ranked results, you can build a RAG system on
top of what you created with 
[Cohere's Chat API](https://docs.cohere.com/docs/chat-api).

Pass in the retrieved documents and the query to receive a grounded response
using Cohere's newest generative model 
[Command R+](https://docs.cohere.com/docs/command-r-plus).

Then pass in the query and the documents to the Chat API, and print out the
response.

In [44]:
client.inference.put(
    task_type="completion",
    inference_id="cohere_completion",
    body={
        "service": "cohere",
        "service_settings": {
            "api_key": COHERE_API_KEY,
            "model_id": "command-r-plus",
        },
    },
)

ObjectApiResponse({'model_id': 'cohere_completion', 'inference_id': 'cohere_completion', 'task_type': 'completion', 'service': 'cohere', 'service_settings': {'model_id': 'command-r-plus', 'rate_limit': {'requests_per_minute': 10000}}, 'task_settings': {}})

In [45]:
def RAG(query):
    input = [
        """Context: you are a personal assistant helping to explain the best recipes for users based on their query and the supplied recipes (seperated by \n\n). 
                You must pick one recipe which best fits the query, and print out the title, ingredients, and instructions in an easy to understand way.
                If none of the recipes fit the query, supply a query the user might be able to use to find a recipe.
                Feel free to provide comments about the recipe that are related to the query, including alternative query."""
    ]

    input.append("Query:" + str(query))

    # Use semantic search with our previously configured reranker to find our small set of matching recipes
    recipes = extract_recipe_as_string(semantic_search_with_reranking(query))
    if len(recipes) > 0:
        # if at least one recipe matched our query, combine them into a single string, with each recipe seperated by two new lines.
        input.append("Recipes:" + "\n\n".join(recipes))

        # combine the Context, Query, and Recipes sections (each seperated by three new lines) into a single string
        input_as_string = "\n\n\n".join(input)

        # pass the full combined instructions and recipes to the Command+R model
        chat_completion = client.inference.inference(
            inference_id="cohere_completion", input=input_as_string
        )

        # print the response which should contain our recipe
        print(chat_completion["completion"][0]["result"])
    else:
        print("No hits")

In [46]:
RAG("best easy spicy noodles")

Here is a recipe that fits your query:

## Spicy Soba Noodles with Shiitakes and Cabbage

### Ingredients:
- 1/3 cup water
- 1/3 cup soy sauce
- 2 to 3 teaspoons Korean hot-pepper paste (gochujang)
- 1 tablespoon packed brown sugar
- 3 tablespoons sesame seeds
- 1/4 cup vegetable oil
- 2 tablespoons finely chopped peeled ginger
- 1 tablespoon finely chopped garlic
- 10 oz fresh shiitake mushrooms, stemmed and thinly sliced
- 1 1/4 pound Napa cabbage, thinly sliced (about 8 cups)
- 6 scallions, thinly sliced
- 8 to 9 ounces soba (buckwheat noodles)
- 1 cup frozen shelled edamame

### Instructions:
1. Stir together all sauce ingredients until the brown sugar is dissolved, and set aside.
2. Toast the sesame seeds in a dry skillet over medium heat until pale golden, then transfer to a bowl.
3. Heat oil in the skillet over medium-high heat, then add ginger and garlic, stirring until fragrant (about 30 seconds).
4. Add shiitake mushrooms and sauté until tender and browned (about 6 minutes).
