# Tutorial: Using Cohere with Elasticsearch

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/integrations/cohere/cohere-elasticsearch.ipynb)

This tutorial shows you how to compute embeddings with
Cohere using the inference API and store them for efficient vector or hybrid
search in Elasticsearch. This tutorial uses the Python Elasticsearch client
to perform the operations.

You'll learn how to:
* create an inference endpoint for text embedding using the Cohere service,
* create the necessary index mapping for the Elasticsearch index,
* build an inference pipeline to ingest documents into the index together with the embeddings,
* perform hybrid search on the data,
* rerank search results by using Cohere's rerank model,
* design a RAG system with Cohere's Chat API.

The tutorial uses the [SciFact](https://huggingface.co/datasets/mteb/scifact) data set.

Refer to [Cohere's tutorial](https://docs.cohere.com/docs/elasticsearch-and-cohere) for an example using a different data set.

## 🧰 Requirements

For this example, you will need:

- An Elastic deployment with minimum **4GB machine learning node**
   - We'll be using [Elastic Cloud](https://www.elastic.co/guide/en/cloud/current/ec-getting-started.html) for this example (available with a [free trial](https://cloud.elastic.co/registration?utm_source=github&utm_content=elasticsearch-labs-notebook))
   
- A paid [Cohere account](https://cohere.com/) is required to use the Inference API with 
the Cohere service as the Cohere free trial API usage is limited.

- Python 3.7 or later.

## Install and import required packages

Install Elasticsearch and Cohere:

In [58]:
!pip install elasticsearch
!pip install cohere

Looking in indexes: http://mirrors.aliyun.com/pypi/simple/
Looking in indexes: http://mirrors.aliyun.com/pypi/simple/


Import the required packages:

In [59]:
from elasticsearch import Elasticsearch, helpers
import cohere
import json
import requests
from dotenv import load_dotenv
import os

## Create an Elasticsearch client

Now you can instantiate the Python Elasticsearch client.

Then create a `client` object that instantiates an instance of the `Elasticsearch` class.

In [60]:
load_dotenv()
 
ES_USER = os.getenv("ES_USER")
ES_PASSWORD = os.getenv("ES_PASSWORD")
ES_ENDPOINT = os.getenv("ES_ENDPOINT")
COHERE_API_KEY = os.getenv("COHERE_API_KEY")
 
url = f"https://{ES_USER}:{ES_PASSWORD}@{ES_ENDPOINT}:9200"
print(url)
 
client = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)
print(client.info())

https://elastic:uK+7WbkeXMzwk9YvP-H3@localhost:9200
{'name': 'liuxgm.local', 'cluster_name': 'elasticsearch', 'cluster_uuid': 'coIKHIPsTf2_aWWQ8TO4bw', 'version': {'number': '8.14.1', 'build_flavor': 'default', 'build_type': 'tar', 'build_hash': '93a57a1a76f556d8aee6a90d1a95b06187501310', 'build_date': '2024-06-10T23:35:17.114581191Z', 'build_snapshot': False, 'lucene_version': '9.10.0', 'minimum_wire_compatibility_version': '7.17.0', 'minimum_index_compatibility_version': '7.0.0'}, 'tagline': 'You Know, for Search'}


## Create the inference endpoint

Create the inference endpoint first. In this example, the inference endpoint 
uses Cohere's `embed-english-v3.0` model and the `embedding_type` is set to
`byte`.

In [61]:
co = cohere.Client(api_key=COHERE_API_KEY)

In [62]:
from elasticsearch import BadRequestError

client.inference.delete_model(inference_id="cohere_embeddings")

try: 
    client.inference.put_model(
        task_type="text_embedding",
        inference_id="cohere_embeddings",
        body={
            "service": "cohere",
            "service_settings": {
                "api_key": COHERE_API_KEY,
                "model_id": "embed-english-v3.0",
                "embedding_type": "byte",
            },
        },
    )
except BadRequestError as e:
    print(e)

You can find your API keys in your Cohere dashboard under the
[API keys section](https://dashboard.cohere.com/api-keys).

## Create the index mapping

Create the index mapping for the index that will contain the embeddings.

In [63]:
index_name="cohere-embeddings"

client.indices.delete(index=index_name)

if not client.indices.exists(index=index_name):
    client.indices.create(
        index=index_name,
        settings={"index": {"default_pipeline": "cohere_embeddings"}},
        mappings={
            "properties": {
                "text_embedding": {
                    "type": "dense_vector",
                    "dims": 1024,
                    "element_type": "byte",
                },
                "text": {"type": "text"},
                "id": {"type": "integer"},
                "title": {"type": "text"},
            }
        },
    )

## Create the inference pipeline

Now you have an inference endpoint and an index ready to store embeddings. The next
step is to create an ingest pipeline that creates the embeddings using the
inference endpoint and stores them in the index.

In [64]:
client.ingest.put_pipeline(
    id="cohere_embeddings",
    description="Ingest pipeline for Cohere inference.",
    processors=[
        {
            "inference": {
                "model_id": "cohere_embeddings",
                "input_output": {
                    "input_field": "text",
                    "output_field": "text_embedding",
                },
            }
        }
    ],
)

ObjectApiResponse({'acknowledged': True})

## Prepare data and insert documents

This example uses the [SciFact](https://huggingface.co/datasets/mteb/scifact) data
set that you can find on HuggingFace.

In [65]:
#url = "https://huggingface.co/datasets/mteb/scifact/raw/main/corpus.jsonl"

# Fetch the JSONL data from the URL
#response = requests.get(url)
#response.raise_for_status()  # Ensure we notice bad responses

import json

with open('./corpus.jsonl', 'r') as file:
    content = file.read()
 
# Split the content by new lines and parse each line as JSON
data = [json.loads(line) for line in content.strip().split("\n") if line]

data = data[:20]
print(f"Successfully loaded {len(data)} documents")

# Change `_id` key to `id` as `_id` is a reserved key in Elasticsearch.
for item in data:
    if "_id" in item:
        item["id"] = item.pop("_id")

# Prepare the documents to be indexed
documents = []
for line in data:
    data_dict = line
    documents.append(
        {
            "_index": "cohere-embeddings",
            "_source": data_dict,
        }
    )

# Use the bulk endpoint to index
helpers.bulk(client, documents)

print("Data ingestion completed, text embeddings generated!")

Successfully loaded 20 documents
Data ingestion completed, text embeddings generated!


Your index is populated with the SciFact data and text embeddings for the text
field.

## Hybrid search

Let's start querying the index!

The code below performs a hybrid search. The `kNN` query computes the relevance
of search results based on vector similarity using the `text_embedding` field.
The lexical search query uses BM25 retrieval to compute keyword similarity on
the `title` and `text` fields.

In [66]:
query = "What is biosimilarity?"

response = client.search(
    index="cohere-embeddings",
    size=100,
    knn={
        "field": "text_embedding",
        "query_vector_builder": {
            "text_embedding": {
                "model_id": "cohere_embeddings",
                "model_text": query,
            }
        },
        "k": 10,
        "num_candidates": 50,
    },
    query={"multi_match": {"query": query, "fields": ["text", "title"]}},
)

raw_documents = response["hits"]["hits"]

# Display the first 10 results
for document in raw_documents[0:10]:
    print(
        f'Title: {document["_source"]["title"]}\nText: {document["_source"]["text"]}\n'
    )

# Format the documents for ranking
documents = []
for hit in response["hits"]["hits"]:
    documents.append(hit["_source"]["text"])

Title: BC1 RNA, the transcript from a master gene for ID element amplification, is able to prime its own reverse transcription.
Text: ID elements are short interspersed elements (SINEs) found in high copy number in many rodent genomes. BC1 RNA, an ID-related transcript, is derived from the single copy BC1 RNA gene. The BC1 RNA gene has been shown to be a master gene for ID element amplification in rodent genomes. ID elements are dispersed through a process termed retroposition. The retroposition process involves a number of potential regulatory steps. These regulatory steps may include transcription in the appropriate tissue, transcript stability, priming of the RNA transcript for reverse transcription and integration. This study focuses on priming of the RNA transcript for reverse transcription. BC1 RNA gene transcripts are shown to be able to prime their own reverse transcription in an efficient intramolecular and site-specific fashion. This self-priming ability is a consequence of t

## Rerank search results

To combine the results more effectively, use 
[Cohere's Rerank v3](https://docs.cohere.com/docs/rerank-2) model through the
inference API to provide a more precise semantic reranking of the results.

Create an inference endpoint with your Cohere API key and the used model name as
the `model_id` (`rerank-english-v3.0` in this example).

In [67]:
client.inference.delete_model(inference_id="cohere_embeddings")

try:
    client.inference.put_model(
        task_type="rerank",
        inference_id="cohere_rerank",
        body={
            "service": "cohere",
            "service_settings": {
                "api_key": COHERE_API_KEY,
                "model_id": "rerank-english-v3.0",
            },
            "task_settings": {
                "top_n": 10,
            },
        },
    )
except BadRequestError as e:
    print(e)

Rerank the results using the new inference endpoint.

In [68]:
response = client.inference.inference(
    inference_id="cohere_rerank",
    body={
        "query": query,
        "input": documents,
        "task_settings": {"return_documents": False},
    },
)

# Reconstruct the input documents based on the index provided in the rereank response
ranked_documents = []
for document in response.body["rerank"]:
    ranked_documents.append(
        {
            "title": raw_documents[int(document["index"])]["_source"]["title"],
            "text": raw_documents[int(document["index"])]["_source"]["text"],
        }
    )

# Print the top 10 results
for document in ranked_documents[0:10]:
    print(f"Title: {document['title']}\nText: {document['text']}\n")

Title: The sacroiliac joint in the spondyloarthropathies.
Text: The term spondyloarthropathy (SpA) describes and defines a group of related inflammatory joint disease that share characteristic clinical features and a unique association with the major histocompatibility complex class I molecule HLA-B27. Five subgroups can be differentiated: ankylosing spondylitis, reactive arthritis, psoriatic arthritis, arthritis associated with inflammatory bowel disease, and undifferentiated SpA. The sacroiliac joints are centrally involved in the SpA, most clearly and pathognomonic in ankylosing spondylitis, in which most patients are affected early in the disease. Overcoming some of the diagnostic difficulties of early sacroiliitis, dynamic magnetic resonance imaging was shown to visualize both acute and chronic changes in the sacroiliac joints. The inflammation in the sacroiliac joints in patients with SpA was recently examined in more detail; using immunohistology and in situ hybridrization, T ce

## Retrieval Augmented Generation (RAG) with Cohere and Elasticsearch

RAG is a method for generating text using additional information fetched from an
external data source. With the ranked results, you can build a RAG system on
top of what you created with 
[Cohere's Chat API](https://docs.cohere.com/docs/chat-api).

Pass in the retrieved documents and the query to receive a grounded response
using Cohere's newest generative model 
[Command R+](https://docs.cohere.com/docs/command-r-plus).

Then pass in the query and the documents to the Chat API, and print out the
response.

In [75]:
response = co.chat(message=query, documents=ranked_documents, model="command-r-plus")

#source_documents = []
#for citation in response.citations:
#    for document_id in citation.document_ids:
#        if document_id not in source_documents:
#            source_documents.append(document_id)

print(f"Query: {query}")
print(f"Response: {response.text}")
#print("Sources:")
#for document in response.documents:
#    if document["id"] in source_documents:
#        print(f"{document['title']}: {document['text']}")

Query: What is biosimilarity?
Response: Sorry, I do not have any information about biosimilarity. Can I help you with something else?
