# semantic_text with Amazon Bedrock

This notebook demonstrates how to work with `semantic_text` with Amazon Bedrock. This demonstration is related to the article [semantic_text with Amazon Bedrock](https://www.elastic.co/search-labs/blog/semantic-text-with-amazon-bedrock).

## Install Packages and Import Necessary Modules

In [None]:
# install packages
!python3 -m pip install elasticsearch==8.14


# import modules
from elasticsearch import Elasticsearch, exceptions
from getpass import getpass
import json

Collecting elasticsearch==8.14
  Downloading elasticsearch-8.14.0-py3-none-any.whl (480 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m480.2/480.2 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting elastic-transport<9,>=8.13 (from elasticsearch==8.14)
  Downloading elastic_transport-8.13.1-py3-none-any.whl (64 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m64.5/64.5 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: elastic-transport, elasticsearch
Successfully installed elastic-transport-8.13.1 elasticsearch-8.14.0


## Declaring Variables

This code will create inputs where you can enter your credentials.
Here you can learn how to retrieve your Elasticsearch credentials: [Finding Your Cloud ID](https://www.elastic.co/search-labs/tutorials/install-elasticsearch/elastic-cloud#finding-your-cloud-id).

In [None]:
ELASTIC_CLUSTER_ID = getpass("Elastic Cloud ID: ")
ELASTIC_API_KEY = getpass("Elastic Api Key: ")

AWS_ACCESS_KEY = getpass("AWS acess key: ")
AWS_SECRET_KEY = getpass("AWS secret key: ")

# AWS region
region = "us-east-1"

Elastic Cloud ID: ··········
Elastic Api Key: ··········
AWS acess key: ··········
AWS secret key: ··········


## Instance a Elasticsearch client

In [None]:
# Create the client instance
es_client = Elasticsearch(
    cloud_id=ELASTIC_CLUSTER_ID,
    api_key=ELASTIC_API_KEY,
)

## Create Embeddings task

Let's create the inference endpoint using the [Create inference API](https://www.elastic.co/guide/en/elasticsearch/reference/current/put-inference-api.html).

In [None]:
try:
    es_client.inference.delete_model(inference_id="bedrock-embeddings")
except exceptions.NotFoundError:
    # Inference endpoint does not exist
    pass

try:
    es_client.options(
        request_timeout=60, max_retries=3, retry_on_timeout=True
    ).inference.put_model(
        task_type="text_embedding",
        inference_id="bedrock-embeddings", # The name of your inference endpoint
        body={
            "service": "amazonbedrock",
            "service_settings": {
                "access_key": AWS_ACCESS_KEY,
                "secret_key": AWS_SECRET_KEY,
                "region": region,
                "provider": "amazontitan",
                "model": "amazon.titan-embed-text-v1",
            },
        },
    )
    print("Inference endpoint created successfully")
except exceptions.BadRequestError as e:
    if e.error == "resource_already_exists_exception":
        print("Inference is already created.")
    else:
        raise e

Inference endpoint created successfully


  es_client.options(
  es_client.options(
  es_client.options(
  es_client.options(
  es_client.options(
  es_client.options(
  es_client.options(
  es_client.options(
  es_client.options(


## Completion Task

In [None]:
try:
    es_client.options(
        request_timeout=60, max_retries=3, retry_on_timeout=True
    ).inference.put_model(
        task_type="completion",
        inference_id="bedrock-completion",
        body={
            "service": "amazonbedrock",
            "service_settings": {
              "access_key": AWS_ACCESS_KEY,
              "secret_key": AWS_SECRET_KEY,
              "region": region,
              "model": "anthropic.claude-3-haiku-20240307-v1:0",
              "provider": "anthropic",
        }}
      )
    print("Completion task created successfully")
except exceptions.BadRequestError as e:
    if e.error == "resource_already_exists_exception":
        print("Completion already created.")
    else:
        raise e

Completion already created.


## Creating Mappings

In [None]:
try:
    es_client.indices.create(
        index="semantic-text-bedrock",
        body={
            "mappings": {
                "properties": {
                    "super_body": {
                        "type": "semantic_text",
                        "inference_id": "bedrock-embeddings",
                    }
                }
            }
        }
    )
except exceptions.RequestError as e:
    if e.error == "resource_already_exists_exception":
        print("Index already exists.")
    else:
        raise e

## Indexing data

In [None]:
document_content = "Answer the question _what's the cat thing about?_ , based on the following context \n ---\n---\ntitle: \"semantic_text with Amazon Bedrock\"\nslug: \"semantic-text-with-amazon-bedrock\"\ndate: \"2024-07-11\"\ndescription: \"Using semantic_text new feature, and AWS Bedrock as inference endpoint service\"\nauthor:\n  - slug: gustavo-llermaly\nimage: \"semantic-text-with-amazon-bedrock/cover.png\"\ncategory:\n  - slug: integrations\n  - slug: how-to\n  - slug: generative-ai\n  - slug: vector-database\ntags:\n  - slug: rag\n  - slug: search\n---\n\n## Introduction\n\nSome of the biggest challenges on RAG systems are chunking text, generating embeddings, and then retrieving them.\nDeciding which settings to use, and how to actually generate the chunks requires developing additional code or using frameworks like [LangChain](https://www.elastic.co/search-labs/integrations/langchain) or [LlamaIndex](https://www.elastic.co/search-labs/integrations/llama-index).\n\nFew months ago, we provided a way to [chunk documents using ingest pipelines](https://www.elastic.co/search-labs/blog/chunking-via-ingest-pipelines) , leveraging the recent addition of [nested vector fields](https://www.elastic.co/search-labs/blog/multi-vector-relevance).\n\nWith the addition of the [semantic_text mapping type](https://www.elastic.co/search-labs/blog/semantic-search-simplified-semantic-text) the process of chunking text, generating embeddings, and then retrieving them comes to a single place.\n\nIn this article, we are going to create an end-to-end RAG application without leaving Elastic, using Bedrock as our inference service.\n\n![Diagram](/assets/images/semantic-text-with-amazon-bedrock/diagram.png)\n\n### Steps\n\n1. [Creating Endpoints](#creating-endpoints)\n2. [Creating mappings](#creating-mappings)\n3. [Indexing data](#indexing-data)\n4. [Asking questions](#asking-questions)\n\n## Creating Endpoints\n\nBefore creating our index, we must create the endpoints we are going to use for our inference process. The endpoints will be named:\n\n1. Embeddings Task\n2. Completion Task\n\nWe will use Bedrock as our provider for both of them. With these two endpoints we can create a full [RAG](https://www.elastic.co/search-labs/blog/retrieval-augmented-generation-rag) application only using Elastic tools!\n\nIf you want to read more about how to configure Bedrock, I recommend you read [this article](https://www.elastic.co/search-labs/blog/elasticsearch-amazon-bedrock-support) first.\n\n### Embeddings Task\n\nThis task will help us create [vector embeddings](https://www.elastic.co/search-labs/tutorials/search-tutorial/vector-search/embeddings-intro) for our documents content and for the questions the user will ask.\n\nWith these vectors we can find the chunks that are more relevant to the question and retrieve the documents that contain the answer.\n\nGo ahead and run in [Kibana DevTools Console](https://www.elastic.co/guide/en/kibana/current/console-kibana.html) to create the endpoint:\n\n```json\nPUT _inference/text_embedding/bedrock-embeddings\n {\n    \"service\": \"amazonbedrock\",\n    \"service_settings\": {\n        \"access_key\": \"{AWS_ACCESS_KEY}\",\n        \"secret_key\": \"{AWS_SECRET_KEY}\",\n        \"region\": \"{AWS_REGION}\",\n        \"provider\": \"amazontitan\",\n        \"model\": \"amazon.titan-embed-text-v1\"\n    }\n}\n```\n\n- _`provider` must be one of `amazontitan, cohere`_\n- _`model` must be one \\_model_id_ [you have access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html) to in Bedrock\\_\n\nOptional additional settings\n\n- `dimensions`: The output dimensions to use for the inference\n- `max_input_tokens`:The maximum number of input tokens\n- `similarity`: The similarity measure to use\n\n### Completion Task\n\nAfter we find the best chunks, we must send them to the LLM model so it can generate an answer for us.\n\nRun the following to add the completion endpoint:\n\n```json\nPUT _inference/completion/bedrock-completion\n{\n    \"service\": \"amazonbedrock\",\n    \"service_settings\": {\n        \"access_key\": \"{AWS_ACCESS_KEY}\",\n        \"secret_key\": \"{AWS_SECRET_KEY}\",\n        \"region\": \"{AWS_REGION}\",\n        \"model\": \"anthropic.claude-3-haiku-20240307-v1:0\",\n        \"provider\": \"anthropic\",\n    }\n}\n```\n\n- _`provider` must be one of `amazontitan, anthropic, ai21labs, cohere, meta, mistral`_\n- _`model` must be one \\_model_id_ or ARN [you have access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-ids.html) to in Bedrock\\_\n\n## Creating Mappings\n\nThe new [semantic_text](https://www.elastic.co/search-labs/blog/semantic-search-simplified-semantic-text) mapping type will make things super easy. It will take care of inferring the embedding mappings and configurations, and doing the passage chunking for you! If you want to read more you can go to this great [article](https://www.elastic.co/search-labs/blog/semantic-search-simplified-semantic-text).\n\n```json\nPUT semantic-text-bedrock\n{\n  \"mappings\": {\n    \"properties\": {\n      \"super_body\": {\n        \"type\": \"semantic_text\",\n        \"inference_id\": \"bedrock-embeddings\"\n      }\n    }\n  }\n}\n```\n\nYES. That\'s it. `super_body` is ready to be searched with vectors, and to handle chunking.\n\n## Indexing data\n\nFor data indexing we have many [methods available](https://www.elastic.co/search-labs/blog/ES-data-ingestion), you can pick the one of your preference.\n\nFor simplicity, and _recursivity_, I will just copy this whole article as rich text and store it as a document.\n\n![a cat looking into a screen that displays a live feed of the same cat, creating an infinite loop effect.](/assets/images/semantic-text-with-amazon-bedrock/recursive_cat.gif)\n\n```json\nPOST semantic-text-bedrock/_doc\n{\n  \"super_body\": \"<The content of this article>\"\n}\n```\n\nWe have it. Time to test.\n\n## Asking questions\n\nThe question and answer is a two steps process. First we must retrieve the text chunks relevant to the question, and then we must send the chunks to the LLM to generate the answer.\n\nWe will explore two strategies to do that, as promised, without any additional code or framework.\n\n### Strategy 1: API Calls\n\nWe can run two API calls: one to the `_search` endpoint to retrieve the chunk, and another one to the `inference` endpoint to do the LLM completion step.\n\n#### Retrieving chunks\n\nWe are going to try a sort of \"needle in a haystack\" query, to make sure the answer from the LLM is obtained from this article, and not from the LLM base knowledge. We are going to ask about the cat gif referring to the recursivity of this article.\n\nWe could run the nice and short default query for semantic-text:\n\n```json\nGET semantic-text-bedrock/_search\n{\n  \"query\": {\n    \"semantic\": {\n      \"field\": \"super_body\",\n      \"query\": \"what\'s the cat thing about?\"\n    }\n  }\n}\n```\n\nThe problem is this query will not sort the inner hits (chunks) by relevance, which is what we need if we don\'t want to send the entire document to the LLM as context. It will sort the document´s relevance _per document_, and not _per chunk_.\n\nThis longer query will sort inner hits (chunks) by relevance, so we can grab the juicy ones.\n\n```json\nGET semantic-text-bedrock/_search\n{\n  \"_source\": false,\n  \"retriever\": {\n    \"standard\": {\n      \"query\": {\n        \"nested\": {\n          \"path\": \"super_body.inference.chunks\",\n          \"query\": {\n            \"knn\": {\n              \"field\": \"super_body.inference.chunks.embeddings\",\n              \"query_vector_builder\": {\n                \"text_embedding\": {\n                  \"model_id\": \"bedrock-embeddings\",\n                  \"model_text\": \"what\'s the cat thing about?\"\n                }\n              }\n            }\n          },\n          \"inner_hits\": {\n            \"size\": 1,\n            \"name\": \"semantic-text-bedrock.super_body\",\n            \"_source\": \"*.text\"\n          }\n        }\n      }\n    }\n  }\n}\n```\n\n_we set root level `_source` to false, because we are interested on the relevant chunks only_\n\nAs you can see, we are using [retrievers](https://www.elastic.co/search-labs/blog/elasticsearch-retrievers) for this query, and the response looks like this:\n\nNow from the response we can copy the top chunk and combine the text in one big string. What some frameworks do, is to add metadata to each of the chunks.\n\n#### Answering the question\n\nNow we can use the bedrock completion endpoint we created previously to send this question along with the relevant chunks and get the answer.\n\n```json\nPOST _inference/completion/bedrock-completion\n{\n    \"input\": \"\"\"Answer the question:\\n\n\n    _what\'s the cat thing about?_ ,\n    based on the following context \\n\n\n    <paste the relevant chunks here>\"\"\"\n}\n```\n\nLet\'s take a look at the answer!\n\n### Strategy 2: Playground\n\nNow you learned how things work internally, let me show you how you can do this nice and easy, and with a nice UI on top. Using [Elastic Playground](https://www.elastic.co/search-labs/blog/rag-playground-introduction).\n\nGo to Playground, configure the Bedrock connector, and then select the index we just created and you are ready to go.\n\n![Configuring Playground](/assets/images/semantic-text-with-amazon-bedrock/playground_config.gif)\n\nFrom here you can start asking questions to your brand new index.\n\n![Playground chat](/assets/images/semantic-text-with-amazon-bedrock/playgroud_chat.png)\n\n## Conclusion\n\nThe new `semantic_text` mapping type makes creating a RAG setup extremely easy, without having to leave the Elastic ecosystem. Things like chunking and mapping settings are not a challenge anymore (at least not initially!), and there are various alternatives to ask questions to the data.\n\nAWS Bedrock is fully integrated by providing both embeddings and completion endpoints, and also being included as a Playground connector!.\n\n_If you are interested on reproducing the examples of this article, you can find the [Postman collection](https://www.postman.com/collection/) with the requests [here](https://github.com/elastic/elasticsearch-labs/blob/main/supporting-blog-content/semantic-text-with-amazon-bedrock/postman_collection.json)_\n\n---\n"

In [None]:
es_client.index(index='semantic-text-bedrock', document={
    "super_body": document_content
})

  es_client.index(index='semantic-text-bedrock', document={
  es_client.index(index='semantic-text-bedrock', document={
  es_client.index(index='semantic-text-bedrock', document={
  es_client.index(index='semantic-text-bedrock', document={
  es_client.index(index='semantic-text-bedrock', document={
  es_client.index(index='semantic-text-bedrock', document={
  es_client.index(index='semantic-text-bedrock', document={
  es_client.index(index='semantic-text-bedrock', document={
  es_client.index(index='semantic-text-bedrock', document={


ObjectApiResponse({'_index': 'semantic-text-bedrock', '_id': 'eNG-rZABYjkNAUNO6B0A', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 2, 'failed': 0}, '_seq_no': 0, '_primary_term': 1})

## Asking question

In [None]:
response = es_client.search(
    index="semantic-text-bedrock",
    body={
        "_source": False,
        "retriever": {
            "standard": {
                "query": {
                    "nested": {
                        "path": "super_body.inference.chunks",
                        "query": {
                            "knn": {
                                "field": "super_body.inference.chunks.embeddings",
                                "query_vector_builder": {
                                    "text_embedding": {
                                        "model_id": "bedrock-embeddings",
                                        "model_text": "what's the cat thing about?",
                                    }
                                },
                            }
                        },
                        "inner_hits": {
                            "size": 5,
                            "name": "semantic-text-bedrock.super_body",
                            "_source": "*.text",
                        },
                    }
                }
            }
        },
    },
)

# Print results
formatted_json = json.dumps(response.body, indent=4)

print(formatted_json)

{
    "took": 551,
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.7386911,
        "hits": [
            {
                "_index": "semantic-text-bedrock",
                "_id": "eNG-rZABYjkNAUNO6B0A",
                "_score": 0.7386911,
                "inner_hits": {
                    "semantic-text-bedrock.super_body": {
                        "hits": {
                            "total": {
                                "value": 9,
                                "relation": "eq"
                            },
                            "max_score": 0.7386911,
                            "hits": [
                                {
                                    "_index": "semantic-text-bedrock",
                                    "_id": "eNG-rZABYjkNAUNO6B0A",
 

## Answering the question

In [None]:
# Extracting chunks from response
chunks_arr = []

for r in response.body['hits']['hits'][0]['inner_hits']['semantic-text-bedrock.super_body']['hits']['hits']:
    chunks_arr.append(r['_source']['text'])

chunks_str = '\n'.join(chunks_arr)

input_content = {
    "input": chunks_str
}

In [None]:
response = es_client.options(
    request_timeout=60, max_retries=3, retry_on_timeout=True
).inference.inference(
    task_type="completion", inference_id="bedrock-completion", body=input_content
)

# Print results
formatted_json = json.dumps(response.body, indent=4)

print(formatted_json)

{
    "completion": [
        {
            "result": "It seems the \"cat thing\" is referring to the recursive GIF shown in the article, which depicts a cat looking at a screen that displays the same cat, creating an infinite loop effect. This recursive cat GIF is used as an example to illustrate the concept of the article, which is about creating an end-to-end Retrieving Augmented Generation (RAG) application using Elastic's semantic_text mapping type and AWS Bedrock as the inference service. The article is guiding the reader through the process of setting up the necessary components, indexing the data, and then retrieving and answering questions based on the content."
        }
    ]
}


  response = es_client.options(
  response = es_client.options(
  response = es_client.options(
  response = es_client.options(
  response = es_client.options(
  response = es_client.options(
  response = es_client.options(
  response = es_client.options(
  response = es_client.options(


## Deleting

Finally, we can delete the resources used to prevent them from consuming resources.

In [None]:
# Cleanup - Delete Index
es_client.indices.delete(index='semantic-text-bedrock', ignore=[400, 404])

# Cleanup - Delete Completions
es_client.inference.delete_model(inference_id='bedrock-completion', ignore=[400, 404])

# Cleanup - Delete Embeddings Endpoint
es_client.inference.delete_model(inference_id='bedrock-embeddings', ignore=[400, 404])

  es_client.indices.delete(index='semantic-text-bedrock', ignore=[400, 404])
  es_client.inference.delete_model(inference_id='bedrock-completion', ignore=[400, 404])
  es_client.inference.delete_model(inference_id='bedrock-embeddings', ignore=[400, 404])


ObjectApiResponse({'acknowledged': True, 'pipelines': [], 'indexes': []})