<h1 style="background: linear-gradient(to right, #ff6b6b, #4ecdc4); 
           color: white; 
           padding: 20px; 
           border-radius: 10px; 
           text-align: center; 
           font-family: Arial, sans-serif; 
           text-shadow: 2px 2px 4px rgba(0,0,0,0.5);">
  Benchmark Titan Text Embeddings V2: Embeddings model on Amazon Bedrock
</h1>

Embeddings are integral to various natural language processing applications, with their quality crucial for optimal performance. They are commonly used in knowledge bases to represent textual data as dense vectors enabling efficient similarity search and retrieval. In Retrieval Augmented Generation (RAG), embeddings are used to retrieve relevant passages from a corpus to provide context for language models to generate informed, knowledge-grounded responses. Embeddings also play a key role in personalization and recommendation systems by representing user preferences, item characteristics, and historical interactions as vectors, allowing calculation of similarities for personalized recommendations based on user behavior and item embeddings. As new embedding models are released with incremental quality improvements, organizations must weigh the potential benefits against the associated costs of upgrading, considering factors like computational resources, data preprocessing, integration efforts, and projected performance gains impacting business metrics.

To learn more about Titan Text Embeddings V2, please refer to this link: [Titan Text Embeddings V2](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-models.html)

#### How a piece of text is converted into a vector?
Common approach is to use models which can provide contextualized embeddings for entire sentences. These models are based on deep learning architectures such as Transformers, which can capture the contextual information and relationships between words in a sentence more effectively.

![Embedding Model](imgs/vector_embedding.png)

In addition to semantic search, you can use embeddings to augment your prompts for more accurate results through Retrieval Augmented Generation (RAG)—but in order to use them, you’ll need to store them in a database with vector capabilities.

![Embedding Model](imgs/vector_db.jpg)


In September of 2023, Amazon announced the launch of Amazon Titan Text Embeddings V1, a multilingual text embeddings model that converts text inputs like single words, phrases, or large documents into high-dimensional numerical vector representations.  Since then 1000s of our customers used the first version of the model that supported over 25 languages, with an input up to 8,192 tokens, and outputs vector of 1,536 dimensions. Today we take that to next level by introducing a flexible output embedding model. The  Amazon Titan Text Embeddings V2 model supports over 100 languages and allows for a variable dimenion output thereby saving cost in terms of reduced size to store the embeddings. The output dimensions are 256, 384 and 1024. This model is designed to perform well on multi-lingual data and use cases.

<h1 style="background: linear-gradient(to right, #ff6b6b, #4ecdc4); 
           color: white; 
           padding: 20px; 
           border-radius: 10px; 
           text-align: center; 
           font-family: Arial, sans-serif; 
           text-shadow: 2px 2px 4px rgba(0,0,0,0.5);">
  Create a Bedrock client
</h1>

In [1]:
import boto3

## Simple Analogy
# Think of it like a restaurant:
# bedrock = Manager's office (see menu, check what's available, manage operations)
# bedrock-runtime = Kitchen (actually cook and serve the food)

# Create a Bedrock client
bedrock_client = boto3.client('bedrock')

# Create a Bedrock Runtime client
bedrock_runtime_client = boto3.client('bedrock-runtime')


In [2]:
bedrock_client.list_foundation_models()


{'ResponseMetadata': {'RequestId': 'b9d9d5c8-3d3c-42cb-8005-35ca3627e2cb',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'date': 'Fri, 01 Aug 2025 21:41:49 GMT',
   'content-type': 'application/json',
   'content-length': '19666',
   'connection': 'keep-alive',
   'x-amzn-requestid': 'b9d9d5c8-3d3c-42cb-8005-35ca3627e2cb'},
  'RetryAttempts': 0},
 'modelSummaries': [{'modelArn': 'arn:aws:bedrock:eu-central-1::foundation-model/amazon.titan-text-express-v1:0:8k',
   'modelId': 'amazon.titan-text-express-v1:0:8k',
   'modelName': 'Titan Text G1 - Express',
   'providerName': 'Amazon',
   'inputModalities': ['TEXT'],
   'outputModalities': ['TEXT'],
   'responseStreamingSupported': True,
   'customizationsSupported': [],
   'inferenceTypesSupported': [],
   'modelLifecycle': {'status': 'ACTIVE'}},
  {'modelArn': 'arn:aws:bedrock:eu-central-1::foundation-model/amazon.titan-text-express-v1',
   'modelId': 'amazon.titan-text-express-v1',
   'modelName': 'Titan Text G1 - Express',
   'providerNam

In [3]:
# list all the models
all_models = [ model['modelId'] for model in bedrock_client.list_foundation_models()['modelSummaries'] ]
all_models

['amazon.titan-text-express-v1:0:8k',
 'amazon.titan-text-express-v1',
 'amazon.titan-text-lite-v1:0:4k',
 'amazon.titan-text-lite-v1',
 'amazon.titan-embed-text-v1:2:8k',
 'amazon.titan-embed-text-v1',
 'amazon.titan-embed-image-v1:0',
 'amazon.titan-embed-image-v1',
 'amazon.titan-embed-text-v2:0',
 'amazon.rerank-v1:0',
 'amazon.nova-pro-v1:0',
 'amazon.nova-lite-v1:0',
 'amazon.nova-micro-v1:0',
 'anthropic.claude-instant-v1',
 'anthropic.claude-v2:1:18k',
 'anthropic.claude-v2:1:200k',
 'anthropic.claude-v2:1',
 'anthropic.claude-v2',
 'anthropic.claude-3-sonnet-20240229-v1:0',
 'anthropic.claude-3-haiku-20240307-v1:0',
 'anthropic.claude-3-5-sonnet-20240620-v1:0',
 'anthropic.claude-3-7-sonnet-20250219-v1:0',
 'anthropic.claude-sonnet-4-20250514-v1:0',
 'cohere.embed-english-v3',
 'cohere.embed-multilingual-v3',
 'cohere.rerank-v3-5:0',
 'mistral.pixtral-large-2502-v1:0',
 'meta.llama3-2-1b-instruct-v1:0',
 'meta.llama3-2-3b-instruct-v1:0']

In [4]:
# list only the Embedding models
embedding_models = [model for model in all_models if 'embed' in model.lower()]
embedding_models

['amazon.titan-embed-text-v1:2:8k',
 'amazon.titan-embed-text-v1',
 'amazon.titan-embed-image-v1:0',
 'amazon.titan-embed-image-v1',
 'amazon.titan-embed-text-v2:0',
 'cohere.embed-english-v3',
 'cohere.embed-multilingual-v3']

<h1 style="background: linear-gradient(to right, #ff6b6b, #4ecdc4); 
           color: white; 
           padding: 20px; 
           border-radius: 10px; 
           text-align: center; 
           font-family: Arial, sans-serif; 
           text-shadow: 2px 2px 4px rgba(0,0,0,0.5);">
  Generate Embeddings
</h1>

In [33]:
import json

prompt = "Amazon Bedrock supports foundation models from industry-leading providers such as \
AI21 Labs, Anthropic, Stability AI, and Amazon. Choose the model that is best suited to achieving \
your unique goals."

model_id = "amazon.titan-embed-text-v2:0"
body = json.dumps({
    "inputText": prompt,
    "dimensions": 1024,
    "normalize": False
})

model = bedrock_runtime_client.invoke_model(modelId=model_id, body=body, accept="application/json", contentType="application/json")

response_body = json.loads(model.get('body').read())

embedding = response_body.get("embedding")
print(f"The embedding vector has {len(embedding)} values\n{embedding[0:3]+['...']+embedding[-3:]}")


The embedding vector has 1024 values
[-0.08286014944314957, 0.034856218844652176, -0.030087871477007866, '...', -0.025296829640865326, -0.03928922861814499, -0.012503423728048801]


In [34]:
type(embedding)

list

In [35]:
# magnitude of the vector

import numpy as np
np.linalg.norm(embedding)

0.9999999481242833

At the time of writing you can use `amazon.titan-embed-text-v2` as embedding model via the API. The input text size is 8k tokens and the output vector length can be any of 256, 384 or 1024

To use a text embeddings model, use the InvokeModel API operation or the Python SDK. Use InvokeModel to retrieve the vector representation of the input text from the specified model.
Input

```

{
    "inputText": text,
    "dimensions": dimensions, # range from 256 , 384, 1024
    "normalize": normalize
}

Output

{
    "embedding": []
}
```

#### Normalization of a vector 

Normalization is the process of scaling it to have a unit length or magnitude of 1. It is useful to ensure that all vectors have the same scale and contribute equally during vector operations, preventing some vectors from dominating others due to their larger magnitudes.

#### When should you Normalize:
Use this as default for most of the use cases like Retrieval, RAG and others

#### When you should not Normalize: 
Normnally normalization will work for all use cases, but you can experiment for certain use cases like Classification or Entity extraction


In [36]:
prompt = "Amazon Bedrock supports foundation models from industry-leading providers such as \
AI21 Labs, Anthropic, Stability AI, and Amazon. Choose the model that is best suited to achieving \
your unique goals."

model_id = "amazon.titan-embed-text-v2:0"

body = json.dumps({
    "inputText": prompt,
    "dimensions": 1024,
    "normalize": True
})

model = bedrock_runtime_client.invoke_model(modelId=model_id, body=body, accept="application/json", contentType="application/json")

response_body = json.loads(model.get('body').read())

embedding = response_body.get("embedding")
print(f"The embedding vector has {len(embedding)} values\n{embedding[0:3]+['...']+embedding[-3:]}")


The embedding vector has 1024 values
[-0.08286015689373016, 0.034856222569942474, -0.030087873339653015, '...', -0.025296831503510475, -0.03928923234343529, -0.012503424659371376]


In [37]:
import numpy as np

# magnitude of the vector
np.linalg.norm(embedding)


1.0000000347182876

In [21]:
# Let's create a function to get the embedding

def get_embedding(prompt):
    body = json.dumps({
        "inputText": prompt,
        "dimensions": 1024,
        "normalize": True
    })
    model = bedrock_runtime_client.invoke_model(modelId=model_id, body=body, accept="application/json", contentType="application/json")
    response_body = json.loads(model.get('body').read())

    return response_body.get("embedding")


In [22]:
# Compare the embedding with two different prompts and check the similarity

text1 = "Python is a programming language"
text2 = "i have to go to the market"

text1_embedding = get_embedding(text1)
text2_embedding = get_embedding(text2)

# cosine similarity between two vectors
def cosine_similarity(vec1, vec2):
    dot_product = np.dot(vec1, vec2)
    norm_vec1 = np.linalg.norm(vec1)
    norm_vec2 = np.linalg.norm(vec2)
    similarity = dot_product / (norm_vec1 * norm_vec2)
    return similarity

cosine_similarity(text1_embedding, text2_embedding)


-0.012595531336709377

In [23]:
text3 = "Its a sunny day"

text3_embedding = get_embedding(text3)

cosine_similarity(text1_embedding, text3_embedding)


0.009236946209794724

In [24]:
text4 = "Today it may not be raining"

text4_embedding = get_embedding(text4)

cosine_similarity(text3_embedding, text4_embedding)


0.29483532451036876