## Benchmark Titan Text Embeddings V2: A new, state-of-the-art Embeddings model on Amazon Bedrock using MTEB data set

Embeddings are integral to various natural language processing applications, with their quality crucial for optimal performance. They are commonly used in knowledge bases to represent textual data as dense vectors enabling efficient similarity search and retrieval. In Retrieval Augmented Generation (RAG), embeddings are used to retrieve relevant passages from a corpus to provide context for language models to generate informed, knowledge-grounded responses. Embeddings also play a key role in personalization and recommendation systems by representing user preferences, item characteristics, and historical interactions as vectors, allowing calculation of similarities for personalized recommendations based on user behavior and item embeddings. As new embedding models are released with incremental quality improvements, organizations must weigh the potential benefits against the associated costs of upgrading, considering factors like computational resources, data preprocessing, integration efforts, and projected performance gains impacting business metrics.

**Some useful links for MTEB:**
1. https://github.com/embeddings-benchmark/mteb

2. https://huggingface.co/blog/mteb

#### How a piece of text is converted into a vector?
Common approach is to use models which can provide contextualized embeddings for entire sentences. These models are based on deep learning architectures such as Transformers, which can capture the contextual information and relationships between words in a sentence more effectively.

![Embedding Model](./images/vector_embedding.png)

In addition to semantic search, you can use embeddings to augment your prompts for more accurate results through Retrieval Augmented Generation (RAG)—but in order to use them, you’ll need to store them in a database with vector capabilities.

![Embedding Model](./images/vector_db.jpg)


In September of 2023, Amazon announced the launch of Amazon Titan Text Embeddings V1, a multilingual text embeddings model that converts text inputs like single words, phrases, or large documents into high-dimensional numerical vector representations.  Since then 1000s of our customers used the first version of the model that supported over 25 languages, with an input up to 8,192 tokens, and outputs vector of 1,536 dimensions. Today we take that to next level by introducing a flexible output embedding model. The  Amazon Titan Text Embeddings V2 model supports over 100 languages and allows for a variable dimenion output thereby saving cost in terms of reduced size to store the embeddings. The output dimensions are 256, 512 and 1024. This model is designed to perform well on multi-lingual data and use cases.

#### Few installs

You will need to install a few libraries like MTEB for benchmark. To simply use the model you would need only boto3

In [None]:
#!pip install -U boto3  huggingface_hub mteb

#### Boto3 client to connect to the model

This is optional and you can use a simple client like **boto3.client('bedrock')** to invoke. This will assume your default profile to access Bedrock

In [None]:
import os
from typing import Optional

# External Dependencies:
import boto3
from botocore.config import Config


def get_bedrock_client(assumed_role: Optional[str] = None, region: Optional[str] = 'us-east-1',runtime: Optional[bool] = True,external_id=None, ep_url=None):
    """Create a boto3 client for Amazon Bedrock, with optional configuration overrides 
    """
    target_region = region

    print(f"Create new client\n  Using region: {target_region}:external_id={external_id}: ")
    session_kwargs = {"region_name": target_region}
    client_kwargs = {**session_kwargs}

    profile_name = os.environ.get("AWS_PROFILE")
    if profile_name:
        print(f"  Using profile: {profile_name}")
        session_kwargs["profile_name"] = profile_name

    retry_config = Config(
        region_name=target_region,
        retries={
            "max_attempts": 10,
            "mode": "standard",
        },
    )
    session = boto3.Session(**session_kwargs)

    if assumed_role:
        print(f"  Using role: {assumed_role}", end='')
        sts = session.client("sts")
        if external_id:
            response = sts.assume_role(
                RoleArn=str(assumed_role),
                RoleSessionName="langchain-llm-1",
                ExternalId=external_id
            )
        else:
            response = sts.assume_role(
                RoleArn=str(assumed_role),
                RoleSessionName="langchain-llm-1",
            )
        print(f"Using role: {assumed_role} ... sts::successful!")
        client_kwargs["aws_access_key_id"] = response["Credentials"]["AccessKeyId"]
        client_kwargs["aws_secret_access_key"] = response["Credentials"]["SecretAccessKey"]
        client_kwargs["aws_session_token"] = response["Credentials"]["SessionToken"]

    if runtime:
        service_name='bedrock-runtime'
    else:
        service_name='bedrock'

    if ep_url:
        bedrock_client = session.client(service_name=service_name,config=retry_config,endpoint_url = ep_url, **client_kwargs )
    else:
        bedrock_client = session.client(service_name=service_name,config=retry_config, **client_kwargs )

    print("boto3 Bedrock client successfully created!")
    print(bedrock_client._endpoint)
    return bedrock_client

#### Helper Class to connect and run the embeddings

This is not a production use code but more of a reference 

In [None]:
import json
import boto3

class TitanEmbeddings(object):
    accept = "application/json"
    content_type = "application/json"
    
    def __init__(self, model_id="amazon.titan-embed-text-v2:0", boto3_client=None, region_name='us-east-1'):
        
        if boto3_client:
            self.bedrock_boto3 = boto3_client
        else:
            # self.bedrock_boto3 = boto3.client(service_name='bedrock-runtime')
            self.bedrock_boto3 = boto3.client(
                service_name='bedrock-runtime', 
                region_name=region_name, 
            )
        self.model_id = model_id

    def __call__(self, text, dimensions, normalize=True):
        """
        Returns Titan Embeddings

        Args:
            text (str): text to embed
            dimensions (int): Number of output dimensions.
            normalize (bool): Whether to return the normalized embedding or not.

        Return:
            List[float]: Embedding
            
        """

        body = json.dumps({
            "inputText": text,
            "dimensions": dimensions,
            "normalize": normalize
        })

        response = self.bedrock_boto3.invoke_model(
            body=body, modelId=self.model_id, accept=self.accept, contentType=self.content_type
        )

        response_body = json.loads(response.get('body').read())

        return response_body['embedding']


### Test connection to bedrock

use the bedrock boto client. If you are using a role to be assumed pass that in. If you have the profile set up then leverage by setting it in the os environ variable AWS_PROFILE

In [None]:
import json
import os
import sys

import boto3

boto3_bedrock_runtime = get_bedrock_client() #boto3.client('bedrock')

bedrock_embeddings = TitanEmbeddings(model_id="amazon.titan-embed-text-v2:0", boto3_client=boto3_bedrock_runtime)
bedrock_embeddings

### Generate Embeddings

At the time of writing you can use amazon.titan-embed-text-v2 as embedding model via the API. The input text size is 8k tokens and the output vector length can be any of 256, 512 or 1024

To use a text embeddings model, use the InvokeModel API operation or the Python SDK. Use InvokeModel to retrieve the vector representation of the input text from the specified model.
Input

```

{
    "inputText": text,
    "dimensions": dimensions, # range from 256 , 512, 1024
    "normalize": normalize
}

Output

{
    "embedding": []
}
```

#### Normalization of a vector 

Normalization is the process of scaling it to have a unit length or magnitude of 1. It is useful to ensure that all vectors have the same scale and contribute equally during vector operations, preventing some vectors from dominating others due to their larger magnitudes.

#### When should you Normalize:
Use this as default for most of the use cases like Retrieval, RAG and others

#### When you should not Normalize: 
Normnally normalization wil work for all use cases, but you can experiment for certain use cases like Classification or Entity extraction


In [None]:
prompt_data = "Amazon Bedrock supports foundation models from industry-leading providers such as \
AI21 Labs, Anthropic, Stability AI, and Amazon. Choose the model that is best suited to achieving \
your unique goals."


modelId = "amazon.titan-embed-text-v2:0"  # 
accept = "application/json"
contentType = "application/json"



sample_model_input={
    "inputText": prompt_data,
    "dimensions": 256,
    "normalize": True
}

body = json.dumps(sample_model_input)

response = boto3_bedrock_runtime.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)

response_body = json.loads(response.get('body').read())

embedding = response_body.get("embedding")
print(f"The embedding vector has {len(embedding)} values\n{embedding[0:3]+['...']+embedding[-3:]}")


#### Use the Helper class to show the same results

In [None]:
embedding= bedrock_embeddings(text=prompt_data, dimensions=256, normalize=True)
print(f"The embedding vector has {len(embedding)} values\n{embedding[0:3]+['...']+embedding[-3:]}")

### Optional MTEB testing

To create your own MTEB testing you can use this code below as a starting point and a sample. Please note this is not a production code. You should refer to the github location for MTEB at https://github.com/embeddings-benchmark/mteb/tree/main for more details

In [None]:
import numpy as np
import json
import os
import sys
import time
import torch

import boto3

class TitanV2Model():
    def __init__(self) -> None:

        self.br_embeddings = None     
        self._init_connection()
        
        print(f"TitanV2Model:__init__::ready:to:Invoke:::successful::") 
    
    def _init_connection(self, dim=256):
        boto3_bedrock_runtime = get_bedrock_client() #boto3.client('bedrock')

        self.br_embeddings = TitanEmbeddings(model_id="amazon.titan-embed-text-v2:0", boto3_client=boto3_bedrock_runtime)
        self.dim = dim
        
    def process_dict_text(self, single_text_dict):
        """ **IMPORTANT** CHANGE this Code to be tuned to your data set and use this -- DO NOT USE THIS AS IS. refer to  https://github.com/embeddings-benchmark/mteb/blob/main/mteb/abstasks/AbsTaskRetrieval.py as an example. Please goover this Git hub in detail"""
        single_text = [str(key).strip() + " " + str(val).strip() if 'title' in key.lower() else str(val).strip() for key, val in single_text_dict.items()]
        return " ".join(single_text)[:30000]
        
    def reorg_text(self, single_text):
        
        """ **IMPORTANT** CHANGE this Code to be tuned to your data set and use this -- DO NOT USE THIS AS IS. refer to  https://github.com/embeddings-benchmark/mteb/blob/main/mteb/abstasks/AbsTaskRetrieval.py as an example. Please goover this Git hub in detail"""
        if isinstance(single_text, dict):
            single_text = self.process_dict_text(single_text)
        single_text = "0" if not single_text else single_text 
        # check for json -- 
        try:
            single_text_dict = json.loads(single_text)
            single_text = self.process_dict_text(single_text_dict)
        except:
            pass
        return single_text
    
    def invoke_model(self, text_list: list[str]):
        """ **IMPORTANT** CHANGE this Code to be tuned to your data set and use this -- DO NOT USE THIS AS IS. refer to  https://github.com/embeddings-benchmark/mteb/blob/main/mteb/abstasks/AbsTaskRetrieval.py as an example. Please goover this Git hub in detail"""
        list_embeddings = []
        
        for single_text in text_list:
            single_text = self.reorg_text(single_text)
            single_embed = bedrock_embeddings(text=single_text, dimensions=self.dim, normalize=True)
            list_embeddings.append(single_embed)

        return list_embeddings

    def reshape_titan_embeddings(self, query_embeddings: np.ndarray, **kwargs) -> list[np.ndarray]:
        # - use this to re shape your embeddings as needed
        return query_embeddings # 
        
        
    def encode(self, queries: list[str], **kwargs) -> list[np.ndarray] | list[torch.Tensor] : # - | list[torch.Tensor] 
        """
        Returns a list of embeddings for the given sentences.
        Args:
            queries: List of sentences to encode

        Returns:
            List of embeddings for the given sentences
        """
        
        embedding_list = self.invoke_model(queries)
        return self.reshape_titan_embeddings(np.array(embedding_list))

    
    def encode_queries(self, queries: list[str], **kwargs) -> list[np.ndarray] | list[torch.Tensor] : # - | list[torch.Tensor] 
        """
        Returns a list of embeddings for the given sentences.
        Args:
            queries: List of sentences to encode

        Returns:
            List of embeddings for the given sentences
        """
        
        embedding_list = self.invoke_model(queries)
        return self.reshape_titan_embeddings(np.array(embedding_list))


    def encode_corpus(self, corpus: list[str] | list[dict[str, str]], **kwargs) -> list[np.ndarray] | list[torch.Tensor] : #- | list[torch.Tensor]:
        """
        Returns a list of embeddings for the given sentences.
        Args:
            corpus: List of sentences to encode
                or list of dictionaries with keys "title" and "text"

        Returns:
            List of embeddings for the given sentences
        """
        
        embedding_list = self.invoke_model(corpus)
        return self.reshape_titan_embeddings(np.array(embedding_list))


In [None]:
final_output = TitanV2Model().encode_queries(["this is a test"])

print(f"Embneddings Generated ::")
print(f"shape:of:embeddings -- > length of embeddings={len(final_output)}::")
print(f"shape:of:embeddings -- > {len(final_output[0])}::")

### Sample of one of the classfication tasks which can be run

In [None]:
from mteb import MTEB
from sentence_transformers import SentenceTransformer
from mteb.tasks.Retrieval.eng.CQADupstackEnglishRetrieval import CQADupstackEnglishRetrieval 

model = TitanV2Model()
evaluation = MTEB(tasks=[CQADupstackEnglishRetrieval(langs=["en"]),])

evaluation.run(model, eval_splits=["test"], trust_remote_code=True, overwrite_results = True) #- output_folder

### Please refer to this link [for Amazon Titan models](https://docs.aws.amazon.com/bedrock/latest/userguide/titan-models.html) 