![Generating Embeddings](../../images/headings/01_embeddings_02_01_comparing_embeddings.png)

# Comparing Embeddings

## Configure embedding model

Available model IDs hosted in Bedrock include:

- `cohere.embed-english-v3`
- `cohere.embed-multilingual-v3`
- `amazon.titan-embed-text-v1`
- `amazon.titan-embed-text-v2:0`
- `amazon.titan-embed-image-v1`

In [1]:
from langchain_aws.embeddings import BedrockEmbeddings
model = BedrockEmbeddings(model_id='amazon.titan-embed-text-v2:0')

## Generate embeddings for documents

In [2]:
sentences = ['That is a happy person', '그 사람은 행복한 사람이야', 'That is a very happy person', 'These are some fairly unhappy people']
embeddings = model.embed_documents(sentences)
print(embeddings)

[[-0.032252535, 0.022927823, -0.023366632, -0.01579716, 0.10180392, 0.012670639, -0.029619675, -0.02501217, 0.032252535, -0.08117985, 0.052437793, -0.0006445022, 0.038834684, -0.05660649, -0.05923935, -0.0030579572, -0.015687458, -0.012231829, -0.0072952164, 0.016894186, 0.012670639, 0.05002434, 0.009873225, 0.0570453, -0.023586039, -0.02413455, -0.044539217, 0.0022077628, 0.050682556, 0.013987069, -0.00029311137, 0.007514621, -0.006664427, -0.12023394, 0.06626031, 0.043661594, -0.0041961204, -0.038834684, -0.027315922, -0.015687458, 0.015906863, -0.0073500676, -0.0035927568, 0.03686004, 0.026657708, 0.06801555, 0.043881, 0.0965382, 0.029619675, 0.05967816, 0.033130154, 0.044978026, -0.0072952164, 0.06231102, -0.020294962, 0.06099459, 0.04651386, 0.039273497, 0.010312035, -0.011134804, 0.01799121, 0.034885395, -0.030058485, 0.0070758113, 0.02281812, -0.018100912, -0.04475862, -0.007404919, -0.018649425, 0.0131643005, 0.030497296, 0.035763014, 0.016016565, -0.042783976, -0.052657202, -0

## Use cosine similarity to compare embeddings

### Generate cosine similarities between each pair of embeddings

In [3]:
import itertools
from langchain_community.utils.math import cosine_similarity

results = [
    { 'items': [a, b], 'similarity': cosine_similarity([embeddings[a]], [embeddings[b]])[0][0] }
    for a, b in itertools.combinations(range(len(sentences)), 2)
]

### Sort results by similarity (high to low)

In [4]:
results.sort(key=lambda x: x['similarity'], reverse=True)

### Display results

In [5]:
for result in results:
    a, b = result['items']
    similarity = result['similarity']
    print(f'Similarity between "{sentences[a]}" and "{sentences[b]}": {similarity}')

Similarity between "That is a happy person" and "That is a very happy person": 0.8817359082049356
Similarity between "That is a happy person" and "그 사람은 행복한 사람이야": 0.6091536232872617
Similarity between "그 사람은 행복한 사람이야" and "That is a very happy person": 0.5854100566756841
Similarity between "그 사람은 행복한 사람이야" and "These are some fairly unhappy people": 0.16842978140995024
Similarity between "That is a happy person" and "These are some fairly unhappy people": 0.12526323491040173
Similarity between "That is a very happy person" and "These are some fairly unhappy people": 0.10801701571063511


## Exercises

- Try using different sentences as input, with the goal of getting a sense for making comparisons between embeddings
- Try using different models to compare the same sentences to see how embeddings and similarities differ between models

## Discussion Questions

- Do you notice any differences? If so, why do you think that is?