## Embedding Example

In [1]:
# Import the Ollama module
import ollama

In [3]:
context = 'Represent this sentence for searching relevant passages:'
text = 'The sky is blue because of Rayleigh scattering'

embeds = ollama.embeddings(model='mxbai-embed-large', 
                           prompt=f'{context} {text}')

In [9]:
type(embeds)

dict

In [10]:
embeds.keys()

dict_keys(['embedding'])

In [13]:
# Access item via key
type(embeds['embedding'])

list

In [14]:
# Check length of list
len(embeds['embedding'])

1024

In [15]:
# Sample 10 elements in the list
embeds['embedding'][:10]

[-0.5315825343132019,
 0.08747989684343338,
 0.46782198548316956,
 -0.6539711952209473,
 -0.5936126112937927,
 0.07853017747402191,
 -0.09366538375616074,
 0.33488044142723083,
 0.5769988894462585,
 1.097411870956421]

## Embedding Comparisons

### Cosine Similarity

In [25]:
context = 'Represent this sentence for searching relevant passages:'

# Text & Embeddings 1
text = 'Apple, oranges, and grapes are good fruits.'
embeds_1 = ollama.embeddings(model='mxbai-embed-large', 
                             prompt=f'{context} {text}')

# Text & Embeddings 2
text = 'Eating a good balance of meat, vegetables, and fruits everyday is good for you.'
embeds_2 = ollama.embeddings(model='mxbai-embed-large', 
                             prompt=f'{context} {text}')


# Text & Embeddings 3
text = 'How to be a good data engineer?'
embeds_3 = ollama.embeddings(model='mxbai-embed-large', 
                             prompt=f'{context} {text}')

The cosine similarity between two vectors $ \mathbf{A} $ and $ \mathbf{B} $ is calculated using the following formula:

$$
\text{cosine\_similarity}(\mathbf{A}, \mathbf{B}) = \frac{\mathbf{A} \cdot \mathbf{B}}{\|\mathbf{A}\| \cdot \|\mathbf{B}\|}
$$

Where:
- $ \mathbf{A} \cdot \mathbf{B} $ denotes the dot product of vectors $ \mathbf{A} $ and $ \mathbf{B} $.
- $ \|\mathbf{A}\| $ denotes the Euclidean norm (magnitude) of vector $ \mathbf{A} $.
- $ \|\mathbf{B}\| $ denotes the Euclidean norm (magnitude) of vector $ \mathbf{B} $.

The dot product of two vectors is the sum of the products of their corresponding components. Mathematically, if $ \mathbf{A} = [a_1, a_2, ..., a_n] $ and $ \mathbf{B} = [b_1, b_2, ..., b_n] $, then the dot product $ \mathbf{A} \cdot \mathbf{B} $ is:

$$
\mathbf{A} \cdot \mathbf{B} = a_1 \times b_1 + a_2 \times b_2 + \cdots + a_n \times b_n
$$

The Euclidean norm of a vector is the square root of the sum of the squares of its components. For a vector $ \mathbf{V} = [v_1, v_2, ..., v_n] $, the Euclidean norm $ \|\mathbf{V}\| $ is:

$$
\|\mathbf{V}\| = \sqrt{v_1^2 + v_2^2 + \cdots + v_n^2}
$$

So, to calculate the cosine similarity between two vectors, we take their dot product and divide it by the product of their magnitudes.

This formula computes the cosine of the angle between the two vectors. If the angle is small (cosine close to 1), the vectors are similar; if it's large (cosine close to -1), they are dissimilar. If the angle is 90 degrees (cosine 0), the vectors are orthogonal and have no similarity.


In [18]:
import numpy as np

def cosine_similarity(vector1, vector2):
    dot_product = np.dot(vector1, vector2)
    norm_vector1 = np.linalg.norm(vector1)
    norm_vector2 = np.linalg.norm(vector2)
    similarity = dot_product / (norm_vector1 * norm_vector2)
    return similarity



In [28]:
# Comparison 1/2
cosine_similarity(np.array(embeds_1['embedding']), np.array(embeds_2['embedding']))

0.7484848742146336

In [29]:
# Comparison 1/3
cosine_similarity(np.array(embeds_1['embedding']), np.array(embeds_3['embedding']))

0.3626350686270389