# How to measure vector similarity?

To measure the similarity or closeness between two word embeddings, you can use cosine similarity. Cosine similarity measures the cosine of the angle between two vectors and ranges from -1 to 1, where 1 indicates that the vectors are identical, 0 indicates that the vectors are orthogonal (i.e., have no similarity), and -1 indicates that the vectors are opposite.

In [1]:
import numpy as np
from numpy.linalg import norm

def cosine_similarity(embedding1, embedding2):
    """
    Calculate cosine similarity between two embeddings.
    
    Parameters:
        embedding1 (numpy.ndarray): First embedding vector.
        embedding2 (numpy.ndarray): Second embedding vector.
        
    Returns:
        float: Cosine similarity between the two embeddings.
    """
    # Calculate dot product of the two embeddings
    dot_product = np.dot(embedding1, embedding2)
    
    # Calculate magnitudes of the embeddings
    norm1 = norm(embedding1)
    norm2 = norm(embedding2)
    
    # Calculate cosine similarity
    similarity = dot_product / (norm1 * norm2)
    
    return similarity

# Example usage
embedding1 = np.array([0.5, 0.3, -0.2])
embedding2 = np.array([0.7, 0.2, -0.1])

similarity = cosine_similarity(embedding1, embedding2)
print("Cosine Similarity:", similarity)


Cosine Similarity: 0.9492481892299481


Imagine you have two vectors represented by arrows on a piece of paper. The dot product of these two vectors is a measure of how aligned or similar they are in direction. To compute the dot product, you align the two vectors tail-to-tail and measure the length of the projection of one vector onto the other. If the vectors point in the same direction, the dot product is positive; if they point in opposite directions, the dot product is negative; and if they are perpendicular, the dot product is zero.

Now, let's talk about normalization. Imagine you want to measure the length of a vector, which represents its magnitude or size. The normalization process involves dividing the vector by its length, which scales it down to a unit vector (a vector with length 1) pointing in the same direction. This allows us to compare vectors of different lengths on an equal footing.

So, when we compute the cosine similarity between two vectors, we first compute the dot product to measure their alignment. Then, we normalize each vector by dividing it by its length (norm) to ensure they are unit vectors. Finally, we divide the dot product by the product of the norms to obtain a value between -1 and 1, representing the cosine of the angle between the vectors. This value indicates how similar or aligned the vectors are in direction, with higher values indicating greater similarity.