## Word2Vec Cosine Similarity Calculation

This notebook demonstrates how to calculate the cosine similarity between two sentences using pre-trained Word2Vec embeddings. The process involves the following steps:

#### Overview


1. **Pre-trained Model**:
    - We use the `word2vec-google-news-300` model, which is a pre-trained Word2Vec model trained on a large corpus of Google News data.

2. **Cosine Similarity**:
    - The similarity between the two sentences is computed using the cosine similarity metric. This metric measures the cosine of the angle between two vectors in a multi-dimensional space, providing a value between -1 and 1. A value closer to 1 indicates higher similarity.

#### Implementation Details

1. **Sentence Vectorization**:
    - Each sentence is converted into a vector representation by averaging the Word2Vec embeddings of the words present in the sentence. Words not found in the model's vocabulary are ignored.

2. **Similarity Calculation**:
    - The cosine similarity is calculated between the two sentence vectors.


In [3]:
!pip install gensim 

Collecting gensim
  Downloading gensim-4.3.3-cp312-cp312-macosx_11_0_arm64.whl.metadata (8.1 kB)
Collecting scipy<1.14.0,>=1.7.0 (from gensim)
  Downloading scipy-1.13.1-cp312-cp312-macosx_12_0_arm64.whl.metadata (60 kB)
Downloading gensim-4.3.3-cp312-cp312-macosx_11_0_arm64.whl (24.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m24.0/24.0 MB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hDownloading scipy-1.13.1-cp312-cp312-macosx_12_0_arm64.whl (30.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m30.4/30.4 MB[0m [31m2.1 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: scipy, gensim
  Attempting uninstall: scipy
    Found existing installation: scipy 1.15.2
    Uninstalling scipy-1.15.2:
      Successfully uninstalled scipy-1.15.2
Successfully installed gensim-4.3.3 scipy-1.13.1


In [4]:
import gensim.downloader as api
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

In [5]:

def average_vector(sentence, model):
    words = [word.lower() for word in sentence.split() if word in model]
    if not words:
        return np.zeros(model.vector_size)
    return np.mean([model[word] for word in words], axis=0)

def word2vec_cosine(sent1, sent2):
    model = api.load('word2vec-google-news-300')
    vec1 = average_vector(sent1, model)
    vec2 = average_vector(sent2, model)
    return cosine_similarity([vec1], [vec2])[0][0]


sent1 = "Dogs are wonderful pets."
sent2 = "Cats are amazing companions."
score = word2vec_cosine(sent1, sent2)
print(f"Word2Vec Cosine Similarity: {score:.4f}")


Word2Vec Cosine Similarity: 0.8374
