# Word Mover's Distance as a Dissimilarity/Similarity Measure

### Import the libraries

In [1]:
import gensim
from gensim.models import KeyedVectors

### Load the pre-trained Word2Vec model

In [2]:
model = KeyedVectors.load_word2vec_format('/Users/amankedia/Desktop/Sunday/nlp-book/Chapter 5/Code/GoogleNews-vectors-negative300.bin', binary=True)

FileNotFoundError: [Errno 2] No such file or directory: '/Users/amankedia/Desktop/Sunday/nlp-book/Chapter 5/Code/GoogleNews-vectors-negative300.bin'

### Define the sentences

In [3]:
sentence_1 = "Obama speaks to the media in Illinois"
sentence_2 = "President greets the press in Chicago"
sentence_3 = "Apple is my favorite company"

### Compute the Word Mover's Distance between the sentences

In [4]:
word_mover_distance = model.wmdistance(sentence_1, sentence_2)
word_mover_distance

1.1642040735998236

In [5]:
word_mover_distance = model.wmdistance(sentence_1, sentence_3)
word_mover_distance

1.365806580758697

### Normalizing the word embeddings to get a best measure of distance

In [6]:
model.init_sims(replace = True)

### Recomputing the Word Mover's Distance between the sentences based on normalized embeddings

In [7]:
word_mover_distance = model.wmdistance(sentence_1, sentence_2)
word_mover_distance

0.4277553083600646

In [8]:
word_mover_distance = model.wmdistance(sentence_1, sentence_3)
word_mover_distance

0.47793400675650705

### Summary
## Libraries Used
# 1 Gensim: 
This library, specifically the KeyedVectors module, is used for loading and working with pre-trained Word2Vec models.

## Steps Followed

 1. Import the Libraries:
Necessary libraries for handling word embeddings and computing distances are imported.

 2. Load the Pre-trained Word2Vec Model:
A pre-trained Word2Vec model, such as the one trained on Google News, is loaded to provide word embeddings.

 3. Define the Sentences:
Example sentences are defined for which the Word Mover's Distance will be calculated.

 4. Compute the Word Mover's Distance:
The Word Mover's Distance (WMD) is calculated between pairs of sentences to measure their semantic dissimilarity or similarity.

 5. Normalize the Word Embeddings:
Word embeddings are normalized to enhance the accuracy of the distance measurement.

 6. Recompute the Word Mover's Distance:
The WMD is recalculated using the normalized embeddings to ensure more reliable results.