## Implement Word Embedding using Word2Vec, GloVe, and FastText

### What are Word Embeddings?

Words are symbols → computers cannot directly understand them.

Old techniques (BoW / TF-IDF) only count words, but they:

Ignore meaning.

Cannot capture relationships (e.g., “cat” and “dog” are both animals).

Word embeddings = numerical vectors where similar words are closer in space.

They capture semantic meaning and relationships between words.

### 1. Word2Vec (by Google, 2013)

Learns embeddings by predicting context of words.

Two architectures:

CBOW (Continuous Bag of Words): Predict target word from context.

Skip-Gram: Predict context words from target word.

Example: "king - man + woman ≈ queen".

### 2. GloVe (Global Vectors, by Stanford, 2014)

Pre-trained embeddings from large text (Wikipedia, Twitter).

Uses word co-occurrence statistics.

Advantage: Ready-to-use, very accurate.

### 3. FastText (by Facebook, 2016)

Improves Word2Vec by considering subwords (character n-grams).

Can generate embeddings for unseen/misspelled words.

Example:

"playing" = play + ing

"player" = play + er
→ both share the root play.

In [4]:
#Word2Vec learns meaning of words based on context.
from gensim.models import Word2Vec
from nltk.tokenize import word_tokenize

# Sample text corpus
sentences = [
    "I love natural language processing",
    "Word embeddings capture semantic meaning",
    "Machine learning is fun and powerful",
    "I love deep learning and NLP"
]

# Preprocess (tokenization)
tokenized_sentences = [word_tokenize(sentence.lower()) for sentence in sentences]

# Train Word2Vec model
model = Word2Vec(
    sentences=tokenized_sentences,
    vector_size=50,   # dimensions of word vector
    window=5,         # context window
    min_count=1,      # ignore words with less frequency
    workers=4
)

# Print word embedding
print("\nWord vector for 'learning':")
print(model.wv['learning'])

# Find similar words
print("\nTop similar words to 'learning':")
print(model.wv.most_similar('learning'))



Word vector for 'learning':
[ 1.56351421e-02 -1.90203730e-02 -4.11062239e-04  6.93839323e-03
 -1.87794445e-03  1.67635437e-02  1.80215668e-02  1.30730132e-02
 -1.42324204e-03  1.54208085e-02 -1.70686692e-02  6.41421322e-03
 -9.27599426e-03 -1.01779103e-02  7.17923651e-03  1.07406788e-02
  1.55390287e-02 -1.15330126e-02  1.48667218e-02  1.32509926e-02
 -7.41960062e-03 -1.74912829e-02  1.08749345e-02  1.30195115e-02
 -1.57510047e-03 -1.34197120e-02 -1.41718509e-02 -4.99412045e-03
  1.02865072e-02 -7.33047491e-03 -1.87401194e-02  7.65347946e-03
  9.76895820e-03 -1.28571270e-02  2.41711619e-03 -4.14975407e-03
  4.88066689e-05 -1.97670180e-02  5.38400887e-03 -9.50021297e-03
  2.17529293e-03 -3.15244915e-03  4.39334614e-03 -1.57631524e-02
 -5.43436781e-03  5.32639725e-03  1.06933638e-02 -4.78302967e-03
 -1.90201886e-02  9.01175756e-03]

Top similar words to 'learning':
[('meaning', 0.19010192155838013), ('processing', 0.0449172779917717), ('word', -0.010146019048988819), ('i', -0.0144752655