# 🧠 Day 2: Exploring Word Embeddings with Word2Vec and GloVe
In this notebook, we'll dive into understanding and visualizing word embeddings using popular models like Word2Vec and GloVe. We will also use PCA and t-SNE to explore and represent these embeddings in a way that anyone can grasp intuitively. Let's get started!

## 1️⃣ What are Word Embeddings?
- Word embeddings are vector representations of words in a continuous vector space where similar words have similar representations.
- They capture semantic meaning, allowing us to find relationships like:
  - `king - man + woman ≈ queen`
- Popular models include **Word2Vec** and **GloVe**.

In [1]:
# Import necessary libraries
import gensim.downloader as api
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE
import plotly.express as px


ModuleNotFoundError: No module named 'gensim'

## 2️⃣ Loading Pre-trained Word Embeddings
We'll use the **Google's Word2Vec** pre-trained model, which has been trained on a large corpus of text data.

In [None]:
# Load Word2Vec pre-trained model
model = api.load('word2vec-google-news-300')
print('Model loaded successfully!')

## 3️⃣ Finding Similar Words
Let's find words similar to an input word using **cosine similarity**.

In [None]:
# Define a function to get similar words
def get_similar_words(word, top_n=10):
    try:
        similar_words = model.most_similar(word, topn=top_n)
        return similar_words
    except KeyError:
        print(f'Word "{word}" not in vocabulary!')
        return []

# Example usage
similar_words = get_similar_words('king')
similar_words

## 4️⃣ Visualizing Word Embeddings with PCA
Let's reduce the 300-dimensional vectors to 2D using **PCA** and visualize them with a scatter plot.

In [None]:
# Get vectors and words
words = [word for word, _ in similar_words]
vectors = [model[word] for word in words]

# Reduce dimensions with PCA
pca = PCA(n_components=2)
pca_result = pca.fit_transform(vectors)

# Create a DataFrame for visualization
df_pca = pd.DataFrame(pca_result, columns=['PC1', 'PC2'])
df_pca['word'] = words

# Plot using Seaborn
plt.figure(figsize=(10, 7))
sns.scatterplot(data=df_pca, x='PC1', y='PC2', hue='word', s=100)
plt.title('PCA Visualization of Word Embeddings')
plt.show()

## 5️⃣ Exploring Clusters with t-SNE
Now, let's use **t-SNE** to visualize the embeddings and reveal clusters of similar words.

In [None]:
# Reduce dimensions with t-SNE
tsne = TSNE(n_components=2, perplexity=15, n_iter=3000, random_state=42)
tsne_result = tsne.fit_transform(vectors)

# Create a DataFrame for visualization
df_tsne = pd.DataFrame(tsne_result, columns=['x', 'y'])
df_tsne['word'] = words

# Plot using Plotly
fig = px.scatter(df_tsne, x='x', y='y', text='word', title='t-SNE Visualization of Word Embeddings')
fig.update_traces(textposition='top center')
fig.show()

## 6️⃣ Conclusion
- **Word2Vec** captures semantic relationships between words effectively.
- **PCA** gives a quick, efficient 2D visualization.
- **t-SNE** provides deeper insights into clusters of similar words.
Play around with different words and see what you discover!