<a href="https://colab.research.google.com/github/mohammadreza-mohammadi94/Deep-Learning-Projects/blob/main/Embeddings-and-Analogies/Word2Vec-Model-Comparison/training_word2vec_text8.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The primary objective of this implementation is to empirically investigate the impact of the **context window size** on the quality and nature of learned word embeddings using the **Skip-gram** architecture. By training two distinct **Word2Vec** models on the `text8` corpus, the code demonstrates the transition from **syntactic/functional** similarity (captured by small windows) to **topical/thematic** associations (captured by larger windows). This experiment serves as a practical demonstration of how distributional semantics can be tuned to capture either local grammatical roles or broader conceptual domains.

# Imports

In [1]:
!pip install -q gensim

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m27.9/27.9 MB[0m [31m46.2 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
import gensim.downloader as api
from gensim.models import Word2Vec
import multiprocessing

# Loading the Dataset

In [3]:
print("Loading `text8` dataset...")
dataset = api.load("text8")

Loading `text8` dataset...


In [5]:
cores = multiprocessing.cpu_count()
print(f"Number of CPU cores: {cores}")

Number of CPU cores: 2


# Training Model A

* Config:
    * `window`: 2
    * `skig-gram`: True

In [6]:
print("Training Model A (Windows Size = 2)..")
model_a = Word2Vec(
    sentences = dataset,
    vector_size=100,
    window=2,
    min_count=5,
    sg=1,
    workers=cores,
    negative=5
)

Training Model A (Windows Size = 2)..


# Training Model B
* Config:
    * `window`: 10
    * `skip-gram`: True

In [7]:
print("Training Model B (window=10)...")
model_b = Word2Vec(
    sentences=dataset,
    vector_size=100,
    window=10,        # Large window
    min_count=5,
    sg=1,              # Skip-gram architecture
    negative=5,
    workers=cores)

Training Model B (window=10)...


# Comparison & Analysis

In [8]:
target_word = "apple"

print(f"\nResult for `{target_word}` in Model A (Local - window=2):")
for word, sim in model_a.wv.most_similar(target_word, topn=5):
    print(f"   -> {word}: {sim:.4f}")

print(f"\nResults for '{target_word}' in Model B (Global - window=10):")
for word, sim in model_b.wv.most_similar(target_word, topn=5):
    print(f"   -> {word}: {sim:.4f}")


Result for `apple` in Model A (Local - window=2):
   -> amiga: 0.7803
   -> macintosh: 0.7687
   -> iic: 0.7650
   -> intel: 0.7446
   -> iigs: 0.7332

Results for 'apple' in Model B (Global - window=10):
   -> macintosh: 0.8291
   -> microsoft: 0.7312
   -> hypercard: 0.7213
   -> iic: 0.7094
   -> workstations: 0.7035
