## First, we import a tool that gives us ready-made word meanings (word vectors)

In [2]:
!pip install gensim

Collecting gensim
  Downloading gensim-4.3.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (8.1 kB)
Collecting numpy<2.0,>=1.18.5 (from gensim)
  Downloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (61 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m61.0/61.0 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting scipy<1.14.0,>=1.7.0 (from gensim)
  Downloading scipy-1.13.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (60 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m60.6/60.6 kB[0m [31m4.5 MB/s[0m eta [36m0:00:00[0m
Downloading gensim-4.3.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (26.7 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m26.7/26.7 MB[0m [31m77.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading numpy-1.26.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (18.3 MB)
[2K   [90m━━━━━━━━━━━

In [1]:
from gensim.downloader import load

## Now we will load the GloVe model.
This one has 100 numbers to describe each word.

In [2]:
model = load('glove-wiki-gigaword-100')



## Skip-Gram Example
Let's say we give the model one word, like 'man'
It will tell us other words that often appear in similar situations

In [3]:
word = 'man'
print(f"Top 5 words similar to '{word}' (Skip-Gram style):")
similar_words = model.most_similar(word, topn=5)

Top 5 words similar to 'man' (Skip-Gram style):


### Show each similar word with how close it is (score)


In [4]:
for w, score in similar_words:
    print(f"{w}: {score:.4f}")

woman: 0.8323
boy: 0.7915
one: 0.7789
person: 0.7527
another: 0.7522


## CBOW Example
Now we will give the model a group of words that give context.

The model will try to guess which word fits best in the middle.

In [12]:
context_words = ['bread', 'butter', 'cheese', 'milk']
average_vector = sum(model[word] for word in context_words) / len(context_words)
print(f"\nTop 5 words similar to the context {context_words} (CBOW style):")
predicted_words = model.similar_by_vector(average_vector, topn=5)


Top 5 words similar to the context ['bread', 'butter', 'cheese', 'milk'] (CBOW style):


### Show each predicted word and how close it is

In [13]:
for word, score in predicted_words:
    print(f"{word}: {score:.4f}")

butter: 0.9039
cheese: 0.9017
bread: 0.8798
cream: 0.8507
milk: 0.8370
