# Word2Vec Implementation and Usage Example

This notebook demonstrates how to work with the Word2Vec model for natural language processing (NLP). Word2Vec is a popular neural network-based model for learning vector representations of words, capturing semantic relationships between them.

---

## 1. Installing Necessary Libraries

Before running the Word2Vec model, we need to install the required Python libraries. This includes:
- `gensim`: A library for topic modeling and document similarity that supports Word2Vec.
- `numpy` : Library for numerical computations.

The first code cell installs these dependencies.

---

## 2. Importing Libraries

Once the libraries are installed, we import the necessary modules:
- `gensim.models.Word2Vec`: The main class for training and using the Word2Vec model.
- `gensim.models.KeyedVectors`: Used to handle word vectors once they are trained.

We also import common libraries like `numpy` for numerical operations.

---

## 3. Downloading model

Word2Vec requires a corpus of text, and each word in the corpus will be used to train the model. This could be any form of textual data, such as:
- Text from documents, books, or articles.
- Cleaned and tokenized sentences.

In this example, we use a pre-trained model from Google througout gensim library. Therefore, it is not needed to train a model from scratch.

---

## 4. Exploring Word Embeddings

We can access the word vectors generated by the model. This section showcases how to retrieve the vector for any given word. For example:
```python
vec_king = wv['king']

In [1]:
!pip install gensim

Collecting gensim
  Downloading gensim-4.3.3-cp310-cp310-win_amd64.whl.metadata (8.2 kB)
Collecting numpy<2.0,>=1.18.5 (from gensim)
  Downloading numpy-1.26.4-cp310-cp310-win_amd64.whl.metadata (61 kB)
Collecting scipy<1.14.0,>=1.7.0 (from gensim)
  Downloading scipy-1.13.1-cp310-cp310-win_amd64.whl.metadata (60 kB)
Collecting smart-open>=1.8.1 (from gensim)
  Downloading smart_open-7.0.4-py3-none-any.whl.metadata (23 kB)
Collecting wrapt (from smart-open>=1.8.1->gensim)
  Downloading wrapt-1.16.0-cp310-cp310-win_amd64.whl.metadata (6.8 kB)
Downloading gensim-4.3.3-cp310-cp310-win_amd64.whl (24.0 MB)
   ---------------------------------------- 0.0/24.0 MB ? eta -:--:--
   ------ --------------------------------- 3.9/24.0 MB 19.6 MB/s eta 0:00:02
   ----------------- ---------------------- 10.5/24.0 MB 26.2 MB/s eta 0:00:01
   ------------------------------ --------- 18.1/24.0 MB 30.1 MB/s eta 0:00:01
   ---------------------------------------- 24.0/24.0 MB 31.7 MB/s eta 0:00:00
Downlo

In [2]:
import gensim   

In [3]:
from gensim.models import Word2Vec, KeyedVectors

In [4]:
import gensim.downloader as api

wv = api.load('word2vec-google-news-300')



In [6]:
vec_king = wv['king']
vec_king

array([ 1.25976562e-01,  2.97851562e-02,  8.60595703e-03,  1.39648438e-01,
       -2.56347656e-02, -3.61328125e-02,  1.11816406e-01, -1.98242188e-01,
        5.12695312e-02,  3.63281250e-01, -2.42187500e-01, -3.02734375e-01,
       -1.77734375e-01, -2.49023438e-02, -1.67968750e-01, -1.69921875e-01,
        3.46679688e-02,  5.21850586e-03,  4.63867188e-02,  1.28906250e-01,
        1.36718750e-01,  1.12792969e-01,  5.95703125e-02,  1.36718750e-01,
        1.01074219e-01, -1.76757812e-01, -2.51953125e-01,  5.98144531e-02,
        3.41796875e-01, -3.11279297e-02,  1.04492188e-01,  6.17675781e-02,
        1.24511719e-01,  4.00390625e-01, -3.22265625e-01,  8.39843750e-02,
        3.90625000e-02,  5.85937500e-03,  7.03125000e-02,  1.72851562e-01,
        1.38671875e-01, -2.31445312e-01,  2.83203125e-01,  1.42578125e-01,
        3.41796875e-01, -2.39257812e-02, -1.09863281e-01,  3.32031250e-02,
       -5.46875000e-02,  1.53198242e-02, -1.62109375e-01,  1.58203125e-01,
       -2.59765625e-01,  

In [7]:
#See how much dimensions this model is been training on
vec_king.shape

(300,)

Let's find out how other words are represented

In [9]:
vec_tryout = wv['football']
vec_tryout

array([-9.76562500e-02,  3.19824219e-02,  2.57812500e-01, -4.15039062e-02,
        1.01562500e-01, -1.00585938e-01,  1.46484375e-01, -1.99218750e-01,
        1.53320312e-01,  6.34765625e-02,  8.39843750e-02, -3.00781250e-01,
        6.34765625e-02,  2.08984375e-01, -2.11914062e-01,  1.88476562e-01,
       -8.34960938e-02,  3.28125000e-01,  2.79296875e-01, -1.40625000e-01,
       -1.68945312e-01,  2.04101562e-01,  4.90722656e-02, -6.98852539e-03,
        9.42382812e-02,  9.84191895e-04,  3.12500000e-02,  2.48046875e-01,
        3.35937500e-01,  2.63671875e-01,  5.68847656e-02,  3.04687500e-01,
        1.21582031e-01, -1.97265625e-01,  1.72119141e-02,  9.96093750e-02,
        2.27539062e-01, -1.20605469e-01,  1.23535156e-01,  3.78906250e-01,
        2.36816406e-02, -1.86523438e-01,  6.29882812e-02,  1.52343750e-01,
        3.73535156e-02, -1.69921875e-01,  1.06445312e-01, -4.98046875e-02,
       -6.20117188e-02,  1.68945312e-01,  4.41894531e-02,  2.78320312e-02,
       -1.10839844e-01,  

In [10]:
vec_tryout = wv['amigo']
vec_tryout

array([-0.13476562, -0.31640625,  0.16894531,  0.484375  , -0.07421875,
       -0.10986328, -0.25390625,  0.09912109,  0.15820312,  0.01940918,
        0.01989746, -0.12988281,  0.00509644, -0.12353516,  0.05126953,
        0.12255859,  0.3203125 ,  0.23925781,  0.10742188,  0.15136719,
        0.23535156,  0.16894531,  0.37304688, -0.0703125 , -0.11083984,
       -0.2578125 , -0.11816406,  0.14257812,  0.01403809,  0.03686523,
        0.06225586,  0.08251953,  0.10888672,  0.09228516, -0.24316406,
        0.13964844,  0.19628906, -0.07519531,  0.33398438,  0.02758789,
        0.12695312, -0.07519531,  0.38085938,  0.40625   , -0.00958252,
        0.05419922, -0.27929688, -0.0267334 ,  0.08007812,  0.22363281,
       -0.40820312,  0.2734375 ,  0.26953125, -0.07177734,  0.15917969,
        0.05883789, -0.04931641,  0.16308594, -0.15527344,  0.06982422,
       -0.02490234,  0.28320312, -0.21875   ,  0.09033203, -0.10205078,
       -0.34179688, -0.49609375, -0.38476562, -0.16992188,  0.18

In [18]:
vec_tryout = wv['AI']
vec_tryout

array([ 0.18066406,  0.01342773,  0.14746094,  0.00302124, -0.16699219,
        0.00540161, -0.25976562,  0.01556396, -0.18457031, -0.11035156,
       -0.02893066,  0.00170135,  0.10107422, -0.19433594, -0.05249023,
        0.00146484,  0.28125   , -0.02954102, -0.06030273, -0.03833008,
       -0.0378418 ,  0.08984375,  0.234375  ,  0.10888672, -0.10839844,
       -0.06103516,  0.02307129,  0.16601562, -0.11669922, -0.17285156,
       -0.14160156, -0.2265625 , -0.08935547, -0.08496094, -0.27539062,
        0.17480469,  0.02062988,  0.12158203, -0.0703125 , -0.00286865,
        0.328125  , -0.00318909, -0.07666016,  0.43554688,  0.00619507,
       -0.39453125,  0.16699219, -0.11621094,  0.14648438, -0.04101562,
       -0.12695312, -0.04980469, -0.09082031,  0.05712891, -0.21484375,
        0.04101562,  0.21875   , -0.20117188,  0.05078125,  0.32617188,
       -0.046875  , -0.05395508,  0.08349609,  0.04516602, -0.20410156,
       -0.07910156,  0.35351562, -0.06787109, -0.24804688,  0.11

In [13]:
wv.most_similar('AI')

[('Steven_Spielberg_Artificial_Intelligence', 0.5575934052467346),
 ('Index_MDE_##/###/####', 0.5415324568748474),
 ('Enemy_AI', 0.5256390571594238),
 ('Ace_Combat_Zero', 0.522663414478302),
 ('DOA4', 0.5182536244392395),
 ('mechs', 0.5137375593185425),
 ('mech', 0.5077533721923828),
 ('playstyle', 0.507252037525177),
 ('AI_bots', 0.5051203370094299),
 ('deathmatch_mode', 0.5045916438102722)]

In [14]:
wv.most_similar('glad')

[('thankful', 0.7440484762191772),
 ('happy', 0.7408890724182129),
 ('grateful', 0.6907246708869934),
 ('thrilled', 0.6789539456367493),
 ('pleased', 0.6634493470191956),
 ('proud', 0.6573251485824585),
 ('delighted', 0.6416581273078918),
 ('excited', 0.619755208492279),
 ('sorry', 0.6091007590293884),
 ('overjoyed', 0.6063767671585083)]

In [15]:
wv.similarity('happy', 'glad')

0.74088913

In [16]:
operating_vec = wv['king']-wv['man']+wv['woman']
operating_vec

array([ 4.29687500e-02, -1.78222656e-01, -1.29089355e-01,  1.15234375e-01,
        2.68554688e-03, -1.02294922e-01,  1.95800781e-01, -1.79504395e-01,
        1.95312500e-02,  4.09919739e-01, -3.68164062e-01, -3.96484375e-01,
       -1.56738281e-01,  1.46484375e-03, -9.30175781e-02, -1.16455078e-01,
       -5.51757812e-02, -1.07574463e-01,  7.91015625e-02,  1.98974609e-01,
        2.38525391e-01,  6.34002686e-02, -2.17285156e-02,  0.00000000e+00,
        4.72412109e-02, -2.17773438e-01, -3.44726562e-01,  6.37207031e-02,
        3.16406250e-01, -1.97631836e-01,  8.59375000e-02, -8.11767578e-02,
       -3.71093750e-02,  3.15551758e-01, -3.41796875e-01, -4.68750000e-02,
        9.76562500e-02,  8.39843750e-02, -9.71679688e-02,  5.17578125e-02,
       -5.00488281e-02, -2.20947266e-01,  2.29492188e-01,  1.26403809e-01,
        2.49023438e-01,  2.09960938e-02, -1.09863281e-01,  5.81054688e-02,
       -3.35693359e-02,  1.29577637e-01,  2.41699219e-02,  3.48129272e-02,
       -2.60009766e-01,  

In [17]:
wv.most_similar(operating_vec)

[('king', 0.8449392318725586),
 ('queen', 0.7300517559051514),
 ('monarch', 0.645466148853302),
 ('princess', 0.6156251430511475),
 ('crown_prince', 0.5818676352500916),
 ('prince', 0.5777117609977722),
 ('kings', 0.5613663792610168),
 ('sultan', 0.5376775860786438),
 ('Queen_Consort', 0.5344247817993164),
 ('queens', 0.5289887189865112)]