**Each of the headings related to embeddings, specifically using the Word2Vec model from the Gensim library**

**1. Traning Data**

In [10]:
import gensim
from gensim.models import Word2Vec

# Sample training data (a corpus of sentences)
sentences = [
    ['I', 'love', 'deep', 'learning'],
    ['deep', 'learning', 'is', 'fun'],
    ['natural', 'language', 'processing', 'is', 'a', 'branch', 'of', 'AI'],
    ['Word2Vec', 'is', 'a', 'great', 'embedding', 'technique']
]

**2. Context Window**

In [11]:
# Define the context window size
context_window = 2  # Example window size

**3. Model Training**

In [12]:
# Train Word2Vec model
model = Word2Vec(sentences, vector_size=50, window=context_window, min_count=1, sg=1)

# Save the model
model.save("word2vec.model")

**4. Output (Word Embeddings)**


In [13]:
# Load the model (if needed)
model = Word2Vec.load("word2vec.model")

# Get the embedding for the word 'deep'
vector = model.wv['deep']
print("Embedding for 'deep':", vector)

Embedding for 'deep': [-0.01631583  0.0089916  -0.00827415  0.00164907  0.01699724 -0.00892435
  0.009035   -0.01357392 -0.00709698  0.01879702 -0.00315531  0.00064274
 -0.00828126 -0.01536538 -0.00301602  0.00493959 -0.00177605  0.01106732
 -0.00548595  0.00452013  0.01091159  0.01669191 -0.00290748 -0.01841629
  0.0087411   0.00114357  0.01488382 -0.00162657 -0.00527683 -0.01750602
 -0.00171311  0.00565313  0.01080286  0.01410531 -0.01140624  0.00371764
  0.01217773 -0.0095961  -0.00621452  0.01359526  0.00326295  0.00037983
  0.00694727  0.00043555  0.01923765  0.01012121 -0.01783478 -0.01408312
  0.00180291  0.01278507]


In [14]:
# Example sentence
sentence = "I love deep learning"

# Context window of size 2
context_window_size = 2

# Tokenizing the sentence
words = sentence.split()

# Creating context windows
context_windows = [
    (words[max(0, i - context_window_size):min(len(words), i + context_window_size + 1)], words[i])
    for i in range(len(words))
]

print("Context windows:")
for context, target in context_windows:
    print(f"Context: {context}, Target: {target}")

Context windows:
Context: ['I', 'love', 'deep'], Target: I
Context: ['I', 'love', 'deep', 'learning'], Target: love
Context: ['I', 'love', 'deep', 'learning'], Target: deep
Context: ['love', 'deep', 'learning'], Target: learning


**5. Improved Performance:**
Use the word embeddings in a machine learning task. Here, we'll create a simple example where we check the similarity between words.


In [15]:
# Check similarity between words
similarity = model.wv.similarity('deep', 'learning')
print("Similarity between 'deep' and 'learning':", similarity)

# Find most similar words to 'deep'
similar_words = model.wv.most_similar('deep')
print("Words most similar to 'deep':", similar_words)

Similarity between 'deep' and 'learning': 0.011071963
Words most similar to 'deep': [('I', 0.22978782653808594), ('technique', 0.12486250698566437), ('of', 0.08061248809099197), ('love', 0.07399576157331467), ('is', 0.04237300902605057), ('language', 0.018277151510119438), ('Word2Vec', 0.011398451402783394), ('learning', 0.011071980930864811), ('processing', 0.0013571369927376509), ('AI', -0.01201754529029131)]


**6. Semantic Understanding:**
Illustrate how embeddings capture semantic relationships by showing vector arithmetic.

In [16]:
# Example of vector arithmetic: 'deep' - 'learning' + 'fun' ≈ 'is'
vector_deep = model.wv['deep']
vector_learning = model.wv['learning']
vector_fun = model.wv['fun']

# Vector arithmetic
result_vector = vector_deep - vector_learning + vector_fun

# Find the word that is most similar to the result_vector
most_similar_word = model.wv.similar_by_vector(result_vector, topn=1)
print("Result of 'deep' - 'learning' + 'fun':", most_similar_word)


Result of 'deep' - 'learning' + 'fun': [('fun', 0.6047247648239136)]
