## **Word Embeddings**

**Word embeddings** in NLP are dense vector representations of words that capture their meanings based on context. Unlike traditional representations like Bag of Words or TF-IDF, which treat words as isolated units, word embeddings map words to vectors in a continuous space, where semantically similar words are closer to each other.

Key characteristics of word embeddings:
- They preserve **semantic relationships** between words (e.g., "king" and "queen" are close in the vector space, while "king" and "apple" are far apart).
- They allow for capturing complex relationships, such as analogies (e.g., **"king" - "man" + "woman" = "queen"**).
  
Popular techniques for generating word embeddings include:
- **Word2Vec**: Generates embeddings by training on word co-occurrence in large text corpora using methods like Skip-Gram or Continuous Bag of Words (CBOW).
- **GloVe**: Produces embeddings by leveraging word co-occurrence statistics across the entire corpus.
- **FastText**: Extends Word2Vec by considering subword information, improving performance on morphologically rich languages.

Word embeddings are essential for tasks like machine translation, text classification, and sentiment analysis, as they help capture deeper meanings and relationships in language.

In [2]:
import gensim

In [3]:
doc=['she looks awesome',
     'she looks awful',
     'she looks amazing']

In [4]:
doc_processed=[]
for i in doc:
    doc_processed.append(gensim.utils.simple_preprocess(i))

In [5]:
doc_processed

[['she', 'looks', 'awesome'],
 ['she', 'looks', 'awful'],
 ['she', 'looks', 'amazing']]

In [6]:
w2v=gensim.models.Word2Vec(window=2,min_count=1)

In [7]:
w2v.build_vocab(doc_processed,progress_per=10)

In [8]:
print(w2v.wv.index_to_key)

['looks', 'she', 'amazing', 'awful', 'awesome']


In [9]:
w2v.corpus_count

3

In [10]:
w2v.train(doc_processed,total_examples=w2v.corpus_count,epochs=w2v.epochs)

(5, 45)

In [12]:
w2v.wv.most_similar('amazing')

[('awesome', 0.17018885910511017),
 ('awful', -0.013514922931790352),
 ('she', -0.023671654984354973),
 ('looks', -0.05234673619270325)]

In [15]:
w2v.wv.similarity('awesome','she')

0.0045030187