#### Word Embeddings

- ##### Why Should We Represent Words as Vectors?

  - Traditional one-hot encoding does not capture semantic relationships.
  - Word embeddings map words into dense vector spaces, preserving meaning and context.
  - Helps NLP models understand similarities and analogies between words.
  - Enables more effective machine learning models by reducing dimensionality.
    <br>
    <br>

---

#### 1. Word2Vec

##### What is Word2Vec?

- Word2Vec is a neural network-based model that learns word representations in a continuous vector space, allowing similar words to have closer representations.

##### Key Features and Disadvantages

- Features:

  - Learns vector representations based on context.
  - Can capture word analogies (e.g., `king - man + woman ≈ queen`).
  - Efficient and scalable for large corpora.

- Disadvantages:

  - Produces static word embeddings (the same vector for a word in different contexts).
  - Cannot handle out-of-vocabulary words.
  - Ignores subword information.

#### Who Developed It?

- Developed by Tomas Mikolov and his team at Google in 2013.

#### Dataset Used

- Trained on large corpora such as Google News dataset (100B words).

#### Core Models

##### Continuous Bag of Words (CBOW)

- Predicts a word based on its surrounding context.
- Efficient and faster for training.

##### Skip-gram

- Predicts surrounding words given a central word.
- Works better for learning rare words.

<br>
<br>

---

#### 2. FastText

#### What is FastText?

- FastText is an extension of Word2Vec that incorporates subword information to generate better word representations.

#### Key Features and Disadvantages

- Features:

  - Uses subword embeddings, allowing it to handle out-of-vocabulary (OOV) words.
  - More effective for morphologically rich languages.
  - Retains advantages of Word2Vec while improving rare word representation.

- Disadvantages:

  - Increased computational complexity due to subword modeling.
  - Requires more storage compared to standard Word2Vec.

#### Who Developed It?

- Developed by Facebook AI Research (FAIR) in 2016.

#### Dataset Used

- Typically trained on Wikipedia and Common Crawl datasets.

<br>
<br>

---

### Word2Vec vs FastText Comparison

| Feature               | Word2Vec       | FastText   |
| --------------------- | -------------- | ---------- |
| Subword Handling      | ❌ No          | ✅ Yes     |
| OOV Words             | ❌ Not Handled | ✅ Handled |
| Computational Cost    | ✅ Lower       | ❌ Higher  |
| Rare Word Performance | ❌ Weak        | ✅ Strong  |

<br>
<br>

---

### Vector Space and Semantic Similarity

- Vector Space Representation: Each word is mapped to a high-dimensional space, where similar words appear closer to each other.
- Semantic Similarity: Words with similar meanings have higher cosine similarity in the vector space.
- Word Analogies: Relationships like _Paris - France + Italy ≈ Rome_ can be learned through vector operations.
