# Embeddings

Embeddings are a way to represent data—especially text—in a numerical format that machine learning models can understand. In NLP, embeddings transform words, phrases, or even entire documents into dense vectors in a continuous vector space. These vectors capture semantic relationships, so words with similar meanings are placed closer together.


## Common Types of Embeddings in NLP
- **One-Hot Encoding**
A basic method where each word is represented by a binary vector. It’s simple but doesn’t capture meaning or relationships between words.
- **TF-IDF (Term Frequency–Inverse Document Frequency)**
Weights words based on how frequently they appear in a document versus across all documents. It’s useful for feature extraction but lacks semantic understanding.
- **Word2Vec**
A neural embedding technique that learns word associations from large corpora. It comes in two flavors: Skip-gram and CBOW (Continuous Bag of Words). Famous for capturing analogies like king - man + woman ≈ queen.
- **GloVe (Global Vectors for Word Representation)**
Combines global word co-occurrence statistics with local context to produce word vectors. It’s like a hybrid between Word2Vec and matrix factorization.
- **FastText**
Developed by Facebook, it improves on Word2Vec by considering subword information (like character n-grams), which helps with rare or misspelled words.
- **ELMo (Embeddings from Language Models)**
Contextual embeddings that generate different vectors for the same word depending on its usage in a sentence.
- **BERT and Transformer-based Embeddings**
These are deep contextual embeddings from models like BERT, RoBERTa, and GPT. They understand the full context of a word by looking at both left and right surroundings.


### Vectorization vs Embeddings

### Vectorization: The Classic Approach
Vectorization is a broad term for turning text into numbers. Traditional methods include:

- **One-Hot Encoding**: Each word is a binary vector with a single 1. No context or similarity captured.
  
- **Bag of Words (BoW)**: Counts word frequency but ignores order and semantics.

  
- **TF-IDF**: Weighs words by importance across documents. Still sparse and context-agnostic.
These methods are sparse, high-dimensional, and don’t capture meaning—“cat” and “dog” are just as unrelated as “cat” and “banana.”


### Embeddings: The Semantic Revolution

Embeddings are dense, low-dimensional vectors learned from data. They capture semantic relationships between words.

- **Word2Vec, GloVe, and FastText** learn from co-occurrence patterns.
  
- Contextual embeddings like **BERT or ELMo** generate different vectors for the same word depending on its context.

  
For example, “bank” in “river bank” vs. “savings bank” will have different embeddings in BERT, but identical vectors in TF-IDF.
