# 4. Embedding Model
> **Embeddings** models are specialized algorithms that reduces High Dimensional Data (such as text, images, audio, video) into a Low Dimensional Space of Dense Vectors.

> **LLMs** are effective Artificial Neural Networks Pre-Trained on gigantic corpus of Textual Data.

> While both the Embeddings and LLMs are rooted to Neural Networks, they employ distinct methodologies. LLMs are designed for generating coherent and contextually relevant text, while Embeddings are focused on mapping words, phrases, sentences into dense vectors for capturing semantic relationship. 

**Points to remember:**
1. **Embedding Models:**  
   These models map words, phrases, or entire sentences into dense vector spaces where semantic relationships are maintained. This means that similar meanings are clustered (positioned close) together in this space.

2. **Contrastive Loss:**  
   A technique used in embedding models to differentiate between similar and dissimilar pairs during training. It helps the model learn which items should be close in the embedding space and which should be far apart.

3. **Positive and Negative Sampling:**  
   - **Positive Samples (Minimizes the distance between Positive Pairs):** These are similar items, like synonyms or related sentences. They help the model learn what should be grouped together.  
   - **Negative Samples (Maximizes the distance between Negative Pairs):** These are dissimilar items, like unrelated words or sentences, used to teach the model what should be kept apart.

<img src="./assets/contrastive_loss.png"/>

### **When to Use Embedding Models**
Embedding models are best for tasks that involve understanding and leveraging semantic relationships within data. They are ideal for:
- **Semantic Similarity:** Finding or recommending similar items (documents, products) based on content.  
- **Clustering:** Grouping entities with similar semantic properties.  
- **Information Retrieval:** Enhancing search capabilities by understanding query meanings.

### **When to Use Large Language Models (LLMs)**
LLMs are suited for tasks that require deep understanding, text generation, or both. They excel in:
- **Content Creation:** Generating coherent and stylistically correct text (e.g., writing a movie synopsis).  
- **Conversational AI:** Building chatbots and virtual assistants capable of natural, human-like conversations.  
- **Language Translation:** Handling complex translations with cultural and linguistic nuances.

### **Key Takeaway**
- **Embedding Models:** Focus on compact, semantic representation for search, recommendation, and clustering.  
- **LLMs:** Excel in generating, interpreting, and understanding complex text.  

---

### **Key Differences Across Embedding Types**

| **Embedding Type**               | **Purpose**                                      | **Strengths**                                   | **Examples**                       |
|----------------------------------|--------------------------------------------------|-------------------------------------------------|-----------------------------------|
| **Word Embeddings**              | Represent individual words.                      | Captures basic semantic similarity.             | Word2Vec, GloVe, fastText         |
| **Sentence/Document Embeddings** | Represent longer texts.                          | Captures overall meaning of sentences/docs.     | Doc2Vec, BERT                    |
| **Contextual Embeddings**        | Dynamic word meaning based on context.           | Context-aware, handles polysemy (multiple meanings). | BERT, ELMo, GPT                |
| **Specialized Embeddings**       | Focus on specific linguistic properties.         | Handles rare words, dialects, and complex languages. | fastText                       |
| **Image Embeddings**             | Represent visual data as vectors.                | Captures image features (edges, textures, patterns). | VGG, ResNet                    |
| **Audio Embeddings**             | Represent audio signals in vector space.         | Captures temporal and spectral audio features.  | OpenL3, VGGish                 |
| **Video Embeddings**             | Capture spatial and temporal video information. | Useful for video analysis and action recognition. | 3D CNNs, I3D                  |
| **Graph Embeddings**             | Represent nodes and their relationships.         | Models complex network structures.              | Node2Vec, DeepWalk             |
| **JSON Embeddings**              | Represent structured hierarchical data.          | Captures nested relationships in JSON data.     | Tree-LSTM, json2vec           |
| **Multi-modal Embeddings**       | Integrate multiple data types (text, image, audio). | Enables cross-modal understanding and reasoning. | CLIP, LXMERT, ViLBERT, VisualBERT |

---

### **Summary of Use Cases**

- **Word Embeddings:**  
  Best for basic semantic similarity tasks (e.g., word similarity, basic search).  

- **Sentence/Document Embeddings:**  
  Ideal for document classification, summarization, and information retrieval.  

- **Contextual Embeddings:**  
  Best for sentiment analysis, question-answering, and tasks needing deep contextual understanding.  

- **Specialized Embeddings:**  
  Useful for handling rare words, multilingual content, and domain-specific analysis.  

- **Image Embeddings:**  
  Perfect for image classification, object detection, and pattern recognition tasks.  

- **Audio Embeddings:**  
  Essential for audio event detection, music genre classification, and speech recognition.  

- **Video Embeddings:**  
  Applied in video action recognition, surveillance, and sports analytics.  

- **Graph Embeddings:**  
  Useful for social network analysis, fraud detection, and recommendation systems.  

- **JSON Embeddings:**  
  Suitable for tasks involving hierarchical data structures like semantic parsing.  

- **Multi-modal Embeddings:**  
  Critical for combining text, image, audio, and video data in applications like AI assistants, autonomous driving, and multimedia search engines.

---