# 🧠 Lezione: Modelli di Embedding (Incorporazione) per RAG

## 📌 Cosa sono gli Embedding?

Gli **embedding** sono rappresentazioni vettoriali dense del testo, che catturano il significato semantico delle frasi o dei documenti.

Vengono utilizzati per:

* 🔍 Recupero semantico (retrieval)
* 🤖 Classificazione o clustering
* 🤝 Similarità tra domande e contesti nelle pipeline RAG

---

## 🧭 Due categorie di modelli di embedding

| Categoria          | Esempi                                                          | Caratteristiche                      |
| ------------------ | --------------------------------------------------------------- | ------------------------------------ |
| **🔓 Open Source** | `sentence-transformers`, `BAAI/bge-*`, `jina-embeddings-v4`     | Richiede download + inferenza locale |
| **🔑 API-based**   | `OpenAI`, `Cohere`, `Clarifai`, `Mistral`, `Google`, `NLPCloud` | Richiede API Key + pagamento         |

---

## 🧪 Caso studio: Open Source con `sentence-transformers`

### ✅ Setup ambiente

Installa i pacchetti (CPU version):

```bash
pip install torch torchvision torchaudio
pip install --user sentence-transformers
```

### ✅ Esempio: uso di `paraphrase-MiniLM-L6-v2`

```python
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("paraphrase-MiniLM-L6-v2")
embeddings = model.encode(["Testo del ristorante"])
```

🧾 *Caratteristiche del modello:*

* 384 dimensioni
* Ottimo per hardware limitato (CPU compatibile)
* Buona qualità per task base

📉 *Contro:* performance inferiori rispetto ai modelli API come OpenAI

In [1]:
from dotenv import load_dotenv
import os
load_dotenv()

openai_api_key = os.getenv("OPENAI_API_KEY")

In [2]:
with open("./data/restaurant.txt") as f:
    raw_data = f.read()

In [6]:
from langchain.text_splitter import CharacterTextSplitter


text_splitter = CharacterTextSplitter(
    chunk_size=200, 
    chunk_overlap=20,
    length_function=len, 
    is_separator_regex=False,
    separator="\n"
    )

texts = text_splitter.split_text(raw_data)

texts

Created a chunk of size 329, which is longer than the specified 200
Created a chunk of size 331, which is longer than the specified 200
Created a chunk of size 291, which is longer than the specified 200
Created a chunk of size 376, which is longer than the specified 200
Created a chunk of size 291, which is longer than the specified 200


['In the charming streets of Palermo, tucked away in a quaint alley, stood Chef Amico, a restaurant that was more than a mere eateryâ€”it was a slice of Sicilian heaven. Founded by Amico, a chef whose name was synonymous with passion and creativity, the restaurant was a mosaic of his lifeâ€™s journey through the flavors of Italy.',
 'Chef Amicoâ€™s doors opened to a world where the aromas of garlic and olive oil were as welcoming as a warm embrace. The walls, adorned with photos of Amicoâ€™s travels and family recipes, spoke of a rich culinary heritage. The chatter and laughter of patrons filled the air, creating a symphony as delightful as the dishes served.',
 "One evening, as the sun cast a golden glow over the city, a renowned food critic, Elena Rossi, stepped into Chef Amico. Her mission was to uncover the secret behind the restaurant's growing fame. She was greeted by Amico himself, whose eyes sparkled with the joy of a man who loved his work.",
 'Elena was led to a table adorned

In [7]:
import torch
print(torch.cuda.is_available())

True


In [9]:
from sentence_transformers import SentenceTransformer # lib per creazione di embeddings

model = SentenceTransformer("paraphrase-MiniLM-L6-v2")

embedding_huggingface = model.encode(texts)

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


In [11]:
embedding_huggingface[0]

array([ 1.26175195e-01,  4.17751312e-01, -1.12065352e-01,  4.59996685e-02,
       -3.20104808e-01, -2.77941704e-01,  1.84422135e-01, -1.41149208e-01,
        9.47151929e-02, -1.57855358e-02,  2.84207791e-01, -2.01127782e-01,
       -5.12927212e-02,  1.25655025e-01,  2.72517800e-01, -3.62387508e-01,
        2.35507414e-01, -8.82826075e-02,  2.03624561e-01,  4.81936894e-02,
       -3.39871831e-02, -1.03866749e-01, -9.32255909e-02,  2.22074986e-01,
        3.85922521e-01, -1.90588295e-01,  3.89328897e-01,  2.90763766e-01,
       -6.22040890e-02, -6.92462847e-02,  1.97223097e-01, -1.65435240e-01,
        1.78786457e-01, -2.32762955e-02, -1.31499365e-01,  2.63680458e-01,
       -7.40773231e-02, -2.39875495e-01,  1.49778888e-01,  2.47147977e-02,
        1.14711896e-01,  1.52374089e-01, -1.07586913e-01, -2.28516608e-01,
        1.58248842e-01, -1.97335720e-01,  1.25389770e-01,  1.16207048e-01,
       -3.07203364e-02, -1.14177540e-01, -5.13785899e-01,  5.75092360e-02,
        2.72504222e-02, -

In [12]:
len(embedding_huggingface[0])

384

---

## ☁️ Embedding via OpenAI API

### ✅ Esempio d’uso

In [13]:
from langchain_openai import OpenAIEmbeddings

emebeddings = OpenAIEmbeddings(model="text-embedding-3-small") #dimensione fissa degli embedidngs 1536

In [15]:
vectors = [emebeddings.embed_query(text) for text in texts]

In [16]:
vectors[0]

[-0.011215649545192719,
 -0.05614090710878372,
 -0.034062765538692474,
 -0.000228556600632146,
 0.05477383732795715,
 -0.06972043961286545,
 -0.0009626433602534235,
 -0.0018996541621163487,
 8.872365810930205e-07,
 -0.07605452090501785,
 -0.00364836142398417,
 -0.04980682581663132,
 0.003622728865593672,
 -0.015345333144068718,
 -0.01475293654948473,
 0.0038135487120598555,
 -0.006641669664531946,
 0.04130822420120239,
 0.05554850772023201,
 -0.0150833111256361,
 0.057280126959085464,
 0.01673518493771553,
 -0.06402432918548584,
 0.009466942399740219,
 0.030667880550026894,
 0.006778376176953316,
 -0.02506290376186371,
 0.03807282820343971,
 0.026019850745797157,
 -0.006932171527296305,
 0.026954013854265213,
 -0.029232461005449295,
 0.036250073462724686,
 -0.03732094168663025,
 0.016871891915798187,
 0.01193905621767044,
 0.027478056028485298,
 -0.0422423854470253,
 -0.01721365749835968,
 -0.019127553328871727,
 -0.04162720590829849,
 0.008082786574959755,
 0.06101677939295769,
 0.040

In [17]:
len(vectors[0])

1536

In [18]:
emebeddings = OpenAIEmbeddings(model="text-embedding-3-large") # di default dimensione a 3072

In [19]:
vectors = [emebeddings.embed_query(text) for text in texts]

len(vectors[0])

3072

Con `text-embedding-3-large` si possono creare embeddings di varia dimensione che va da un minimo di 256 a un massimo di 3072. Questo perchè non tutti vectorstores sono in grado di gestire 3072 dimensions dei vettori.

Il massimo per il PGVector (vectorstore di postgres) è di 1536.

### 📏 Dimensioni dei vettori

| Modello                  | Dimensioni | Velocità     | Qualità   | Note                                      |
| ------------------------ | ---------- | ------------ | --------- | ----------------------------------------- |
| `text-embedding-ada-002` | 1536       | ⚡ Veloce     | 🟡 Media  | Default                                   |
| `text-embedding-3-small` | 1536       | ⚡ Veloce     | 🟢 Buona  | Consigliato                               |
| `text-embedding-3-large` | 3072       | 🐢 Più lento | 🔵 Ottima | ⚠️ Alcuni vectorstore non supportano 3072 |

### 🔧 Cambiare dimensione (es. per compatibilità con vectorstore)

```python
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large",
    dimensions=1536  # compatibile con PGVector
)
```

🔽 Dimensioni minime supportate: **256**

---

## ⚖️ Confronto Open Source vs OpenAI

| Criterio          | Open Source                        | OpenAI (API)                        |
| ----------------- | ---------------------------------- | ----------------------------------- |
| Setup             | 🧱 Richiede installazione & RAM    | ☁️ Richiede API Key                 |
| Inferenzia        | 🖥️ Locale (CPU/GPU)               | 🌍 Cloud-based                      |
| Latency           | 🐌 Prima esecuzione più lenta      | ⚡ Ottima                            |
| Qualità           | 🟡 Buona (MiniLM), 🟢 Ottima (bge) | 🔵 Ottima (text-embedding-3)        |
| Dimensione output | 384–1024                           | 256–3072                            |
| Costo             | ✅ Gratuito                         | 💰 A pagamento (es. \$0.0001/token) |

---

## ✅ Raccomandazioni pratiche

* ⚙️ **Per prototipi e test locali:** usa `sentence-transformers` (es. MiniLM)
* 📦 **Per app produttive:** preferisci `text-embedding-3-small` a **1536 dim** (ottimo bilanciamento qualità/prezzo)
* 🧠 **Per massima qualità:** `text-embedding-3-large` con **3072 dim**, se supportato dal vectorstore
* 🚧 Verifica il supporto del tuo vectorstore:

  * PGVector: max 1536
  * Chroma: flessibile
  * Qdrant: fino a 4096

---

## 👀 Prossima Lezione

➡️ **Prompt Engineering per Query**: come scrivere domande efficaci che migliorano retrieval e precisione.
