# Lesson 3: Embedding Models for Retrieval

**Objective**: Understand the role of embeddings in representing document chunks for retrieval.

**Topics**:
- Overview of embedding models: LLM-Embedder, BAAI/bge, etc.
- Selecting the right embedding model
- Integrating embeddings into the retrieval pipeline

**Practical Task**: Implement and test embedding models on the chunked documents.

**Resources**:
- Choosing and embedding model
- How to select an embedding model
- [Mastering RAG: How to Select an Embedding Model](https://www.rungalileo.io/blog/mastering-rag-how-to-select-an-embedding-model#:~:text=Embeddings%20encode%20the%20semantics%20of,efficient%20and%20user%20friendly%20experience.)
- [Vector Embeddings in RAG Applications](https://wandb.ai/mostafaibrahim17/ml-articles/reports/Vector-Embeddings-in-RAG-Applications--Vmlldzo3OTk1NDA5)
- [Vector Embeddings in RAG Applications](https://medium.com/thedeephub/vector-embeddings-in-rag-applications-9ea8043c172b)
- [Embeddings leaderboard](https://huggingface.co/spaces/mteb/leaderboard)


## Load the datasets

In [3]:
from langchain_community.document_loaders import PyPDFLoader
from dotenv import load_dotenv

load_dotenv()

file_path = (
    "./data/Regulaciones cacao y chocolate 2003.pdf"
)
loader = PyPDFLoader(file_path)
docs = loader.load_and_split()

In [4]:
docs

[Document(metadata={'source': './data/Regulaciones cacao y chocolate 2003.pdf', 'page': 0}, page_content='Status:  This is the original version (as it was originally made).\nSTATUTORY INSTRUMENTS\n2003 No. 1659\nFOOD, ENGLAND\nThe Cocoa and Chocolate Products (England) Regulations 2003\nMade       -      -      -      - 25th June 2003\nLaid before Parliament 3rd July 2003\nComing into force       -      - 3rd August 2003\nThe Secretary of State, in exercise of the powers conferred by sections 16(1)(e), 17(1), 26(1) and (3)\nand 48(1) of the Food Safety Act 1990(1) and now vested in him(2) and of all other powers enabling\nhim in that behalf, having had regard in accordance with section 48(4A) of that Act to relevant\nadvice given by the Food Standards Agency, and after consultation both as required by Article 9\nof Regulation (EC) No. 178/2002 of the European Parliament and of the Council laying down the\ngeneral principles and requirements of food law, establishing the European Food S

## Dense Embeddings

-Dense embeddings are continuous, low-dimensional vectors where each dimension holds meaningful information. They are typically generated using neural networks and represent data in a compressed form. Dense embeddings are commonly used in deep learning models, capturing semantic similarity between inputs like documents and queries. In RAG, dense embeddings help retrieve documents that are semantically similar to a query.

In [5]:
from dotenv import load_dotenv
from langchain_qdrant import FastEmbedSparse, RetrievalMode
from langchain_qdrant import QdrantVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain_qdrant import RetrievalMode
from langchain_huggingface import HuggingFaceEmbeddings

load_dotenv()

embeddings = OpenAIEmbeddings()
open_source_embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/paraphrase-MiniLM-L6-v2")
sparse_embeddings = FastEmbedSparse(model_name="Qdrant/bm25")


qdrant = QdrantVectorStore.from_documents(
    docs,
    embedding=open_source_embeddings,
    location=":memory:",
    collection_name="my_documents",
    retrieval_mode=RetrievalMode.DENSE,
)

query = "What did the president say about Ketanji Brown Jackson"
found_docs = qdrant.similarity_search(query)

  from .autonotebook import tqdm as notebook_tqdm
Fetching 29 files: 100%|██████████| 29/29 [00:02<00:00, 10.54it/s]


In [6]:
found_docs

[Document(metadata={'source': './data/Regulaciones cacao y chocolate 2003.pdf', 'page': 0, '_id': '6889bf9aaf1d4daf963b7d8fa5ff43ac', '_collection_name': 'my_documents'}, page_content='Status:  This is the original version (as it was originally made).\nSTATUTORY INSTRUMENTS\n2003 No. 1659\nFOOD, ENGLAND\nThe Cocoa and Chocolate Products (England) Regulations 2003\nMade       -      -      -      - 25th June 2003\nLaid before Parliament 3rd July 2003\nComing into force       -      - 3rd August 2003\nThe Secretary of State, in exercise of the powers conferred by sections 16(1)(e), 17(1), 26(1) and (3)\nand 48(1) of the Food Safety Act 1990(1) and now vested in him(2) and of all other powers enabling\nhim in that behalf, having had regard in accordance with section 48(4A) of that Act to relevant\nadvice given by the Food Standards Agency, and after consultation both as required by Article 9\nof Regulation (EC) No. 178/2002 of the European Parliament and of the Council laying down the\nge

### Huggingface embeddings

In [7]:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

In [8]:
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
text = "This is a test document."
query_result = embeddings.embed_query(text)
query_result[:3]

[-0.04895174130797386, -0.03986189514398575, -0.021562783047556877]

In [9]:
len(query_result)

768

## Sparse embeddings

- Sparse embeddings are high-dimensional vectors, where most elements are zero, and only a few dimensions hold non-zero values. They are often derived from traditional methods like TF-IDF or BM25, focusing on specific keywords or tokens present in the text. In the context of RAG, sparse embeddings help match documents and queries based on exact term overlap rather than broader semantic relationships.

In [10]:
from fastembed.sparse.bm25 import Bm25

bm25_embedding_model = Bm25("Qdrant/bm25")
bm25_embeddings = list(bm25_embedding_model.passage_embed(docs[0].page_content))
len(bm25_embeddings[0].values)


Fetching 29 files: 100%|██████████| 29/29 [00:00<00:00, 270901.59it/s]


135

In [11]:
sparse_embeddings_model = FastEmbedSparse(model_name="Qdrant/bm25")
sparse_embeddings = sparse_embeddings_model.embed_query("This is a test document.")
sparse_embeddings

Fetching 29 files: 100%|██████████| 29/29 [00:00<00:00, 273952.29it/s]


SparseVector(indices=[926244272, 1167338989], values=[1.0, 1.0])

## Late interaction embeddings

- Late interaction embeddings refer to a hybrid approach where sparse and dense embeddings are combined during the retrieval process, but their interactions are delayed until the final ranking or matching phase. This allows the model to first retrieve candidates based on simpler, faster methods and then refine the ranking through more complex, dense representations. In RAG, this helps improve retrieval quality while balancing computational efficiency.

In [12]:
from fastembed.late_interaction import LateInteractionTextEmbedding

late_interaction_embedding_model = LateInteractionTextEmbedding("colbert-ir/colbertv2.0")
late_interaction_embeddings = list(late_interaction_embedding_model.passage_embed(docs[0].page_content))
len(late_interaction_embeddings[0])

Fetching 5 files: 100%|██████████| 5/5 [14:12<00:00, 170.56s/it]


512