### Fase 1: Ingestion pipeline

#### 1.1 Extract

In [1]:
from langchain_community.document_loaders import PyPDFDirectoryLoader

loader = PyPDFDirectoryLoader("../../aulas/1_LangChain_RAG")
docs = loader.load()

  from .autonotebook import tqdm as notebook_tqdm


#### 1.2 Transform

In [2]:
from langchain_classic.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1500,
    chunk_overlap=200,
    separators=["\n\n", "\n", ". ", " ", ""]
)

chunks = text_splitter.split_documents(docs)
print(f"Total created chunks: {len(chunks)}")

Total created chunks: 98


In [3]:
from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings_model = GoogleGenerativeAIEmbeddings(model="gemini-embedding-001")

#### 1.3 Load

In [4]:
from langchain_community.vectorstores import Chroma

vectorstore = Chroma.from_documents(
    documents=chunks,
    embedding=embeddings_model
)

print("Database created successfully!")

Database created successfully!


### Fase 2: Building a advanced retrieval system

#### 2.1 Hybrid search

In [6]:
from langchain_classic.retrievers import BM25Retriever, EnsembleRetriever

# 1. Lexical retriever (BM25)
bm25_rtvr = BM25Retriever.from_documents(chunks, k=5)

# 2. Semantic retriever (ChromaDB)
vctr_rtvr = vectorstore.as_retriever(search_kwargs={"k": 5})

# 1. Ensemble retriever (BM25)
ensemble_rtvr = EnsembleRetriever(
    retrievers=[bm25_rtvr, vctr_rtvr],
    weights=[0.5, 0.5]
)

### Fase 3: Robust convesational chain

#### 3.1 Memory chain and query transformation

In [7]:
from langchain_classic.chains import ConversationalRetrievalChain
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain_classic.memory import ConversationBufferMemory

llm = ChatGoogleGenerativeAI(model="gemini-2.5-flash", temperature=0)

memory = ConversationBufferMemory(
    memory_key="chat_history",
    return_messages=True,
    output_key="answer"
)

qa_chain = ConversationalRetrievalChain.from_llm(
    llm=llm,
    retriever=ensemble_rtvr,
    memory=memory,
    return_source_documents=True
)

In [8]:
resp1 = qa_chain.invoke({"question": "What's adaptative chunking?"})
print(resp1["answer"])

Adaptive chunking is a technique for intelligently dividing long documents into smaller pieces (chunks) while preserving their semantic context. This process is crucial for efficient processing and retrieval, and for the quality of the responses.

The strategy for chunking should be adapted to the type of data, such as continuous texts, tables, source code, or structured documents. Common methods include:
*   **Fixed-Size Chunking:** Dividing by a fixed number of tokens or characters.
*   **Sliding Window:** Using overlap (overlap) to maintain context between chunks.
*   **Recursive Splitting:** Dividing based on the semantic structure of the text.


In [9]:
resp2 = qa_chain.invoke({"question": "And what are the main strategies?"})
print(resp2["answer"])

The main strategies for adaptive chunking are:

*   **Fixed-Size Chunking:** Dividing documents by a fixed number of tokens or characters.
*   **Sliding Window:** Using overlap between chunks to maintain context.
*   **Recursive Splitting:** Dividing based on the semantic structure of the text.


### Fase 4: RAGAS evaluation system

In [10]:
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import (
    _faithfulness,
    _answer_relevancy,
    _context_recall,
    _context_precision
)


All support for the `google.generativeai` package has ended. It will no longer be receiving 
updates or bug fixes. Please switch to the `google.genai` package as soon as possible.
See README for more details:

https://github.com/google-gemini/deprecated-generative-ai-python/blob/main/README.md

  import google.generativeai as genai  # type: ignore[import-not-found]


In [13]:
questions_list = [
    "O que é RAG e qual problema ele soluciona?",
    "Quais os componentes essenciais do RAG?",
    "Qual a diferença entre busca lexical e semântica?",
    "O que mede a métrica faithfulness do RAGAS?"
]
rag_key_points = [
    "RAG (Retrieval-Augmented Generation) é uma arquitetura que combina um motor de busca para recuperar informações com um LLM para gerar respostas. Ele soluciona problemas como alucinações e conhecimento desatualizado dos LLMs.",
    "Os componentes essenciais são: Embeddings, Banco de Dados Vetorial, Chunking e um Modelo de Linguagem (LLM).",
    "Busca lexical (como BM25) encontra correspondências exatas de termos, enquanto a busca semântica captura o significado e o contexto, mesmo com palavras diferentes.",
    "A métrica Faithfulness mede se a resposta gerada é suportada e factualmente consistente com os documentos recuperados, evitando alucinações."
]

In [15]:
generated_responses = []
retrieved_contexts = []
for question in questions_list:
    result = qa_chain.invoke({"question": question})
    generated_responses.append(result['answer'])
    retrieved_contexts.append([doc.page_content for doc in result['source_documents']])

dataset_dict = {
    'question': questions_list,
    'answer': generated_responses,
    'contexts': retrieved_contexts,
    'ground_truth': rag_key_points
}
dataset = Dataset.from_dict(dataset_dict)

evaluation_result = evaluate(
    dataset=dataset,
    metrics=[
        _faithfulness,
        _answer_relevancy,
        _context_precision,
        _context_recall,
    ],
    llm=ChatGoogleGenerativeAI(model="gemini-1.5-pro-latest"),
    embeddings=embeddings_model
)

evaluation_results_df = evaluation_result.to_pandas()
print("\nEvaluation results:")
display(evaluation_results_df)

Evaluating:   0%|          | 0/16 [00:00<?, ?it/s]Exception raised in Job[13]: ChatGoogleGenerativeAIError(Error calling model 'gemini-1.5-pro-latest' (NOT_FOUND): 404 NOT_FOUND. {'error': {'code': 404, 'message': 'models/gemini-1.5-pro-latest is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}})
Evaluating:   6%|▋         | 1/16 [01:07<16:50, 67.38s/it]Exception raised in Job[2]: ChatGoogleGenerativeAIError(Error calling model 'gemini-1.5-pro-latest' (NOT_FOUND): 404 NOT_FOUND. {'error': {'code': 404, 'message': 'models/gemini-1.5-pro-latest is not found for API version v1beta, or is not supported for generateContent. Call ListModels to see the list of available models and their supported methods.', 'status': 'NOT_FOUND'}})
Evaluating:  12%|█▎        | 2/16 [01:11<07:06, 30.44s/it]Exception raised in Job[6]: ChatGoogleGenerativeAIError(Error calling model


Evaluation results:


Unnamed: 0,user_input,retrieved_contexts,response,reference,faithfulness,answer_relevancy,context_precision,context_recall
0,O que é RAG e qual problema ele soluciona?,[POR QUE RAG É A REVOLUÇÃO\nReduz Alucinações\...,RAG (Retrieval-Augmented Generation) é uma sol...,RAG (Retrieval-Augmented Generation) é uma arq...,,,,
1,Quais os componentes essenciais do RAG?,[alura\nNormalização de Embeddings\n● A normal...,Os componentes essenciais do RAG são:\n\n* *...,"Os componentes essenciais são: Embeddings, Ban...",,,,
2,Qual a diferença entre busca lexical e semântica?,[alura\nHybrid Search (Busca Híbrida)\nBusca s...,A diferença entre busca lexical e semântica é ...,Busca lexical (como BM25) encontra correspondê...,,,,
3,O que mede a métrica faithfulness do RAGAS?,[alura\nMétricas de Geração\nO RAGAS fornece m...,"A métrica ""Faithfulness"" (Factualidade) do RAG...",A métrica Faithfulness mede se a resposta gera...,,,,


### Fundamental Concepts

**BM25**, part of the **Okapi BM** family of functions, is a classic ranking technique used to estimate the relevance of documents in information retrieval systems. It is based on the frequency with which query terms appear in each document, balancing this factor against the document's length and the rarity of the terms across the entire collection. This approach was designed to overcome the limitations of simpler models by adjusting the impact of high-frequency words and favoring documents where the terms are more significant.



### How BM25 Works

The algorithm uses a formula that sums the individual contributions of each query term. Generally, for each term, it considers:

* **Term Frequency (TF):** How often the term appears in the document.
* **Inverse Document Frequency (IDF):** The relative importance of the term in the collection, based on how rare it is.
* **Normalization Factor:** An adjustment for document length.

The combination of these elements allows BM25 to compute a score that reflects how well a document matches a specific query. This score is calculated taking into account that longer documents tend to have higher term counts; therefore, **normalization** is essential to avoid bias toward length.

### Comparative Analysis with Other Approaches

While semantic search techniques—which utilize vector representations—are on the rise, BM25 maintains significant advantages. As a **lexical method**, its interpretation is straightforward, allowing for a better understanding of how and why a document was deemed relevant. On the other hand, because it does not capture deep semantic relationships between terms, it may be less effective in situations where context or synonyms play a crucial role.



In practical scenarios, combining BM25 with vector search approaches (**Hybrid Search**) has proven highly advantageous. It allows systems to leverage the best of both worlds: the robust lexical analysis of BM25 and the semantic similarity capabilities of embeddings. This integration leads to more precise and robust information retrieval, especially in diverse databases.

This deep dive demonstrates that even in a world increasingly driven by AI techniques, established methods like BM25 remain relevant and effectively complement modern data retrieval strategies.

In this lesson, we learned:

* **Using PDFs as a knowledge base** in RAG systems and chatbots.
* **Setting up the environment** with essential libraries such as LangChain, Google Generative AI, and ChromaDB.
* **Processing and splitting PDF documents into chunks** using the `RecursiveCharacterTextSplitter`.



* **Implementing embeddings with Google Generative AI** for high-performance vector representations.
* **Creating a vector database with ChromaDB** and indexing both chunks and embeddings.
* **Developing a retrieval system using hybrid search**, combining BM25 and a vector retriever.



* **Configuring a conversational chain with memory** using `ConversationBufferMemory`.
* **Utilizing the Google Gemini model** to generate deterministic responses within conversational chains.