<a href="https://colab.research.google.com/github/nalpata/proyecto_aplicado_preservantes/blob/main/01_naive_rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Build a RAG agent with LangChain

In [21]:
!pip install -U "langchain>=0.2.0" "langchain-community" "langchain-core"




## Overview

One of the most powerful applications enabled by LLMs is sophisticated question-answering (Q&A) chatbots. These are applications that can answer questions about specific source information. These applications use a technique known as Retrieval Augmented Generation, or RAG.

This class we will learn how to build a simple Q&A application over an unstructured text data source.

We will demonstrate:
- A RAG agent that executes searches with a simple tool. This is a good general-purpose implementation.
- A two-step RAG chain that uses just a single LLM call per query. This is a fast and effective method for simple queries.

## Concepts
We will cover the following concepts:
- **Indexing**: a pipeline for ingesting data from a source and indexing it. This usually happens in a separate process.
- **Retrieval** and **generation**: the actual RAG process, which takes the user query at run time and retrieves the relevant data from the index, then passes that to the model.

Once we‚Äôve indexed our data, we will use an agent as our orchestration framework to implement the retrieval and generation steps.

In [22]:
!pip install langchain langchain-community langchain-text-splitters langchain-huggingface "langchain[openai]" pypdf




In [23]:
from dotenv import load_dotenv

load_dotenv()

True

# Components
We will need to select three components from LangChain‚Äôs suite of integrations.
1. Select a chat model:

In [24]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0
)


In [25]:
from langchain_openai import ChatOpenAI
model = ChatOpenAI(model="gpt-4o-mini")


2. Select an embeddings model:

In [26]:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

3. Select a vector store:

In [27]:
from langchain_core.vectorstores import InMemoryVectorStore
vector_store = InMemoryVectorStore(embeddings)


In [28]:
# VECTOR STORE SIMPLE PARA HITO 1

from langchain_core.vectorstores import InMemoryVectorStore

# Creamos el vector store a partir del modelo de embeddings
vector_store = InMemoryVectorStore(embeddings)

print("Vector store en memoria inicializado correctamente.")


Vector store en memoria inicializado correctamente.


# 1. Indexing

Indexing commonly works as follows:
1. **Load**: First we need to load our data. This is done with Document Loaders.
2. **Split**: Text splitters break large Documents into smaller chunks. This is useful both for indexing data and passing it into a model, as large chunks are harder to search over and won‚Äôt fit in a model‚Äôs finite context window.
3. **Store**: We need somewhere to store and index our splits, so that they can be searched over later. This is often done using a VectorStore and Embeddings model.

## Loading documents
We need to first load the blog post contents. We can use `DocumentLoaders` for this, which are objects that load in data from a source and return a list of `Document` objects.
In this case we‚Äôll use the `WebBaseLoader`, which uses urllib to load HTML from web URLs and BeautifulSoup to parse it to text.

We can customize the HTML -> text parsing by passing in parameters into the BeautifulSoup parser via bs_kwargs (see [BeautifulSoup docs](https://beautiful-soup-4.readthedocs.io/en/latest/#beautifulsoup)).

In this case only HTML tags with class ‚Äúpost-content‚Äù, ‚Äúpost-title‚Äù, or ‚Äúpost-header‚Äù are relevant, so we‚Äôll remove all others.

In [29]:
%cd /content
!git clone https://github.com/nalpata/proyecto_aplicado_preservantes.git


/content
fatal: destination path 'proyecto_aplicado_preservantes' already exists and is not an empty directory.


In [30]:
%cd /content/proyecto_aplicado_preservantes
!ls


/content/proyecto_aplicado_preservantes
 CHECKLIST_INSTALACION.md   notebooks		   run_pipeline.py
 data			   'Pauta proyecto.pdf'    SOLUCION_INSTALACION.md
 DIAGRAMAS_HITO_1.md	    PROBLEMAS_COMUNES.md   src
 examples.py		    QUICK_START.md	   streamlit_app.py
 FIX_CHROMADB.md	    README.md		   test_improvements.py
 install_dependencies.sh    requirements.txt	   TESTING_LOCAL.md
 install_fix.sh		    RESUMEN_FINAL.md	   test_single_pdf.py
 INSTRUCCIONES_PRUEBA.md    RESUMEN_HITO_1.md
 MEJORAS_CHUNKING.md	    RESUMEN_PROBLEMAS.md


In [31]:
#  Cargando documentos PDF del dominio de preservantes

from langchain_community.document_loaders import PyPDFDirectoryLoader

# Ruta PDFs
pdf_path = "data/pdfs"

loader = PyPDFDirectoryLoader(pdf_path)
docs = loader.load()

print(f" Documentos cargados: {len(docs)}")
print(f" Primer extracto (500 caracteres):\n\n{docs[0].page_content[:500]}")


 Documentos cargados: 475
 Primer extracto (500 caracteres):

antibiotics 
Review
Food Safety through Natural Antimicrobials
Emiliano J. Quinto 1, *
 , Irma Caro 1, Luz H. Villalobos-Delgado 2, Javier Mateo 3
 ,
Beatriz De-Mateo-Silleras 1 and Mar√≠a P . Redondo-Del-R√≠o 1
1 Department of Nutrition and Food Science, Faculty of Medicine, University of Valladolid, 47005 Valladolid,
Spain; irma.caro@uva.es (I.C.); bdemateo@yahoo.com (B.D.-M.-S.); pazr@ped.uva.es (M.P .R.-D.-R.)
2 Institute of Agroindustry, Technological University of the Mixteca, Huajuapan de L


## Splitting documents

Our loaded document is over 42k characters which is too long to fit into the context window of many models. Even for those models that could fit the full post in their context window, models can struggle to find information in very long inputs.

To handle this we‚Äôll split the Document into chunks for embedding and vector storage. This should help us retrieve only the most relevant parts of the blog post at run time.

As in the semantic search tutorial, we use a `RecursiveCharacterTextSplitter`, which will recursively split the document using common separators like new lines until each chunk is the appropriate size. This is the recommended text splitter for generic text use cases.

In [32]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Chunking pensado para art√≠culos cient√≠ficos largos
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1500,   # tama√±o del trozo
    chunk_overlap=200, # solapamiento para no cortar ideas a la mitad
)

all_splits = text_splitter.split_documents(docs)

print(f"üîπ Total de chunks: {len(all_splits)}")
print("üîπ Ejemplo de chunk:\n")
print(all_splits[0].page_content[:500])


üîπ Total de chunks: 1429
üîπ Ejemplo de chunk:

antibiotics 
Review
Food Safety through Natural Antimicrobials
Emiliano J. Quinto 1, *
 , Irma Caro 1, Luz H. Villalobos-Delgado 2, Javier Mateo 3
 ,
Beatriz De-Mateo-Silleras 1 and Mar√≠a P . Redondo-Del-R√≠o 1
1 Department of Nutrition and Food Science, Faculty of Medicine, University of Valladolid, 47005 Valladolid,
Spain; irma.caro@uva.es (I.C.); bdemateo@yahoo.com (B.D.-M.-S.); pazr@ped.uva.es (M.P .R.-D.-R.)
2 Institute of Agroindustry, Technological University of the Mixteca, Huajuapan de L


In [33]:
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)


In [34]:
from langchain_core.vectorstores import InMemoryVectorStore

# Crear vector store en memoria
vector_store = InMemoryVectorStore(embeddings)

# Indexar todos los chunks
vector_store.add_documents(all_splits)

print("üì¶ Chunks indexados en el vector store.")


üì¶ Chunks indexados en el vector store.


## Storing documents
Now we need to index our text chunks so that we can search over them at runtime.

Our approach is to embed the contents of each document split and insert these embeddings into a vector store. Given an input query, we can then use vector search to retrieve relevant documents.

We can embed and store all of our document splits in a single command using the vector store and embeddings model selected at the start of the tutorial.

In [35]:
document_ids = vector_store.add_documents(documents=all_splits)

print(document_ids[:3])

['a0b7d67d-0ffe-4d08-b5b1-f69c52762232', '4283909d-7081-45be-9022-841b0ab1335e', 'c719735f-e37f-441b-bb53-f23e565552e3']


# 2. Retrieval and Generation
RAG applications commonly work as follows:
1. **Retrieve**: Given a user input, relevant splits are retrieved from storage using a Retriever.
2. **Generate**: A model produces an answer using a prompt that includes both the question with the retrieved data

Now let‚Äôs write the actual application logic.

We want to create a simple application that takes a user question, searches for documents relevant to that question, passes the retrieved documents and initial question to a model, and returns an answer.

We will demonstrate:
- A RAG agent that executes searches with a simple tool. This is a good general-purpose implementation.
- A two-step RAG chain that uses just a single LLM call per query. This is a fast and effective method for simple queries.

## RAG agents
One formulation of a RAG application is as a simple agent with a tool that retrieves information. We can assemble a minimal RAG agent by implementing a tool that wraps our vector store:

In [36]:
from langchain.tools import tool

@tool(response_format="content_and_artifact")
def retrieve_context(query: str):
    """Retrieve information to help answer a query."""
    retrieved_docs = vector_store.similarity_search(query, k=2)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\nContent: {doc.page_content}")
        for doc in retrieved_docs
    )
    return serialized, retrieved_docs

Here we use the `@[tool decorator][tool]` to configure the tool to attach raw documents as artifacts to each `ToolMessage`.

This will let us access document metadata in our application, separate from the stringified representation that is sent to the model.

Given our tool, we can construct the agent:

Let‚Äôs test this out. We construct a question that would typically require an iterative sequence of retrieval steps to answer:

In [37]:
# 1. Importar el helper para crear el agente
from langchain.agents import create_agent

# 2. Registrar las tools que tendr√° el agente
tools = [retrieve_context]

# 3. Prompt del sistema con instrucciones para el modelo
prompt = (
    "You have access to a tool that retrieves context from a blog post. "
    "Use the tool to help answer user queries."
)

# 4. Crear el agente (asumiendo que ya tienes un `model` definido antes)
agent = create_agent(model, tools, system_prompt=prompt)


In [41]:
query = (
    "What is the standard method for Task Decomposition?\n\n"
    "Once you get the answer, look up common extensions of that method."
)

In [40]:
response = agent.invoke({"messages": [{"role": "user", "content": query}]})
response["messages"][-1].pretty_print()


RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

Note that the agent:
1. Generates a query to search for a standard method for task decomposition.
2. Receiving the answer, generates a second query to search for common extensions of it.
3. Having received all necessary context, answers the question.