<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/docs/examples/low_level/oss_ingestion_retrieval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Building RAG from Scratch (Open-source only!)

In this tutorial, we show you how to build a data ingestion pipeline into a vector database, and then build a retrieval pipeline from that vector database, from scratch.

Notably, we use a fully open-source stack:

- Sentence Transformers as the embedding model
- Postgres as the vector store (we support many other [vector stores](https://gpt-index.readthedocs.io/en/stable/module_guides/storing/vector_stores.html) too!)
- Llama 2 as the LLM (through [llama.cpp](https://github.com/ggerganov/llama.cpp))

## Setup

We setup our open-source components.
1. Sentence Transformers
2. Llama 2
3. We initialize postgres and wrap it with our wrappers/abstractions.

#### Sentence Transformers

In [1]:
# %pip install llama-index-readers-file pymupdf
# %pip install llama-index-vector-stores-postgres
# %pip install llama-index-embeddings-huggingface
# %pip install llama-index-llms-llama-cpp

In [2]:
import os
import torch
import gc

os.environ["TORCHDYNAMO_DISABLE"] = "1"
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:128"
os.environ["TF_FORCE_GPU_ALLOW_GROWTH"] = "true"

def clear_vram(variable=None):
    if variable != None:
        del variable
    torch.cuda.empty_cache()
    gc.collect()

In [3]:
# sentence transformers
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# model_embed_name = "cnmoro/bert-tiny-embeddings-english-portuguese"
model_embed_name = "./results/bert-tiny-embeddings-english-portuguese"

embed_model = HuggingFaceEmbedding(
    model_name=model_embed_name,
    trust_remote_code=True,
    device="cpu",
    normalize=True,
    embed_batch_size=4,
)

# Llamaindex global settings for llm and embeddings
EMBED_DIMENSION = 128

#### Llama CPP

In this notebook, we use the [`llama-2-chat-13b-ggml`](https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML) model, along with the proper prompt formatting.

Check out our [Llama CPP guide](https://gpt-index.readthedocs.io/en/stable/examples/llm/llama_2_llama_cpp.html) for full setup instructions/details.

In [4]:
# !pip install llama-cpp-python

In [5]:
from llama_index.llms.llama_cpp import LlamaCPP

# model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGML/resolve/main/llama-2-13b-chat.ggmlv3.q4_0.bin"
# model_url = "https://huggingface.co/TheBloke/Llama-2-13B-chat-GGUF/resolve/main/llama-2-13b-chat.Q4_0.gguf"

model_name = "../12-ollamadatasetapp/results/lora_model"
model_path = f"{model_name}/unsloth.Q4_K_M.gguf"

llm = LlamaCPP(
    # You can pass in the URL to a GGML model to download it automatically
    model_url=None,
    # optionally, you can set the path to a pre-downloaded model instead of model_url
    model_path=model_path,
    temperature=0.01,
    max_new_tokens=256,
    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=3900,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": 1},
    verbose=True,
)

ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3070 Ti Laptop GPU, compute capability 8.6, VMM: yes
llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 3070 Ti Laptop GPU) - 7665 MiB free
llama_model_loader: loaded meta data with 35 key-value pairs and 291 tensors from ../12-ollamadatasetapp/results/lora_model/unsloth.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Lora_Model
llama_model_loader: - kv   3:                         general.size_label str              = 7.2B
llama_model_loader:

#### Initialize Postgres

Using an existing postgres running at localhost, create the database we'll be using.

**NOTE**: Of course there are plenty of other open-source/self-hosted databases you can use! e.g. Chroma, Qdrant, Weaviate, and many more. Take a look at our [vector store guide](https://gpt-index.readthedocs.io/en/stable/module_guides/storing/vector_stores.html).

**NOTE**: You will need to setup postgres on your local system. Here's an example of how to set it up on OSX: https://www.sqlshack.com/setting-up-a-postgresql-database-on-mac/.

**NOTE**: You will also need to install pgvector (https://github.com/pgvector/pgvector).

You can add a role like the following:
```
CREATE ROLE <user> WITH LOGIN PASSWORD '<password>';
ALTER ROLE <user> SUPERUSER;
```

In [6]:
# !pip install psycopg2-binary pgvector asyncpg "sqlalchemy[asyncio]" greenlet

In [7]:
import psycopg2

db_name = "vectordb"
host = "127.0.0.1"  # 54.174.91.231
password = "postgres"
port = "5432"
user = "postgres"
# conn = psycopg2.connect(connection_string)
conn = psycopg2.connect(
    dbname="postgres",
    host=host,
    password=password,
    port=port,
    user=user,
)
conn.autocommit = True

with conn.cursor() as c:
    c.execute(f"DROP DATABASE IF EXISTS {db_name}")
    c.execute(f"CREATE DATABASE {db_name}")

### Vector Store

In [8]:
from sqlalchemy import make_url
from llama_index.vector_stores.postgres import PGVectorStore

vector_store = PGVectorStore.from_params(
    database=db_name,
    host=host,
    password=password,
    port=port,
    user=user,
    table_name="mistral_7b_portuguese",
    embed_dim=EMBED_DIMENSION,  # bert-tiny embedding dimension
)

import faiss
from llama_index.vector_stores.faiss import FaissVectorStore

# Create FaisVectorStore to store embeddings
# fais_index = faiss.IndexFlatL2(EMBED_DIMENSION)
# vector_store = FaissVectorStore(faiss_index=fais_index)

## Build an Ingestion Pipeline from Scratch

We show how to build an ingestion pipeline as mentioned in the introduction.

We fast-track the steps here (can skip metadata extraction). More details can be found [in our dedicated ingestion guide](https://gpt-index.readthedocs.io/en/latest/examples/low_level/ingestion.html).

### 1. Load Data

In [9]:
# !mkdir data
# !wget --user-agent "Mozilla" "https://arxiv.org/pdf/2307.09288.pdf" -O "data/llama2.pdf"

In [10]:
import csv
from pathlib import Path
from llama_index.readers.file import PyMuPDFReader
from llama_index.readers.file import PagedCSVReader
from llama_index.core.readers import SimpleDirectoryReader

# loader = PyMuPDFReader()
# documents = loader.load(file_path="./tests/inputs/llama2.pdf")

file_path = "./tests/inputs/SELLOUT_SELLIN_ESTOQUE_HISTORICO_26246169000110_20250606_032528_v02.csv"

# file = open(file_path, mode="r",)
# docs = list(csv.DictReader(file))

# print(docs[0])

csv_reader = PagedCSVReader()

reader = SimpleDirectoryReader(
    input_files=[file_path], file_extractor={".csv": csv_reader}
)

docs = reader.load_data()

# Check a sample chunk
print(docs[0].text)

CODIGO_GRUPO_EMPRESARIAL_GE: 41
NOME_GE: Drogaria Campe達
CODIGO_EMPRESA_GE: 84
NOME_EMPRESA_GE: DROGARIA CAMPEA POPULAR LAPA LTDA
NOME_FANTASIA_EMPRESA_GE: LOJA 84
CNPJ_EMPRESA_GE: 26.246.169/0001-10
CODIGO_INTERNO_SKU: 23010
NOME_SKU: DES ROLLON REXONA 50ML FEM POWDER
EAN_TEXTO_SKU: 78924338
EAN_NUMERICO_SKU: 78924338
EAN_PRINCIPAL_SKU: 78924338.0
UNIDADE_MEDIDA_SKU: UN
CODIGO_NCM_SKU: 3307.20.10
CODIGO_CEST_SKU: 
DATA_CADASTRO_SKU: 
SITUACAO_SKU: 0
CODIGO_FABRICANTE_SKU: 0
NOME_FABRICANTE_SKU: DES ROLLON REXONA 50ML FEM POWDER
CODIGO_MARCA_SKU: 319
NOME_MARCA_SKU: DES ROLLON REXONA 50ML FEM POWDER
CODIGO_CANAL_VENDA: 
NOME_CANAL_VENDA: 
CODIGO_ORIGEM_VENDA: 
NOME_ORIGEM_VENDA: 
CODIGO_CFOP_VENDA: 
ORDERID_CUPOM_VENDA: 
SKU_DATA_VENDA: 15/06/2024
SKU_QUANTIDADE_VENDA: 2
SKU_TOTAL_BRUTO_VENDA: 27.98
SKU_TOTAL_DESCONTO_VENDA: 0.0
SKU_TOTAL_LIQUIDO_VENDA: 27.98
SKU_PRECO_MEDIO_VENDA: 13.99
SKU_TOTAL_CMV_VENDA: 0
SKU_TOTAL_COMISSAO_VENDA: 0.0
SKU_TOTAL_BONIFICACAO_VENDA: 0.0
SKU_TOTAL_IMP

### 2. Use a Text Splitter to Split Documents

In [11]:
from llama_index.core.node_parser import SentenceSplitter

text_parser = SentenceSplitter(
    chunk_size=1024,
    # separator=" ",
)

text_chunks = []
# maintain relationship with source doc index, to help inject doc metadata in (3)
idxs = []
for idx, doc in enumerate(docs):
    # Convert dict to string for splitting
    # text_str = str([str(v) for v in doc.values()])
    text_str = doc.text
    cur_text_chunks = text_parser.split_text(text_str)
    text_chunks.extend(cur_text_chunks)
    idxs.extend([idx] * len(cur_text_chunks))

### 3. Manually Construct Nodes from Text Chunks

In [12]:
from llama_index.core.schema import TextNode

nodes = []
for idx, text_chunk in enumerate(text_chunks):
    node = TextNode(
        text=text_chunk,
    )
    src_doc = docs[idxs[idx]]
    node.metadata = src_doc.metadata
    nodes.append(node)

print(nodes[0].metadata)
print(nodes[0].get_content(metadata_mode="all"))

{'file_path': 'tests/inputs/SELLOUT_SELLIN_ESTOQUE_HISTORICO_26246169000110_20250606_032528_v02.csv', 'file_name': 'SELLOUT_SELLIN_ESTOQUE_HISTORICO_26246169000110_20250606_032528_v02.csv', 'file_type': 'text/csv', 'file_size': 22352135, 'creation_date': '2025-08-01', 'last_modified_date': '2025-07-25'}
file_path: tests/inputs/SELLOUT_SELLIN_ESTOQUE_HISTORICO_26246169000110_20250606_032528_v02.csv
file_name: SELLOUT_SELLIN_ESTOQUE_HISTORICO_26246169000110_20250606_032528_v02.csv
file_type: text/csv
file_size: 22352135
creation_date: 2025-08-01
last_modified_date: 2025-07-25

CODIGO_GRUPO_EMPRESARIAL_GE: 41
NOME_GE: Drogaria Campe達
CODIGO_EMPRESA_GE: 84
NOME_EMPRESA_GE: DROGARIA CAMPEA POPULAR LAPA LTDA
NOME_FANTASIA_EMPRESA_GE: LOJA 84
CNPJ_EMPRESA_GE: 26.246.169/0001-10
CODIGO_INTERNO_SKU: 23010
NOME_SKU: DES ROLLON REXONA 50ML FEM POWDER
EAN_TEXTO_SKU: 78924338
EAN_NUMERICO_SKU: 78924338
EAN_PRINCIPAL_SKU: 78924338.0
UNIDADE_MEDIDA_SKU: UN
CODIGO_NCM_SKU: 3307.20.10
CODIGO_CEST_SKU: 

### 4. Generate Embeddings for each Node

Here we generate embeddings for each Node using a sentence_transformers model.

In [13]:
# clear_vram(embed_model)

# embed_model = HuggingFaceEmbedding()

from llama_index.core.schema import MetadataMode

for node in nodes:
    node_embedding = embed_model.get_text_embedding(
        # node.get_content(metadata_mode="all")
        node.get_content(metadata_mode=MetadataMode.NONE)
    )
    node.embedding = node_embedding

### 5. Load Nodes into a Vector Store

We now insert these nodes into our `PostgresVectorStore`.

In [14]:
vector_store.add(nodes)

['8453c6af-88f9-4d92-b753-834dcaf82696',
 '4f038df6-f8a4-4051-b74a-92b6108df7b8',
 '5ad0dd05-0f45-42c4-9a09-3395637b15c7',
 '55714792-683d-4245-94cc-84908dfc0fdb',
 'c74cab34-cd31-490c-be92-443086f985f6',
 '958bcf62-b825-4471-abfe-733104a832f2',
 'ac0b1773-771f-4778-9ae7-baf398bd9bad',
 'fbeb879f-1229-41af-9d31-23c35a9048a2',
 '36b61175-f4a8-4b65-bc23-68fa990dd737',
 'e5f434a6-d282-4726-866c-a325edd47168',
 'f71d9ee4-996a-449f-bdc8-484c48c72118',
 'b2212bc4-0085-4964-b156-66016d623e21',
 'd0380d9d-d800-4673-98c5-bc42faa4261e',
 '77f612a0-9f71-41a0-8838-f1b49062d09a',
 '992b89ba-c093-4739-91e5-c92850769f1a',
 'e5bee84c-38b3-4db7-9197-9330c7848431',
 '34c364b5-c89d-4f7f-a4fb-83bc45640638',
 '5988e3fc-5ea3-449f-8944-f2ef1b58ca1b',
 '3db072dd-aa59-49b0-b54b-36f68d46ffb0',
 'e985f397-735f-4710-9606-e07a7de02c48',
 '77712d54-4253-4537-ac00-747dd60fcf6e',
 '56dfbdf7-de46-4053-bf22-ad8c71e5a3c3',
 '8d6e8f57-a1b9-45df-a5f3-126acea734cd',
 '771b1415-1c83-42ef-a122-11a4333b59c3',
 '630fa19e-9244-

## Build Retrieval Pipeline from Scratch

We show how to build a retrieval pipeline. Similar to ingestion, we fast-track the steps. Take a look at our [retrieval guide](https://gpt-index.readthedocs.io/en/latest/examples/low_level/retrieval.html) for more details!

In [15]:
query_str = "Qual o CNPJ da Drogaria Campe達?"

### 1. Generate a Query Embedding

In [16]:
query_embedding = embed_model.get_query_embedding(query_str)

### 2. Query the Vector Database

In [17]:
# construct vector store query
from llama_index.core.vector_stores import VectorStoreQuery

# query_mode = "default"
# # query_mode = "sparse"
# # query_mode = "hybrid"

# vector_store_query = VectorStoreQuery(
#     query_embedding=query_embedding, similarity_top_k=2, mode=query_mode
# )

# # returns a VectorStoreQueryResult
# query_result = vector_store.query(vector_store_query)

# print(query_result.nodes[0].get_content())

from llama_index.core import VectorStoreIndex

vector_store_index = VectorStoreIndex(nodes, embed_model=embed_model)
query_engine = vector_store_index.as_query_engine(similarity_top_k=2, llm=llm)

query_result = query_engine.query(query_str)
query_result.response

llama_perf_context_print:        load time =    5807.77 ms
llama_perf_context_print: prompt eval time =    5807.24 ms /  2545 tokens (    2.28 ms per token,   438.25 tokens per second)
llama_perf_context_print:        eval time =    1768.39 ms /    18 runs   (   98.24 ms per token,    10.18 tokens per second)
llama_perf_context_print:       total time =    7581.51 ms /  2563 tokens


'26.246.169/0001-10'

### 3. Parse Result into a Set of Nodes

In [18]:
from llama_index.core.schema import NodeWithScore
from typing import Optional

nodes_with_scores = []
for node_with_score in query_result.source_nodes:
    nodes_with_scores.append(node_with_score)

### 4. Put into a Retriever

In [19]:
from llama_index.core import QueryBundle
from llama_index.core.retrievers import BaseRetriever
from typing import Any, List

class VectorDBRetriever(BaseRetriever):
    """Retriever over a postgres vector store."""

    def __init__(
        self,
        vector_store: PGVectorStore,
        embed_model: Any,
        query_mode: str = "default",
        similarity_top_k: int = 2,
    ) -> None:
        """Init params."""
        self._vector_store = vector_store
        self._embed_model = embed_model
        self._query_mode = query_mode
        self._similarity_top_k = similarity_top_k
        super().__init__()

    def _retrieve(self, query_bundle: QueryBundle) -> List[NodeWithScore]:
        """Retrieve."""
        query_embedding = embed_model.get_query_embedding(
            query_bundle.query_str
        )
        vector_store_query = VectorStoreQuery(
            query_embedding=query_embedding,
            similarity_top_k=self._similarity_top_k,
            mode=self._query_mode,
        )
        query_result = vector_store.query(vector_store_query)

        nodes_with_scores = []
        for index, node in enumerate(query_result.nodes):
            score: Optional[float] = None
            if query_result.similarities is not None:
                score = query_result.similarities[index]
            nodes_with_scores.append(NodeWithScore(node=node, score=score))

        return nodes_with_scores

## Plug this into our RetrieverQueryEngine to synthesize a response

In [20]:
from llama_index.core.query_engine import RetrieverQueryEngine

retriever = VectorDBRetriever(
    vector_store, embed_model, query_mode="default", similarity_top_k=2
)
query_engine = RetrieverQueryEngine.from_args(retriever, llm=llm)

response = query_engine.query(query_str)

print(str(response))
print(response.source_nodes[0].get_content())

Llama.generate: 2544 prefix-match hit, remaining 1 prompt tokens to eval
llama_perf_context_print:        load time =    5807.77 ms
llama_perf_context_print: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_perf_context_print:        eval time =    1824.77 ms /    19 runs   (   96.04 ms per token,    10.41 tokens per second)
llama_perf_context_print:       total time =    1830.48 ms /    20 tokens


26.246.169/0001-10
CODIGO_GRUPO_EMPRESARIAL_GE: 41
NOME_GE: Drogaria Campe達
CODIGO_EMPRESA_GE: 84
NOME_EMPRESA_GE: DROGARIA CAMPEA POPULAR LAPA LTDA
NOME_FANTASIA_EMPRESA_GE: LOJA 84
CNPJ_EMPRESA_GE: 26.246.169/0001-10
CODIGO_INTERNO_SKU: 7676
NOME_SKU: DICLORIDRATO DE LEVOCETIRIZINA 5MG 10S RANBAXY
EAN_TEXTO_SKU: 7897076921499
EAN_NUMERICO_SKU: 7897076921499
EAN_PRINCIPAL_SKU: 7897076921499.0
UNIDADE_MEDIDA_SKU: UN
CODIGO_NCM_SKU: 3004.90.69
CODIGO_CEST_SKU: 
DATA_CADASTRO_SKU: 
SITUACAO_SKU: 0
CODIGO_FABRICANTE_SKU: 0
NOME_FABRICANTE_SKU: DICLORIDRATO DE LEVOCETIRIZINA 5MG 10S RANBAXY
CODIGO_MARCA_SKU: 265
NOME_MARCA_SKU: DICLORIDRATO DE LEVOCETIRIZINA 5MG 10S RANBAXY
CODIGO_CANAL_VENDA: 
NOME_CANAL_VENDA: 
CODIGO_ORIGEM_VENDA: 
NOME_ORIGEM_VENDA: 
CODIGO_CFOP_VENDA: 
ORDERID_CUPOM_VENDA: 
SKU_DATA_VENDA: 14/01/2025
SKU_QUANTIDADE_VENDA: 1
SKU_TOTAL_BRUTO_VENDA: 42.62
SKU_TOTAL_DESCONTO_VENDA: 15.77
SKU_TOTAL_LIQUIDO_VENDA: 26.85
SKU_PRECO_MEDIO_VENDA: 26.85
SKU_TOTAL_CMV_VENDA: 0
SK