# R2R RAG Quickstart (Linux)

This notebook shows how to use R2R for Retrieval-Augmented Generation (RAG):
- Install the Python SDK
- Verify the API is reachable
- Ingest a sample document
- Run search and RAG
- Try the agent for deeper analysis

To run, start the R2R API separately in a Linux shell.

## Prerequisites (run in a separate terminal)

1) Create a Python 3.12 venv and install the server (from this repo):
```bash
python3.12 -m venv .venv
source .venv/bin/activate
pip install -e ./py
```

2) Configure your providers (OpenAI-compatible):
These examples mirror defaults in docker/compose.dev.yaml.
```bash
# Text generation (e.g., LM Studio / OpenAI-compatible)
export LMSTUDIO_API_BASE="http://localhost:8000/v1"  # from docker/compose.dev.yaml
export LMSTUDIO_API_KEY="123"  # example key used in compose.dev.yaml; replace as needed

# Embeddings (can be a different OpenAI-compatible base)
export OPENAI_API_BASE="http://athena.skhms.com/rse/embedding/qwen3-4b/v1"
export OPENAI_API_KEY="<embed-token>"  # set your key; compose.dev.yaml shows a sample

# Postgres (adjust as needed)
export R2R_POSTGRES_HOST=127.0.0.1  # from compose.dev.yaml
export R2R_POSTGRES_PORT=5432
export R2R_POSTGRES_USER=r2r
export R2R_POSTGRES_PASSWORD=r2rpassword
export R2R_POSTGRES_DBNAME=r2r
export R2R_PROJECT_NAME=r2r_local
```

Note: Ensure embedding dimensions in config match (see py/r2r/r2r.toml).

3) Start the API server:
```bash
export R2R_PORT=8002
python -m r2r.serve
# Server listens on http://0.0.0.0:8002
```

Then continue here in the notebook.

In [1]:
# Install the Python SDK for the client
%pip -q install r2r requests


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
# Configure providers and Postgres via Python env (this kernel only)
# If your R2R server runs in a separate process,
# set these in that process as well (compose/env).
import os

def setdefault_env(key, value):
    # if not os.environ.get(key):
    os.environ[key] = str(value)

# LM Studio / text generation (compose.dev.yaml defaults)
setdefault_env("LMSTUDIO_API_BASE", "http://localhost:8000/v1")
setdefault_env("LMSTUDIO_API_KEY", "123")  # example; replace with your token

# OpenAI-compatible embeddings
setdefault_env("OPENAI_API_BASE", "http://athena.skhms.com/rse/embedding/qwen3-4b/v1")
setdefault_env("OPENAI_API_KEY", "rse-s0HBWeEhWUXJx5LI3gNNtJWHdKBOePvRVuh8s3BplYRiT7VZ0N2aMej")


# Postgres
setdefault_env("R2R_POSTGRES_HOST", "127.0.0.1")
setdefault_env("R2R_POSTGRES_PORT", "5432")
setdefault_env("R2R_POSTGRES_USER", "r2r")
setdefault_env("R2R_POSTGRES_PASSWORD", "r2rpassword")
setdefault_env("R2R_POSTGRES_DBNAME", "r2r")
setdefault_env("R2R_PROJECT_NAME", "r2r_local")

print("Configured:")
for k in [
    "LMSTUDIO_API_BASE","LMSTUDIO_API_KEY",
    "OPENAI_API_BASE",
    "R2R_POSTGRES_HOST","R2R_POSTGRES_PORT","R2R_POSTGRES_DBNAME","R2R_PROJECT_NAME"
]:
    print(f" - {k}={os.environ.get(k)}")


Configured:
 - LMSTUDIO_API_BASE=http://localhost:8000/v1
 - LMSTUDIO_API_KEY=123
 - OPENAI_API_BASE=http://athena.skhms.com/rse/embedding/qwen3-4b/v1
 - R2R_POSTGRES_HOST=127.0.0.1
 - R2R_POSTGRES_PORT=5432
 - R2R_POSTGRES_DBNAME=r2r
 - R2R_PROJECT_NAME=r2r_local


In [3]:
# Point to your running R2R server
import os
BASE_URL = os.getenv('R2R_BASE_URL', 'http://0.0.0.0:8002')
print('Using BASE_URL =', BASE_URL)

Using BASE_URL = http://0.0.0.0:8002


In [4]:
# Sanity check: hit the OpenAPI spec
import requests, json
spec_url = BASE_URL + '/openapi_spec'
try:
    r = requests.get(spec_url, timeout=10)
    r.raise_for_status()
    print('OpenAPI available at', spec_url)
    # print a small part of the spec
    data = r.json() if r.headers.get('content-type','').startswith('application/json') else r.text
    if isinstance(data, dict):
        print('Title:', data.get('info', {}).get('title'))
    else:
        print(str(data)[:200] + '...')
except Exception as e:
    print('Failed to reach server:', e)
    print('Make sure the server is running as described above.')

OpenAPI available at http://0.0.0.0:8002/openapi_spec
Title: R2R Application API


In [5]:
# Initialize the client
from r2r import R2RClient
client = R2RClient(base_url=BASE_URL)
client

<sdk.sync_client.R2RClient at 0x7fd8667b1970>

## Ingest a sample document

You can upload your own `.pdf`, `.txt`, `.md`, etc.
Below we create a small `.txt` file and ingest it.

In [6]:
# Create a small sample file
sample_path = 'NCB-PCI_Express_Base_6.3.txt'
# with open(sample_path, 'w', encoding='utf-8') as f:
#     f.write('DeepSeek R1 is a reasoning model. This file is a simple demo.\n')
#     f.write('RAG combines retrieval with generation to produce grounded answers.')
# print('Wrote', sample_path)

# Ingest the file
doc = client.documents.create(file_path=sample_path)
doc

R2RException: An error '500: Error during ingestion: Error code: 400 - {'object': 'error', 'message': "This model's maximum context length is 131072 tokens. However, you requested 133475 tokens (13475 in the messages, 120000 in the completion). Please reduce the length of the messages or completion.", 'type': 'BadRequestError', 'param': None, 'code': 400}' occurred during create_document

In [None]:
# List your documents
docs = client.documents.list()
len(docs)
docs[0] if docs else None

DocumentResponse(id=UUID('40030f07-db03-59c3-ba46-7376251cd7b1'), collection_ids=[UUID('122fdf6a-e116-546b-a8f6-e4cb2e2c0a09')], owner_id=UUID('2acb499e-8428-543b-bd85-0d9098718220'), document_type=<DocumentType.TXT: 'txt'>, metadata={'version': 'v0'}, title='sample_doc.txt', version='v0', size_in_bytes=129, ingestion_status=<IngestionStatus.AUGMENTING: 'augmenting'>, extraction_status=<GraphExtractionStatus.PENDING: 'pending'>, created_at=datetime.datetime(2025, 9, 5, 18, 22, 24, 212274, tzinfo=TzInfo(UTC)), updated_at=datetime.datetime(2025, 9, 5, 18, 22, 24, 218906, tzinfo=TzInfo(UTC)), ingestion_attempt_number=None, summary=None, summary_embedding=None, total_tokens=27, chunks=None)

## Search

Run a semantic or hybrid search over ingested content.

In [None]:
search_res = client.retrieval.search(query='What is RAG?')
search_res

R2RResults[AggregateSearchResult](results=AggregateSearchResult(chunk_search_results=[ChunkSearchResult(score=0.660, text=DeepSeek R1 is a reasoning model. This file is a simple demo.
RAG combines retrieval with generation to produce grounded answers.)], graph_search_results=[], web_page_search_results=None, web_search_results=None, document_search_results=None, generic_tool_result=None))

## RAG (with citations)

Ask a question and let R2R retrieve + generate an answer grounded in your documents.

In [None]:
rag_res = client.retrieval.rag(query='Explain RAG briefly.')
rag_res

R2RResults[RAGResponse](results=RAGResponse(generated_answer='RAG (Retrieval-Augmented Generation) is a technique that combines retrieval from a knowledge source with a generation model to produce more accurate and grounded answers [e921491].', search_results=AggregateSearchResult(chunk_search_results=[ChunkSearchResult(score=0.827, text=DeepSeek R1 is a reasoning model. This file is a simple demo.
RAG combines retrieval with generation to produce grounded answers.)], graph_search_results=[], web_page_search_results=None, web_search_results=None, document_search_results=None, generic_tool_result=None), citations=[Citation(id='e921491', object='citation', is_new=True, span=None, source_type=None, payload={'id': 'e921491c-3bd6-57f7-ae96-fa615ecb8f99', 'document_id': '40030f07-db03-59c3-ba46-7376251cd7b1', 'owner_id': '2acb499e-8428-543b-bd85-0d9098718220', 'collection_ids': ['122fdf6a-e116-546b-a8f6-e4cb2e2c0a09'], 'score': 0.8274951657302877, 'text': 'DeepSeek R1 is a reasoning model. T

## Agentic RAG

Use the conversational agent with retrieval tools for richer, multi-step answers.

In [None]:
agent_res = client.retrieval.agent(
    message={"role": "user", "content": "Give a short analysis of RAG."},
    rag_tools=["search_file_knowledge", "get_file_content"],
)
agent_res

R2RException: {'message': '500: Internal Server Error - Error code: 400 - {\'object\': \'error\', \'message\': \'"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set\', \'type\': \'BadRequestError\', \'param\': None, \'code\': 400}', 'error_type': 'R2RException'}

## Advanced: custom search settings

You can pass `search_mode` and `search_settings` to control hybrid search, filters, etc.

In [None]:
advanced = client.retrieval.rag(
    query='What does the sample say about RAG?',
    search_mode='advanced',
    search_settings={
        "use_hybrid_search": True,
        "hybrid_settings": {
            "full_text_weight": 1.0,
            "semantic_weight": 5.0,
            "full_text_limit": 50,
            "rrf_k": 50
        }
    }
)
advanced

R2RResults[RAGResponse](results=RAGResponse(generated_answer='The sample states that RAG combines retrieval with generation to produce grounded answers [e921491].', search_results=AggregateSearchResult(chunk_search_results=[ChunkSearchResult(score=0.018, text=DeepSeek R1 is a reasoning model. This file is a simple demo.
RAG combines retrieval with generation to produce grounded answers.), ChunkSearchResult(score=0.018, text=DeepSeek R1 is a reasoning model. This file is a simple demo.
RAG combines retrieval with generation to produce grounded answers.), ChunkSearchResult(score=0.018, text=DeepSeek R1 is a reasoning model. This file is a simple demo.
RAG combines retrieval with generation to produce grounded answers.), ChunkSearchResult(score=0.018, text=DeepSeek R1 is a reasoning model. This file is a simple demo.
RAG combines retrieval with generation to produce grounded answers.)], graph_search_results=[], web_page_search_results=None, web_search_results=None, document_search_results