# R2R RAG Quickstart (Linux)

This notebook shows how to use R2R for Retrieval-Augmented Generation (RAG):
- Install the Python SDK
- Verify the API is reachable
- Ingest a sample document
- Run search and RAG
- Try the agent for deeper analysis

To run, start the R2R API separately in a Linux shell.

## Prerequisites (run in a separate terminal)

1) Create a Python 3.12 venv and install the server (from this repo):
```bash
python3.12 -m venv .venv
source .venv/bin/activate
pip install -e ./py
```

2) Configure your providers (OpenAI-compatible):
These examples mirror defaults in docker/compose.dev.yaml.
```bash
# Text generation (e.g., LM Studio / OpenAI-compatible)
export LMSTUDIO_API_BASE="http://localhost:8000/v1"  # from docker/compose.dev.yaml
export LMSTUDIO_API_KEY="123"  # example key used in compose.dev.yaml; replace as needed

# Embeddings (can be a different OpenAI-compatible base)
export OPENAI_API_BASE="http://athena.skhms.com/rse/embedding/qwen3-4b/v1"
export OPENAI_API_KEY="<embed-token>"  # set your key; compose.dev.yaml shows a sample

# Postgres (adjust as needed)
export R2R_POSTGRES_HOST=127.0.0.1  # from compose.dev.yaml
export R2R_POSTGRES_PORT=5432
export R2R_POSTGRES_USER=r2r
export R2R_POSTGRES_PASSWORD=r2rpassword
export R2R_POSTGRES_DBNAME=r2r
export R2R_PROJECT_NAME=r2r_local
```

Note: Ensure embedding dimensions in config match (see py/r2r/r2r.toml).

3) Start the API server:
```bash
export R2R_PORT=8002
python -m r2r.serve
# Server listens on http://0.0.0.0:8002
```

Then continue here in the notebook.

In [1]:
# Install the Python SDK for the client
%pip -q install r2r requests


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
# Configure providers and Postgres via Python env (this kernel only)
# If your R2R server runs in a separate process,
# set these in that process as well (compose/env).
import os

def setdefault_env(key, value):
    # if not os.environ.get(key):
    os.environ[key] = str(value)

# LM Studio / text generation (compose.dev.yaml defaults)
setdefault_env("LMSTUDIO_API_BASE", "http://localhost:8000/v1")
setdefault_env("LMSTUDIO_API_KEY", "123")  # example; replace with your token

# OpenAI-compatible embeddings
setdefault_env("OPENAI_API_BASE", "http://athena.skhms.com/rse/embedding/qwen3-4b/v1")
setdefault_env("OPENAI_API_KEY", "rse-s0HBWeEhWUXJx5LI3gNNtJWHdKBOePvRVuh8s3BplYRiT7VZ0N2aMej")


# Postgres
setdefault_env("R2R_POSTGRES_HOST", "127.0.0.1")
setdefault_env("R2R_POSTGRES_PORT", "5432")
setdefault_env("R2R_POSTGRES_USER", "r2r")
setdefault_env("R2R_POSTGRES_PASSWORD", "r2rpassword")
setdefault_env("R2R_POSTGRES_DBNAME", "r2r")
setdefault_env("R2R_PROJECT_NAME", "r2r_local")

print("Configured:")
for k in [
    "LMSTUDIO_API_BASE","LMSTUDIO_API_KEY",
    "OPENAI_API_BASE",
    "R2R_POSTGRES_HOST","R2R_POSTGRES_PORT","R2R_POSTGRES_DBNAME","R2R_PROJECT_NAME"
]:
    print(f" - {k}={os.environ.get(k)}")


Configured:
 - LMSTUDIO_API_BASE=http://localhost:8000/v1
 - LMSTUDIO_API_KEY=123
 - OPENAI_API_BASE=http://athena.skhms.com/rse/embedding/qwen3-4b/v1
 - R2R_POSTGRES_HOST=127.0.0.1
 - R2R_POSTGRES_PORT=5432
 - R2R_POSTGRES_DBNAME=r2r
 - R2R_PROJECT_NAME=r2r_local


In [3]:
# Point to your running R2R server
import os
BASE_URL = os.getenv('R2R_BASE_URL', 'http://0.0.0.0:8002')
print('Using BASE_URL =', BASE_URL)

Using BASE_URL = http://0.0.0.0:8002


In [4]:
# Sanity check: hit the OpenAPI spec
import requests, json
spec_url = BASE_URL + '/openapi_spec'
try:
    r = requests.get(spec_url, timeout=10)
    r.raise_for_status()
    print('OpenAPI available at', spec_url)
    # print a small part of the spec
    data = r.json() if r.headers.get('content-type','').startswith('application/json') else r.text
    if isinstance(data, dict):
        print('Title:', data.get('info', {}).get('title'))
    else:
        print(str(data)[:200] + '...')
except Exception as e:
    print('Failed to reach server:', e)
    print('Make sure the server is running as described above.')

OpenAPI available at http://0.0.0.0:8002/openapi_spec
Title: R2R Application API


In [5]:
# Initialize the client
from r2r import R2RClient
client = R2RClient(base_url=BASE_URL)
client

<sdk.sync_client.R2RClient at 0x7f90a5df95e0>

## Ingest a sample document

You can upload your own `.pdf`, `.txt`, `.md`, etc.
Below we create a small `.txt` file and ingest it.

In [19]:
# Create a small sample file
sample_path = 'NCB-PCI_Express_Base_6.3.md'
# with open(sample_path, 'w', encoding='utf-8') as f:
#     f.write('DeepSeek R1 is a reasoning model. This file is a simple demo.\n')
#     f.write('RAG combines retrieval with generation to produce grounded answers.')
# print('Wrote', sample_path)
# client.documents.delete("15df7e62-584a-5257-ae77-91bf10d27028")
# Ingest the file
doc = client.documents.create(file_path=sample_path)
doc

R2RResults[IngestionResponse](results=IngestionResponse(message='Document created and ingested successfully.', task_id=None, document_id=UUID('5d4b2a06-ddfd-5235-b98d-bc616039dbf1')))

In [None]:
# client.documents.extract(id="15df7e62-584a-5257-ae77-91bf10d27028")

R2RResults[GenericMessageResponse](results=GenericMessageResponse(message='Document entities and relationships extracted successfully.'))

In [15]:
# List your documents
docs = client.documents.list()
len(docs)
for doc in docs:
    print(doc)


id=UUID('5d4b2a06-ddfd-5235-b98d-bc616039dbf1') collection_ids=[UUID('122fdf6a-e116-546b-a8f6-e4cb2e2c0a09')] owner_id=UUID('2acb499e-8428-543b-bd85-0d9098718220') document_type=<DocumentType.MD: 'md'> metadata={'version': 'v0'} title='NCB-PCI_Express_Base_6.3.md' version='v0' size_in_bytes=5299326 ingestion_status=<IngestionStatus.SUCCESS: 'success'> extraction_status=<GraphExtractionStatus.PENDING: 'pending'> created_at=datetime.datetime(2025, 9, 5, 23, 20, 24, 440216, tzinfo=TzInfo(UTC)) updated_at=datetime.datetime(2025, 9, 5, 23, 21, 39, 365151, tzinfo=TzInfo(UTC)) ingestion_attempt_number=None summary='The' summary_embedding=None total_tokens=2774971 chunks=None
id=UUID('15df7e62-584a-5257-ae77-91bf10d27028') collection_ids=[UUID('122fdf6a-e116-546b-a8f6-e4cb2e2c0a09')] owner_id=UUID('2acb499e-8428-543b-bd85-0d9098718220') document_type=<DocumentType.MD: 'md'> metadata={'version': 'v0'} title='CONTRIBUTING.md' version='v0' size_in_bytes=1006 ingestion_status=<IngestionStatus.SUCC

## Search

Run a semantic or hybrid search over ingested content.

In [24]:
search_res = client.retrieval.search(query='correctable error',
                                     search_mode='advanced',
                                     search_settings={"use_semantic_search": True,
                                                      "use_fulltext_search": True,
                                                      "use_hybrid_search": True,
                                                      "search_strategy": "hyde"})
search_res

R2RResults[AggregateSearchResult](results=AggregateSearchResult(chunk_search_results=[ChunkSearchResult(score=0.019, text=Page 1091
6.3-1.0-PUB — PCI Express® Base Specification Revision 6.3
15
6
5
4
3
2
1
0
RsvdP
System Error on Correctable Error Enable
System Error on Non-Fatal Error Enable
System Error on Fatal Error Enable
PME Interrupt Enable
Configuration RRS Software Visibility Enable
No NFM Subtree Below This Root Port
Figure 7-33 Root Control Register
§
Table 7-30 Root Control Register
§
Bit Location Register Description
0
1
2
3
4
System Error on Correctable Error Enable - If Set, this bit indicates that a System Error should be
generated if a correctable error (ERR_COR) is reported by any of the devices in the Hierarchy Domain
associated with this Root Port, or by the Root Port itself. The mechanism for signaling a System Error to
the system is system specific.
Root Complex Event Collectors provide support for the above-described functionality for RCiEPs.
Default value of thi

## RAG (with citations)

Ask a question and let R2R retrieve + generate an answer grounded in your documents.

In [18]:
rag_res = client.retrieval.rag(query='how to find receiver error?')
rag_res

R2RResults[RAGResponse](results=RAGResponse(generated_answer='To find a receiver error in the context of PCI Express®, specific mechanisms and registers are involved, particularly related to error checking and status reporting. Here\'s how to identify receiver errors based on the provided context:\n\n1. **Check for Correctable Errors and Receiver Error Implementation**:\n   - Some level of checking for Receiver Errors is required in all cases, even if certain bits are optional. Specifically, if the implementation does not support a particular bit (for historical reasons), bit 0 of the Correctable Error Mask Register must also not be implemented [e06963b].\n\n2. **Use Margining Registers for Per-Lane Error Detection**:\n   - The **Margining Lane Status Register** provides per-lane status information, including the "Receiver Number Status" field (bits 2:0), which can indicate the status of the receiver [c8c6243]. This register helps monitor lane-specific receiver behavior during marginin

## Agentic RAG

Use the conversational agent with retrieval tools for richer, multi-step answers.

In [None]:
agent_res = client.retrieval.agent(
    message={"role": "user", "content": "Give a short analysis of AER."},
    rag_tools=["search_file_knowledge", "get_file_content"],
)
agent_res

R2RException: {'message': '500: Internal Server Error - Error code: 400 - {\'object\': \'error\', \'message\': \'"auto" tool choice requires --enable-auto-tool-choice and --tool-call-parser to be set\', \'type\': \'BadRequestError\', \'param\': None, \'code\': 400}', 'error_type': 'R2RException'}

## Advanced: custom search settings

You can pass `search_mode` and `search_settings` to control hybrid search, filters, etc.

In [13]:
advanced = client.retrieval.rag(
    query='What does the sample say about RAG?',
    search_mode='advanced',
    search_settings={
        "use_hybrid_search": True,
        "hybrid_settings": {
            "full_text_weight": 1.0,
            "semantic_weight": 5.0,
            "full_text_limit": 50,
            "rrf_k": 50
        }
    }
)
advanced

R2RResults[RAGResponse](results=RAGResponse(generated_answer='The sample states that RAG (Retrieval-Augmented Generation) combines retrieval with generation to produce grounded answers [e921491].', search_results=AggregateSearchResult(chunk_search_results=[ChunkSearchResult(score=0.018, text=DeepSeek R1 is a reasoning model. This file is a simple demo.
RAG combines retrieval with generation to produce grounded answers.), ChunkSearchResult(score=0.018, text=R2R Contribution Guide
Quick Start

Pre-Discussion: Feel free to propose your ideas via issues, Discord if you want to get early feedback.
Code of Conduct: Adhere to our Code of Conduct in all interactions.
Pull Requests (PRs): Follow the PR process for contributions.

Pull Request Process

Dependencies: Ensure all dependencies are necessary and documented.
Documentation: Update README.md with any changes to interfaces, including new environment variables, exposed ports, and other relevant details.
Versioning: Increment version numbe