# R2R RAG Quickstart (Linux)

This notebook shows how to use R2R for Retrieval-Augmented Generation (RAG):
- Install the Python SDK
- Verify the API is reachable
- Ingest a sample document
- Run search and RAG
- Try the agent for deeper analysis

To run, start the R2R API separately in a Linux shell.

## Prerequisites (run in a separate terminal)

1) Create a Python 3.12 venv and install the server (from this repo):
```bash
python3.12 -m venv .venv
source .venv/bin/activate
pip install -e ./py
```

2) Configure your providers (OpenAI-compatible):
These examples mirror defaults in docker/compose.dev.yaml.
```bash
# Text generation (e.g., LM Studio / OpenAI-compatible)
export LMSTUDIO_API_BASE="http://localhost:8000/v1"  # from docker/compose.dev.yaml
export LMSTUDIO_API_KEY="123"  # example key used in compose.dev.yaml; replace as needed

# Embeddings (can be a different OpenAI-compatible base)
export OPENAI_API_BASE="http://athena.skhms.com/rse/embedding/qwen3-4b/v1"
export OPENAI_API_KEY="<embed-token>"  # set your key; compose.dev.yaml shows a sample

# Postgres (adjust as needed)
export R2R_POSTGRES_HOST=127.0.0.1  # from compose.dev.yaml
export R2R_POSTGRES_PORT=5432
export R2R_POSTGRES_USER=r2r
export R2R_POSTGRES_PASSWORD=r2rpassword
export R2R_POSTGRES_DBNAME=r2r
export R2R_PROJECT_NAME=r2r_local
```

Note: Ensure embedding dimensions in config match (see py/r2r/r2r.toml).

3) Start the API server:
```bash
export R2R_PORT=8002
python -m r2r.serve
# Server listens on http://0.0.0.0:8002
```

Then continue here in the notebook.

In [1]:
# Install the Python SDK for the client
%pip -q install r2r requests


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m24.3.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.


In [2]:
# Configure providers and Postgres via Python env (this kernel only)
# If your R2R server runs in a separate process,
# set these in that process as well (compose/env).
import os

def setdefault_env(key, value):
    # if not os.environ.get(key):
    os.environ[key] = str(value)

# LM Studio / text generation (compose.dev.yaml defaults)
setdefault_env("LMSTUDIO_API_BASE", "http://localhost:8000/v1")
setdefault_env("LMSTUDIO_API_KEY", "123")  # example; replace with your token

# OpenAI-compatible embeddings
setdefault_env("OPENAI_API_BASE", "http://athena.skhms.com/rse/embedding/qwen3-4b/v1")
setdefault_env("OPENAI_API_KEY", "rse-s0HBWeEhWUXJx5LI3gNNtJWHdKBOePvRVuh8s3BplYRiT7VZ0N2aMej")


# Postgres
setdefault_env("R2R_POSTGRES_HOST", "127.0.0.1")
setdefault_env("R2R_POSTGRES_PORT", "5432")
setdefault_env("R2R_POSTGRES_USER", "r2r")
setdefault_env("R2R_POSTGRES_PASSWORD", "r2rpassword")
setdefault_env("R2R_POSTGRES_DBNAME", "r2r")
setdefault_env("R2R_PROJECT_NAME", "r2r_local")

print("Configured:")
for k in [
    "LMSTUDIO_API_BASE","LMSTUDIO_API_KEY",
    "OPENAI_API_BASE",
    "R2R_POSTGRES_HOST","R2R_POSTGRES_PORT","R2R_POSTGRES_DBNAME","R2R_PROJECT_NAME"
]:
    print(f" - {k}={os.environ.get(k)}")


Configured:
 - LMSTUDIO_API_BASE=http://localhost:8000/v1
 - LMSTUDIO_API_KEY=123
 - OPENAI_API_BASE=http://athena.skhms.com/rse/embedding/qwen3-4b/v1
 - R2R_POSTGRES_HOST=127.0.0.1
 - R2R_POSTGRES_PORT=5432
 - R2R_POSTGRES_DBNAME=r2r
 - R2R_PROJECT_NAME=r2r_local


In [3]:
# Point to your running R2R server
import os
BASE_URL = os.getenv('R2R_BASE_URL', 'http://0.0.0.0:8002')
print('Using BASE_URL =', BASE_URL)

Using BASE_URL = http://0.0.0.0:8002


In [4]:
# Sanity check: hit the OpenAPI spec
import requests, json
spec_url = BASE_URL + '/openapi_spec'
try:
    r = requests.get(spec_url, timeout=10)
    r.raise_for_status()
    print('OpenAPI available at', spec_url)
    # print a small part of the spec
    data = r.json() if r.headers.get('content-type','').startswith('application/json') else r.text
    if isinstance(data, dict):
        print('Title:', data.get('info', {}).get('title'))
    else:
        print(str(data)[:200] + '...')
except Exception as e:
    print('Failed to reach server:', e)
    print('Make sure the server is running as described above.')

OpenAPI available at http://0.0.0.0:8002/openapi_spec
Title: R2R Application API


In [5]:
# Initialize the client
from r2r import R2RClient
client = R2RClient(base_url=BASE_URL)
client

<sdk.sync_client.R2RClient at 0x7f22dd9c72c0>

In [6]:
os.environ['OPENAI_API_KEY']

'rse-s0HBWeEhWUXJx5LI3gNNtJWHdKBOePvRVuh8s3BplYRiT7VZ0N2aMej'

In [7]:
os.environ['OPENAI_API_BASE']

'http://athena.skhms.com/rse/embedding/qwen3-4b/v1'

## Ingest a sample document

You can upload your own `.pdf`, `.txt`, `.md`, etc.
Below we create a small `.txt` file and ingest it.

In [8]:
# Create a small sample file
# sample_path = 'NCB-PCI_Express_Base_6.3.md'
sample_path = 'CONTRIBUTING.md'
# with open(sample_path, 'w', encoding='utf-8') as f:
#     f.write('DeepSeek R1 is a reasoning model. This file is a simple demo.\n')
#     f.write('RAG combines retrieval with generation to produce grounded answers.')
# print('Wrote', sample_path)
client.documents.delete("15df7e62-584a-5257-ae77-91bf10d27028")
# Ingest the file
doc = client.documents.create(file_path=sample_path, run_with_orchestration=True)
doc

R2RResults[IngestionResponse](results=IngestionResponse(message='Document created and ingested successfully.', task_id=None, document_id=UUID('15df7e62-584a-5257-ae77-91bf10d27028')))

In [9]:
# client.documents.extract(id="15df7e62-584a-5257-ae77-91bf10d27028")

In [10]:
# List your documents
docs = client.documents.list()
len(docs)
for doc in docs:
    print(doc)


id=UUID('15df7e62-584a-5257-ae77-91bf10d27028') collection_ids=[UUID('122fdf6a-e116-546b-a8f6-e4cb2e2c0a09')] owner_id=UUID('2acb499e-8428-543b-bd85-0d9098718220') document_type=<DocumentType.MD: 'md'> metadata={'version': 'v0'} title='CONTRIBUTING.md' version='v0' size_in_bytes=1006 ingestion_status=<IngestionStatus.SUCCESS: 'success'> extraction_status=<GraphExtractionStatus.PENDING: 'pending'> created_at=datetime.datetime(2025, 9, 10, 21, 39, 2, 780000, tzinfo=TzInfo(UTC)) updated_at=datetime.datetime(2025, 9, 10, 21, 39, 2, 783006, tzinfo=TzInfo(UTC)) ingestion_attempt_number=None summary='The document contains a contribution guide for R2R that outlines the process for submitting code contributions. It describes the pre-discussion process for proposing ideas via issues or Discord, emphasizes adherence to a Code of Conduct adapted from the Contributor Covenant, and details the Pull Request process including dependency management, documentation updates, SemVer versioning, and the req

## Search

Run a semantic or hybrid search over ingested content.

In [11]:
search_res = client.retrieval.search(query='correctable error',
                                     search_mode='advanced',
                                     search_settings={"use_semantic_search": True,
                                                      "use_fulltext_search": True,
                                                      "use_hybrid_search": True,
                                                      "search_strategy": "hyde"})
search_res

R2RResults[AggregateSearchResult](results=AggregateSearchResult(chunk_search_results=[ChunkSearchResult(score=0.018, text=Fatal or Non-fatal with the Error Severity register. Device Functions without Advanced Error Reporting Extended
Capability use the default associations and are not reprogrammable.
The detecting agent action for Downstream Ports that implement Downstream Port Containment (DPC) and have it
enabled will be different if the error triggers DPC. DPC behavior is not described in the following tables. See § Section
6.2.11 for the description of DPC behavior.
References
§ Section
6.2.10
§ Section
6.2.10
§ Section
6.2.4.2
Error Name
Error Type
(Default Severity)
Table 6-2 General PCI Express Error List
Detecting Agent Action 122
§
Corrected Internal
Error
Correctable
(masked by default)
Component:
Send ERR_COR to Root Complex.
Uncorrectable Internal
Error
Uncorrectable
(Fatal and masked by
default)
Component: Send ERR_FATAL to Root Complex.
Optionally, log the prefix/header o

## RAG (with citations)

Ask a question and let R2R retrieve + generate an answer grounded in your documents.

In [12]:
rag_res = client.retrieval.rag(query='how to find receiver error?')
rag_res

R2RResults[RAGResponse](results=RAGResponse(generated_answer='To find receiver errors, you can use several methods based on the provided search results:\n\n1. Check the mandatory receiver error checking mentioned in [e06963b], which states that some checking for Receiver Errors is required in all cases (see § Section 4.2.1.1.3, § Section 4.2.5.8, and § Section 4.2.7).\n\n2. Use margining capabilities as described in [c8c6243] and [5a38de4]. The Margining Lane Control and Status Registers include Receiver Number fields that can help identify specific receivers experiencing errors.\n\n3. Implement error injection functionality as described in [7d24bc2], which includes the "Flit Error Injection Enable" bit that allows for injecting errors in transmitted or received flits to test error detection.\n\n4. Monitor the Margining Lane Status Register as mentioned in [d50422d], which reflects error status information when a Control SKP Ordered Set is received and passes CRC and parity checks.\n\n

## Agentic RAG

Use the conversational agent with retrieval tools for richer, multi-step answers.

In [13]:
agent_res = client.retrieval.agent(
    message={"role": "user", "content": "Give a short analysis of AER."},
    rag_tools=["search_file_knowledge", "get_file_content"],
)
agent_res

R2RResults[AgentResponse](results=AgentResponse(messages=[Message(role='assistant', content='\nBased on the search results, I can provide a short analysis of AER (Advanced Error Reporting) in the context of PCI Express:\n\n## AER Analysis\n\n**Advanced Error Reporting (AER)** is an enhanced error reporting capability defined in the PCI Express specification that goes beyond the baseline error reporting requirements.\n\n### Key Characteristics:\n\n1. **Two-tier error reporting system**: PCI Express defines two paradigms:\n   - **Baseline capability**: Required for all PCI Express devices, defining minimum error reporting requirements\n   - **AER capability**: An advanced, optional enhancement for more sophisticated error handling\n\n2. **Comprehensive error coverage**: AER handles errors that:\n   - Occur on the PCI Express interface itself\n   - Occur on behalf of transactions initiated on PCI Express\n   - Occur within a component and are related to the PCI Express interface\n\n3. **E

## Advanced: custom search settings

You can pass `search_mode` and `search_settings` to control hybrid search, filters, etc.

In [None]:
advanced = client.retrieval.rag(
    query='Hot reset failure',
    search_mode='advanced',
    search_settings={
        "use_hybrid_search": True,
        "hybrid_settings": {
            "full_text_weight": 1.0,
            "semantic_weight": 5.0,
            "full_text_limit": 50,
            "rrf_k": 50
        }
    }
)


'Based on the provided context, I don\'t find specific information about "hot reset failure" directly. However, there is some related information about hot reset in the context of Retimers:\n\nIn [af3a17d], there\'s information about handling hot reset in Retimers:\n- "The Retimer follows these additional rules if the Retimer is exiting Electrical Idle after entering Electrical Idle as a result of Hot Reset, and the Retimer Enter Compliance bit is Set in the Retimer."\n- When a Lane receives an EIOS (Electrical Idle Ordered Set), the system is expected to set the RT_next_data_rate and RT_error_data_rate variables to 2.5 GT/s.\n\nThis suggests that hot reset is related to Electrical Idle states and may involve data rate adjustments, but the provided context doesn\'t specifically address failure scenarios for hot reset operations.\n\nFor more detailed information about hot reset failures, additional context would be needed that specifically addresses error conditions or failure modes rel

In [18]:
advanced.results

RAGResponse(generated_answer='Based on the provided context, I don\'t find specific information about "hot reset failure" directly. However, there is some related information about hot reset in the context of Retimers:\n\nIn [af3a17d], there\'s information about handling hot reset in Retimers:\n- "The Retimer follows these additional rules if the Retimer is exiting Electrical Idle after entering Electrical Idle as a result of Hot Reset, and the Retimer Enter Compliance bit is Set in the Retimer."\n- When a Lane receives an EIOS (Electrical Idle Ordered Set), the system is expected to set the RT_next_data_rate and RT_error_data_rate variables to 2.5 GT/s.\n\nThis suggests that hot reset is related to Electrical Idle states and may involve data rate adjustments, but the provided context doesn\'t specifically address failure scenarios for hot reset operations.\n\nFor more detailed information about hot reset failures, additional context would be needed that specifically addresses error co

In [None]:
advanced.results.generated_answer

In [17]:
advanced.results.search_results

AggregateSearchResult(chunk_search_results=[ChunkSearchResult(score=0.018, text=truncating a Translation Completion the TA is not allowed to pad the response with invalid entries (R = 0b, W = 0b).
Note: There are multiple reasons that the TA may break a Translation Completion into multiple TLPs. As an example, if
the virtual address of the Translation Completion resolves to a table access that crosses an implementation specific
address boundary, the completion to the TA may be broken into two completions. Rather than require that the TA
accumulate the results, the TA is permitted to send each portion of the Translation Completion to a Function when it is
received from memory.
Page 1626
6.3-1.0-PUB — PCI Express® Base Specification Revision 6.3
10.3 ATS Invalidation §
ATS uses the messages shown in this section to maintain consistency between the TA and the ATC. This specification
assumes there is a single TA associated with each ATC. The TA (in conjunction with its associated software)