# Retrieval Layer — Example Query Notebook

This notebook demonstrates the **agentic-rag-kubeflow** retrieval layer
using a lightweight in-memory vector store so that it runs entirely
self-contained (no Chroma server required).

**What you'll see:**
1. Vector DB abstraction (`VectorStoreBase` / `FakeVectorStore`)
2. Top-k semantic search
3. Metadata-aware filtering with `MetadataFilter`
4. Citation tracking with `Citation` and `RetrievalResult`
5. Score-threshold filtering
6. Formatted results table

## 1 — Imports

In [1]:
from __future__ import annotations

import sys, pathlib

# Ensure the project src/ is on the path so we can import agentic_rag
_project_root = pathlib.Path.cwd().parent if pathlib.Path.cwd().name == "notebooks" else pathlib.Path.cwd()
if str(_project_root / "src") not in sys.path:
    sys.path.insert(0, str(_project_root / "src"))

from typing import Any

from agentic_rag.retrieval.base import VectorStoreBase
from agentic_rag.retrieval.models import Citation, MetadataFilter, RetrievalResult
from agentic_rag.retrieval.retriever import SemanticRetriever

print("✓ All retrieval imports successful")

✓ All retrieval imports successful


## 2 — In-Memory Fake Vector Store

We create a simple in-memory backend that implements `VectorStoreBase`.
This lets us demo the full retrieval pipeline without a running Chroma server.

In [2]:
class FakeVectorStore(VectorStoreBase):
    """Deterministic in-memory store seeded with sample documents."""

    def __init__(self, docs: list[dict[str, Any]]) -> None:
        super().__init__("demo-collection")
        self._docs = docs

    def similarity_search(
        self,
        query_embedding: list[float],
        *,
        k: int = 5,
        filters: list[MetadataFilter] | None = None,
    ) -> list[dict[str, Any]]:
        return self._apply_filters(filters)[:k]

    def similarity_search_by_text(
        self,
        query: str,
        *,
        k: int = 5,
        filters: list[MetadataFilter] | None = None,
    ) -> list[dict[str, Any]]:
        return self._apply_filters(filters)[:k]

    def health_check(self) -> bool:
        return True

    # ---- helpers ----
    def _apply_filters(self, filters: list[MetadataFilter] | None) -> list[dict[str, Any]]:
        if not filters:
            return list(self._docs)
        results = []
        for doc in self._docs:
            meta = doc.get("metadata", {})
            if all(self._match(f, meta) for f in filters):
                results.append(doc)
        return results

    @staticmethod
    def _match(f: MetadataFilter, meta: dict[str, Any]) -> bool:
        val = meta.get(f.field)
        if f.operator == "eq":
            return val == f.value
        if f.operator == "ne":
            return val != f.value
        if f.operator == "in":
            return val in f.value
        if f.operator == "gt":
            return val is not None and val > f.value
        if f.operator == "gte":
            return val is not None and val >= f.value
        if f.operator == "lt":
            return val is not None and val < f.value
        if f.operator == "lte":
            return val is not None and val <= f.value
        return True

print("✓ FakeVectorStore defined")

✓ FakeVectorStore defined


## 3 — Seed Sample Documents

We populate the store with a small corpus of chunks with rich metadata
so that filtering and citation tracing are clearly visible.

In [3]:
SAMPLE_DOCS: list[dict[str, Any]] = [
    {
        "id": "chunk-001",
        "content": (
            "Kubeflow Pipelines is a platform for building and deploying portable, "
            "scalable ML workflows based on Docker containers."
        ),
        "score": 0.95,
        "metadata": {
            "source": "kubeflow_pipelines_guide.md",
            "chunk_index": 0,
            "page": 1,
            "category": "orchestration",
            "date": "2025-06-15",
        },
    },
    {
        "id": "chunk-002",
        "content": (
            "KServe provides a Kubernetes CRD for serving ML models with "
            "autoscaling, canary rollouts, and pre-/post-processing."
        ),
        "score": 0.91,
        "metadata": {
            "source": "kserve_docs.md",
            "chunk_index": 3,
            "page": 12,
            "category": "serving",
            "date": "2025-08-01",
        },
    },
    {
        "id": "chunk-003",
        "content": (
            "LangGraph enables building stateful, multi-actor applications with "
            "LLMs by modelling workflows as graphs."
        ),
        "score": 0.88,
        "metadata": {
            "source": "langgraph_overview.md",
            "chunk_index": 1,
            "category": "agents",
            "date": "2025-09-20",
        },
    },
    {
        "id": "chunk-004",
        "content": (
            "ChromaDB is an open-source embedding database designed for AI "
            "applications with first-class LangChain integration."
        ),
        "score": 0.82,
        "metadata": {
            "source": "chroma_overview.md",
            "chunk_index": 0,
            "category": "vector-db",
            "date": "2025-04-10",
        },
    },
    {
        "id": "chunk-005",
        "content": (
            "Retrieval-Augmented Generation (RAG) combines a retriever over a "
            "knowledge base with a generative LLM to reduce hallucinations."
        ),
        "score": 0.78,
        "metadata": {
            "source": "rag_survey_arxiv.md",
            "chunk_index": 2,
            "page": 5,
            "category": "research",
            "date": "2024-11-30",
        },
    },
    {
        "id": "chunk-006",
        "content": (
            "Sentence-Transformers maps sentences to a 384-dimensional dense "
            "vector space suitable for semantic search and clustering."
        ),
        "score": 0.40,
        "metadata": {
            "source": "sentence_transformers_docs.md",
            "chunk_index": 0,
            "category": "embeddings",
            "date": "2025-01-05",
        },
    },
]

store = FakeVectorStore(SAMPLE_DOCS)
print(f"✓ Seeded store with {len(SAMPLE_DOCS)} chunks")

✓ Seeded store with 6 chunks


## 4 — Basic Top-k Semantic Search

Run a simple search over all documents, returning the top 3 results
with full citation provenance.

In [4]:
retriever = SemanticRetriever(store=store, default_k=3)

results = retriever.search("How does Kubeflow orchestrate ML workflows?")

print(f"{'#':<3} {'Score':>6}  {'Source':<35} {'Chunk':>5}  Content")
print("-" * 100)
for i, r in enumerate(results, 1):
    c = r.citation
    print(f"{i:<3} {c.score:>6.2f}  {c.source:<35} {c.chunk_index or '-':>5}  {r.content[:55]}…")

#    Score  Source                              Chunk  Content
----------------------------------------------------------------------------------------------------
1     0.95  kubeflow_pipelines_guide.md             -  Kubeflow Pipelines is a platform for building and deplo…
2     0.91  kserve_docs.md                          3  KServe provides a Kubernetes CRD for serving ML models …
3     0.88  langgraph_overview.md                   1  LangGraph enables building stateful, multi-actor applic…


## 5 — Metadata Filtering

Filter results to only documents in the **"orchestration"** or **"serving"** categories.

In [5]:
# Filter: category must be one of "orchestration" or "serving"
category_filter = MetadataFilter.one_of("category", ["orchestration", "serving"])

filtered = retriever.search(
    "ML deployment",
    k=5,
    filters=[category_filter],
)

print(f"Results with category ∈ {{orchestration, serving}}  ({len(filtered)} hits)\n")
for r in filtered:
    c = r.citation
    print(f"  {c.short_ref()}  score={c.score:.2f}  category={c.metadata.get('category')}")
    print(f"    → {r.content[:80]}…\n")

Results with category ∈ {orchestration, serving}  (2 hits)

  [kubeflow_pipelines_guide.md§0]  score=0.95  category=orchestration
    → Kubeflow Pipelines is a platform for building and deploying portable, scalable M…

  [kserve_docs.md§3]  score=0.91  category=serving
    → KServe provides a Kubernetes CRD for serving ML models with autoscaling, canary …



## 6 — Score Threshold Filtering

Create a stricter retriever that drops any result with similarity < 0.5.

In [6]:
strict_retriever = SemanticRetriever(store=store, default_k=10, score_threshold=0.5)

strict_results = strict_retriever.search("embeddings and vector search")

print(f"All chunks: {len(SAMPLE_DOCS)} | After threshold ≥ 0.5: {len(strict_results)}\n")
for r in strict_results:
    print(f"  {r.citation.score:.2f}  {r.citation.source}")

All chunks: 6 | After threshold ≥ 0.5: 5

  0.95  kubeflow_pipelines_guide.md
  0.91  kserve_docs.md
  0.88  langgraph_overview.md
  0.82  chroma_overview.md
  0.78  rag_survey_arxiv.md


## 7 — Citation Deep-Dive

Inspect the full `Citation` object to show how every result traces
back to its source document, chunk position, and page.

In [7]:
import json

full_results = retriever.search("Tell me about RAG and vector databases", k=4)

for i, r in enumerate(full_results, 1):
    c = r.citation
    print(f"── Result {i} ──────────────────────────────────────────────")
    print(f"  Citation ID  : {c.citation_id}")
    print(f"  Document ID  : {c.document_id}")
    print(f"  Source       : {c.source}")
    print(f"  Chunk index  : {c.chunk_index}")
    print(f"  Page         : {c.page or 'N/A'}")
    print(f"  Score        : {c.score:.4f}")
    print(f"  Retrieved at : {c.retrieved_at.isoformat()}")
    print(f"  Short ref    : {c.short_ref()}")
    print(f"  Metadata     : {json.dumps(c.metadata, indent=2)}")
    print(f"  Content      : {r.content[:90]}…")
    print()

── Result 1 ──────────────────────────────────────────────
  Citation ID  : 712063ff1ba9
  Document ID  : chunk-001
  Source       : kubeflow_pipelines_guide.md
  Chunk index  : 0
  Page         : 1
  Score        : 0.9500
  Retrieved at : 2026-02-12T05:52:16.886602+00:00
  Short ref    : [kubeflow_pipelines_guide.md§0]
  Metadata     : {
  "source": "kubeflow_pipelines_guide.md",
  "chunk_index": 0,
  "page": 1,
  "category": "orchestration",
  "date": "2025-06-15"
}
  Content      : Kubeflow Pipelines is a platform for building and deploying portable, scalable ML workflow…

── Result 2 ──────────────────────────────────────────────
  Citation ID  : 2230fe02bbbc
  Document ID  : chunk-002
  Source       : kserve_docs.md
  Chunk index  : 3
  Page         : 12
  Score        : 0.9100
  Retrieved at : 2026-02-12T05:52:16.886701+00:00
  Short ref    : [kserve_docs.md§3]
  Metadata     : {
  "source": "kserve_docs.md",
  "chunk_index": 3,
  "page": 12,
  "category": "serving",
  "date": "2

## 8 — Combined: Metadata Filter + Score Threshold

Filter for documents dated after 2025-05-01, with strict score threshold.

In [8]:
# Combine: date > 2025-05-01 AND score ≥ 0.5
date_filter = MetadataFilter(field="date", operator="gt", value="2025-05-01")

combo_retriever = SemanticRetriever(store=store, default_k=10, score_threshold=0.5)
combo_results = combo_retriever.search("ML infrastructure", filters=[date_filter])

print(f"Filters: date > 2025-05-01 + score ≥ 0.5  →  {len(combo_results)} results\n")
print(f"{'Source':<35} {'Date':<12} {'Category':<15} {'Score':>6}")
print("-" * 75)
for r in combo_results:
    c = r.citation
    print(
        f"{c.source:<35} {c.metadata.get('date', 'N/A'):<12} "
        f"{c.metadata.get('category', 'N/A'):<15} {c.score:>6.2f}"
    )

Filters: date > 2025-05-01 + score ≥ 0.5  →  3 results

Source                              Date         Category         Score
---------------------------------------------------------------------------
kubeflow_pipelines_guide.md         2025-06-15   orchestration     0.95
kserve_docs.md                      2025-08-01   serving           0.91
langgraph_overview.md               2025-09-20   agents            0.88


## 9 — Serialization (API / Agent Handoff)

`RetrievalResult` and `Citation` are Pydantic models, so they serialize
cleanly for JSON APIs or agent tool responses.

In [9]:
# Serialize the first result to JSON (e.g. for an API response)
first = results[0]
payload = first.model_dump_json(indent=2)
print(payload)

{
  "content": "Kubeflow Pipelines is a platform for building and deploying portable, scalable ML workflows based on Docker containers.",
  "citation": {
    "citation_id": "36ba2cc70969",
    "document_id": "chunk-001",
    "source": "kubeflow_pipelines_guide.md",
    "chunk_index": 0,
    "page": 1,
    "score": 0.95,
    "metadata": {
      "source": "kubeflow_pipelines_guide.md",
      "chunk_index": 0,
      "page": 1,
      "category": "orchestration",
      "date": "2025-06-15"
    },
    "retrieved_at": "2026-02-12T05:51:53.556635Z"
  }
}
