Discovery
autobot-backend/knowledge/rag_benchmarks.py (Issue #58) benchmarks RAG operations using randomly generated mock embeddings and documents. It never instantiates a real KnowledgeBase, connects to ChromaDB, or runs queries against actual indexed content. All results measure synthetic numpy array operations, not real retrieval quality.
Evidence
@pytest.fixture
def mock_embeddings(self):
return [[random.random() for _ in range(384)] for _ in range(100)]
@pytest.fixture
def mock_documents(self):
return [{"id": f"doc_{i}", "content": f"This is test document {i}...",
"embedding": [random.random() for _ in range(384)]}
for i in range(1000)]
All benchmark tests operate on these fixtures. No fixture mounts a real KB or ChromaDB collection. The benchmarks measure:
- Raw cosine similarity on random vectors (not semantic similarity)
- Top-k selection on random data (not real document relevance)
- A simulated pipeline with
time.sleep() calls for "realism"
The file is also not wired into any CI pipeline, scheduler, or feedback loop.
Impact
Fix
Add a RealKBBenchmarks test class alongside the existing mock class that:
- Connects to a real (or test-fixture) ChromaDB instance with seeded documents
- Runs
AdvancedRAGOptimizer.advanced_search() with real queries
- Scores results against known-good ground truth (precision@k, MRR)
- Can run in CI with a lightweight ChromaDB fixture (in-memory mode)
The mock benchmarks can remain for pure performance microbenchmarks (vector math speed etc).
Affected File
autobot-backend/knowledge/rag_benchmarks.py — add real-KB benchmark class
Prerequisite For
Discovery
autobot-backend/knowledge/rag_benchmarks.py(Issue #58) benchmarks RAG operations using randomly generated mock embeddings and documents. It never instantiates a realKnowledgeBase, connects to ChromaDB, or runs queries against actual indexed content. All results measure synthetic numpy array operations, not real retrieval quality.Evidence
All benchmark tests operate on these fixtures. No fixture mounts a real KB or ChromaDB collection. The benchmarks measure:
time.sleep()calls for "realism"The file is also not wired into any CI pipeline, scheduler, or feedback loop.
Impact
AdvancedRAGOptimizercannot be validated by running benchmarksRetrievalLearnerFix
Add a
RealKBBenchmarkstest class alongside the existing mock class that:AdvancedRAGOptimizer.advanced_search()with real queriesThe mock benchmarks can remain for pure performance microbenchmarks (vector math speed etc).
Affected File
autobot-backend/knowledge/rag_benchmarks.py— add real-KB benchmark classPrerequisite For
RetrievalLearner)