Conversation
Code ReviewUser Requirement Compliance: All four stated goals are implemented. No explicit requirements are violated. Overall Assessment: Solid work with a few correctness gaps in test mocking and one transparency inconsistency. Nothing that prevents merging, but two issues should be fixed before merge. Strengths
Issues Found1. Test mocking bug in four reranker tests (Medium — tests pass for wrong reasons) Location:
What actually happens: the density check code does
Fix: add dense_links = Mock()
dense_links.get_as_df.return_value = pd.DataFrame({"total_links": [20]})
dense_articles = Mock()
dense_articles.get_as_df.return_value = pd.DataFrame({"total_articles": [10]})
mock_result = Mock()
mock_result.get_as_df.return_value = pd.DataFrame(
{"article_id": [1, 2], "centrality": [0.3, 0.8]}
)
mock_kuzu_conn.execute.side_effect = [dense_links, dense_articles, mock_result]2. Transparency cypher string has mismatched parameter name (Low — documentation-only bug) Location: query_plan = {
"type": "vector_search",
"cypher": "CALL QUERY_VECTOR_INDEX('Section', 'embedding_idx', $query, K) RETURN *",
"cypher_params": {"q": question},
}The Fix: change the transparency string to match what "cypher": "CALL QUERY_VECTOR_INDEX('Section', 'embedding_idx', $emb, $k) RETURN *", # transparency only
"cypher_params": {"emb": "<embedding vector>", "k": max_results},Minor Observations (no fix required)Timing label includes augmentation work.
Philosophy Compliance (within user constraints)
Recommendation: Fix the four test mocking gaps (issue #1) before merge. Issue #2 is low-priority but straightforward to fix in the same pass. |
Security Review (Step 16c)PR: #168 issue-163-kg-agent-retrieval-pipeline 1. Cypher Injection via Vector Search User InputFinding: PASS — Parameterised correctly throughout The vector primary retrieval path calls result = self.conn.execute(
"CALL QUERY_VECTOR_INDEX('Section', 'embedding_idx', $emb, $k)",
{"emb": query_embedding, "k": top_k * 3},
)The user's The fast-path title lookup also uses parameterised binding: conn.execute(
"MATCH (a:Article {title: $query})-[:HAS_SECTION]->(s:Section) RETURN s.embedding AS embedding LIMIT 1",
{"query": query},
)No injection risk is introduced by this PR. 2.
|
| Check | Result | Severity |
|---|---|---|
| Cypher injection via user question | Parameterised via embedding vector | Pass |
$query/$emb parameter binding |
Correct typed binding, not string | Pass |
--disable-* CLI flags attack surface |
Boolean only, no path/query influence | Pass |
| Sparse graph detection with malformed data | Hardcoded queries, integer arithmetic only | Pass |
| Error handling in vector retrieval | Logged warning, graceful fallback | Informational |
No blocking security findings. This PR is clean from a security perspective. The vector search path correctly avoids any form of query string injection by routing user input exclusively through the embedding layer before it reaches Kuzu.
query_plan["cypher"] previously showed wrong param name ($query vs $emb)
and bare K literal. Now matches actual semantic_search() execution:
CALL QUERY_VECTOR_INDEX('Section', 'embedding_idx', $emb, N) RETURN *
cypher_params: {"emb": "<embedding_vector>"}
Addresses: #168 (comment)
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Step 17: Review Feedback AddressedFixed in commit e7a85f2: Issue 2 (Cypher transparency string):
Issue 1 (test mocking): |
- audit_pack_content.py: positional db paths, not --pack flag - validate_pack_urls.py: positional urls_file path, not --pack flag - build_dotnet_pack.py: only --test-mode exists, no --min-content-words flag (threshold is enforced via WebContentSource default, not CLI) - run_enhancement_evaluation.py: no --pack or --use-enhancements flags; targets physics pack by default, only --disable-* flags added in PR #168 - Add PR reference notes so readers know which PRs these features require - Remove broken reference to non-existent docs/reference/evaluation-scripts.md Addresses: #170 (comment) #170 (comment) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Philosophy Review (Step 16d)Zen-Architect Review: PR #168 — Vector Primary Retrieval, Sparse Graph Fix, A/B Flags Philosophy Score: B+Strengths
Concerns
Violations
Recommendations
Regeneration Assessment
|
Add warning when enable_reranker/multidoc/fewshot flags are explicitly set to non-default values but use_enhancements=False, so callers know the flags have no effect. Addresses: #168 (comment) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Philosophy Review Response (Step 17)Fixed the deceptive enable_ no-op* in commit ade4180:
Deferring the other concerns:
None of the concerns were zero-BS violations. |
Step 13: Local Testing ResultsTest Environment: issue-163-kg-agent-retrieval-pipeline branch, 2026-02-27
Issues Found: None — all scenarios work as expected. |
…#170) * Add how-to docs for Phase 3 features (vector search, eval questions, .NET quality) - docs/howto/vector-search-primary-retrieval.md: Vector-first retrieval pipeline, sparse graph detection for Rust pack, A/B testing CLI flags - docs/howto/generating-evaluation-questions.md: generate_eval_questions.py usage, run_all_packs_evaluation.py, statistical significance guidance - docs/howto/dotnet-content-quality.md: audit_pack_content.py, content threshold, URL validation, expected accuracy improvements - docs/index.md: link all three new guides from docs index Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Fix pre-commit issues: ruff auto-fixes, format, end-of-file newlines Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Step 17: Fix doc CLI commands to match actual script arguments - audit_pack_content.py: positional db paths, not --pack flag - validate_pack_urls.py: positional urls_file path, not --pack flag - build_dotnet_pack.py: only --test-mode exists, no --min-content-words flag (threshold is enforced via WebContentSource default, not CLI) - run_enhancement_evaluation.py: no --pack or --use-enhancements flags; targets physics pack by default, only --disable-* flags added in PR #168 - Add PR reference notes so readers know which PRs these features require - Remove broken reference to non-existent docs/reference/evaluation-scripts.md Addresses: #170 (comment) #170 (comment) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…, A/B flags
Priority 5 - Vector Search as Primary Retrieval:
- Add _vector_primary_retrieve() method to KnowledgeGraphAgent
- Use QUERY_VECTOR_INDEX('Section', 'embedding_idx') as PRIMARY retrieval path
- Fall back to LLM-generated Cypher when max cosine similarity < 0.6
- Reduces 30% query failure rate from bad Cypher generation
Priority 1 - Fix Enhancement Degradation on Rust Pack:
- Add _check_graph_density() to GraphReranker
- Check LINKS_TO edge count / Article count before applying centrality
- If avg links/article < 2.0, disable centrality scoring for the session
- Logs warning: 'Sparse graph detected (avg N links/article), disabling centrality'
- Prevents score degradation on web-doc packs like Rust (sparse LINKS_TO)
Priority 7 - A/B Testing Enhancement CLI Flags:
- Add enable_reranker, enable_multidoc, enable_fewshot to KnowledgeGraphAgent.__init__
- Each defaults to True when use_enhancements=True
- Add --disable-reranker, --disable-multidoc, --disable-fewshot to both evaluation scripts
- Enables measuring isolated impact of each enhancement component
Tests:
- TestSparseGraphDetection: 5 new tests for sparse graph detection in reranker
- TestVectorPrimaryRetrieval: 6 new tests for vector-first retrieval logic
- TestABTestingFlags: 6 new tests for individual component flags
- Update existing reranker tests to account for density check mock calls
- Add conftest.py to ensure local workstream takes precedence over installed package
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Complete grand_summary dict comprehension (was cut off mid-expression) - Add save block to write results to data/packs/all_packs_evaluation.json - Fix 5 ruff issues (f-strings, formatting) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Split 8 semicolon-joined mock statements in tests/packs/test_reranker.py - Apply ruff-format to 7 files (reformatted by hook) - Fix end-of-file-fixer on evaluation JSON files Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
query_plan["cypher"] previously showed wrong param name ($query vs $emb)
and bare K literal. Now matches actual semantic_search() execution:
CALL QUERY_VECTOR_INDEX('Section', 'embedding_idx', $emb, N) RETURN *
cypher_params: {"emb": "<embedding_vector>"}
Addresses: #168 (comment)
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Add warning when enable_reranker/multidoc/fewshot flags are explicitly set to non-default values but use_enhancements=False, so callers know the flags have no effect. Addresses: #168 (comment) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
ade4180 to
7e97757
Compare
Summary
Priority 5: Vector search is now the PRIMARY retrieval path in
KnowledgeGraphAgent.query(). UsesCALL QUERY_VECTOR_INDEX('Section', 'embedding_idx', $query, K) RETURN *as the first strategy. Falls back to LLM-generated Cypher only when max cosine similarity < 0.6. Reduces the 30% query failure rate from bad Cypher generation.Priority 1: Fixed enhancement degradation on the Rust pack.
GraphReranker.rerank()now checks graph density before applying centrality scores. If avg LINKS_TO edges/article < 2.0 (sparse graph like Rust web docs), centrality is disabled for the session with a warning log. Prevents the -1.2 score degradation observed on non-Wikipedia packs.Priority 7: Added
enable_reranker,enable_multidoc,enable_fewshotconstructor params toKnowledgeGraphAgent(all default True whenuse_enhancements=True). Added corresponding--disable-reranker,--disable-multidoc,--disable-fewshotCLI flags to both evaluation scripts for isolated component testing.Changes
wikigr/agent/kg_agent.py: Add_vector_primary_retrieve(), restructurequery()for vector-first, add A/B flag params, updatefrom_connection()wikigr/agent/reranker.py: Add_check_graph_density(),_sparse_graphcache, sparse check inrerank()scripts/run_enhancement_evaluation.py: Add--disable-*flagsscripts/run_all_packs_evaluation.py: Add--disable-*flags and pass toevaluate_pack()tests/agent/test_kg_agent_semantic.py: 12 new tests for vector primary retrieval and A/B flagstests/packs/test_reranker.py: 5 new tests for sparse graph detection; update 4 existing tests for new density check callsconftest.py: Ensure local workstream source takes precedence over installed package in testsTest plan
tests/agent/test_kg_agent_semantic.py- 20 tests pass (12 new)tests/packs/test_reranker.py- 23 tests pass (5 new + 4 updated)🤖 Generated with Claude Code