Skip to content

Phase 1 Production Hardening: E2E tests, RF-7 privacy, perf baselines, Tantivy consistency#53

Merged
juice094 merged 12 commits into
mainfrom
docs/agents-trim
May 18, 2026
Merged

Phase 1 Production Hardening: E2E tests, RF-7 privacy, perf baselines, Tantivy consistency#53
juice094 merged 12 commits into
mainfrom
docs/agents-trim

Conversation

@juice094
Copy link
Copy Markdown
Owner

Summary

This PR delivers the Phase 1 production-readiness hardening for devbase v0.20.0, spanning end-to-end testing, path privacy (RF-7), performance regression baselines, and Tantivy index consistency.

Changes

Testing (Wave 1-3)

  • Registry Wave 1: relation + health module coverage
  • Registry Wave 2: code-symbols, call-graph, dead-code coverage
  • MCP Scenario E2E: Claude onboarding (>devkit_health → project_brief → query_repos) and semantic exploration (hybrid_search → project_context → vault_search) chains
  • Workflow E2E: MCP-tool-level DAG execution tests with Condition steps, verifying registration → run → status query full chain and ErrorPolicy::Fail propagation

MCP Tooling

  • MCP oplog analytics: devkit_oplog_query gains analytics mode with latency percentiles (p50/p95/p99), tool breakdown, error classification, and success-rate metrics
  • Vault search fix: Replaced broken VaultNote deserialization with direct Value traversal, eliminating silent empty-result failures
  • 5 Stable Tool schemas frozen: devkit_health, devkit_project_brief, devkit_query_repos, devkit_hybrid_search, devkit_vault_search

Security / Privacy (RF-7)

  • Added sanitize_path() helper: replaces dirs::home_dir() prefix with ~, normalizes \/
  • Applied across project_context output fields: repo.path, modules.path, symbols.file, calls.caller_file, assets.path
  • 5 unit tests verify home-prefix replacement, separator normalization, and edge cases

Performance

  • Fixed keyword_search_latency_regression_* tests (missing signature column caused panic)
  • Profile-aware thresholds via cfg!(debug_assertions): release 1k<200ms/10k<500ms; debug uses relaxed thresholds to avoid false positives

Storage / Index Consistency

  • Critical fix: AppContext::with_storage() now uses the actual storage backend's index_path for repair_tantivy_consistency_at and sync_index_to_db_at — previously hardcoded to DefaultStorageBackend, causing TempStorageBackend tests to check the wrong directory
  • repair_tantivy_consistency_at no longer returns early on Tantivy read failure; SQLite IDs are loaded first, and on failure missing_from_index = sqlite_ids.len() is reported
  • Added 3 tests: fresh workspace consistency, empty-index-with-DB-repos detection, AppContext index-path correctness

Sync / Registry

  • MANAGED_TAGS constant + RepoEntry::is_managed() for sync transparency
  • devkit_repo_status command with managed/unmanaged/dirty/behind/ahead counts

Test Plan

  • cargo test --lib --tests --bins --examples passes
  • cargo fmt --check clean
  • cargo clippy --all-targets -- -W warnings clean
  • Release-mode perf tests pass (1k < 200ms, 10k < 500ms)

Risk Assessment

  • Low risk: All changes are additive (tests, new functions) or bug fixes (vault_search, Tantivy path hardcoding). No breaking schema changes.
  • Medium attention: sanitize_path() affects all project_context consumers; paths now use ~ prefix and forward slashes.

🤖 Generated with Claude Code

juice094 and others added 12 commits May 16, 2026 22:50
- AGENTS.md: 705→258 行,保留核心约束(环境指引/关键约定/安全原则/架构红线/禁止事项)
- docs/AGENTS-full.md: 保留完整版(历史记录/路线图/详细讨论)
- docs/production-readiness.md: 新增生产就绪检查清单
  - 6 大维度:稳定性/性能/MCP/Agent集成/文档/发布流程
  - 4 阶段推进计划:Phase0(当前)→Phase1(稳定化)→Phase2(Agent试点)→Phase3(v1.0)
  - 所有条目客观可验证,禁止主观描述
- Add description fields to 12 crates missing them in Cargo.toml
- Remove dead code: sync_skills_to_clarity (Client-Agnostic violation),
  update_repo_last_synced_at, list_workspaces_by_tier
- Add SAFETY comments to 4 unsafe env var blocks in mcp/tests.rs
- RepairResult caller now logs orphan/missing counts; remove allow(dead_code)
- Remove unused FolderScheduler::new, add NOTE for retained dead_code

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- relation: add test_list_relations, test_find_related_entities_bidirectional,
  test_save_relation_upsert (was only 1 smoke test)
- health: add test_get_health_batch covering batch query, empty input,
  and partial miss scenarios
- 5 tested crates now at 88-97% region coverage

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Add MANAGED_TAGS constant and RepoEntry::is_managed() in registry.rs
  Document that managed lives in repo_tags (queryable) not metadata
- Replace inline MANAGED_TAGS check in sync/tasks.rs with repo.is_managed()
- Add devbase repo list: shows managed flag, type, tier, path
- Add devbase repo status: batch git health (ahead/behind/dirty/managed)
  with health cache TTL reuse and --json support
- Enhance sync output: categorize skipped repos by reason with counts

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Promote devkit_hybrid_search from Beta to Stable
- Add docs/reference/stable-tools/{health,project_brief,hybrid_search,
  vault_search,session_recall}.md with frozen input/output schemas,
  example requests/responses, and error catalogs
- Add stable-tools/README.md with stability guarantee contract
- Update mcp-tools.md cross-links and tier markings
- Add docs/clients/claude/scenarios.md with 5 usage scenarios

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- code-symbols: 9 tests covering query_all, type filter, name filter,
  file_path filter, combined filters, limit, cross-repo isolation,
  optional field preservation
- call-graph: 8 tests covering all_edges, callee/caller/file filters,
  combined filters, limit, cross-repo isolation
- dead-code: 6 tests covering include_pub/exclude_pub, caller exclusion,
  tests.rs exclusion, limit, empty repo
- All 3 crates raised from 0% to functional coverage

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Scenario validation tests for Claude onboarding and semantic code
exploration revealed a production-grade silent-failure bug in
devkit_vault_search: VaultNote deserialization from partial JSON
failed and was masked by unwrap_or_default(), causing all queries
to return empty results.

- Add append_mcp_oplog() NDJSON tracing for tool call latency
  and error classification
- Add seed_scenario_data() + two scenario integration tests
- Fix vault_search to operate on serde_json::Value directly,
  eliminating the deserialization trap

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Extend devkit_oplog_query with an `analytics` flag that reads
mcp-oplog.ndjson and returns statistical reports:
- tool call frequency and success rate per tool
- latency percentiles (P50, P95, P99)
- error classification breakdown
- time range coverage

Also fixes clippy lint issues in sort_by closures.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rf baselines, Tantivy consistency

端到端 workflow 测试 (Task #16):
- Add MCP-tool-level DAG workflow integration tests:
  test_workflow_run_dag_success: 3-step Condition chain via
  DevkitWorkflowRunTool -> DevkitWorkflowStatusTool round-trip
  test_workflow_run_failure_propagation: verifies ErrorPolicy::Fail
  status propagation to execution record

性能回归基线校准 (Task #17):
- Fix schema bug in perf tests (missing `signature` column caused panic)
- Add profile-aware thresholds via cfg!(debug_assertions):
  release 1k<200ms/10k<500ms; debug 1k<800ms/10k<2000ms
- Add latency eprintln reporting for CI observability

RF-7 路径隐私修复 (Task #18):
- Add sanitize_path() helper: replaces home dir prefix with ~,
  normalizes \ to /
- Apply across project_context output: repo.path, modules.path,
  symbols.file, calls.caller_file, assets.path
- Add 5 unit tests for path desensitization logic

Tantivy 一致性强化 (Task #19):
- Fix AppContext to use actual storage backend's index_path for
  repair_tantivy_consistency_at and sync_index_to_db_at (was
  hardcoded to DefaultStorageBackend, breaking TempStorageBackend)
- Fix repair_tantivy_consistency_at early-return bug: now loads
  SQLite IDs first; on Tantivy read failure reports
  missing_from_index = sqlite_ids.len() instead of silently 0
- Add tests: fresh_workspace consistency, empty-index+DB-repos
  detection, AppContext correct index path verification

Formatting:
- Run cargo fmt across workspace to satisfy CI fmt --check

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The G5 RF-6 rule was flagging  in
as production code because the skip regex  did not match
 (no trailing slash).  in test modules is
idiomatic Rust and should not be flagged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The release workflow uses --locked which requires Cargo.lock
to be in sync with Cargo.toml version bumps.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@juice094 juice094 merged commit 77a79a8 into main May 18, 2026
8 of 9 checks passed
@juice094 juice094 deleted the docs/agents-trim branch May 18, 2026 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant