fix(tests): eliminate flaky tests via hermetic StorageBackend#4
Merged
Conversation
Root cause: DEVBASE_DATA_DIR environment variable is process-global; parallel tests overwrite each other's value causing PermissionDenied and assertion failures. Production-grade fix: - Extract TempStorageBackend as pub(crate) test helper (isolated temp dirs) - Add explicit-path APIs to search module: init_index_at, list_indexed_repo_ids_at, sync_index_to_db_at, index_is_empty_at, search_repos_at - Add repair_tantivy_consistency_at to storage module - Add nl_filter_repos_at to repo tool - Refactor vault tools to pre-compute vault_dir from ctx.storage before spawn_blocking - Migrate all 6 flaky tests to TempStorageBackend + AppContext::with_storage() without any environment variable mutations Flaky tests fixed: - test_nl_filter_repos_tantivy_finds_devbase - test_vault_graph_basic - test_vault_graph_filtered_by_repo - test_workflow_run_not_found - test_sync_index_to_db_removes_orphans - test_repair_tantivy_consistency_detects_orphan Parallel test execution (--test-threads=4) now stable: 390 passed, 0 failed.
cc7d375 to
1f2853e
Compare
- scan.rs: canonicalize root and ignored_dirs to fix path comparison on Windows where TempDir returns short (8.3) filenames while dunce::canonicalize expands them, causing starts_with to fail. - sync/tests.rs: replace repo.signature() with explicit git2::Signature::now() to avoid failure on CI runners without global git config (user.name / user.email). - mcp/tests.rs: migrate test_ctx() from DEVBASE_DATA_DIR env var to TempStorageBackend injection, eliminating global state pollution that caused flaky test_tools_call_devkit_query under --test-threads=4.
Replace all 7 occurrences of std::env::set_var with TempStorageBackend injection or direct tmp path usage across vault.rs, workflow.rs, executor.rs, state.rs. Enforces RF-2.1.
- Replace LICENSE with AGPL-3.0 full text - Add LICENSE-COMMERCIAL.md for commercial licensing inquiries - Update Cargo.toml: license = AGPL-3.0-or-later - Update README.md badge + dual license notice in Chinese - Update AGENTS.md with licensing policy section
cargo test --all-targets passes --test-threads to bench binaries, which criterion does not accept. Use --lib --tests --bins --examples instead.
test_run_skill_success intermittently times out on Windows CI runners under load. 5s is too tight for process spawn + Python interpreter startup on shared runners.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
CI Test job fails intermittently on Windows due to 6 flaky tests caused by DEVBASE_DATA_DIR environment-variable races under parallel execution (--test-threads=4).
Root Cause
DEVBASE_DATA_DIR is process-global. Parallel tests overwrite each other's value, causing:
Fix
Production-grade injection architecture:
epair_tantivy_consistency_at to storage
l_filter_repos_at to repo tool
Verification
`
cargo test --workspace -- --test-threads=4
result: ok. 390 passed; 0 failed; 3 ignored
`
Scope