-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Bug Description
Daemon mode incorrectly runs semantic indexing (hashing 1244 files) when user specifies --index-commits flag, which should ONLY perform temporal indexing of git commits. This wastes 30-60 seconds on large repositories.
Environment
- Version: feature/temporal-git-history branch (f93084c)
- OS: Linux 5.14.0-570.55.1.el9_6.x86_64
- Mode: Daemon mode (enabled via
cidx config --daemon) - Repository: code-indexer project (1244 files)
- Configuration: VoyageAI embedding provider, FilesystemVectorStore backend
Steps to Reproduce
-
Enable daemon mode:
cidx config --daemon cidx start
-
Run temporal indexing with clear flag:
cidx index --index-commits --all-branches --clear
-
Observe output showing semantic indexing starting:
🔧 Running in daemon mode ℹ️ 🗑️ Cleared collection 'voyage-code-3' (collection was empty) ℹ️ 🔍 Discovering files in repository... ℹ️ 📁 Found 1244 files for indexing # ← WRONG: Should skip semantic 🔍 Hashing ━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━ 68% • 849/1244 files
Expected Behavior
When --index-commits flag is specified:
- Should skip semantic indexing entirely
- Should ONLY index git commits into temporal collection
- Output should show:
🕒 Starting temporal git history indexing... Indexing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • N/N commits
Actual Behavior
- Daemon starts semantic indexing first (hashing all 1244 files)
- This takes 30-60 seconds to complete
- THEN it presumably runs temporal indexing (user cancelled before completion)
- User waited ~18 seconds watching semantic hash 849/1244 files before cancelling
Error Messages / Logs
User hit Ctrl+C during semantic hashing phase:
KeyboardInterrupt
🔍 Hashing ━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━ 68% • 0:00:18 • 0:00:07 • 849/1244 files
No other errors (bug is logic flow issue, not crash).
Impact Assessment
Severity: High
Affected Users: All users running cidx index --index-commits in daemon mode
Frequency: Every temporal indexing operation in daemon mode
Workaround:
- Option 1: Use standalone mode (
cidx config --no-daemon) - Option 2: Let semantic indexing complete, then temporal runs (wastes time but works)
- Option 3: Don't use
--clearflag (but then old data persists)
Performance Impact:
- Small repos (100 files): +5-10 seconds wasted
- Medium repos (1000 files): +30-60 seconds wasted
- Large repos (10K+ files): +5-10 minutes wasted
Root Cause Analysis
File: src/code_indexer/daemon/service.py
Method: exposed_index_blocking() (lines 370-556)
Problem: Temporal indexing check happens AFTER SmartIndexer initialization
Code Flow:
def exposed_index_blocking(self, project_path: str, callback=None, **kwargs):
# Line 400-418: Creates SmartIndexer for SEMANTIC indexing
indexer = SmartIndexer(config, embedding_provider, vector_store_client, metadata_path)
# Line 463: Checks for temporal (TOO LATE - semantic already initialized)
if kwargs.get("index_commits", False):
# Temporal indexing mode
temporal_indexer = TemporalIndexer(...)
result = temporal_indexer.index_commits(...)
else:
# Standard workspace indexing (SEMANTIC)
stats = indexer.smart_index(...) # ← Runs semanticWhy It Happens:
- Method initializes SmartIndexer unconditionally (line 400)
- SmartIndexer initialization triggers file discovery (1244 files found)
- Then it checks
if index_commits(line 463) - By then, semantic infrastructure already built and file hashing started
Confirmed Root Cause: Logic ordering bug - temporal check must be FIRST, before any semantic initialization.
Fix Implementation
Changes Required
- Move temporal check to line 391 (immediately after logger, before any initialization)
- Create early return path for temporal-only indexing
- Only initialize SmartIndexer if NOT temporal mode
- Add test case for daemon temporal indexing (no semantic)
Testing Required
- Unit test:
test_daemon_temporal_skips_semantic() - Integration test: E2E daemon temporal indexing validates no semantic
- Regression test: Ensure semantic still works when NOT using --index-commits
- Manual validation: Run
cidx index --index-commitsin daemon, confirm NO hashing phase
Implementation Status
- Root cause identified
- Core fix implemented
- Tests added and passing
- Code review approved
- Manual testing completed
- Documentation updated
Completion: 1/6 tasks complete (17%)
Proposed Fix
def exposed_index_blocking(self, project_path: str, callback=None, **kwargs):
logger.info(f"exposed_index_blocking: project={project_path}")
# CHECK TEMPORAL FIRST (before any initialization)
if kwargs.get("index_commits", False):
# TEMPORAL-ONLY PATH - skip all semantic infrastructure
from code_indexer.services.temporal.temporal_indexer import TemporalIndexer
from code_indexer.storage.filesystem_vector_store import FilesystemVectorStore
from code_indexer.config import ConfigManager
config_manager = ConfigManager.create_with_backtrack(Path(project_path))
index_dir = Path(project_path) / ".code-indexer" / "index"
vector_store = FilesystemVectorStore(base_path=index_dir, project_root=Path(project_path))
temporal_indexer = TemporalIndexer(config_manager, vector_store)
result = temporal_indexer.index_commits(
all_branches=kwargs.get("all_branches", False),
max_commits=kwargs.get("max_commits"),
since_date=kwargs.get("since_date"),
progress_callback=callback,
)
temporal_indexer.close()
# Invalidate temporal cache
with self.cache_lock:
if self.cache_entry:
self.cache_entry.invalidate_temporal()
return {"status": "completed", "stats": {...}}
# SEMANTIC PATH (only if NOT temporal)
# ... existing SmartIndexer code from line 400 onwards ...Verification Evidence
[To be added after fix implementation]
Reported By: User (jsbattig)
Assigned To: [Unassigned]
Found In: f93084c (feature/temporal-git-history branch)
Fixed In: [To be updated]