Skip to content

[BUG] Daemon mode runs semantic indexing before --index-commits check #473

@jsbattig

Description

@jsbattig

Bug Description

Daemon mode incorrectly runs semantic indexing (hashing 1244 files) when user specifies --index-commits flag, which should ONLY perform temporal indexing of git commits. This wastes 30-60 seconds on large repositories.

Environment

  • Version: feature/temporal-git-history branch (f93084c)
  • OS: Linux 5.14.0-570.55.1.el9_6.x86_64
  • Mode: Daemon mode (enabled via cidx config --daemon)
  • Repository: code-indexer project (1244 files)
  • Configuration: VoyageAI embedding provider, FilesystemVectorStore backend

Steps to Reproduce

  1. Enable daemon mode:

    cidx config --daemon
    cidx start
  2. Run temporal indexing with clear flag:

    cidx index --index-commits --all-branches --clear
  3. Observe output showing semantic indexing starting:

    🔧 Running in daemon mode
    ℹ️  🗑️ Cleared collection 'voyage-code-3' (collection was empty)
    ℹ️  🔍 Discovering files in repository...
    ℹ️  📁 Found 1244 files for indexing  # ← WRONG: Should skip semantic
    🔍 Hashing ━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━ 68% • 849/1244 files
    

Expected Behavior

When --index-commits flag is specified:

  • Should skip semantic indexing entirely
  • Should ONLY index git commits into temporal collection
  • Output should show:
    🕒 Starting temporal git history indexing...
    Indexing ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% • N/N commits
    

Actual Behavior

  1. Daemon starts semantic indexing first (hashing all 1244 files)
  2. This takes 30-60 seconds to complete
  3. THEN it presumably runs temporal indexing (user cancelled before completion)
  4. User waited ~18 seconds watching semantic hash 849/1244 files before cancelling

Error Messages / Logs

User hit Ctrl+C during semantic hashing phase:

KeyboardInterrupt
🔍 Hashing ━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━ 68% • 0:00:18 • 0:00:07 • 849/1244 files

No other errors (bug is logic flow issue, not crash).

Impact Assessment

Severity: High
Affected Users: All users running cidx index --index-commits in daemon mode
Frequency: Every temporal indexing operation in daemon mode
Workaround:

  • Option 1: Use standalone mode (cidx config --no-daemon)
  • Option 2: Let semantic indexing complete, then temporal runs (wastes time but works)
  • Option 3: Don't use --clear flag (but then old data persists)

Performance Impact:

  • Small repos (100 files): +5-10 seconds wasted
  • Medium repos (1000 files): +30-60 seconds wasted
  • Large repos (10K+ files): +5-10 minutes wasted

Root Cause Analysis

File: src/code_indexer/daemon/service.py
Method: exposed_index_blocking() (lines 370-556)

Problem: Temporal indexing check happens AFTER SmartIndexer initialization

Code Flow:

def exposed_index_blocking(self, project_path: str, callback=None, **kwargs):
    # Line 400-418: Creates SmartIndexer for SEMANTIC indexing
    indexer = SmartIndexer(config, embedding_provider, vector_store_client, metadata_path)

    # Line 463: Checks for temporal (TOO LATE - semantic already initialized)
    if kwargs.get("index_commits", False):
        # Temporal indexing mode
        temporal_indexer = TemporalIndexer(...)
        result = temporal_indexer.index_commits(...)
    else:
        # Standard workspace indexing (SEMANTIC)
        stats = indexer.smart_index(...)  # ← Runs semantic

Why It Happens:

  1. Method initializes SmartIndexer unconditionally (line 400)
  2. SmartIndexer initialization triggers file discovery (1244 files found)
  3. Then it checks if index_commits (line 463)
  4. By then, semantic infrastructure already built and file hashing started

Confirmed Root Cause: Logic ordering bug - temporal check must be FIRST, before any semantic initialization.

Fix Implementation

Changes Required

  • Move temporal check to line 391 (immediately after logger, before any initialization)
  • Create early return path for temporal-only indexing
  • Only initialize SmartIndexer if NOT temporal mode
  • Add test case for daemon temporal indexing (no semantic)

Testing Required

  • Unit test: test_daemon_temporal_skips_semantic()
  • Integration test: E2E daemon temporal indexing validates no semantic
  • Regression test: Ensure semantic still works when NOT using --index-commits
  • Manual validation: Run cidx index --index-commits in daemon, confirm NO hashing phase

Implementation Status

  • Root cause identified
  • Core fix implemented
  • Tests added and passing
  • Code review approved
  • Manual testing completed
  • Documentation updated

Completion: 1/6 tasks complete (17%)

Proposed Fix

def exposed_index_blocking(self, project_path: str, callback=None, **kwargs):
    logger.info(f"exposed_index_blocking: project={project_path}")

    # CHECK TEMPORAL FIRST (before any initialization)
    if kwargs.get("index_commits", False):
        # TEMPORAL-ONLY PATH - skip all semantic infrastructure
        from code_indexer.services.temporal.temporal_indexer import TemporalIndexer
        from code_indexer.storage.filesystem_vector_store import FilesystemVectorStore
        from code_indexer.config import ConfigManager

        config_manager = ConfigManager.create_with_backtrack(Path(project_path))
        index_dir = Path(project_path) / ".code-indexer" / "index"
        vector_store = FilesystemVectorStore(base_path=index_dir, project_root=Path(project_path))

        temporal_indexer = TemporalIndexer(config_manager, vector_store)
        result = temporal_indexer.index_commits(
            all_branches=kwargs.get("all_branches", False),
            max_commits=kwargs.get("max_commits"),
            since_date=kwargs.get("since_date"),
            progress_callback=callback,
        )
        temporal_indexer.close()

        # Invalidate temporal cache
        with self.cache_lock:
            if self.cache_entry:
                self.cache_entry.invalidate_temporal()

        return {"status": "completed", "stats": {...}}

    # SEMANTIC PATH (only if NOT temporal)
    # ... existing SmartIndexer code from line 400 onwards ...

Verification Evidence

[To be added after fix implementation]


Reported By: User (jsbattig)
Assigned To: [Unassigned]
Found In: f93084c (feature/temporal-git-history branch)
Fixed In: [To be updated]

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions