Skip to content

[EPIC] Daemon Temporal Watch Integration #19

@jsbattig

Description

@jsbattig

Epic: Daemon Temporal Watch Integration

Epic Completion Status: ⏳ PENDING

Story Completion: 0/3 stories complete

COMPLETED STORIES:

  • None

IN PROGRESS:

  • None

PENDING:

Overall Progress: 0% complete


Main Intent

Enable full daemon mode support for temporal git history queries and integrate temporal indexing into watch mode with automatic git commit detection, providing users with the same fast, cached query experience for temporal searches as they have for HEAD collection queries.

This epic extends CIDX's daemon capabilities to handle temporal (git history) queries with identical caching infrastructure and integrates temporal indexing into watch mode with automatic commit detection via git refs inotify monitoring.

Conversation Context

User Requirements:

  1. Temporal Query Daemon Support (MANDATORY):

    • User: "for indexing, I want the same reporting experience on the CLI as it happens with regular indexing or standalone operation"
    • User: "make a plan to enable querying for temporal indexes in daemon mode"
    • User decision: "1. Mandatory"
    • Current blocker: cli.py:4708-4710 prevents daemon delegation when time_range is set
  2. Watch Mode Auto-Detection:

    • User: "review watch behavior, we need to add a parameter to auto update FTS and Temporal indexes, incrementally"
    • User decision: "3. C, auto-detect based on all indexes we have. keep everything updated. easier to the user"
    • Requirement: cidx watch automatically detects ALL existing indexes (semantic, FTS, temporal)
  3. Git Commit Detection via Inotify:

    • User: "can you detect a commit by a change in some file in .git folder with the same inode technique we have now?"
    • Evidence: .git/refs/heads/<branch> changes inode on every commit
    • User decision: "5. No hooks. Explore inode git file change detection and if that won't work, use polling"
  4. Temporal Watch - Current Branch Only:

    • User decision: "4. current branch. You need to ensure that temporal index can 'catch up' if the user changes branch"
    • Requirement: Incremental indexing when branch switches
  5. Identical HNSW mmap Caching:

    • User: "daemon uses mmap for semantic hnsw indexes. I want the same exact approach for temporal indexes"
    • Requirement: Temporal collection uses IDENTICAL HNSWIndexManager.load_index()
  6. JSON-Based Metadata:

    • Context: User previously removed SQLite completely
    • Current: temporal_progress.json tracks completed commits
    • Requirement: NO database queries, all JSON-based

System Architecture

Current Daemon Architecture (HEAD Collection)

CLI Query Request
    ↓
cli.py checks daemon_config.enabled
    ↓
Daemon RPC Call: exposed_query(query, filters...)
    ↓
CIDXDaemonService.exposed_query()
    ↓
Cache Check: cache_entry.hnsw_index exists?
    ├─ YES: Use cached mmap HNSW index (5ms query)
    └─ NO: Load from disk via HNSWIndexManager.load_index()
              ↓
         Cache in CacheEntry (hnsw_index, id_mapping)
              ↓
         Query cached index

New Temporal Architecture (This Epic)

CLI Temporal Query Request (--time-range)
    ↓
cli.py checks daemon_config.enabled AND time_range
    ↓
[STORY 1.3] Remove blocking at cli.py:4710
    ↓
Daemon RPC Call: exposed_query_temporal(query, time_range, filters...)
    ↓
[STORY 1.2] CIDXDaemonService.exposed_query_temporal()
    ↓
[STORY 1.1] Cache Check: cache_entry.temporal_hnsw_index exists?
    ├─ YES: Use cached mmap HNSW index for temporal collection
    └─ NO: Load from disk via HNSWIndexManager.load_index()
              ↓
         Cache in CacheEntry (temporal_hnsw_index, temporal_id_mapping)
              ↓
         Query cached temporal index with time-range filtering

Watch Mode Architecture (Enhanced)

cidx watch (no flags)
    ↓
[STORY 2.1] Auto-detect existing indexes:
    - .code-indexer/index/code-indexer-HEAD/ → Semantic watch
    - .code-indexer/index/tantivy-fts/ → FTS watch
    - .code-indexer/index/code-indexer-temporal/ → Temporal watch
    ↓
Start multi-index watch handlers:
    ├─ SemanticWatchHandler (existing)
    ├─ FTSWatchHandler (existing, Story 02_Story_RealTimeFTSMaintenance.md)
    └─ [STORY 2.2, 2.3] TemporalWatchHandler (NEW)
           ↓
        Watch .git/refs/heads/<current_branch> via inotify
           ↓
        On inode change (commit detected):
           ↓
        [STORY 2.3] Run incremental temporal indexing:
           - Load temporal_progress.json
           - Get new commits since last indexed
           - Index only new commits
           - Update temporal_progress.json

Branch Switch Detection (Story 3.1)

Watch Mode Active
    ↓
Detect branch switch:
    - .git/HEAD file change (ref: refs/heads/new-branch)
    ↓
[STORY 3.1] Load temporal_progress.json
    ↓
Get all commits in new branch: git rev-list new-branch
    ↓
Build in-memory set of completed commits (O(1) lookup)
    ↓
Filter out indexed commits → unindexed commits list
    ↓
[STORY 3.2] Index unindexed commits incrementally
    ↓
Update temporal_progress.json

Technology Stack

Existing Components (Reuse):

  • HNSWIndexManager.load_index() - mmap HNSW loading (identical for temporal)
  • CacheEntry class - Cache structure (extend with temporal fields)
  • BackgroundIndexRebuilder - Atomic HNSW updates (Story 0 pattern)
  • RichLiveProgressManager - Progress display for indexing
  • FilesystemVectorStore - Vector storage backend
  • TemporalIndexer - Temporal indexing logic
  • TemporalSearchService - Temporal search with time-range filtering
  • temporal_progress.json - JSON metadata tracking

New Components (This Epic):

  • exposed_query_temporal() RPC method in CIDXDaemonService
  • TemporalWatchHandler - Git refs inotify monitoring + incremental indexing
  • Cache fields: temporal_hnsw_index, temporal_id_mapping in CacheEntry
  • Branch switch detection in watch mode

Features and Implementation Order

Feature 1: Temporal Query Daemon Support

Objective: Enable temporal queries in daemon mode with identical mmap caching to HEAD collection

Stories:

  1. Enable Temporal Queries in Daemon Mode with mmap Cache - Complete vertical slice: extend CacheEntry with temporal HNSW cache using identical mmap mechanism, implement exposed_query_temporal() RPC method with time-range filtering, and wire CLI delegation to enable daemon-based temporal queries with sub-5ms cached performance

Value Delivered: Users get sub-5ms temporal queries via daemon cache (same experience as HEAD queries)

Feature 2: Watch Mode Auto-Detection and Git Monitoring

Objective: Automatically detect and watch all existing indexes, including git commit detection via inotify

Stories:

  1. Watch Mode Auto-Updates All Indexes Including Temporal with Git Commit Detection - Complete vertical slice: auto-detect all existing indexes (semantic, FTS, temporal), implement git commit detection via .git/refs/heads/<branch> inotify monitoring with polling fallback, and trigger incremental temporal indexing with progress reporting when commits are detected

Value Delivered: Zero-configuration watch mode keeps all indexes current automatically

Feature 3: Branch Switch Temporal Catch-Up

Objective: Efficiently detect and index unindexed commits when user switches branches

Stories:

  1. Efficient Unindexed Commit Detection - Use in-memory set from temporal_progress.json for O(1) commit existence checks and efficient incremental indexing

Value Delivered: Temporal index stays current across branch switches without re-indexing completed commits

Component Connections

CacheEntry Extension (Story 1.1)

class CacheEntry:
    # Existing HEAD collection cache
    hnsw_index: Optional[Any] = None
    id_mapping: Optional[Dict[str, Any]] = None

    # NEW: Temporal collection cache (IDENTICAL pattern)
    temporal_hnsw_index: Optional[Any] = None
    temporal_id_mapping: Optional[Dict[str, Any]] = None
    temporal_index_version: Optional[str] = None

Daemon Service Extension (Story 1.2)

class CIDXDaemonService:
    def exposed_query_temporal(
        self,
        query_text: str,
        time_range: str,  # "2024-01-01..2024-12-31" or "last-30-days"
        limit: int = 10,
        languages: Optional[List[str]] = None,
        # ... other filters
    ) -> Dict[str, Any]:
        """Temporal query via daemon with mmap cache."""

Watch Handler Extension (Story 2.2, 2.3)

class TemporalWatchHandler(FileSystemEventHandler):
    def __init__(self, project_root: Path):
        self.git_refs_file = project_root / ".git/refs/heads" / self._get_current_branch()
        self.temporal_indexer = TemporalIndexer(...)
        self.progressive_metadata = TemporalProgressiveMetadata(...)

    def on_modified(self, event):
        if event.src_path == str(self.git_refs_file):
            # Commit detected - run incremental indexing
            self._index_new_commits()

Integration with Existing Systems

Daemon Service (src/code_indexer/daemon/service.py):

  • Line 261-309: Existing temporal indexing support (daemon can INDEX temporal)
  • NEW: exposed_query_temporal() method for querying
  • NEW: Temporal cache management in cache_entry

CLI (src/code_indexer/cli.py):

  • Line 4710: Current blocking for temporal + daemon (REMOVE in Story 1.3)
  • NEW: Delegate temporal queries to daemon when enabled

Cache (src/code_indexer/daemon/cache.py):

  • Extend CacheEntry with temporal_hnsw_index, temporal_id_mapping
  • Reuse identical mmap loading via HNSWIndexManager.load_index()

Watch Mode:

  • Extend existing GitAwareWatchHandler pattern
  • Follow FTS watch implementation (02_Story_RealTimeFTSMaintenance.md)

Testing Strategy

Unit Tests:

  • CacheEntry temporal field management
  • exposed_query_temporal() RPC method
  • TemporalWatchHandler commit detection
  • Branch switch detection logic
  • In-memory set commit filtering (O(1) lookup verification)

Integration Tests:

  • End-to-end temporal query via daemon
  • Cache hit/miss scenarios for temporal collection
  • Watch mode auto-detection of all indexes
  • Git commit detection via inotify
  • Incremental temporal indexing on new commits
  • Branch switch catch-up workflow

E2E Manual Tests (per story):

  • Query temporal via daemon, verify cache performance
  • Start watch mode, make commits, verify temporal index updates
  • Switch branches, verify catch-up indexing
  • Compare daemon vs standalone temporal query results

Performance Tests:

  • Temporal query latency: <5ms (cached), <1s (uncached)
  • Commit detection latency: <100ms after commit
  • Branch switch catch-up: Only unindexed commits processed
  • Memory usage: Temporal cache similar to HEAD cache

Definition of Done

Per Story:

  • All acceptance criteria satisfied
  • Unit tests pass (>85% coverage for new code)
  • Integration tests pass
  • E2E manual testing completed by Claude Code
  • Code review approval from code-reviewer agent
  • No regressions in fast-automation.sh

Per Feature:

  • All stories completed
  • Feature integration tests pass
  • Documentation updated (README, --help text)

Epic Complete:

  • All 3 stories delivered
  • Temporal queries work identically in daemon and standalone modes
  • Watch mode auto-detects and maintains all indexes
  • Git commits detected via inotify without hooks
  • Branch switches trigger efficient catch-up indexing
  • Performance targets met (daemon cache <5ms, incremental indexing <1s)
  • Zero breaking changes to existing daemon/watch behavior

Risks and Mitigations

Risk Impact Mitigation
Inotify doesn't work on all filesystems High Fallback to polling (5s interval) if inotify fails
mmap file descriptor leaks with temporal cache High Proper cleanup in CacheEntry.invalidate()
Branch switch with thousands of commits Medium Efficient filtering via in-memory set, progress reporting
Concurrent watch updates during daemon queries Medium Reuse existing thread-safe cache_lock pattern
Git refs file missing on detached HEAD Low Detect detached HEAD, disable temporal watch with warning

Success Criteria

  1. Daemon Temporal Queries: cidx query "auth" --time-range "last-7-days" uses daemon cache (5ms response)
  2. Zero Configuration Watch: cidx watch detects and updates all indexes (semantic, FTS, temporal)
  3. Commit Detection: Git commit triggers temporal indexing within 100ms (no hooks)
  4. Branch Switch Catch-Up: Only unindexed commits processed (verified via logs)
  5. Cache Parity: Temporal cache behavior identical to HEAD cache (verified via tests)
  6. No Breaking Changes: All existing tests pass, no regressions

References

Conversation Evidence:

  • Temporal query daemon requirement: "make a plan to enable querying for temporal indexes in daemon mode" → "1. Mandatory"
  • Auto-detection decision: "auto-detect based on all indexes we have" → "3. C"
  • Inotify decision: "can you detect a commit by a change in some file in .git folder" → "5. No hooks"
  • Current branch only: "4. current branch. You need to ensure that temporal index can 'catch up'"
  • Identical caching: "daemon uses mmap for semantic hnsw indexes. I want the same exact approach for temporal"

Code References:

  • Daemon blocking: src/code_indexer/cli.py:4710 (prevent temporal + daemon)
  • Temporal indexing in daemon: src/code_indexer/daemon/service.py:261-309
  • Cache structure: src/code_indexer/daemon/cache.py:18-122
  • FTS watch pattern: plans/Completed/full-text-search/01_Feat_FTSIndexInfrastructure/02_Story_RealTimeFTSMaintenance.md
  • Temporal indexer: src/code_indexer/services/temporal/temporal_indexer.py
  • Temporal search: src/code_indexer/services/temporal/temporal_search_service.py

Standards:

  • Epic structure: ~/.claude/standards/epic-writing-standards.md
  • Testing quality: ~/.claude/standards/testing-quality-standards.md
  • Agent delegation: ~/.claude/standards/agent-delegation-mandate.md

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions