Skip to content

refactor: split hybridSearchData into keyword, vector, and merge steps#768

Merged
carlos-alm merged 3 commits intomainfrom
refactor/decompose-hybrid-search
Apr 3, 2026
Merged

refactor: split hybridSearchData into keyword, vector, and merge steps#768
carlos-alm merged 3 commits intomainfrom
refactor/decompose-hybrid-search

Conversation

@carlos-alm
Copy link
Copy Markdown
Contributor

Summary

  • Split hybridSearchData (cognitive=38, MI=24.6, bugs=1.43) into keyword search, vector search, and result merge steps

Titan Audit Context

  • Phase: decomposition
  • Domain: search
  • Commits: 1
  • Depends on: none

Changes

  • src/domain/search/search/hybrid.ts - Extract keyword/vector/merge sub-functions from hybridSearchData

Test plan

  • CI passes (lint + build + tests)
  • Search tests still pass

@claude
Copy link
Copy Markdown

claude bot commented Apr 3, 2026

Claude finished @carlos-alm's task —— View job


PR Review: Hybrid Search Refactoring

Todo List

  • Phase 1: Root cause analysis
  • Phase 2: Configuration change detection
  • Phase 3: Test integrity check
  • Phase 4: Code quality review
  • Phase 5: Backlog compliance assessment
  • Final rating and recommendation

Root Cause Analysis

Problem Being Solved: The hybridSearchData function had high cognitive complexity (38), maintainability index (24.6), and bug prediction (1.43), making it difficult to understand and maintain. This refactoring decomposes the large function into smaller, focused sub-functions.

Why This Approach: Breaking down complex functions into smaller, single-responsibility functions is a proven refactoring technique that improves code maintainability, testability, and readability. The extracted functions (parseQueries, collectRankedLists, fuseResults) each handle a distinct step in the hybrid search process.

Risk Assessment: ✅ LOW RISK - This is a pure refactoring with no behavioral changes, proper interface preservation, and comprehensive test coverage already exists.

Backlog Compliance

  • Zero-dep: ✅ — No new runtime dependencies added
  • Foundation-aligned: ✅ — Improves code maintainability without violating any principles
  • Problem-fit: 3 — Improves code quality which indirectly helps AI agents navigate the codebase, but doesn't directly address the core "lost AI" problem
  • Breaking: No — External interface preserved (hybridSearchData signature unchanged)
  • Tier: 1

Configuration & Test Review

No configuration changes detected - Only the implementation file was modified
Test integrity maintained - Existing comprehensive test coverage in tests/search/embedder-search.test.ts covers all scenarios:

  • Basic hybrid search functionality (lines 394-447)
  • RRF fusion logic
  • Multi-query semicolon parsing
  • FTS5 fallback behavior
  • Result shape validation

Code Quality Assessment

Strengths ✅

  1. Clear function decomposition: Each extracted function has a single, well-defined responsibility
  2. Excellent documentation: JSDoc comments clearly explain each function's purpose
  3. Type safety maintained: Proper TypeScript interfaces for all data structures
  4. Interface preservation: External API remains unchanged, ensuring backward compatibility
  5. Logic consistency: The refactored code maintains identical behavior to the original

Implementation Quality ✅

  • parseQueries(query: string): Simple, focused query parsing with proper string handling
  • collectRankedLists(...): Clean separation of BM25 and semantic search collection logic
  • fuseResults(...): Well-structured RRF implementation with proper scoring logic
  • Main function: Now focused purely on orchestration and configuration

Potential Concerns (Minor)

  1. Interface duplication: RankedItem and FusionEntry have overlapping fields, but this is acceptable for type clarity
  2. Type assertions: Lines 134 and 139 use type assertions for optional properties, but these are safe in context

Code Patterns ✅

  • Consistent with existing codebase patterns
  • Proper error handling maintained
  • No security concerns introduced
  • Performance characteristics preserved

Technical Deep Dive

The refactoring successfully decomposes the original 160+ line function into:

  • parseQueries: 6 lines, handles query string parsing
  • collectRankedLists: 34 lines, manages BM25/semantic search coordination
  • fuseResults: 54 lines, implements RRF merge algorithm
  • hybridSearchData: 27 lines, focused on configuration and orchestration

This decomposition dramatically improves the cognitive load while preserving all existing functionality and test coverage.

Final Recommendation

  • Rating: ⭐⭐⭐⭐ (4/5)
  • Action: APPROVE
  • Reasoning: Excellent refactoring that significantly improves code maintainability without changing behavior. Clean function decomposition, comprehensive test coverage, and proper interface preservation. This is exactly the type of technical debt reduction that improves long-term codebase health.

The only reason this isn't 5 stars is that it's "just" a refactoring - while excellently executed, it doesn't add new functionality or directly address user needs.


@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 3, 2026

Greptile Summary

This PR refactors hybridSearchData in src/domain/search/search/hybrid.ts by extracting three helper functions (parseQueries, collectRankedLists, fuseResults) and promoting two inner interfaces (RankedItem, FusionEntry) to module scope. The public API is unchanged and behavior is preserved.

Confidence Score: 5/5

Safe to merge — pure structural refactor with no behavioral changes to the public API.

All three extracted helpers faithfully reproduce the original logic. Interfaces promoted to module scope are unchanged. The removal of the dead typeof query === 'string' branch is a minor improvement, not a regression. No P0 or P1 findings.

No files require special attention.

Important Files Changed

Filename Overview
src/domain/search/search/hybrid.ts Clean structural decomposition of hybridSearchData into parseQueries, collectRankedLists, and fuseResults; interfaces promoted to module scope; no behavioral changes introduced.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant hybridSearchData
    participant parseQueries
    participant collectRankedLists
    participant ftsSearchData
    participant searchData
    participant fuseResults

    Caller->>hybridSearchData: query, customDbPath, opts
    hybridSearchData->>hybridSearchData: loadConfig(), derive limit/k/topK
    hybridSearchData->>hybridSearchData: openReadonlyOrFail → hasFtsIndex → close
    alt FTS not available
        hybridSearchData-->>Caller: null
    end
    hybridSearchData->>parseQueries: query string
    parseQueries-->>hybridSearchData: string[]
    hybridSearchData->>collectRankedLists: queries[], customDbPath, opts, topK
    loop for each query q
        collectRankedLists->>ftsSearchData: q, customDbPath, opts+limit
        ftsSearchData-->>collectRankedLists: BM25 results
        collectRankedLists->>searchData: q, customDbPath, opts+limit+minScore
        searchData-->>collectRankedLists: semantic results
    end
    collectRankedLists-->>hybridSearchData: RankedItem[][]
    hybridSearchData->>fuseResults: rankedLists, k, limit
    fuseResults-->>hybridSearchData: HybridResult[] (RRF-scored)
    hybridSearchData-->>Caller: { results }
Loading

Reviews (1): Last reviewed commit: "refactor: split hybridSearchData into ke..." | Re-trigger Greptile

@carlos-alm carlos-alm merged commit bd9f508 into main Apr 3, 2026
5 of 12 checks passed
@carlos-alm carlos-alm deleted the refactor/decompose-hybrid-search branch April 3, 2026 04:46
@github-actions github-actions bot locked and limited conversation to collaborators Apr 3, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant