refactor: split hybridSearchData into keyword, vector, and merge steps by carlos-alm · Pull Request #768 · optave/ops-codegraph-tool

carlos-alm · 2026-04-03T00:21:51Z

Summary

Split hybridSearchData (cognitive=38, MI=24.6, bugs=1.43) into keyword search, vector search, and result merge steps

Titan Audit Context

Phase: decomposition
Domain: search
Commits: 1
Depends on: none

Changes

src/domain/search/search/hybrid.ts - Extract keyword/vector/merge sub-functions from hybridSearchData

Test plan

CI passes (lint + build + tests)
Search tests still pass

claude · 2026-04-03T00:22:07Z

Claude finished @carlos-alm's task —— View job

PR Review: Hybrid Search Refactoring

Todo List

Phase 1: Root cause analysis
Phase 2: Configuration change detection
Phase 3: Test integrity check
Phase 4: Code quality review
Phase 5: Backlog compliance assessment
Final rating and recommendation

Root Cause Analysis

Problem Being Solved: The hybridSearchData function had high cognitive complexity (38), maintainability index (24.6), and bug prediction (1.43), making it difficult to understand and maintain. This refactoring decomposes the large function into smaller, focused sub-functions.

Why This Approach: Breaking down complex functions into smaller, single-responsibility functions is a proven refactoring technique that improves code maintainability, testability, and readability. The extracted functions (parseQueries, collectRankedLists, fuseResults) each handle a distinct step in the hybrid search process.

Risk Assessment: ✅ LOW RISK - This is a pure refactoring with no behavioral changes, proper interface preservation, and comprehensive test coverage already exists.

Backlog Compliance

Zero-dep: ✅ — No new runtime dependencies added
Foundation-aligned: ✅ — Improves code maintainability without violating any principles
Problem-fit: 3 — Improves code quality which indirectly helps AI agents navigate the codebase, but doesn't directly address the core "lost AI" problem
Breaking: No — External interface preserved (hybridSearchData signature unchanged)
Tier: 1

Configuration & Test Review

✅ No configuration changes detected - Only the implementation file was modified
✅ Test integrity maintained - Existing comprehensive test coverage in tests/search/embedder-search.test.ts covers all scenarios:

Basic hybrid search functionality (lines 394-447)
RRF fusion logic
Multi-query semicolon parsing
FTS5 fallback behavior
Result shape validation

Code Quality Assessment

Strengths ✅

Clear function decomposition: Each extracted function has a single, well-defined responsibility
Excellent documentation: JSDoc comments clearly explain each function's purpose
Type safety maintained: Proper TypeScript interfaces for all data structures
Interface preservation: External API remains unchanged, ensuring backward compatibility
Logic consistency: The refactored code maintains identical behavior to the original

Implementation Quality ✅

parseQueries(query: string): Simple, focused query parsing with proper string handling
collectRankedLists(...): Clean separation of BM25 and semantic search collection logic
fuseResults(...): Well-structured RRF implementation with proper scoring logic
Main function: Now focused purely on orchestration and configuration

Potential Concerns (Minor)

Interface duplication: RankedItem and FusionEntry have overlapping fields, but this is acceptable for type clarity
Type assertions: Lines 134 and 139 use type assertions for optional properties, but these are safe in context

Code Patterns ✅

Consistent with existing codebase patterns
Proper error handling maintained
No security concerns introduced
Performance characteristics preserved

Technical Deep Dive

The refactoring successfully decomposes the original 160+ line function into:

parseQueries: 6 lines, handles query string parsing
collectRankedLists: 34 lines, manages BM25/semantic search coordination
fuseResults: 54 lines, implements RRF merge algorithm
hybridSearchData: 27 lines, focused on configuration and orchestration

This decomposition dramatically improves the cognitive load while preserving all existing functionality and test coverage.

Final Recommendation

Rating: ⭐⭐⭐⭐ (4/5)
Action: APPROVE
Reasoning: Excellent refactoring that significantly improves code maintainability without changing behavior. Clean function decomposition, comprehensive test coverage, and proper interface preservation. This is exactly the type of technical debt reduction that improves long-term codebase health.

The only reason this isn't 5 stars is that it's "just" a refactoring - while excellently executed, it doesn't add new functionality or directly address user needs.

greptile-apps · 2026-04-03T00:23:21Z

Greptile Summary

This PR refactors hybridSearchData in src/domain/search/search/hybrid.ts by extracting three helper functions (parseQueries, collectRankedLists, fuseResults) and promoting two inner interfaces (RankedItem, FusionEntry) to module scope. The public API is unchanged and behavior is preserved.

Confidence Score: 5/5

Safe to merge — pure structural refactor with no behavioral changes to the public API.

All three extracted helpers faithfully reproduce the original logic. Interfaces promoted to module scope are unchanged. The removal of the dead typeof query === 'string' branch is a minor improvement, not a regression. No P0 or P1 findings.

No files require special attention.

Important Files Changed

Filename	Overview
src/domain/search/search/hybrid.ts	Clean structural decomposition of hybridSearchData into parseQueries, collectRankedLists, and fuseResults; interfaces promoted to module scope; no behavioral changes introduced.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant hybridSearchData
    participant parseQueries
    participant collectRankedLists
    participant ftsSearchData
    participant searchData
    participant fuseResults

    Caller->>hybridSearchData: query, customDbPath, opts
    hybridSearchData->>hybridSearchData: loadConfig(), derive limit/k/topK
    hybridSearchData->>hybridSearchData: openReadonlyOrFail → hasFtsIndex → close
    alt FTS not available
        hybridSearchData-->>Caller: null
    end
    hybridSearchData->>parseQueries: query string
    parseQueries-->>hybridSearchData: string[]
    hybridSearchData->>collectRankedLists: queries[], customDbPath, opts, topK
    loop for each query q
        collectRankedLists->>ftsSearchData: q, customDbPath, opts+limit
        ftsSearchData-->>collectRankedLists: BM25 results
        collectRankedLists->>searchData: q, customDbPath, opts+limit+minScore
        searchData-->>collectRankedLists: semantic results
    end
    collectRankedLists-->>hybridSearchData: RankedItem[][]
    hybridSearchData->>fuseResults: rankedLists, k, limit
    fuseResults-->>hybridSearchData: HybridResult[] (RRF-scored)
    hybridSearchData-->>Caller: { results }

_{Reviews (1): Last reviewed commit: "refactor: split hybridSearchData into ke..." | Re-trigger Greptile}

refactor: split hybridSearchData into keyword, vector, and merge steps

38dd4ca

carlos-alm added 2 commits April 2, 2026 20:04

Merge branch 'main' into refactor/decompose-hybrid-search

08627b9

Merge branch 'main' into refactor/decompose-hybrid-search

628b370

carlos-alm merged commit bd9f508 into main Apr 3, 2026
5 of 12 checks passed

carlos-alm deleted the refactor/decompose-hybrid-search branch April 3, 2026 04:46

github-actions bot locked and limited conversation to collaborators Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: split hybridSearchData into keyword, vector, and merge steps#768

refactor: split hybridSearchData into keyword, vector, and merge steps#768
carlos-alm merged 3 commits intomainfrom
refactor/decompose-hybrid-search

carlos-alm commented Apr 3, 2026

Uh oh!

claude bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

carlos-alm commented Apr 3, 2026

Summary

Titan Audit Context

Changes

Test plan

Uh oh!

claude bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: Hybrid Search Refactoring

Todo List

Root Cause Analysis

Backlog Compliance

Configuration & Test Review

Code Quality Assessment

Strengths ✅

Implementation Quality ✅

Potential Concerns (Minor)

Code Patterns ✅

Technical Deep Dive

Final Recommendation

Uh oh!

greptile-apps bot commented Apr 3, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude bot commented Apr 3, 2026 •

edited

Loading