Skip to content

Wave 1e: Query engine (n-gram extraction, posting list intersection) #14

@leesharon

Description

@leesharon

Overview

Query engine: text query → n-grams → index lookup → posting list intersection → BM25F scoring → ranked results.

Module: crates/rskim-search/src/lexical/query.rs (~200 lines)

Pipeline

Query string → extract_query_ngrams → lookup each in index (binary search .skidx) → load posting lists (mmap .skpost) → intersect (sorted merge) → score (BM25F) → sort by score → top-K

Optimizations

  • Process rarest n-gram first (smallest posting list) — "cheapest first" strategy
  • Lazy posting list iteration (never fully materialize)
  • Early termination when top-K scores can't be exceeded

Dependencies

Acceptance Criteria

  • Query "handleRequest" returns files with that identifier in top 3
  • Query latency < 50ms for 10k-file index
  • Empty query returns empty results (not error)
  • Unicode queries work
  • Results include file path, line, score, snippet

Implementation Guidelines Checklist

  • No file exceeds 400 lines
  • Every public function has /// doc comment
  • All dependencies injected
  • All fallible operations return Result<T, SearchError>
  • Module has #[cfg(test)] mod tests with: happy path, edge case, error path
  • No hardcoded language-specific match arms
  • No Vec::clone() in hot paths
  • Posting list intersection uses lazy evaluation

Metadata

Metadata

Assignees

No one assigned

    Labels

    searchLayer 1: Code search systemwave-1Wave 1: Lexical index

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions