Add optional note-level diversity control (MMR / redundancy filter)

Context
Even with good candidate retrieval and reranking, top_k results can cluster on near-duplicates (same topic phrased differently, or multiple chunks from highly similar notes). ELF currently aggregates by note (top-1 chunk), but does not apply an explicit diversity step after reranking.

Goal
Increase coverage and reduce redundancy in top_k without meaningful precision loss.

Scope
- Add an optional, deterministic note-level diversity step after rerank.
- Use pooled note embeddings (Postgres source of truth) to compute similarity between candidate notes.
- Selection strategy: MMR-style or redundancy-threshold filtering.
  - Keep the highest-scoring note.
  - Skip notes whose similarity to already-selected notes exceeds a threshold, until top_k is filled.
- Add explain outputs for diversity decisions (for example: selected_reason, skipped_reason, nearest_selected_note_id, similarity).

Non-goals
- No LLM calls.
- No changes to candidate retrieval (Qdrant hybrid, RRF fusion).
- No changes to reranker provider calls.

Configuration
- Add ranking knobs with safe defaults (disabled by default, or defaults that preserve current behavior).
  - diversity.enabled
  - diversity.sim_threshold
  - diversity.max_skips (optional guardrail)

Testing and evaluation
- Unit tests for selection logic and determinism.
- Retrieval harness comparison on a fixed dataset:
  - duplicate_rate@k, unique_note_rate@k
  - NDCG@k / recall@k

Acceptance criteria
- With diversity enabled, duplicate_rate@k decreases and unique_note_rate@k increases on the eval set.
- Default configuration preserves current ranking behavior.

References
- qmd: note-level coverage via post-fusion selection heuristics
- Graphiti: redundancy penalties / MMR-like reranking patterns


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optional note-level diversity control (MMR / redundancy filter) #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add optional note-level diversity control (MMR / redundancy filter) #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions