generated from hack-ink/vibe-mono
-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
area:serviceRetrieval logic, ranking, and request orchestration.Retrieval logic, ranking, and request orchestration.kind:researchInvestigation, evaluation, or spike that produces a decision memo or research artifact.Investigation, evaluation, or spike that produces a decision memo or research artifact.theme:costLatency, compute, storage, and cost controls.Latency, compute, storage, and cost controls.theme:evaluationQuality measurement, gold sets, regressions, and metrics.Quality measurement, gold sets, regressions, and metrics.
Description
Context
Reranker scores can be noisy and are not always directly comparable across runs, and pure rerank_score sorting can overreact to small score deltas. ELF already has basic tie-break signals (importance and recency) and optional hit logging, but additional deterministic signals can improve stability.
Goal
Add small, explainable ranking signals that reduce noise and improve stability, without changing default behavior.
Scope
- Lexical overlap bonus (deterministic)
- Compute a simple overlap signal between query tokens and:
- stitched snippet tokens
- structured fields (when Add structured memory fields with field-level embeddings #17 is enabled)
- Add an optional small bonus term when overlap is high.
- Reinforcement from hits (deterministic)
- Use existing note hit signals (hit_count, last_hit_at) to add a small, saturating reinforcement term.
- Ensure reinforcement is bounded and does not overwhelm rerank.
- Optional decay (separate from TTL)
- Add an optional decay score that gradually reduces ranking weight for stale notes even before TTL expiration.
- Keep TTL as a hard filter; decay is only a soft ranking term.
Explainability
- Record each term contribution in traces (for example: lexical_bonus, hit_boost, decay_penalty).
Non-goals
- No LLM calls.
- No changes to evidence binding rules.
- No changes to candidate retrieval.
Configuration
- Add ranking knobs with safe defaults (disabled by default, or defaults that preserve current behavior).
Testing and evaluation
- Unit tests for term computation and bounds.
- Harness: measure rank churn reduction across multiple runs of the same query.
Acceptance criteria
- With features enabled, rank churn decreases on representative queries without meaningful quality loss.
- Default configuration preserves current ranking behavior.
References
- OpenMemory: decay/reinforcement signals (conceptual inspiration)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
area:serviceRetrieval logic, ranking, and request orchestration.Retrieval logic, ranking, and request orchestration.kind:researchInvestigation, evaluation, or spike that produces a decision memo or research artifact.Investigation, evaluation, or spike that produces a decision memo or research artifact.theme:costLatency, compute, storage, and cost controls.Latency, compute, storage, and cost controls.theme:evaluationQuality measurement, gold sets, regressions, and metrics.Quality measurement, gold sets, regressions, and metrics.