generated from hack-ink/vibe-mono
-
Notifications
You must be signed in to change notification settings - Fork 1
Closed
Labels
kind:perfPerformance and efficiency improvements (latency, throughput, storage, cost).Performance and efficiency improvements (latency, throughput, storage, cost).
Description
Context
QMD caches LLM outputs to keep query expansion and reranking consistent and inexpensive on repeated queries.
Goal
Reduce latency and cost for repeated queries while keeping outputs consistent.
Scope
- Cache expansion and rerank results keyed by query, model, and version.
- Define cache invalidation rules and a retention policy.
Acceptance Criteria
- Repeated queries avoid redundant LLM calls without correctness regressions.
References
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
kind:perfPerformance and efficiency improvements (latency, throughput, storage, cost).Performance and efficiency improvements (latency, throughput, storage, cost).