You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Memory eagerly loads propositions into the LLM context via withEagerQuery(), withEagerTopicSearch(), and withEagerSearchAbout(). These methods control which propositions to load but not how many tokens they consume. There's no mechanism for staying within a token budget.
For small contexts with few propositions this works fine. For large contexts with hundreds of propositions, eager loading can overflow the context window. Consumers need a way to:
Estimate token cost of each proposition
Select which propositions to include given a budget
Respect priority ordering when dropping propositions
Guarantee that critical propositions are never dropped
What DICE already has
Memory — implements Tool and EagerSearch<Memory>. Eager loading via withEagerQuery(), withEagerTopicSearch(), withEagerSearchAbout(). On-demand retrieval via call() (the Tool interface).
PropositionRepository.query(PropositionQuery) — composable queries with limit, but limit is count-based, not token-based.
Proposition.importance and Proposition.confidence — could drive drop priority, but nothing uses them for budget decisions today.
The question
Should DICE enforce token budgets when injecting propositions into LLM context?
Some possibilities:
Count-based limits — PropositionQuery already supports limit. Just use withEagerQuery { it.copy(limit = 50) }. Crude but simple.
TokenCounter SPI — a fun interface TokenCounter { fun countTokens(text: String): Int } with a built-in character heuristic (~4 chars/token). Consumers can plug in model-specific tokenizers when precision matters.
BudgetEnforcer — given a token budget and a priority ordering, include as many propositions as fit. Drop priority: lowest importance first, oldest lastAccessed for tie-breaking. If priority/authority (Proposition priority and authority model #13 ) is adopted, PROVISIONAL drops before RELIABLE, CANON never drops.
Budget-aware Memory — Memory accepts a maxTokens parameter. Eager loading respects it automatically.
Open questions
Is count-based limiting sufficient? If propositions are roughly uniform in length, count limits approximate token budgets well enough. Variable-length propositions (short facts vs. long narratives) would benefit from actual token counting.
Who owns the budget? Should it be on Memory, on MemoryRetriever, or as a standalone enforcer that wraps either?
How does this interact with pinning (Proposition pinning and eviction immunity #9 )? Pinned propositions consume budget first. If pinned propositions alone exceed the budget, what happens?
Observation
Memoryeagerly loads propositions into the LLM context viawithEagerQuery(),withEagerTopicSearch(), andwithEagerSearchAbout(). These methods control which propositions to load but not how many tokens they consume. There's no mechanism for staying within a token budget.For small contexts with few propositions this works fine. For large contexts with hundreds of propositions, eager loading can overflow the context window. Consumers need a way to:
What DICE already has
Memory— implementsToolandEagerSearch<Memory>. Eager loading viawithEagerQuery(),withEagerTopicSearch(),withEagerSearchAbout(). On-demand retrieval viacall()(the Tool interface).MemoryRetriever—recall(),recallAbout(),recallByType(),recallRecent(). Returns propositions without considering token cost.PropositionRepository.query(PropositionQuery)— composable queries withlimit, but limit is count-based, not token-based.Proposition.importanceandProposition.confidence— could drive drop priority, but nothing uses them for budget decisions today.The question
Should DICE enforce token budgets when injecting propositions into LLM context?
Some possibilities:
Count-based limits —
PropositionQueryalready supportslimit. Just usewithEagerQuery { it.copy(limit = 50) }. Crude but simple.TokenCounter SPI — a
fun interface TokenCounter { fun countTokens(text: String): Int }with a built-in character heuristic (~4 chars/token). Consumers can plug in model-specific tokenizers when precision matters.BudgetEnforcer — given a token budget and a priority ordering, include as many propositions as fit. Drop priority: lowest importance first, oldest
lastAccessedfor tie-breaking. If priority/authority (Proposition priority and authority model #13 ) is adopted, PROVISIONAL drops before RELIABLE, CANON never drops.Budget-aware Memory —
Memoryaccepts amaxTokensparameter. Eager loading respects it automatically.Open questions
Memory, onMemoryRetriever, or as a standalone enforcer that wraps either?