Skip to content

Token budget enforcement for context injection #16

@jimador

Description

@jimador

Observation

Memory eagerly loads propositions into the LLM context via withEagerQuery(), withEagerTopicSearch(), and withEagerSearchAbout(). These methods control which propositions to load but not how many tokens they consume. There's no mechanism for staying within a token budget.

For small contexts with few propositions this works fine. For large contexts with hundreds of propositions, eager loading can overflow the context window. Consumers need a way to:

  1. Estimate token cost of each proposition
  2. Select which propositions to include given a budget
  3. Respect priority ordering when dropping propositions
  4. Guarantee that critical propositions are never dropped

What DICE already has

  • Memory — implements Tool and EagerSearch<Memory>. Eager loading via withEagerQuery(), withEagerTopicSearch(), withEagerSearchAbout(). On-demand retrieval via call() (the Tool interface).
  • MemoryRetrieverrecall(), recallAbout(), recallByType(), recallRecent(). Returns propositions without considering token cost.
  • PropositionRepository.query(PropositionQuery) — composable queries with limit, but limit is count-based, not token-based.
  • Proposition.importance and Proposition.confidence — could drive drop priority, but nothing uses them for budget decisions today.

The question

Should DICE enforce token budgets when injecting propositions into LLM context?

Some possibilities:

  1. Count-based limitsPropositionQuery already supports limit. Just use withEagerQuery { it.copy(limit = 50) }. Crude but simple.

  2. TokenCounter SPI — a fun interface TokenCounter { fun countTokens(text: String): Int } with a built-in character heuristic (~4 chars/token). Consumers can plug in model-specific tokenizers when precision matters.

  3. BudgetEnforcer — given a token budget and a priority ordering, include as many propositions as fit. Drop priority: lowest importance first, oldest lastAccessed for tie-breaking. If priority/authority (Proposition priority and authority model #13 ) is adopted, PROVISIONAL drops before RELIABLE, CANON never drops.

  4. Budget-aware MemoryMemory accepts a maxTokens parameter. Eager loading respects it automatically.

Open questions

  • Is count-based limiting sufficient? If propositions are roughly uniform in length, count limits approximate token budgets well enough. Variable-length propositions (short facts vs. long narratives) would benefit from actual token counting.
  • Who owns the budget? Should it be on Memory, on MemoryRetriever, or as a standalone enforcer that wraps either?
  • How does this interact with pinning (Proposition pinning and eviction immunity #9 )? Pinned propositions consume budget first. If pinned propositions alone exceed the budget, what happens?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions