Token budget enforcement for context injection

## Observation

`Memory` eagerly loads propositions into the LLM context via `withEagerQuery()`, `withEagerTopicSearch()`, and `withEagerSearchAbout()`. These methods control *which* propositions to load but not *how many tokens* they consume. There's no mechanism for staying within a token budget.

For small contexts with few propositions this works fine. For large contexts with hundreds of propositions, eager loading can overflow the context window. Consumers need a way to:

1. Estimate token cost of each proposition
2. Select which propositions to include given a budget
3. Respect priority ordering when dropping propositions
4. Guarantee that critical propositions are never dropped

## What DICE already has

- **`Memory`** — implements `Tool` and `EagerSearch<Memory>`. Eager loading via `withEagerQuery()`, `withEagerTopicSearch()`, `withEagerSearchAbout()`. On-demand retrieval via `call()` (the Tool interface).
- **`MemoryRetriever`** — `recall()`, `recallAbout()`, `recallByType()`, `recallRecent()`. Returns propositions without considering token cost.
- **`PropositionRepository.query(PropositionQuery)`** — composable queries with `limit`, but limit is count-based, not token-based.
- **`Proposition.importance`** and **`Proposition.confidence`** — could drive drop priority, but nothing uses them for budget decisions today.

## The question

Should DICE enforce token budgets when injecting propositions into LLM context?

Some possibilities:

1. **Count-based limits** — `PropositionQuery` already supports `limit`. Just use `withEagerQuery { it.copy(limit = 50) }`. Crude but simple.

2. **TokenCounter SPI** — a `fun interface TokenCounter { fun countTokens(text: String): Int }` with a built-in character heuristic (~4 chars/token). Consumers can plug in model-specific tokenizers when precision matters.

3. **BudgetEnforcer** — given a token budget and a priority ordering, include as many propositions as fit. Drop priority: lowest importance first, oldest `lastAccessed` for tie-breaking. If priority/authority (#13 ) is adopted, PROVISIONAL drops before RELIABLE, CANON never drops.

4. **Budget-aware Memory** — `Memory` accepts a `maxTokens` parameter. Eager loading respects it automatically.

## Open questions

- **Is count-based limiting sufficient?** If propositions are roughly uniform in length, count limits approximate token budgets well enough. Variable-length propositions (short facts vs. long narratives) would benefit from actual token counting.
- **Who owns the budget?** Should it be on `Memory`, on `MemoryRetriever`, or as a standalone enforcer that wraps either?
- **How does this interact with pinning (#9 )?** Pinned propositions consume budget first. If pinned propositions alone exceed the budget, what happens?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Token budget enforcement for context injection #16

Observation

What DICE already has

The question

Open questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Token budget enforcement for context injection #16

Description

Observation

What DICE already has

The question

Open questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions