fix(services): Trigger auto-compact from gateway-reported tokens by edenreich · Pull Request #454 · inference-gateway/cli

edenreich · 2026-04-27T13:01:54Z

Summary

Closes #453. Auto-compact and session rollover never fired because the trigger gated on tokenizer.EstimateMessagesTokens(entries) — which counts only message content. The real prompt also includes the system prompt (~2K) and tool definitions (~3–5K for 15+ tools), so sessions blew past the 80% threshold without ever crossing the gate's underestimate.

Trigger now prefers repo.GetSessionTokens().LastInputTokens (the gateway's authoritative count, already shown by /context). Falls back to the entries-only estimate before the first round-trip.
Refreshed ContextMatchers to track the gateway's actual /v1/models output — added Gemini 3, Gemma 3/3n/4, GPT-OSS, Ministral 3, Devstral, MiniMax M2, GLM 4/5, Nemotron 3, and deep-research (1M); dropped patterns for families the gateway doesn't serve.
Model picker (/switch) now shows the context window beside each row (1M / 128K / ? when unknown) alongside pricing.

Auto-compact and session rollover gated on tokenizer.EstimateMessagesTokens(entries), which counts only message content. The actual prompt also includes the system prompt (~2K tokens) and tool definitions (~3-5K tokens for 15+ tools), so the gate underestimated by several thousand tokens — sessions never crossed the 80% threshold even when /context showed the window full. - SessionRolloverManager.tokenTriggerFires and ConversationOptimizer.OptimizeMessages now prefer repo.GetSessionTokens().LastInputTokens (the gateway's authoritative count, already shown by /context). Fall back to the entries-only estimate before the first round-trip when LastInputTokens is still 0. - Wire ConversationRepository into the optimizer config in the container. Also in this commit: - Refresh ContextMatchers to track the gateway's /v1/models output. Drop patterns for families the gateway doesn't serve (OpenAI direct, Llama, Cohere). Add Gemini 3 (1M), Gemma 3/3n/4, GPT-OSS, Ministral 3, Devstral, MiniMax M2, GLM 4/5, Nemotron 3, and deep-research (Gemini 3.1 Pro backed, 1M input window). Reorder so specific patterns precede generic ones. - Add LookupContextWindow returning a hit boolean so callers can distinguish a real match from the 8192 fallback. - Model picker (/switch) now shows the context window beside each row — "1M" / "128K" / "?" when no matcher hits — alongside the existing pricing. - Test coverage for Gemini, Gemma, Deep Research, DeepSeek, Mistral family, GPT-OSS / MiniMax / GLM / Nemotron, Qwen3 served variants, and the LastInputTokens trigger path on both the rollover manager and optimizer.

## [0.104.1](v0.104.0...v0.104.1) (2026-04-27) ### 🐛 Bug Fixes * **ui:** Restore typing while agent is busy ([#455](#455)) ([92840d6](92840d6)), closes [#410](#410) * **services:** Trigger auto-compact from gateway-reported tokens ([#454](#454)) ([1fc19dd](1fc19dd)) * **config:** Update model context windows and pricing for current model lineup ([#452](#452)) ([655f9f8](655f9f8)) ### ♻️ Code Refactoring * **config:** Centralize config loading and remove service indirection ([#443](#443)) ([babf173](babf173)) * **config:** Centralize sub-config definitions and consolidate prompts ([#448](#448)) ([9979ac7](9979ac7)) * Extract keybindings configuration to separate file ([#438](#438)) ([6d04195](6d04195)) * **config:** Generate viper defaults via reflection over DefaultConfig() ([#436](#436)) ([85c6e0a](85c6e0a)) * **config:** Make tool prompts configurable via prompts.yaml ([#450](#450)) ([9fc1bb5](9fc1bb5)), closes [#446](#446) * Move channels to separate channels.yaml config ([#444](#444)) ([aa43e0e](aa43e0e)), closes [#441](#441) * Move computer_use to a separate config ([#447](#447)) ([a762c64](a762c64)), closes [#444](#444) * Move prompts to separate prompts.yaml config ([#442](#442)) ([45e4fb6](45e4fb6)) * **config:** Unify sub-configs behind CollectionConfig + utils.{Load,Save}YAML ([#445](#445)) ([aab481c](aab481c)) ### 📚 Documentation * Add directory-structure reference ([#451](#451)) ([95d26f8](95d26f8)) ### 👷 CI/CD * Reduce runs of nix ([acac364](acac364)) ### 🧹 Maintenance * **deps:** Bump modernc.org/sqlite from 1.49.1 to 1.50.0 ([#449](#449)) ([06a535d](06a535d)) * **nix:** Update package to v0.104.0 ([#432](#432)) ([5564697](5564697))

ig-semantic-release-bot · 2026-04-27T13:40:41Z

🎉 This PR is included in version 0.104.1 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

) ## Summary - `GenerateLLMSummary` hardcoded `maxTokens := 200` and ignored `FinishReason`, so when the model hit the cap the truncated mid-sentence text was rendered as a complete `--- Context Summary ---` block. Raise the default to **1024** and expose it as `compact.summary_max_tokens` for tuning. - Warn-log when `response.Choices[0].FinishReason == sdk.Length` so future truncations are visible in logs instead of failing silently. - Bug surfaced after #454 (`fix(services): Trigger auto-compact from gateway-reported tokens`) made auto-compact actually fire on the gateway-reported token count — the 200-cap was always too tight, but compaction rarely triggered before, so the truncation was rarely visible.

## [0.104.2](v0.104.1...v0.104.2) (2026-04-27) ### 🐛 Bug Fixes * **services:** Prevent compact summary truncation at 200-token cap ([#457](#457)) ([36b9612](36b9612)), closes [#454](#454) * Reconcile token displays and persist metadata-only saves ([#459](#459)) ([8bc8767](8bc8767)) ### 📚 Documentation * Update agents MD ([#458](#458)) ([27bfaea](27bfaea)) ### 🧹 Maintenance * **nix:** Update package to v0.104.1 ([#456](#456)) ([784e4bc](784e4bc))

edenreich merged commit 1fc19dd into main Apr 27, 2026
5 checks passed

edenreich deleted the fix/auto-compact-gateway-tokens branch April 27, 2026 13:23

ig-semantic-release-bot Bot added the released label Apr 27, 2026

edenreich mentioned this pull request Apr 27, 2026

fix(services): Prevent compact summary truncation at 200-token cap #457

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(services): Trigger auto-compact from gateway-reported tokens#454

fix(services): Trigger auto-compact from gateway-reported tokens#454
edenreich merged 1 commit intomainfrom
fix/auto-compact-gateway-tokens

edenreich commented Apr 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

ig-semantic-release-bot Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

edenreich commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

ig-semantic-release-bot Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

edenreich commented Apr 27, 2026 •

edited

Loading