fix(services): Prevent compact summary truncation at 200-token cap by edenreich · Pull Request #457 · inference-gateway/cli

edenreich · 2026-04-27T14:33:30Z

Summary

GenerateLLMSummary hardcoded maxTokens := 200 and ignored FinishReason, so when the model hit the cap the truncated mid-sentence text was rendered as a complete --- Context Summary --- block. Raise the default to 1024 and expose it as compact.summary_max_tokens for tuning.
Warn-log when response.Choices[0].FinishReason == sdk.Length so future truncations are visible in logs instead of failing silently.
Bug surfaced after fix(services): Trigger auto-compact from gateway-reported tokens #454 (fix(services): Trigger auto-compact from gateway-reported tokens) made auto-compact actually fire on the gateway-reported token count — the 200-cap was always too tight, but compaction rarely triggered before, so the truncation was rarely visible.

The summary was hardcoded at 200 max tokens with no FinishReason check, so when the model hit the cap the truncated mid-sentence text was rendered as a complete summary. Raise the default to 1024, expose it as compact.summary_max_tokens, and warn-log on FinishReason=length.

## [0.104.2](v0.104.1...v0.104.2) (2026-04-27) ### 🐛 Bug Fixes * **services:** Prevent compact summary truncation at 200-token cap ([#457](#457)) ([36b9612](36b9612)), closes [#454](#454) * Reconcile token displays and persist metadata-only saves ([#459](#459)) ([8bc8767](8bc8767)) ### 📚 Documentation * Update agents MD ([#458](#458)) ([27bfaea](27bfaea)) ### 🧹 Maintenance * **nix:** Update package to v0.104.1 ([#456](#456)) ([784e4bc](784e4bc))

ig-semantic-release-bot · 2026-04-27T19:28:17Z

🎉 This PR is included in version 0.104.2 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

edenreich added 2 commits April 27, 2026 16:24

chore: Regenerate the config

3199b83

edenreich merged commit 36b9612 into main Apr 27, 2026
5 checks passed

edenreich deleted the fix/compact-summary-truncation branch April 27, 2026 14:44

ig-semantic-release-bot Bot added the released label Apr 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(services): Prevent compact summary truncation at 200-token cap#457

fix(services): Prevent compact summary truncation at 200-token cap#457
edenreich merged 2 commits intomainfrom
fix/compact-summary-truncation

edenreich commented Apr 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

ig-semantic-release-bot Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

edenreich commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

Uh oh!

ig-semantic-release-bot Bot commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

edenreich commented Apr 27, 2026 •

edited

Loading