fix(discord): raise thread title max tokens for reasoning models#64172
Conversation
Greptile SummaryRaises Confidence Score: 5/5Safe to merge — the change is minimal, correctly targets the root cause, and is backed by live end-to-end verification. Only a single P2 style finding (preferred test model constant not updated when the test file was touched). No logic, correctness, or security issues found. The token budget increase is well-justified and scoped correctly. No files require special attention.
|
0237f06 to
87ac961
Compare
Local cherry-pick of upstream PR openclaw#64172. Keeps the live gateway build on this fork in sync with the upstream fix while the PR is under review. Root cause: when the simple-completion model used for thread-title generation is a reasoning model, the 24-token output budget is consumed entirely by the internal thinking block, leaving no tokens for the text output. extractAssistantText returns empty, the rename is silently skipped. Raising the ceiling to 512 gives enough headroom for thinking plus a short title. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
87ac961 to
228e6ed
Compare
Follow repo testing guideline to prefer sonnet-4.6 for Anthropic model constants in tests (per CLAUDE.md, flagged by Greptile review on openclaw#64172).
|
The failing CI checks ( Also addressed the Greptile review's P2 note by updating the |
2d1c1cd to
78301d9
Compare
Follow repo testing guideline to prefer sonnet-4.6 for Anthropic model constants in tests (per CLAUDE.md, flagged by Greptile review on openclaw#64172).
Follow repo testing guideline to prefer sonnet-4.6 for Anthropic model constants in tests (per CLAUDE.md, flagged by Greptile review on openclaw#64172).
78301d9 to
0c54fbb
Compare
Follow repo testing guideline to prefer sonnet-4.6 for Anthropic model constants in tests (per CLAUDE.md, flagged by Greptile review on openclaw#64172).
0c54fbb to
8e36c35
Compare
When the simple-completion model selected for thread-title generation is a reasoning model (e.g. MiniMax M2, Claude thinking models, OpenAI o-series), the 24-token output budget is entirely consumed by the internal thinking block before any user-visible text is emitted. extractAssistantText then returns an empty string, generateThreadTitle returns null, and the auto-thread rename is silently skipped while the feature appears to do nothing. Raise DISCORD_THREAD_TITLE_MAX_TOKENS to 512 so there is enough headroom for a short thinking pass plus the 3-6 word title output. The generous ceiling only matters when the provider actually reasons; non-reasoning models still emit a short title and stop early at end-of-sequence. Verified live against a MiniMax M2 reasoning model served through an Anthropic-compatible API endpoint: before the fix, the rename never fired; after the fix, the thread is renamed with a concise generated title. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Follow repo testing guideline to prefer sonnet-4.6 for Anthropic model constants in tests (per CLAUDE.md, flagged by Greptile review on openclaw#64172).
8e36c35 to
4d721a4
Compare
Follow repo testing guideline to prefer sonnet-4.6 for Anthropic model constants in tests (per CLAUDE.md, flagged by Greptile review on #64172).
|
Landed via rebase onto
Thanks @hanamizuki! |
Follow repo testing guideline to prefer sonnet-4.6 for Anthropic model constants in tests (per CLAUDE.md, flagged by Greptile review on openclaw#64172).
…models Follow-up to openclaw#64172. That PR raised DISCORD_THREAD_TITLE_MAX_TOKENS from 24 to 512, which unblocked very short messages but left two failure modes for moderately complex inputs on reasoning models: 1. 512 is still too tight when the thinking block alone exceeds the budget - extractAssistantText returns empty, generateThreadTitle returns null, rename is silently skipped. 2. DEFAULT_THREAD_TITLE_TIMEOUT_MS = 10_000 kills ~15% of real production samples: observed MiniMax M2.7 thinking latency ranges 1.7s to 17s for the same task, median 7s, p95 16.9s (n=20). Raise MAX_TOKENS 512 -> 4096 and TIMEOUT 10_000 -> 60_000. Both headrooms are safe because maybeRenameDiscordAutoThread is dispatched without await - a longer rename cannot block message delivery. Worst case is the thread keeps its original title for up to a minute longer before being renamed, which is strictly better than the 15% silent failure rate today. Update the matching maxTokens assertion in the unit test. Full data, percentile distribution, and why-these-numbers analysis in the PR description. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
autoThreadName: "generated"silently fails to rename auto-created threads when the selected simple-completion model is a reasoning model (e.g. MiniMax M2, Claude thinking models, OpenAI o-series).DISCORD_THREAD_TITLE_MAX_TOKENSraised from24to512inextensions/discord/src/monitor/thread-title.ts, plus the matching expectation in the existing unit test.Change Type
Scope
Linked Issue/PR
autoThreadName: "generated"feature)Root Cause
The simple-completion call in
generateThreadTitlesetsmaxTokens: 24as a cost/latency guard, assuming the model's entire output budget will go to a 3-6 word title. That assumption holds for instruct-only models but breaks for reasoning models whose API response contains athinkingcontent block before thetextblock:content: [{ type: "thinking", thinking: "..." }, { type: "text", text: "..." }].maxTokens: 24, the entire budget is consumed by the thinking block before any text token is produced.extractAssistantText(response)walks the content array looking for text blocks, finds none, and returns"".normalizeGeneratedThreadTitle("") → ""→generated || null → null→maybeRenameDiscordAutoThreadearly-returns.logVerbose— invisible unless the gateway is explicitly in verbose mode — so the failure is silent in normal operation.Bumping the ceiling to
512gives enough headroom for a short thinking pass plus the title output. The generous ceiling only costs more tokens when the provider actually reasons; instruct-only models still emit a short title and stop early at natural end-of-sequence.Test plan
pnpm test:extension discord— 928/928 tests passing locally (112 files)extractAssistantTextreturned"";generateThreadTitlereturnednull;maybeRenameDiscordAutoThreadearly-returned with no visible log; the thread kept the raw first-message name.thinkingblock and atextblock (total token usage well under 512);rawTextwas non-empty; a concise title was produced; the DiscordPATCH /channels/{id}call succeeded; the thread was visibly renamed in the Discord UI.thread-title.generate.test.tsupdated to expectmaxTokens: 512(it was the only test asserting the constant value).AI-Assisted
Notes
pnpm tsgoerrors inextensions/discord/src/components.tsand siblings (DiscordModalEntry,DiscordComponentModalEntry, etc.) reproduce on an otherwise-cleanupstream/maincheckout — pre-existing and untouched by this PR.