Skip to content

fix(docs): unblock markdown --append on Thai/CJK/emoji content#560

Open
ninyawee wants to merge 1 commit intoopenclaw:mainfrom
ninyawee:fix/markdown-thai-hang
Open

fix(docs): unblock markdown --append on Thai/CJK/emoji content#560
ninyawee wants to merge 1 commit intoopenclaw:mainfrom
ninyawee:fix/markdown-thai-hang

Conversation

@ninyawee
Copy link
Copy Markdown

@ninyawee ninyawee commented May 5, 2026

Summary

Fixes #559gog docs write --markdown --append hung indefinitely (CPU-bound infinite loop) on any markdown content ending in a non-ASCII rune. Thai content reliably triggered it because Thai paragraphs/headings naturally end with multi-byte runes; the same bug applies to CJK and emoji content.

The hang was in nextRune (internal/cmd/docs_markdown.go). For a string containing exactly one multi-byte rune, the previous range-based implementation:

for i, r := range s {
    if i > 0 { return s[:i], i }   // never fires (only one iteration)
    if len(s) == 1 { return s, 1 } // false: len() is bytes, not runes
    _ = r
}
return "", 0   // ← falls through here

returned size 0, and the caller in ParseInlineFormatting advanced currentByte += 0 forever inside its for currentByte < len(text) loop.

--markdown --replace is unaffected because that path uploads to Drive's server-side markdown converter (Media(strings.NewReader(cleaned), ContentType(mimeTextMarkdown)) in docs_edit.go). Only --append and the find-replace/sed paths invoke the local parser, so this issue affects:

  • docs write --markdown --append
  • docs find-replace --format markdown (when replacement text ends in a multi-byte rune)
  • docs sed family commands using markdown replacement content

Fix

Replace the hand-rolled rune iteration with utf8.DecodeRuneInString. It handles every case correctly — empty input, ASCII, multi-byte runes, and invalid UTF-8 (RuneError with its byte size).

Test plan

  • Added unit tests for nextRune covering the previously-broken paths (single Thai rune, single emoji, empty input, mixed strings).
  • Added timeout-guarded tests for ParseInlineFormatting on Thai headings, FAQ-style mixed text, bold/italic/code mixed with Thai, and trailing emoji. Each call is wrapped in a 2s timeout so any future infinite-loop regression fails fast in CI rather than stalling it.
  • Added an end-to-end test TestMarkdownToDocsRequests_ThaiAppend exercising the exact path used by docs write --markdown --appendParseMarkdownMarkdownToDocsRequests at a non-zero base index — on a multi-element Thai sample (heading + paragraph + bullet list + blockquote).
  • Verified the new tests fail on main HEAD (each times out and the goroutine stack confirms it's stuck inside ParseInlineFormatting at line 434) and pass with the fix.
  • make lint clean. Full go test ./internal/cmd/... passes apart from a pre-existing failure in TestRequireAccount_Missing that is unrelated to this change (also fails on stock main).
  • Verified end-to-end against the live Google Docs API:
    • The repro case (--markdown --append on a single-line Thai heading) used to time out at 30s; now completes in ~1.4s.
    • A 14 KB Thai markdown doc with 136 generated batch-update requests now appends in ~1.7s.

Notes

…ing markdown append

`gog docs write --markdown --append` looped forever on any markdown content
ending in a non-ASCII rune (Thai, CJK, emoji, etc.). The trigger was
`nextRune` in docs_markdown.go: for a string containing exactly one
multi-byte rune, the range-loop implementation never returned a non-zero
size, and the caller in ParseInlineFormatting spun on `currentByte += 0`.

`--markdown --replace` was unaffected because that path uploads the
markdown to Drive's server-side converter; only `--append` and the
find-replace/sed paths invoke the local parser.

Replace the hand-rolled iteration with `utf8.DecodeRuneInString`, which
handles empty input, single-byte ASCII, multi-byte runes, and invalid
UTF-8 (returns the replacement-rune width) consistently.

Adds three regression tests covering the unit (`nextRune`),
the immediate caller (`ParseInlineFormatting` on Thai/emoji), and the
end-to-end markdown→Docs-API path used by --append (`MarkdownToDocsRequests`
at a non-zero base index). Each is wrapped in a 2–5s timeout so a future
infinite-loop regression fails fast instead of stalling CI.

Fixes openclaw#559
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs write --markdown --append hangs forever on Thai content (infinite loop in nextRune)

1 participant