Skip to content

fix(gmail): skip charset re-decoding when body is already valid UTF-8#511

Closed
dinakars777 wants to merge 1 commit intoopenclaw:mainfrom
dinakars777:fix/gmail-charset-redecode
Closed

fix(gmail): skip charset re-decoding when body is already valid UTF-8#511
dinakars777 wants to merge 1 commit intoopenclaw:mainfrom
dinakars777:fix/gmail-charset-redecode

Conversation

@dinakars777
Copy link
Copy Markdown
Contributor

Summary

decodeBodyCharset re-decodes already-UTF-8 data as the original charset (e.g., ISO-2022-JP, Shift-JIS, GBK) when the MIME header declares a non-UTF-8 charset, producing garbled output. The Gmail API normalizes body.data to UTF-8 before base64url-encoding but preserves the original MIME headers verbatim.

Steps to Reproduce (before fix)

  1. Receive an email with Content-Type: text/plain; charset="iso-2022-jp"
  2. Run gog gmail read <threadId>
  3. Body text is garbled (replacement characters \ufffd)

Changes

Add a utf8.Valid(data) guard before charset conversion. If the data is already valid UTF-8, skip re-decoding. Genuine non-UTF-8 raw bytes (like ISO-2022-JP or Shift-JIS) are almost never valid UTF-8, so this guard is safe.

Affected Encodings

Confirmed: iso-2022-jp. Likely also: shift_jis, euc-jp, gb2312, gbk, euc-kr, windows-1252, iso-8859-1.

Test plan

  • go build ./... compiles cleanly
  • go test ./internal/cmd/... -run GmailGet -v passes
  • Manual test: read an ISO-2022-JP encoded email and verify it displays correctly

Closes #446

The Gmail API normalizes body.data to UTF-8 before base64url-encoding
but preserves the original MIME Content-Type headers verbatim. This
caused decodeBodyCharset to re-decode already-UTF-8 bytes as the
original charset (e.g., ISO-2022-JP, Shift-JIS, GBK), producing
garbled output for emails in those encodings.

Add a utf8.Valid() guard before charset conversion: if the data is
already valid UTF-8, skip re-decoding. Genuine non-UTF-8 raw bytes
are almost never valid UTF-8, so this guard is safe.

Closes steipete#446
@steipete
Copy link
Copy Markdown
Collaborator

Landed manually on main as part of a grouped Gmail fix commit.

Note: review kept the UTF-8 guard but preserved ISO-2022-JP escape-sequence decoding, with regression coverage.

Thanks!

@steipete steipete closed this Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(gmail): ISO-2022-JP emails garbled — decodeBodyCharset re-decodes UTF-8 bytes

2 participants