Skip to content

fix(gmail): prevent URL corruption in quoted-printable decoding#186

Merged
steipete merged 3 commits intosteipete:mainfrom
100menotu001:fix/quoted-printable-url-corruption
Feb 14, 2026
Merged

fix(gmail): prevent URL corruption in quoted-printable decoding#186
steipete merged 3 commits intosteipete:mainfrom
100menotu001:fix/quoted-printable-url-corruption

Conversation

@100menotu001
Copy link

Summary

Fixes #159 — URLs with = characters were being corrupted to U+FFFD when emails had Content-Transfer-Encoding: quoted-printable but content was already decoded by Gmail API.

Root Cause

Gmail API's format=full may return body.data with content already decoded from its original transfer encoding, but the CTE header still indicates quoted-printable. The existing code would attempt to QP-decode again, causing:

  • Raw = characters (from URLs like ?foo=bar) treated as invalid QP sequences
  • Go's quotedprintable.Reader produces U+FFFD for invalid sequences

Solution

Added looksLikeQuotedPrintable() check (similar to existing looksLikeBase64()) that detects actual QP markers before decoding:

  • Soft line breaks (=\r\n or =\n)
  • Uppercase hex sequences (=XX where X is 0-9 or A-F)

Using uppercase-only hex detection avoids false positives from URLs containing lowercase letters after = (e.g., ?foo=bar).

Testing

Added unit tests covering:

  • Original issue (URL preservation when already decoded)
  • Various QP patterns (uppercase hex, soft breaks)
  • False positive prevention (lowercase URL params)
  • Edge cases (short input, mixed case)

Contributed via OpenClaw agent — active gog CLI users contributing back.

@steipete steipete self-assigned this Feb 14, 2026
Agent and others added 3 commits February 14, 2026 01:16
When Gmail API returns already-decoded content (format=full), the
Content-Transfer-Encoding header may still say 'quoted-printable'.
Previously, we would attempt to QP-decode again, causing raw '='
characters in URLs to be replaced with U+FFFD (replacement char).

This adds looksLikeQuotedPrintable() to detect actual QP sequences
(=XX hex or =CRLF soft breaks) and skip decoding when content appears
pre-decoded.

Fixes steipete#159
Matt's security review caught that URLs like '?foo=bar' would incorrectly
trigger QP detection because '=ba' matches as hex digits.

Changed to only match UPPERCASE hex (0-9, A-F) since:
- RFC 2045 recommends uppercase for QP encoding
- Most encoders use uppercase in practice
- This avoids false positives from lowercase URL params
@steipete steipete force-pushed the fix/quoted-printable-url-corruption branch from f4b8744 to aea2d90 Compare February 14, 2026 00:23
@steipete steipete merged commit 9eaff65 into steipete:main Feb 14, 2026
1 check passed
@steipete
Copy link
Owner

Landed via temp rebase onto main.

Thanks @100menotu001!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: gmail get corrupts URLs in email body (quoted-printable decoding issue)

2 participants