Summary
Add a --safe flag to gmail thread get (aka gmail read) and gmail get that strips all HTML, removes URLs, and decodes HTML entities, leaving only plain text in the output. The flag is fully opt-in; without it, behavior is unchanged.
Motivation
This tool is increasingly used in automated pipelines and by LLM-based agents that consume --json output to read and reason about emails. This creates a number of risk categories:
-
Phishing & malicious content: URLs in email bodies (both text/plain and stripped text/html) are displayed as-is. A user or automation that follows a link from an untrusted email is exposed to phishing, credential harvesting, or malware downloads.
-
Prompt injection via email: When LLMs process raw email content from --json output, malicious emails can embed prompt injection payloads in HTML, URLs, or entity-encoded text. Sanitizing the content before it reaches the LLM reduces this attack surface.
-
Tracking: Even in a CLI context, URLs copied from output can trigger open-tracking pixels and link-click tracking. Users reading emails in a "read-only" mindset may not expect tracking URLs to survive into the output.
Currently there is no way to request a "safe" read of an email.
Current behavior
text/plain bodies are displayed verbatim, including all URLs
text/html fallback uses regex-based tag stripping (<[^>]*>), which can be bypassed by malformed HTML
- HTML entities (e.g.
https://evil.com) are not decoded, allowing obfuscated URLs to pass through
--json output includes the full raw Gmail API response with unsanitized HTML body data
- No mechanism exists to strip URLs or dangerous content
Expected behavior (with --safe)
- HTML is converted to text using a proper HTML parser (
golang.org/x/net/html tokenizer), not regex, correctly handling malformed tags, nested structures, and edge cases (this package is already an indirect dependency (imported for charset support), so it adds no new dependencies)
- All
http:// and https:// URLs are replaced with [url removed]
- HTML entities are decoded before URL detection, catching obfuscated URLs like
https://...
<script> and <style> blocks are fully removed
- Headers (Subject, etc.) are also sanitized
- In JSON mode: a
bodies map (keyed by message ID) provides sanitized text, and raw body data is cleared from the message payload to prevent downstream tools from accessing unsanitized content
- Without
--safe: zero changes to existing behavior
Implementation
I've already implemented this and have it working with tests. Happy to open a PR if this proposal is approved.
Summary
Add a
--safeflag togmail thread get(akagmail read) andgmail getthat strips all HTML, removes URLs, and decodes HTML entities, leaving only plain text in the output. The flag is fully opt-in; without it, behavior is unchanged.Motivation
This tool is increasingly used in automated pipelines and by LLM-based agents that consume
--jsonoutput to read and reason about emails. This creates a number of risk categories:Phishing & malicious content: URLs in email bodies (both
text/plainand strippedtext/html) are displayed as-is. A user or automation that follows a link from an untrusted email is exposed to phishing, credential harvesting, or malware downloads.Prompt injection via email: When LLMs process raw email content from
--jsonoutput, malicious emails can embed prompt injection payloads in HTML, URLs, or entity-encoded text. Sanitizing the content before it reaches the LLM reduces this attack surface.Tracking: Even in a CLI context, URLs copied from output can trigger open-tracking pixels and link-click tracking. Users reading emails in a "read-only" mindset may not expect tracking URLs to survive into the output.
Currently there is no way to request a "safe" read of an email.
Current behavior
text/plainbodies are displayed verbatim, including all URLstext/htmlfallback uses regex-based tag stripping (<[^>]*>), which can be bypassed by malformed HTMLhttps://evil.com) are not decoded, allowing obfuscated URLs to pass through--jsonoutput includes the full raw Gmail API response with unsanitized HTML body dataExpected behavior (with
--safe)golang.org/x/net/htmltokenizer), not regex, correctly handling malformed tags, nested structures, and edge cases (this package is already an indirect dependency (imported forcharsetsupport), so it adds no new dependencies)http://andhttps://URLs are replaced with[url removed]https://...<script>and<style>blocks are fully removedbodiesmap (keyed by message ID) provides sanitized text, and raw body data is cleared from the message payload to prevent downstream tools from accessing unsanitized content--safe: zero changes to existing behaviorImplementation
I've already implemented this and have it working with tests. Happy to open a PR if this proposal is approved.