Feature Request: --safe flag for gmail read/get to sanitize email content

## Summary

Add a `--safe` flag to `gmail thread get` (aka `gmail read`) and `gmail get` that strips all HTML, removes URLs, and decodes HTML entities, leaving only plain text in the output. The flag is fully opt-in; without it, behavior is unchanged.

## Motivation

This tool is increasingly used in automated pipelines and by LLM-based agents that consume `--json` output to read and reason about emails. This creates a number of risk categories:

1. **Phishing & malicious content:** URLs in email bodies (both `text/plain` and stripped `text/html`) are displayed as-is. A user or automation that follows a link from an untrusted email is exposed to phishing, credential harvesting, or malware downloads.

2. **Prompt injection via email:**  When LLMs process raw email content from `--json` output, malicious emails can embed prompt injection payloads in HTML, URLs, or entity-encoded text. Sanitizing the content before it reaches the LLM reduces this attack surface.

3. **Tracking:** Even in a CLI context, URLs copied from output can trigger open-tracking pixels and link-click tracking. Users reading emails in a "read-only" mindset may not expect tracking URLs to survive into the output.

Currently there is no way to request a "safe" read of an email.

## Current behavior

- `text/plain` bodies are displayed verbatim, including all URLs
- `text/html` fallback uses regex-based tag stripping (`<[^>]*>`), which can be bypassed by malformed HTML
- HTML entities (e.g. `&#104;ttps://evil.com`) are not decoded, allowing obfuscated URLs to pass through
- `--json` output includes the full raw Gmail API response with unsanitized HTML body data
- No mechanism exists to strip URLs or dangerous content

## Expected behavior (with `--safe`)

- HTML is converted to text using a proper HTML parser (`golang.org/x/net/html` tokenizer), not regex, correctly handling malformed tags, nested structures, and edge cases (this package is already an indirect dependency (imported for `charset` support), so it adds no new dependencies)
- All `http://` and `https://` URLs are replaced with `[url removed]`
- HTML entities are decoded before URL detection, catching obfuscated URLs like `&#104;ttps://...`
- `<script>` and `<style>` blocks are fully removed
- Headers (Subject, etc.) are also sanitized
- In JSON mode: a `bodies` map (keyed by message ID) provides sanitized text, and raw body data is cleared from the message payload to prevent downstream tools from accessing unsanitized content
- Without `--safe`: zero changes to existing behavior

## Implementation

I've already implemented this and have it working with tests. Happy to open a PR if this proposal is approved.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: --safe flag for gmail read/get to sanitize email content #220

Summary

Motivation

Current behavior

Expected behavior (with `--safe`)

Implementation

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature Request: --safe flag for gmail read/get to sanitize email content #220

Description

Summary

Motivation

Current behavior

Expected behavior (with --safe)

Implementation

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Expected behavior (with `--safe`)