Skip to content

markdown: sanitize HTML on render (URL scheme allowlist + raw-HTML handling) #128

Description

@moshloop

Summary

markdown.Document.HTML() can emit untrusted markdown as unsanitized HTML, exposing XSS when the rendered output is shown in a browser (e.g. the entity webapp). There are two related vectors that should be addressed together:

  1. Unvalidated URL schemesmarkdown/document.go (link ~:182, image ~:188-194) writes href/src from n.Href through html.EscapeString only. html.EscapeString does not block dangerous schemes, so javascript: / data: URIs survive into the attribute.
  2. Raw HTML passthrough (dominant vector)markdown/options.go defaults PreserveHTML: true, and markdown/document.go:226 (case "raw-html", "html") returns n.RawHTML verbatim. Raw <script>, <img onerror=>, etc. pass straight through.

Why this is deferred (not a drive-by fix)

  • A scheme allowlist on href/src alone is incomplete — the raw-HTML path is the bigger hole, so fixing only (1) gives a false sense of security.
  • Simply flipping PreserveHTML to false is behavior-breaking: it silently strips/flattens embedded HTML, breaks markdown round-trip fidelity, and breaks <details>/admonition handling (parseDetailsHTML depends on PreserveHTML: true).

Correct fix

Sanitize on render (allowlist of tags / attributes / URL schemes — e.g. bluemonday or an equivalent), keeping HTML preservation intact, rather than a piecemeal href allowlist or a default flip. Confirm the intended trust model first:

  • Is Document.HTML() expected to be safe for untrusted input, or only for trusted local CLI/PDF rendering?
  • Should sanitization be opt-in (a Sanitize option) or the default for the webapp consumer?

Context

Deferred from PR #119 CodeRabbit review (comments on markdown/document.go:194 and markdown/options.go:25). Both are the same underlying surface.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions