Skip to content

Security Hardening Plan #2

@dereuromark

Description

@dereuromark

Current Vulnerabilities

Issue Severity Vector Example
javascript: URLs in links High XSS [click](javascript:alert(1))
javascript: URLs in images High XSS ![x](javascript:alert(1))
javascript: autolinks High XSS <javascript:alert(1)>
Event handlers in attributes High XSS [text]{onclick=alert(1)}
Raw HTML blocks High XSS ``` =html <script>... ```
Raw HTML inline High XSS `<script>`{=html}
Spaces in image src Medium Attribute injection ![x](x onerror=alert(1))

Proposed Solution

1. Safe Mode Option

Add a safeMode option to the converter that enables all security measures:

$converter = new DjotConverter(safeMode: true);

When enabled:

  • Dangerous URL schemes are blocked
  • Only whitelisted attributes are allowed
  • Raw HTML is stripped

2. URL Sanitization

Block dangerous schemes:

  • javascript:
  • vbscript:
  • data: (except safe image types)
  • file:

Implementation:

  • Add HtmlRenderer::sanitizeUrl(string $url): string
  • Check scheme against blocklist
  • Return empty string or # for blocked URLs
  • Apply to: links, images, autolinks

3. Attribute Whitelist

Allowed attributes:

  • class
  • id
  • href (links only, sanitized)
  • src (images only, sanitized)
  • alt
  • title
  • width, height (images)
  • start (ordered lists)
  • data-* (with value sanitization)

Blocked attributes (event handlers):

  • onclick, ondblclick, onmousedown, onmouseup, onmouseover, onmousemove, onmouseout
  • onkeydown, onkeypress, onkeyup
  • onfocus, onblur, onchange, onsubmit, onreset
  • onload, onerror, onabort
  • onscroll, onresize
  • Any attribute starting with on

Implementation:

  • Add HtmlRenderer::sanitizeAttributes(array $attrs): array
  • Filter against whitelist
  • Block any attribute starting with on

4. Raw HTML Handling

Options:

  1. Strip - Remove raw HTML entirely (safest)
  2. Escape - Convert to visible escaped text
  3. Allow - Keep as-is (unsafe, only for trusted input)

Implementation:

  • Add rawHtmlMode option: strip | escape | allow
  • Default to strip when safeMode is enabled
  • Default to allow when safeMode is disabled (current behavior)

5. Image Source Validation

Rules:

  • No spaces in URL (prevents attribute injection)
  • No newlines
  • Must be valid URL or relative path

API Design

// Simple: enable all protections
$converter = new DjotConverter(safeMode: true);

// Granular control
$converter = new DjotConverter();
$renderer = $converter->getRenderer();
$renderer->setSafeMode(true);
$renderer->setRawHtmlMode('escape'); // 'strip', 'escape', 'allow'
$renderer->setAllowedUrlSchemes(['http', 'https', 'mailto']);
$renderer->setAllowedAttributes(['class', 'id', 'title', 'data-*']);

Implementation Tasks

  • Add safeMode property to HtmlRenderer
  • Implement sanitizeUrl() method
  • Implement sanitizeAttributes() method
  • Add rawHtmlMode option with strip/escape/allow
  • Update renderLink() to sanitize URLs
  • Update renderImage() to sanitize URLs and validate src
  • Update renderAutolink() to sanitize URLs
  • Update renderAttributes() to filter dangerous attributes
  • Update renderRawBlock() to respect rawHtmlMode
  • Update renderRawInline() to respect rawHtmlMode
  • Add constructor parameter to DjotConverter
  • Add tests for all XSS vectors
  • Document safe mode in README

Test Cases

// All of these should be safe when safeMode is enabled:

// URLs
'[click](javascript:alert(1))'        // → link with href="#" or stripped
'![x](javascript:alert(1))'           // → image with src="#" or stripped
'<javascript:alert(1)>'               // → not rendered as link

// Attributes
'[text]{onclick=alert(1)}'            // → span without onclick
'[text]{.ok onclick=bad}'             // → span with class="ok" only

// Raw HTML
'`<script>alert(1)</script>`{=html}'  // → stripped or escaped
'``` =html\n<script>...\n```'         // → stripped or escaped

// Image src
'![x](x onerror=alert(1))'            // → blocked (space in src)

Backwards Compatibility

  • Default behavior remains unchanged (safeMode: false)
  • Safe mode is opt-in
  • Document clearly that untrusted input requires safeMode: true

Alternative: HTMLPurifier

Instead of implementing security measures in the renderer, use HTMLPurifier to sanitize the final HTML output. This is a battle-tested library specifically designed for this purpose.

Installation

composer require ezyang/htmlpurifier

Usage

use Djot\DjotConverter;

function convertDjotSafe(string $djot): string
{
    $converter = new DjotConverter();
    $html = $converter->convert($djot);

    $config = HTMLPurifier_Config::createDefault();
    $config->set('Cache.DefinitionImpl', null);
    $config->set('HTML.Allowed', 'p,br,strong,em,u,s,del,ins,mark,sub,sup,a[href|title],img[src|alt|title],ul,ol,li,dl,dt,dd,blockquote,pre,code[class],h1,h2,h3,h4,h5,h6,table,thead,tbody,tr,th[align],td[align],hr,div[class|id],span[class|id]');
    $config->set('HTML.TargetBlank', true);
    $config->set('URI.AllowedSchemes', ['http' => true, 'https' => true, 'mailto' => true]);

    $purifier = new HTMLPurifier($config);

    return $purifier->purify($html);
}

Pros

  • Battle-tested, widely used library
  • Handles edge cases we might miss
  • Maintained by security experts
  • Configurable whitelist approach
  • No changes needed to djot-php

Cons

  • Additional dependency
  • Performance overhead (parses HTML again)
  • Must remember to apply it (easy to forget)

Comparison

Approach Security Performance Simplicity
Built-in safe mode Good Fast Simple API
HTMLPurifier Excellent Slower Extra step
Both combined Best Slowest Defense in depth

Recommendation

For maximum security with untrusted input, use both:

// Belt and suspenders approach
$converter = new DjotConverter(safeMode: true);
$html = $converter->convert($untrustedInput);
$safeHtml = $purifier->purify($html);

This provides defense in depth - the built-in safe mode catches issues at the source, while HTMLPurifier catches anything that slips through.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions