Security Hardening Plan


## Current Vulnerabilities

| Issue | Severity | Vector | Example |
|-------|----------|--------|---------|
| `javascript:` URLs in links | High | XSS | `[click](javascript:alert(1))` |
| `javascript:` URLs in images | High | XSS | `![x](javascript:alert(1))` |
| `javascript:` autolinks | High | XSS | `<javascript:alert(1)>` |
| Event handlers in attributes | High | XSS | `[text]{onclick=alert(1)}` |
| Raw HTML blocks | High | XSS | `` ``` =html <script>... ``` `` |
| Raw HTML inline | High | XSS | `` `<script>`{=html} `` |
| Spaces in image src | Medium | Attribute injection | `![x](x onerror=alert(1))` |

## Proposed Solution

### 1. Safe Mode Option

Add a `safeMode` option to the converter that enables all security measures:

```php
$converter = new DjotConverter(safeMode: true);
```

When enabled:
- Dangerous URL schemes are blocked
- Only whitelisted attributes are allowed
- Raw HTML is stripped

### 2. URL Sanitization

**Block dangerous schemes:**
- `javascript:`
- `vbscript:`
- `data:` (except safe image types)
- `file:`

**Implementation:**
- Add `HtmlRenderer::sanitizeUrl(string $url): string`
- Check scheme against blocklist
- Return empty string or `#` for blocked URLs
- Apply to: links, images, autolinks

### 3. Attribute Whitelist

**Allowed attributes:**
- `class`
- `id`
- `href` (links only, sanitized)
- `src` (images only, sanitized)
- `alt`
- `title`
- `width`, `height` (images)
- `start` (ordered lists)
- `data-*` (with value sanitization)

**Blocked attributes (event handlers):**
- `onclick`, `ondblclick`, `onmousedown`, `onmouseup`, `onmouseover`, `onmousemove`, `onmouseout`
- `onkeydown`, `onkeypress`, `onkeyup`
- `onfocus`, `onblur`, `onchange`, `onsubmit`, `onreset`
- `onload`, `onerror`, `onabort`
- `onscroll`, `onresize`
- Any attribute starting with `on`

**Implementation:**
- Add `HtmlRenderer::sanitizeAttributes(array $attrs): array`
- Filter against whitelist
- Block any attribute starting with `on`

### 4. Raw HTML Handling

**Options:**
1. **Strip** - Remove raw HTML entirely (safest)
2. **Escape** - Convert to visible escaped text
3. **Allow** - Keep as-is (unsafe, only for trusted input)

**Implementation:**
- Add `rawHtmlMode` option: `strip` | `escape` | `allow`
- Default to `strip` when `safeMode` is enabled
- Default to `allow` when `safeMode` is disabled (current behavior)

### 5. Image Source Validation

**Rules:**
- No spaces in URL (prevents attribute injection)
- No newlines
- Must be valid URL or relative path

## API Design

```php
// Simple: enable all protections
$converter = new DjotConverter(safeMode: true);

// Granular control
$converter = new DjotConverter();
$renderer = $converter->getRenderer();
$renderer->setSafeMode(true);
$renderer->setRawHtmlMode('escape'); // 'strip', 'escape', 'allow'
$renderer->setAllowedUrlSchemes(['http', 'https', 'mailto']);
$renderer->setAllowedAttributes(['class', 'id', 'title', 'data-*']);
```

## Implementation Tasks

- [ ] Add `safeMode` property to `HtmlRenderer`
- [ ] Implement `sanitizeUrl()` method
- [ ] Implement `sanitizeAttributes()` method
- [ ] Add `rawHtmlMode` option with strip/escape/allow
- [ ] Update `renderLink()` to sanitize URLs
- [ ] Update `renderImage()` to sanitize URLs and validate src
- [ ] Update `renderAutolink()` to sanitize URLs
- [ ] Update `renderAttributes()` to filter dangerous attributes
- [ ] Update `renderRawBlock()` to respect rawHtmlMode
- [ ] Update `renderRawInline()` to respect rawHtmlMode
- [ ] Add constructor parameter to `DjotConverter`
- [ ] Add tests for all XSS vectors
- [ ] Document safe mode in README

## Test Cases

```php
// All of these should be safe when safeMode is enabled:

// URLs
'[click](javascript:alert(1))'        // → link with href="#" or stripped
'![x](javascript:alert(1))'           // → image with src="#" or stripped
'<javascript:alert(1)>'               // → not rendered as link

// Attributes
'[text]{onclick=alert(1)}'            // → span without onclick
'[text]{.ok onclick=bad}'             // → span with class="ok" only

// Raw HTML
'`<script>alert(1)</script>`{=html}'  // → stripped or escaped
'``` =html\n<script>...\n```'         // → stripped or escaped

// Image src
'![x](x onerror=alert(1))'            // → blocked (space in src)
```

## Backwards Compatibility

- Default behavior remains unchanged (`safeMode: false`)
- Safe mode is opt-in
- Document clearly that untrusted input requires `safeMode: true`

---

## Alternative: HTMLPurifier

Instead of implementing security measures in the renderer, use [HTMLPurifier](http://htmlpurifier.org/) to sanitize the final HTML output. This is a battle-tested library specifically designed for this purpose.

### Installation

```bash
composer require ezyang/htmlpurifier
```

### Usage

```php
use Djot\DjotConverter;

function convertDjotSafe(string $djot): string
{
    $converter = new DjotConverter();
    $html = $converter->convert($djot);

    $config = HTMLPurifier_Config::createDefault();
    $config->set('Cache.DefinitionImpl', null);
    $config->set('HTML.Allowed', 'p,br,strong,em,u,s,del,ins,mark,sub,sup,a[href|title],img[src|alt|title],ul,ol,li,dl,dt,dd,blockquote,pre,code[class],h1,h2,h3,h4,h5,h6,table,thead,tbody,tr,th[align],td[align],hr,div[class|id],span[class|id]');
    $config->set('HTML.TargetBlank', true);
    $config->set('URI.AllowedSchemes', ['http' => true, 'https' => true, 'mailto' => true]);

    $purifier = new HTMLPurifier($config);

    return $purifier->purify($html);
}
```

### Pros

- Battle-tested, widely used library
- Handles edge cases we might miss
- Maintained by security experts
- Configurable whitelist approach
- No changes needed to djot-php

### Cons

- Additional dependency
- Performance overhead (parses HTML again)
- Must remember to apply it (easy to forget)

### Comparison

| Approach | Security | Performance | Simplicity |
|----------|----------|-------------|------------|
| Built-in safe mode | Good | Fast | Simple API |
| HTMLPurifier | Excellent | Slower | Extra step |
| Both combined | Best | Slowest | Defense in depth |

### Recommendation

For maximum security with untrusted input, use both:

```php
// Belt and suspenders approach
$converter = new DjotConverter(safeMode: true);
$html = $converter->convert($untrustedInput);
$safeHtml = $purifier->purify($html);
```

This provides defense in depth - the built-in safe mode catches issues at the source, while HTMLPurifier catches anything that slips through.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Security Hardening Plan #2

Current Vulnerabilities

Proposed Solution

1. Safe Mode Option

2. URL Sanitization

3. Attribute Whitelist

4. Raw HTML Handling

5. Image Source Validation

API Design

Implementation Tasks

Test Cases

Backwards Compatibility

Alternative: HTMLPurifier

Installation

Usage

Pros

Cons

Comparison

Recommendation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue	Severity	Vector	Example
`javascript:` URLs in links	High	XSS	`[click](javascript:alert(1))`
`javascript:` URLs in images	High	XSS	`![x](javascript:alert(1))`
`javascript:` autolinks	High	XSS	`<javascript:alert(1)>`
Event handlers in attributes	High	XSS	`[text]{onclick=alert(1)}`
Raw HTML blocks	High	XSS	``` =html <script>... ```
Raw HTML inline	High	XSS	`<script>`{=html}
Spaces in image src	Medium	Attribute injection	`![x](x onerror=alert(1))`

Approach	Security	Performance	Simplicity
Built-in safe mode	Good	Fast	Simple API
HTMLPurifier	Excellent	Slower	Extra step
Both combined	Best	Slowest	Defense in depth

Uh oh!

Security Hardening Plan #2

Description

Current Vulnerabilities

Proposed Solution

1. Safe Mode Option

2. URL Sanitization

3. Attribute Whitelist

4. Raw HTML Handling

5. Image Source Validation

API Design

Implementation Tasks

Test Cases

Backwards Compatibility

Alternative: HTMLPurifier

Installation

Usage

Pros

Cons

Comparison

Recommendation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions