Skip to content

ian-hickey/rewriter

Repository files navigation

Rewriter — Web Parsing & Edge-Infrastructure Security Research

A collection of offensive-security research projects centered on a single theme: what happens when two systems disagree about how to parse the same bytes. Browsers, edge rewriters, caches, proxies, and template engines all claim to understand HTML/HTTP — but they implement different subsets of the spec, and the gaps between them are exploitable.

⚠️ Research / defensive use only. Everything here is a proof-of-concept built in local labs against software the author controls. The point is to document divergences so defenders can reason about them. Don't point any of it at systems you don't own.


The through-line

The flagship project (FINDINGS.md + src/) studies the parsing differential between lol-html (Cloudflare's bounded-memory streaming HTML rewriter, used in Cloudflare Workers) and html5ever (Mozilla/Servo's spec-compliant WHATWG parser, used as a browser-equivalent oracle). A streaming rewriter that never builds a DOM tree necessarily sees the document differently than a browser does — and an attacker who understands where the two disagree can smuggle content past the rewriter that the browser will still execute.

The other directories generalize that idea to other layers of the stack: edge caches (ESI), web servers (SSI), HTTP proxies (request smuggling / desync), client-side morph libraries (MutationObserver script recreation), innerHTML injection, QR-code rendering, and an acoustic side-channel experiment.


Projects

src/ + FINDINGS.md — lol-html parsing differential harness

The core project. A Rust differential-testing harness that feeds the same HTML to lol-html and html5ever and reports where their parse trees diverge.

  • 85 / 90 corpus cases produced structural divergences; 7 / 7 targeted experiments produced confirmed security findings, 3 critical with demonstrated CSP-bypass capability.
  • Headline findings include the selector-defense impossibility (operators cannot write correct position-based defenses against foster-parenting in a streaming rewriter), the noscript RCDATA bypass family, template nonce smuggling, and 26 enumerated attribute-value "invisible channels."
  • Build: cargo build (see Cargo.toml). Full writeup in FINDINGS.md. Corpus in src/corpus/, targeted PoCs in src/experiments/ and src/bin/.

desync/ — HTTP request smuggling / desync lab

A Docker Compose lab pitting multiple proxies (nginx, HAProxy, Envoy, Apache, Caddy) against backends in several languages to hunt for cross-user request-smuggling primitives. Includes a test runner and an echo/instrumentation server. Conclusion so far: standard vectors (CL.TE, TE.CL, CL.0, H2.CL, …) are exhausted against current mainstream proxies; see desync/FINDINGS.md and desync/CSD-FINDINGS.md.

esi-injection-research/ — Edge Side Includes injection

Tests how Varnish's ESI parser diverges from HTML5 parsing semantics — e.g. processing <esi:include> inside <script type="application/json">, JS string literals, and <style> bodies, opening SSR-state-injection vectors. Lab uses Varnish 7.4 + nginx origin. Results in esi-injection-research/RESULTS.md (and Round 2 / breakout writeups).

ssi-rpfi/ — Server Side Includes / reflected-path file inclusion

A Docker lab exploring SSI processing across web servers and frameworks. content/secret.txt is a deliberate canary token used to prove information leakage in PoCs — not a real secret.

mo-recreate-survey/ — MutationObserver script-recreation bypass

Surveys 21 frameworks/morph libraries (htmx, Turbo/Hotwire, …) for the pattern where innerHTML-inserted <script> elements — inert by spec — get recreated via createElement('script'), clearing the "already-started" flag and executing. Turns innerHTML injection into full XSS in affected apps. See mo-recreate-survey/FINDINGS.md and mo-recreate-survey/IMPACT.md.

innerhtml/ — constrained innerHTML XSS research

Research prompts and PoCs for achieving JS execution under hostile constraints (e.g. no = sign permitted, escaped-quote contexts), covering encoding bypasses and mutation-XSS.

qrcode/ — HTML/CSS QR-code rendering

Generators that render QR codes as DOM/CSS structures (divs, CSS) plus payload tooling and test pages. A small Node server drives generation.

FFT/ — acoustic keystroke side-channel prototype

An experiment in classifying keystrokes from microphone audio: a browser data collector, a Python ML pipeline (feature engineering, training, evaluation), and a real-time demo. Note: the trained model (keystroke_classifier.pkl, model_browser.json) and the raw dataset (keystroke-data.json) are excluded from git because they exceed GitHub's 100 MB file limit — regenerate them with the scripts in FFT/ml-pipeline/ and FFT/data-collection/. normalization.json is kept.

ultrasonic/ — ultrasonic data transmission

FSK-based near-ultrasonic transmitter/receiver pages and a CSP demo exploring inaudible cross-device signaling as a covert/exfil channel.


Repository notes

  • Excluded from git: .env files, node_modules/, Rust target/, Python __pycache__/, .DS_Store, the .claude/ workspace config, and the large trained-model/dataset artifacts noted above. See .gitignore.
  • Languages/stacks: Rust (core harness), Docker Compose (proxy/cache/server labs), Python (ML pipeline), and a lot of HTML/JS PoC pages.

License

No license is currently specified; all rights reserved by the author. Open an issue if you'd like to use any of this.

About

Web parsing & edge-infrastructure security research: lol-html parsing differentials, HTTP desync, ESI/SSI injection, MutationObserver script recreation, innerHTML XSS, and side-channels

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors