Rewriter — Web Parsing & Edge-Infrastructure Security Research

A collection of offensive-security research projects centered on a single theme: what happens when two systems disagree about how to parse the same bytes. Browsers, edge rewriters, caches, proxies, and template engines all claim to understand HTML/HTTP — but they implement different subsets of the spec, and the gaps between them are exploitable.

⚠️ Research / defensive use only. Everything here is a proof-of-concept built in local labs against software the author controls. The point is to document divergences so defenders can reason about them. Don't point any of it at systems you don't own.

The through-line

The flagship project (FINDINGS.md + src/) studies the parsing differential between lol-html (Cloudflare's bounded-memory streaming HTML rewriter, used in Cloudflare Workers) and html5ever (Mozilla/Servo's spec-compliant WHATWG parser, used as a browser-equivalent oracle). A streaming rewriter that never builds a DOM tree necessarily sees the document differently than a browser does — and an attacker who understands where the two disagree can smuggle content past the rewriter that the browser will still execute.

The other directories generalize that idea to other layers of the stack: edge caches (ESI), web servers (SSI), HTTP proxies (request smuggling / desync), client-side morph libraries (MutationObserver script recreation), innerHTML injection, QR-code rendering, and an acoustic side-channel experiment.

Projects

`src/` + `FINDINGS.md` — lol-html parsing differential harness

The core project. A Rust differential-testing harness that feeds the same HTML to lol-html and html5ever and reports where their parse trees diverge.

85 / 90 corpus cases produced structural divergences; 7 / 7 targeted experiments produced confirmed security findings, 3 critical with demonstrated CSP-bypass capability.
Headline findings include the selector-defense impossibility (operators cannot write correct position-based defenses against foster-parenting in a streaming rewriter), the noscript RCDATA bypass family, template nonce smuggling, and 26 enumerated attribute-value "invisible channels."
Build: cargo build (see Cargo.toml). Full writeup in FINDINGS.md. Corpus in src/corpus/, targeted PoCs in src/experiments/ and src/bin/.

`desync/` — HTTP request smuggling / desync lab

A Docker Compose lab pitting multiple proxies (nginx, HAProxy, Envoy, Apache, Caddy) against backends in several languages to hunt for cross-user request-smuggling primitives. Includes a test runner and an echo/instrumentation server. Conclusion so far: standard vectors (CL.TE, TE.CL, CL.0, H2.CL, …) are exhausted against current mainstream proxies; see desync/FINDINGS.md and desync/CSD-FINDINGS.md.

`esi-injection-research/` — Edge Side Includes injection

Tests how Varnish's ESI parser diverges from HTML5 parsing semantics — e.g. processing <esi:include> inside <script type="application/json">, JS string literals, and <style> bodies, opening SSR-state-injection vectors. Lab uses Varnish 7.4 + nginx origin. Results in esi-injection-research/RESULTS.md (and Round 2 / breakout writeups).

`ssi-rpfi/` — Server Side Includes / reflected-path file inclusion

A Docker lab exploring SSI processing across web servers and frameworks. content/secret.txt is a deliberate canary token used to prove information leakage in PoCs — not a real secret.

`mo-recreate-survey/` — MutationObserver script-recreation bypass

Surveys 21 frameworks/morph libraries (htmx, Turbo/Hotwire, …) for the pattern where innerHTML-inserted <script> elements — inert by spec — get recreated via createElement('script'), clearing the "already-started" flag and executing. Turns innerHTML injection into full XSS in affected apps. See mo-recreate-survey/FINDINGS.md and mo-recreate-survey/IMPACT.md.

`innerhtml/` — constrained `innerHTML` XSS research

Research prompts and PoCs for achieving JS execution under hostile constraints (e.g. no = sign permitted, escaped-quote contexts), covering encoding bypasses and mutation-XSS.

`qrcode/` — HTML/CSS QR-code rendering

Generators that render QR codes as DOM/CSS structures (divs, CSS) plus payload tooling and test pages. A small Node server drives generation.

`FFT/` — acoustic keystroke side-channel prototype

An experiment in classifying keystrokes from microphone audio: a browser data collector, a Python ML pipeline (feature engineering, training, evaluation), and a real-time demo. Note: the trained model (keystroke_classifier.pkl, model_browser.json) and the raw dataset (keystroke-data.json) are excluded from git because they exceed GitHub's 100 MB file limit — regenerate them with the scripts in FFT/ml-pipeline/ and FFT/data-collection/. normalization.json is kept.

`ultrasonic/` — ultrasonic data transmission

FSK-based near-ultrasonic transmitter/receiver pages and a CSP demo exploring inaudible cross-device signaling as a covert/exfil channel.

Repository notes

Excluded from git: .env files, node_modules/, Rust target/, Python __pycache__/, .DS_Store, the .claude/ workspace config, and the large trained-model/dataset artifacts noted above. See .gitignore.
Languages/stacks: Rust (core harness), Docker Compose (proxy/cache/server labs), Python (ML pipeline), and a lot of HTML/JS PoC pages.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rewriter — Web Parsing & Edge-Infrastructure Security Research

The through-line

Projects

`src/` + `FINDINGS.md` — lol-html parsing differential harness

`desync/` — HTTP request smuggling / desync lab

`esi-injection-research/` — Edge Side Includes injection

`ssi-rpfi/` — Server Side Includes / reflected-path file inclusion

`mo-recreate-survey/` — MutationObserver script-recreation bypass

`innerhtml/` — constrained `innerHTML` XSS research

`qrcode/` — HTML/CSS QR-code rendering

`FFT/` — acoustic keystroke side-channel prototype

`ultrasonic/` — ultrasonic data transmission

Repository notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
FFT		FFT
desync		desync
esi-injection-research		esi-injection-research
innerhtml		innerhtml
mo-recreate-survey		mo-recreate-survey
qrcode		qrcode
src		src
ssi-rpfi		ssi-rpfi
ultrasonic		ultrasonic
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
FINDINGS.md		FINDINGS.md
README.md		README.md
Rewriter.sln		Rewriter.sln
http_blog_ionatomics_org.png		http_blog_ionatomics_org.png

Folders and files

Latest commit

History

Repository files navigation

Rewriter — Web Parsing & Edge-Infrastructure Security Research

The through-line

Projects

src/ + FINDINGS.md — lol-html parsing differential harness

desync/ — HTTP request smuggling / desync lab

esi-injection-research/ — Edge Side Includes injection

ssi-rpfi/ — Server Side Includes / reflected-path file inclusion

mo-recreate-survey/ — MutationObserver script-recreation bypass

innerhtml/ — constrained innerHTML XSS research

qrcode/ — HTML/CSS QR-code rendering

FFT/ — acoustic keystroke side-channel prototype

ultrasonic/ — ultrasonic data transmission

Repository notes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`src/` + `FINDINGS.md` — lol-html parsing differential harness

`desync/` — HTTP request smuggling / desync lab

`esi-injection-research/` — Edge Side Includes injection

`ssi-rpfi/` — Server Side Includes / reflected-path file inclusion

`mo-recreate-survey/` — MutationObserver script-recreation bypass

`innerhtml/` — constrained `innerHTML` XSS research

`qrcode/` — HTML/CSS QR-code rendering

`FFT/` — acoustic keystroke side-channel prototype

`ultrasonic/` — ultrasonic data transmission

Packages