A collection of offensive-security research projects centered on a single theme: what happens when two systems disagree about how to parse the same bytes. Browsers, edge rewriters, caches, proxies, and template engines all claim to understand HTML/HTTP — but they implement different subsets of the spec, and the gaps between them are exploitable.
⚠️ Research / defensive use only. Everything here is a proof-of-concept built in local labs against software the author controls. The point is to document divergences so defenders can reason about them. Don't point any of it at systems you don't own.
The flagship project (FINDINGS.md + src/) studies the parsing
differential between lol-html (Cloudflare's bounded-memory streaming HTML rewriter, used
in Cloudflare Workers) and html5ever (Mozilla/Servo's spec-compliant WHATWG parser, used
as a browser-equivalent oracle). A streaming rewriter that never builds a DOM tree
necessarily sees the document differently than a browser does — and an attacker who
understands where the two disagree can smuggle content past the rewriter that the browser
will still execute.
The other directories generalize that idea to other layers of the stack: edge caches (ESI),
web servers (SSI), HTTP proxies (request smuggling / desync), client-side morph libraries
(MutationObserver script recreation), innerHTML injection, QR-code rendering, and an
acoustic side-channel experiment.
The core project. A Rust differential-testing harness that feeds the same HTML to lol-html and html5ever and reports where their parse trees diverge.
- 85 / 90 corpus cases produced structural divergences; 7 / 7 targeted experiments produced confirmed security findings, 3 critical with demonstrated CSP-bypass capability.
- Headline findings include the selector-defense impossibility (operators cannot write correct position-based defenses against foster-parenting in a streaming rewriter), the noscript RCDATA bypass family, template nonce smuggling, and 26 enumerated attribute-value "invisible channels."
- Build:
cargo build(seeCargo.toml). Full writeup inFINDINGS.md. Corpus insrc/corpus/, targeted PoCs insrc/experiments/andsrc/bin/.
A Docker Compose lab pitting multiple proxies (nginx, HAProxy, Envoy, Apache, Caddy) against
backends in several languages to hunt for cross-user request-smuggling primitives. Includes
a test runner and an echo/instrumentation server. Conclusion so far: standard vectors
(CL.TE, TE.CL, CL.0, H2.CL, …) are exhausted against current mainstream proxies; see
desync/FINDINGS.md and desync/CSD-FINDINGS.md.
Tests how Varnish's ESI parser diverges from HTML5 parsing semantics — e.g. processing
<esi:include> inside <script type="application/json">, JS string literals, and <style>
bodies, opening SSR-state-injection vectors. Lab uses Varnish 7.4 + nginx origin. Results in
esi-injection-research/RESULTS.md (and Round 2 /
breakout writeups).
A Docker lab exploring SSI processing across web servers and frameworks. content/secret.txt
is a deliberate canary token used to prove information leakage in PoCs — not a real secret.
Surveys 21 frameworks/morph libraries (htmx, Turbo/Hotwire, …) for the pattern where
innerHTML-inserted <script> elements — inert by spec — get recreated via
createElement('script'), clearing the "already-started" flag and executing. Turns
innerHTML injection into full XSS in affected apps. See
mo-recreate-survey/FINDINGS.md and
mo-recreate-survey/IMPACT.md.
Research prompts and PoCs for achieving JS execution under hostile constraints (e.g. no =
sign permitted, escaped-quote contexts), covering encoding bypasses and mutation-XSS.
Generators that render QR codes as DOM/CSS structures (divs, CSS) plus payload tooling and test pages. A small Node server drives generation.
An experiment in classifying keystrokes from microphone audio: a browser data collector, a
Python ML pipeline (feature engineering, training, evaluation), and a real-time demo.
Note: the trained model (keystroke_classifier.pkl, model_browser.json) and the raw
dataset (keystroke-data.json) are excluded from git because they exceed GitHub's 100 MB
file limit — regenerate them with the scripts in FFT/ml-pipeline/ and
FFT/data-collection/. normalization.json is kept.
FSK-based near-ultrasonic transmitter/receiver pages and a CSP demo exploring inaudible cross-device signaling as a covert/exfil channel.
- Excluded from git:
.envfiles,node_modules/, Rusttarget/, Python__pycache__/,.DS_Store, the.claude/workspace config, and the large trained-model/dataset artifacts noted above. See.gitignore. - Languages/stacks: Rust (core harness), Docker Compose (proxy/cache/server labs), Python (ML pipeline), and a lot of HTML/JS PoC pages.
No license is currently specified; all rights reserved by the author. Open an issue if you'd like to use any of this.