Catch visual drift before it ships. DriftGate turns design-system conformance into a
CI gate: when front-end code changes, it renders the page in a real browser, scores it
against a DESIGN_SYSTEM.md, and runs a bounded fix loop until the screen conforms —
or an iteration cap is hit.
Live demo: https://pragatig25.github.io/driftgate/ — a pre-recorded playback of the real loop converging (38 → 64 → 88 → 93) on a sample page. Zero backend, zero cost.
Off-brand colours, broken spacing and drifted typography slip past code review and reach production, where they erode brand trust and turn into costly hotfixes. Visual QA is usually manual, slow, and inconsistent.
DriftGate makes it automatic and cheap:
| Without DriftGate | With DriftGate | |
|---|---|---|
| Visual QA / PR | ~15–30 min manual review | < 60 s automated gate |
| Where drift is caught | In production (expensive) | Pre-merge, in CI |
| Cost / run | Reviewer time | ~$0.01 (Haiku + prompt caching) |
| Consistency | Varies by reviewer | Deterministic hard gate |
The gate is hybrid, and the split is deliberate:
| Layer | Role | Deterministic? |
|---|---|---|
| Pixel diff vs baseline | Hard gate — fails the build | Yes |
| Design-token assertions (computed styles) | Hard gate — fails the build | Yes |
| Claude-vision conformance score | Advisory — informs, never blocks alone | No |
A non-deterministic model can therefore never wrongly block a PR. The vision critic explains why a screen drifts from the design language and proposes a fix; the deterministic checks decide pass/fail.
- CI gate (
visual-qe gate) — runs in GitHub Actions on PRs, posts a conformance report. Never auto-edits; it only reports a suggested diff. - Hosted demo (
visual_qe_loop.api.app) — sandboxed, rate-limited, access-code gated. Screenshots a submitted URL or a built-in sample and applies CSS-only suggestions. Never executes untrusted code.
The capture layer has two drivers behind one interface: the Playwright MCP driver for the local interactive agent (Claude edits code, then drives the browser as tools), and the Playwright library driver for CI and the demo (no Claude-Code runtime).
pip install -e ".[playwright,api,dev]"
python -m playwright install --with-deps chromium
cp .env.example .env # add ANTHROPIC_API_KEY (and VQE_ACCESS_CODE for the live demo)# Score a single URL against the design system
visual-qe score --url https://example.com
# Run the bounded fix loop on a local HTML file
visual-qe loop --file ./samples/saas-landing.html
# CI gate (writes a markdown report, fails on hard-gate)
visual-qe gate --report-path conformance-report.md
# Hosted demo backend (the static demo/ talks to this)
uvicorn visual_qe_loop.api.app:app --reloadFront-end change ─ URL or .html
│
▼
CAPTURE ── Playwright ──► screenshot.png + computed styles
(MCP driver = local agent · library driver = CI & demo)
│
├───────────────► DETERMINISTIC HARD GATE ── blocks the build
│ • pixel diff vs baseline (Pillow/numpy)
│ • design-token assertions (computed styles)
▼
CLAUDE VISION CRITIC ──► ConformanceReport {score, violations} ◄ ADVISORY only
cached DESIGN_SYSTEM.md · forced tool-use · effort(Opus)/temp(Haiku)
│
▼
BOUNDED FIX LOOP guardrails: max_iters · threshold · no-improvement
score → propose CSS patch → apply → re-render → re-score
│
├──► CI GATE → markdown report on the PR, fails on hard-gate
└──► DEMO → iteration cards: before → after, score climbing
- The
DESIGN_SYSTEM.md+ rubric are sent as a cached prompt block — they are large, static, and re-sent every loop iteration, so prompt caching is the main cost lever. - The critic is model-aware: Opus uses
output_config.effort; Haiku/Sonnet usetemperature. (The Anthropic API has noseedparam, and Opus rejectstemperature.) - Token / iteration / cost are logged per run via
structlog.
tests/fixtures/golden/ holds screenshots with known expected scores. The critic is
regression-tested against them, so a prompt or model change that shifts scoring is caught
in CI.
visual_qe_loop/
capture/ base interface + Playwright library driver + MCP driver
diff/ pixel diff + design-token extractor
critic/ Claude-vision critic + prompts (cached design system)
loop/ bounded fixer with guardrails
models/ Pydantic contracts (ConformanceReport, DesignSystem, LoopResult)
observability/ structlog config + cost accounting
api/ FastAPI demo backend
demo/ static, self-contained demo page (GitHub Pages)
- Secrets live only in
.env(git-ignored). Never commit API keys. - The demo backend blocks SSRF (rejects non-public hosts), is rate-limited, and is access-code gated so only people you share the code with can spend your tokens.
- URL runs are report-only and rendered read-only — untrusted code is never executed.
MIT © Pragati Gupta