webEmbedding is a source-first website cloning workflow packaged as a Skill + MCP server for AI coding agents.
It does more than ask an AI to "clone this site." It inspects the URL, chooses the safest reuse or rebuild path, captures the live page, extracts structured page evidence, generates reusable frontend reconstruction artifacts when direct reuse is blocked, and then self-verifies the result with visual, DOM, computed-style, interaction, and responsive-breakpoint checks.
The current pipeline is strongest for static and semi-static web pages:
- company, brand, marketing, and documentation pages
- public landing pages
- iframe-blocked pages that need capture-based reconstruction
- responsive page snapshots across desktop, tablet, and mobile
It is not a full backend or app-logic clone engine. Login-only screens, captcha-heavy sites, maps, games, canvas/WebGL-heavy pages, real-time feeds, payments, booking flows, and private server behavior still need separate handling.
Recent local benchmark runs from this repo:
| URL | Path | Score |
|---|---|---|
https://developer.mozilla.org/en-US/ |
iframe-blocked bounded rebuild | root 94, visual 95, mobile 94, tablet 93, breakpoint average 93.5 |
https://www.mozilla.org/ |
bounded rebuild | root 91, visual 100 |
https://www.example.com |
exact reuse | ready yes |
These are generated by the local self-verify pipeline, not manually assigned ratings.
- Source-first routing:
- direct iframe or embed reuse when it is safe and frameable
- original preview, export, remix, or source routes when available
- bounded rebuild only when exact reuse is unavailable
- Live browser capture:
- DOM snapshot
- runtime HTML
- full-page screenshot
- computed style summaries
- CSS analysis
- asset inventory
- HAR-like network metadata
- interaction states and replay traces
- storage state export for session-aware flows
- Blocked-site rebuild:
- handles
X-Frame-Optionsand CSP-blocked pages by rebuilding from captured evidence - generates reusable frontend reconstruction artifacts from captured page structure
- preserves custom tags, shadow-root host structure, and semantic document structure where captured
- handles
- Self-verification:
- screenshot similarity
- DOM snapshot similarity
- computed-style similarity
- hover/focus/click interaction state parity
- interaction trace parity
- desktop/mobile/tablet breakpoint reports
- Responsive benchmark support:
- primary desktop viewport:
1440x1200 - tablet profile:
768x1024 - mobile profile:
390x844
- primary desktop viewport:
- Repair loop:
- bounded self-repair can run when the first scaffold misses the readiness threshold
- Node.js 18 or newer
- Python 3.9 or newer
- Chrome or Chromium available locally for Playwright runtime capture
The package uses playwright-core; it does not download a browser by itself.
Installing this project adds the source-first-clone plugin bundle, the exact-clone-intake skill, and the MCP server that exposes the URL inspection, capture, rebuild, and verification tools.
npm install -g web-embedding
web-embedding install
web-embedding doctorIf you already have an older local plugin installed, overwrite it with:
web-embedding install --force
web-embedding doctorYou can also run the installer without a global install:
npx web-embedding installcurl -fsSL https://github.com/jongko54/webEmbedding/releases/latest/download/install.sh | bashgit clone https://github.com/jongko54/webEmbedding.git
cd webEmbedding
npm install
node ./bin/web-embedding.mjs install
node ./bin/web-embedding.mjs doctorUseful for testing without touching your real agent home:
python3 python/web_embedding/installer.py install --target-home ./.tmp/home
python3 python/web_embedding/installer.py doctor --target-home ./.tmp/home
python3 python/web_embedding/installer.py uninstall --target-home ./.tmp/homeInspect a URL and get route hints:
node ./bin/web-embedding.mjs inspect \
--url https://developer.mozilla.org/en-US/Run the full clone workflow:
node ./bin/web-embedding.mjs clone \
--url https://developer.mozilla.org/en-US/ \
--output-dir ./.tmp/mdn-clone \
--wait-seconds 2 \
--timeout-seconds 35 \
--breakpoints mobile tabletRun a lightweight quality benchmark:
python3 scripts/check_clone_quality_bench.py \
https://developer.mozilla.org/en-US/ \
--output-root ./.tmp/clone-quality-bench \
--wait-seconds 1 \
--timeout-seconds 35 \
--breakpoints mobile tabletThe benchmark prints compact rows for root, visual, and breakpoint scores. The full artifacts are written under the output directory.
node ./bin/web-embedding.mjs capabilities
node ./bin/web-embedding.mjs install
node ./bin/web-embedding.mjs doctor
node ./bin/web-embedding.mjs uninstall
node ./bin/web-embedding.mjs pathsnode ./bin/web-embedding.mjs inspect --url https://www.mozilla.org/node ./bin/web-embedding.mjs capture \
--url https://www.mozilla.org/ \
--output-dir ./.tmp/capture-mozilla \
--breakpoints mobile tabletnode ./bin/web-embedding.mjs reproduce \
--url https://www.mozilla.org/ \
--output-dir ./.tmp/reproduce-mozilla \
--breakpoints mobile tabletnode ./bin/web-embedding.mjs clone \
--url https://www.mozilla.org/ \
--output-dir ./.tmp/clone-mozilla \
--breakpoints mobile tabletnode ./bin/web-embedding.mjs verify \
--reference-bundle ./.tmp/reference/capture.json \
--candidate-bundle ./.tmp/candidate/capture.jsonA clone run can produce:
capture.jsondom/snapshot.jsondom/runtime.htmlstyles/computed-summary.jsonstyles/css-analysis.jsonnetwork/manifest.jsonnetwork/har.jsonnetwork/har-like.jsonassets/inventory.jsoninteractions/states.jsoninteractions/trace.jsonscreenshots/runtime.pngsession/storage-state.jsonreproduction/rebuild/starter.htmlreproduction/rebuild/starter.cssreproduction/rebuild/starter.tsxreproduction/rebuild/next-app/reproduction/self-verify/summary.jsonreproduction/self-verify/renderers/*/verification.jsonreproduction/self-verify/renderers/*/visual-qa.jsonreproduction/self-verify/renderers/*/breakpoints/*-verification.json
Run the default small benchmark:
npm run check:clone-bench:localRun specific URLs:
python3 scripts/check_clone_quality_bench.py \
https://www.example.com \
https://www.mozilla.org/ \
--no-breakpointsRun a responsive benchmark:
python3 scripts/check_clone_quality_bench.py \
https://developer.mozilla.org/en-US/ \
--breakpoints mobile tabletpython3 -m py_compile \
bundle/source-first-clone/mcp/source_first_clone/*.py \
scripts/check_integration_smoke.py \
scripts/check_clone_quality_bench.pynpm run check:integration:localgit diff --checkbundle/source-first-cloneInstalled plugin bundle, MCP server, and exact-clone intake skill.bundle/source-first-clone/mcp/source_first_cloneCapture, planning, rebuild, repair, and verification engine.bin/web-embedding.mjsNode CLI wrapper.python/web_embedding/installer.pyShared installer and command dispatcher.scripts/check_clone_quality_bench.pyURL clone quality benchmark helper.scripts/check_integration_smoke.pyRelease, install, and URL-only clone smoke test.scripts/release_bundle.pyRelease artifact builder.docs/Architecture notes and universal benchmark documentation.
The strongest claim for this project is:
A benchmark-first Skill + MCP workflow for URL-based website cloning that handles iframe-blocked pages and reports reproducible visual, DOM, style, interaction, and responsive breakpoint scores.
Avoid treating the output as a legal or ownership bypass. The engine can reconstruct public page structure, but permission, licensing, and acceptable use still matter.
MIT
