webEmbedding

webEmbedding is a source-first website cloning workflow packaged as a Skill + MCP server for AI coding agents.

It does more than ask an AI to "clone this site." It inspects the URL, chooses the safest reuse or rebuild path, captures the live page, extracts structured page evidence, generates reusable frontend reconstruction artifacts when direct reuse is blocked, and then self-verifies the result with visual, DOM, computed-style, interaction, and responsive-breakpoint checks.

Current Status

The current pipeline is strongest for static and semi-static web pages:

company, brand, marketing, and documentation pages
public landing pages
iframe-blocked pages that need capture-based reconstruction
responsive page snapshots across desktop, tablet, and mobile

It is not a full backend or app-logic clone engine. Login-only screens, captcha-heavy sites, maps, games, canvas/WebGL-heavy pages, real-time feeds, payments, booking flows, and private server behavior still need separate handling.

Measured Checkpoints

Recent local benchmark runs from this repo:

URL	Path	Score
`https://developer.mozilla.org/en-US/`	iframe-blocked bounded rebuild	root `94`, visual `95`, mobile `94`, tablet `93`, breakpoint average `93.5`
`https://www.mozilla.org/`	bounded rebuild	root `91`, visual `100`
`https://www.example.com`	exact reuse	ready `yes`

These are generated by the local self-verify pipeline, not manually assigned ratings.

Core Features

Source-first routing:
- direct iframe or embed reuse when it is safe and frameable
- original preview, export, remix, or source routes when available
- bounded rebuild only when exact reuse is unavailable
Live browser capture:
- DOM snapshot
- runtime HTML
- full-page screenshot
- computed style summaries
- CSS analysis
- asset inventory
- HAR-like network metadata
- interaction states and replay traces
- storage state export for session-aware flows
Blocked-site rebuild:
- handles X-Frame-Options and CSP-blocked pages by rebuilding from captured evidence
- generates reusable frontend reconstruction artifacts from captured page structure
- preserves custom tags, shadow-root host structure, and semantic document structure where captured
Self-verification:
- screenshot similarity
- DOM snapshot similarity
- computed-style similarity
- hover/focus/click interaction state parity
- interaction trace parity
- desktop/mobile/tablet breakpoint reports
Responsive benchmark support:
- primary desktop viewport: 1440x1200
- tablet profile: 768x1024
- mobile profile: 390x844
Repair loop:
- bounded self-repair can run when the first scaffold misses the readiness threshold

Install

Requirements

Node.js 18 or newer
Python 3.9 or newer
Chrome or Chromium available locally for Playwright runtime capture

The package uses playwright-core; it does not download a browser by itself.

Installing this project adds the source-first-clone plugin bundle, the exact-clone-intake skill, and the MCP server that exposes the URL inspection, capture, rebuild, and verification tools.

Install From npm

npm install -g web-embedding
web-embedding install
web-embedding doctor

If you already have an older local plugin installed, overwrite it with:

web-embedding install --force
web-embedding doctor

You can also run the installer without a global install:

npx web-embedding install

Install From Release

curl -fsSL https://github.com/jongko54/webEmbedding/releases/latest/download/install.sh | bash

Install From This Checkout

git clone https://github.com/jongko54/webEmbedding.git
cd webEmbedding
npm install
node ./bin/web-embedding.mjs install
node ./bin/web-embedding.mjs doctor

Install Into A Temporary Home

Useful for testing without touching your real agent home:

python3 python/web_embedding/installer.py install --target-home ./.tmp/home
python3 python/web_embedding/installer.py doctor --target-home ./.tmp/home
python3 python/web_embedding/installer.py uninstall --target-home ./.tmp/home

Quick Start

Inspect a URL and get route hints:

node ./bin/web-embedding.mjs inspect \
  --url https://developer.mozilla.org/en-US/

Run the full clone workflow:

node ./bin/web-embedding.mjs clone \
  --url https://developer.mozilla.org/en-US/ \
  --output-dir ./.tmp/mdn-clone \
  --wait-seconds 2 \
  --timeout-seconds 35 \
  --breakpoints mobile tablet

Run a lightweight quality benchmark:

python3 scripts/check_clone_quality_bench.py \
  https://developer.mozilla.org/en-US/ \
  --output-root ./.tmp/clone-quality-bench \
  --wait-seconds 1 \
  --timeout-seconds 35 \
  --breakpoints mobile tablet

The benchmark prints compact rows for root, visual, and breakpoint scores. The full artifacts are written under the output directory.

CLI Commands

node ./bin/web-embedding.mjs capabilities
node ./bin/web-embedding.mjs install
node ./bin/web-embedding.mjs doctor
node ./bin/web-embedding.mjs uninstall
node ./bin/web-embedding.mjs paths

node ./bin/web-embedding.mjs inspect --url https://www.mozilla.org/

node ./bin/web-embedding.mjs capture \
  --url https://www.mozilla.org/ \
  --output-dir ./.tmp/capture-mozilla \
  --breakpoints mobile tablet

node ./bin/web-embedding.mjs reproduce \
  --url https://www.mozilla.org/ \
  --output-dir ./.tmp/reproduce-mozilla \
  --breakpoints mobile tablet

node ./bin/web-embedding.mjs clone \
  --url https://www.mozilla.org/ \
  --output-dir ./.tmp/clone-mozilla \
  --breakpoints mobile tablet

node ./bin/web-embedding.mjs verify \
  --reference-bundle ./.tmp/reference/capture.json \
  --candidate-bundle ./.tmp/candidate/capture.json

Output Artifacts

A clone run can produce:

capture.json
dom/snapshot.json
dom/runtime.html
styles/computed-summary.json
styles/css-analysis.json
network/manifest.json
network/har.json
network/har-like.json
assets/inventory.json
interactions/states.json
interactions/trace.json
screenshots/runtime.png
session/storage-state.json
reproduction/rebuild/starter.html
reproduction/rebuild/starter.css
reproduction/rebuild/starter.tsx
reproduction/rebuild/next-app/
reproduction/self-verify/summary.json
reproduction/self-verify/renderers/*/verification.json
reproduction/self-verify/renderers/*/visual-qa.json
reproduction/self-verify/renderers/*/breakpoints/*-verification.json

Quality Benchmark

Run the default small benchmark:

npm run check:clone-bench:local

Run specific URLs:

python3 scripts/check_clone_quality_bench.py \
  https://www.example.com \
  https://www.mozilla.org/ \
  --no-breakpoints

Run a responsive benchmark:

python3 scripts/check_clone_quality_bench.py \
  https://developer.mozilla.org/en-US/ \
  --breakpoints mobile tablet

Development Checks

python3 -m py_compile \
  bundle/source-first-clone/mcp/source_first_clone/*.py \
  scripts/check_integration_smoke.py \
  scripts/check_clone_quality_bench.py

npm run check:integration:local

git diff --check

Repo Layout

bundle/source-first-clone Installed plugin bundle, MCP server, and exact-clone intake skill.
bundle/source-first-clone/mcp/source_first_clone Capture, planning, rebuild, repair, and verification engine.
bin/web-embedding.mjs Node CLI wrapper.
python/web_embedding/installer.py Shared installer and command dispatcher.
scripts/check_clone_quality_bench.py URL clone quality benchmark helper.
scripts/check_integration_smoke.py Release, install, and URL-only clone smoke test.
scripts/release_bundle.py Release artifact builder.
docs/ Architecture notes and universal benchmark documentation.

Positioning

The strongest claim for this project is:

A benchmark-first Skill + MCP workflow for URL-based website cloning that handles iframe-blocked pages and reports reproducible visual, DOM, style, interaction, and responsive breakpoint scores.

Avoid treating the output as a legal or ownership bypass. The engine can reconstruct public page structure, but permission, licensing, and acceptable use still matter.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.github/workflows		.github/workflows
bin		bin
bundle/source-first-clone		bundle/source-first-clone
docs		docs
fixtures/frame-shadow-parity		fixtures/frame-shadow-parity
python/web_embedding		python/web_embedding
scripts		scripts
.gitignore		.gitignore
.npmignore		.npmignore
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

webEmbedding

Current Status

Measured Checkpoints

Core Features

Install

Requirements

Install From npm

Install From Release

Install From This Checkout

Install Into A Temporary Home

Quick Start

CLI Commands

Output Artifacts

Quality Benchmark

Development Checks

Repo Layout

Positioning

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

webEmbedding

Current Status

Measured Checkpoints

Core Features

Install

Requirements

Install From npm

Install From Release

Install From This Checkout

Install Into A Temporary Home

Quick Start

CLI Commands

Output Artifacts

Quality Benchmark

Development Checks

Repo Layout

Positioning

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors 1

Languages

Packages