Skip to content

jongko54/webEmbedding

Repository files navigation

webEmbedding

webEmbedding is a source-first website cloning workflow packaged as a Skill + MCP server for AI coding agents.

It does more than ask an AI to "clone this site." It inspects the URL, chooses the safest reuse or rebuild path, captures the live page, extracts structured page evidence, generates reusable frontend reconstruction artifacts when direct reuse is blocked, and then self-verifies the result with visual, DOM, computed-style, interaction, and responsive-breakpoint checks.

webEmbedding Skill and MCP workflow

Current Status

The current pipeline is strongest for static and semi-static web pages:

  • company, brand, marketing, and documentation pages
  • public landing pages
  • iframe-blocked pages that need capture-based reconstruction
  • responsive page snapshots across desktop, tablet, and mobile

It is not a full backend or app-logic clone engine. Login-only screens, captcha-heavy sites, maps, games, canvas/WebGL-heavy pages, real-time feeds, payments, booking flows, and private server behavior still need separate handling.

Measured Checkpoints

Recent local benchmark runs from this repo:

URL Path Score
https://developer.mozilla.org/en-US/ iframe-blocked bounded rebuild root 94, visual 95, mobile 94, tablet 93, breakpoint average 93.5
https://www.mozilla.org/ bounded rebuild root 91, visual 100
https://www.example.com exact reuse ready yes

These are generated by the local self-verify pipeline, not manually assigned ratings.

Core Features

  • Source-first routing:
    • direct iframe or embed reuse when it is safe and frameable
    • original preview, export, remix, or source routes when available
    • bounded rebuild only when exact reuse is unavailable
  • Live browser capture:
    • DOM snapshot
    • runtime HTML
    • full-page screenshot
    • computed style summaries
    • CSS analysis
    • asset inventory
    • HAR-like network metadata
    • interaction states and replay traces
    • storage state export for session-aware flows
  • Blocked-site rebuild:
    • handles X-Frame-Options and CSP-blocked pages by rebuilding from captured evidence
    • generates reusable frontend reconstruction artifacts from captured page structure
    • preserves custom tags, shadow-root host structure, and semantic document structure where captured
  • Self-verification:
    • screenshot similarity
    • DOM snapshot similarity
    • computed-style similarity
    • hover/focus/click interaction state parity
    • interaction trace parity
    • desktop/mobile/tablet breakpoint reports
  • Responsive benchmark support:
    • primary desktop viewport: 1440x1200
    • tablet profile: 768x1024
    • mobile profile: 390x844
  • Repair loop:
    • bounded self-repair can run when the first scaffold misses the readiness threshold

Install

Requirements

  • Node.js 18 or newer
  • Python 3.9 or newer
  • Chrome or Chromium available locally for Playwright runtime capture

The package uses playwright-core; it does not download a browser by itself.

Installing this project adds the source-first-clone plugin bundle, the exact-clone-intake skill, and the MCP server that exposes the URL inspection, capture, rebuild, and verification tools.

Install From npm

npm install -g web-embedding
web-embedding install
web-embedding doctor

If you already have an older local plugin installed, overwrite it with:

web-embedding install --force
web-embedding doctor

You can also run the installer without a global install:

npx web-embedding install

Install From Release

curl -fsSL https://github.com/jongko54/webEmbedding/releases/latest/download/install.sh | bash

Install From This Checkout

git clone https://github.com/jongko54/webEmbedding.git
cd webEmbedding
npm install
node ./bin/web-embedding.mjs install
node ./bin/web-embedding.mjs doctor

Install Into A Temporary Home

Useful for testing without touching your real agent home:

python3 python/web_embedding/installer.py install --target-home ./.tmp/home
python3 python/web_embedding/installer.py doctor --target-home ./.tmp/home
python3 python/web_embedding/installer.py uninstall --target-home ./.tmp/home

Quick Start

Inspect a URL and get route hints:

node ./bin/web-embedding.mjs inspect \
  --url https://developer.mozilla.org/en-US/

Run the full clone workflow:

node ./bin/web-embedding.mjs clone \
  --url https://developer.mozilla.org/en-US/ \
  --output-dir ./.tmp/mdn-clone \
  --wait-seconds 2 \
  --timeout-seconds 35 \
  --breakpoints mobile tablet

Run a lightweight quality benchmark:

python3 scripts/check_clone_quality_bench.py \
  https://developer.mozilla.org/en-US/ \
  --output-root ./.tmp/clone-quality-bench \
  --wait-seconds 1 \
  --timeout-seconds 35 \
  --breakpoints mobile tablet

The benchmark prints compact rows for root, visual, and breakpoint scores. The full artifacts are written under the output directory.

CLI Commands

node ./bin/web-embedding.mjs capabilities
node ./bin/web-embedding.mjs install
node ./bin/web-embedding.mjs doctor
node ./bin/web-embedding.mjs uninstall
node ./bin/web-embedding.mjs paths
node ./bin/web-embedding.mjs inspect --url https://www.mozilla.org/
node ./bin/web-embedding.mjs capture \
  --url https://www.mozilla.org/ \
  --output-dir ./.tmp/capture-mozilla \
  --breakpoints mobile tablet
node ./bin/web-embedding.mjs reproduce \
  --url https://www.mozilla.org/ \
  --output-dir ./.tmp/reproduce-mozilla \
  --breakpoints mobile tablet
node ./bin/web-embedding.mjs clone \
  --url https://www.mozilla.org/ \
  --output-dir ./.tmp/clone-mozilla \
  --breakpoints mobile tablet
node ./bin/web-embedding.mjs verify \
  --reference-bundle ./.tmp/reference/capture.json \
  --candidate-bundle ./.tmp/candidate/capture.json

Output Artifacts

A clone run can produce:

  • capture.json
  • dom/snapshot.json
  • dom/runtime.html
  • styles/computed-summary.json
  • styles/css-analysis.json
  • network/manifest.json
  • network/har.json
  • network/har-like.json
  • assets/inventory.json
  • interactions/states.json
  • interactions/trace.json
  • screenshots/runtime.png
  • session/storage-state.json
  • reproduction/rebuild/starter.html
  • reproduction/rebuild/starter.css
  • reproduction/rebuild/starter.tsx
  • reproduction/rebuild/next-app/
  • reproduction/self-verify/summary.json
  • reproduction/self-verify/renderers/*/verification.json
  • reproduction/self-verify/renderers/*/visual-qa.json
  • reproduction/self-verify/renderers/*/breakpoints/*-verification.json

Quality Benchmark

Run the default small benchmark:

npm run check:clone-bench:local

Run specific URLs:

python3 scripts/check_clone_quality_bench.py \
  https://www.example.com \
  https://www.mozilla.org/ \
  --no-breakpoints

Run a responsive benchmark:

python3 scripts/check_clone_quality_bench.py \
  https://developer.mozilla.org/en-US/ \
  --breakpoints mobile tablet

Development Checks

python3 -m py_compile \
  bundle/source-first-clone/mcp/source_first_clone/*.py \
  scripts/check_integration_smoke.py \
  scripts/check_clone_quality_bench.py
npm run check:integration:local
git diff --check

Repo Layout

  • bundle/source-first-clone Installed plugin bundle, MCP server, and exact-clone intake skill.
  • bundle/source-first-clone/mcp/source_first_clone Capture, planning, rebuild, repair, and verification engine.
  • bin/web-embedding.mjs Node CLI wrapper.
  • python/web_embedding/installer.py Shared installer and command dispatcher.
  • scripts/check_clone_quality_bench.py URL clone quality benchmark helper.
  • scripts/check_integration_smoke.py Release, install, and URL-only clone smoke test.
  • scripts/release_bundle.py Release artifact builder.
  • docs/ Architecture notes and universal benchmark documentation.

Positioning

The strongest claim for this project is:

A benchmark-first Skill + MCP workflow for URL-based website cloning that handles iframe-blocked pages and reports reproducible visual, DOM, style, interaction, and responsive breakpoint scores.

Avoid treating the output as a legal or ownership bypass. The engine can reconstruct public page structure, but permission, licensing, and acceptable use still matter.

License

MIT

About

ai web page embedding

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages