Webstract

CLI to snapshot a web page for offline use: downloads HTML plus CSS/JS/images on the same site, rewrites references, and saves everything locally.

Install

npm install -g webstract          # after publishing
# or run without installing
npx webstract <url> <outputDir>

Quick start

yarn start <url> <outputDir> [--concurrency <n>] [--timeout <ms>]
yarn start https://example.com ./dump

Build once for distribution:

yarn build
node dist/cli.js <url> <outputDir>

What it does

Follows redirects; the final URL is the base for asset rewriting.
Saves index.html under <outputDir>/<domain>/ and rewrites references to point to downloaded files.
Downloads assets on the same registrable domain or same root label (e.g., daum.net → daumcdn.net), not just strict origin.
Collects linked CSS (link[rel=stylesheet]), JS (script[src]), images (img/srcset, img[src], source[src|srcset], icons), inline CSS in <style>/style=, and meta images (OG/Twitter).
Parses downloaded CSS for @import and url(...) references on the same domain/root label.
External origins remain absolute; skipped items are listed in missing-assets.json. Use --download-external to force-download other domains (saved under a hostname-prefixed path).
Writes _WST.md summary with request/final URLs and download/skip/fail counts.

CLI options

Option	Description	Default
`-c, --concurrency <n>`	Concurrent downloads	`WEBSTRACT_CONCURRENCY` or `5`
`-t, --timeout <ms>`	Request timeout in ms	`WEBSTRACT_TIMEOUT_MS` or `15000`
`-r, --retries <n>`	Retry attempts per request	`WEBSTRACT_MAX_RETRIES` or `3`
`--retry-delay <ms>`	Delay between retries (exponential backoff)	`WEBSTRACT_RETRY_DELAY_MS` or `1000`
`--user-agent <ua>`	Custom User-Agent string	`WEBSTRACT_USER_AGENT`
`--no-follow-redirects`	Do not follow HTTP redirects	follow redirects
`--insecure`	Allow insecure TLS (self-signed)	off
`--download-external`	Force download of external-domain assets (prefixed by hostname)	off
`--no-css-parse`	Skip CSS @import/url() parsing	on
`--no-meta`	Skip meta (OG/Twitter) image discovery	on
`--summary-format <md	json>`	`_WST` summary format
`--output-name <name>`	Override output folder name	derived from domain
`--quiet` / `--verbose`	Control log verbosity	normal

Environment variables: WEBSTRACT_CONCURRENCY, WEBSTRACT_TIMEOUT_MS, WEBSTRACT_USER_AGENT.

Output layout

<outputDir>/<domain>/
├─ index.html
├─ _WST.md                  # summary
├─ missing-assets.json      # only if something was skipped
├─ css/...
├─ js/...
└─ images/...               # file tree mirrors remote paths

Open index.html in a browser for the offline copy. Check _WST.md for a quick summary and missing-assets.json to see which external assets stayed remote.

Programmatic use

import { webstract } from "webstract";

await webstract("https://example.com", "./dump/example.com");

Project layout

src/webstract.ts: Orchestrates extraction and options.
src/lib/: Shared utilities (HTTP client, logger).
src/extract/: Core extraction logic (collector, CSS parsing, downloader, rewriter, output).
CLI entry: src/cli.ts.

Environment variables are loaded via dotenv (quiet mode); keep your .env out of version control.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
docs		docs
src		src
test		test
.gitignore		.gitignore
.prettierrc		.prettierrc
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Webstract

Install

Quick start

What it does

CLI options

Output layout

Programmatic use

Project layout

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Webstract

Install

Quick start

What it does

CLI options

Output layout

Programmatic use

Project layout

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages