Universal Web Scraper — Extract Everything from Any Page

Scrape any webpage and extract every data point: text content, links, images, meta tags, headings (h1-h6), HTML tables, JSON-LD structured data, email addresses, and phone numbers. CSS selector targeting for specific content. Recursive crawling to follow internal links. $0.003/page.

What it extracts per page

Data	Description
Text	All visible text (scripts/styles stripped), up to 50K chars
Links	Every `<a>` tag — href, anchor text, internal/external flag
Images	Every `<img>` — src, alt text, width, height
Meta tags	All `<meta>` — description, og:title, keywords, robots, etc
Headings	All h1-h6 with level and text
Tables	HTML tables as structured arrays (headers + rows)
JSON-LD	Schema.org structured data from `<script type="application/ld+json">`
Emails	Email addresses found anywhere in the HTML
Phones	Phone numbers (7+ digits) found in the HTML
Selected	Content matching your CSS selector

Every extraction type can be toggled on/off.

Quick start

Scrape a single page:

{
    "urls": ["https://example.com"]
}

Crawl a site (follow links):

{
    "urls": ["https://example.com"],
    "maxDepth": 2,
    "maxPages": 50
}

Target specific content:

{
    "urls": ["https://example.com"],
    "cssSelector": ".main-content"
}

Input

Field	Type	Default	Description
`urls`	array	(required)	URLs to scrape
`extractText`	boolean	`true`	Visible text content
`extractLinks`	boolean	`true`	All links with anchor text
`extractImages`	boolean	`true`	All images with alt/dimensions
`extractMeta`	boolean	`true`	Meta tags
`extractHeadings`	boolean	`true`	h1-h6 headings
`extractTables`	boolean	`true`	HTML tables as arrays
`extractStructuredData`	boolean	`true`	JSON-LD schema.org data
`extractEmails`	boolean	`true`	Email addresses
`extractPhones`	boolean	`true`	Phone numbers
`cssSelector`	string	(optional)	Target specific element
`maxDepth`	integer	`0`	0 = listed URLs only. 1+ = follow links
`maxPages`	integer	`100`	Max pages to scrape total
`dryRun`	boolean	`false`	Scrape without charges

Pricing

$0.003 per page scraped (pay-per-event pricing).

Errors and dry runs are never charged.
100 pages = $0.30
1,000 pages = $3.00

Performance

Uses CheerioCrawler — pure HTTP, no headless browser
Fast: 100-500 pages/minute depending on target site
Low memory: 256MB handles most scraping jobs

Limitations

No JavaScript rendering. This scraper reads the initial HTML response. Content injected by JavaScript (React, Vue, Angular SPAs) won't be captured. For JS-heavy sites, use a Playwright-based scraper.
Email/phone extraction uses regex — may include false positives from code snippets or malformed patterns.
Tables are extracted as flat text arrays. Complex nested tables may not parse correctly.
Rate limiting. Crawlee handles basic rate limiting, but aggressive crawling may trigger bot protection.

Related Tools by manchittlab

Broken Link Checker — Find broken links across your website.
Email Validator Pro — Validate extracted emails with SMTP check.
Tech Stack Detector — Detect what technology a site uses.
Lighthouse Auditor — Performance and SEO audits.
Sitemap Analyzer — Parse and validate XML sitemaps.
DNS/WHOIS Suite — DNS records + RDAP domain lookup.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.actor		.actor
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Universal Web Scraper — Extract Everything from Any Page

What it extracts per page

Quick start

Input

Pricing

Performance

Limitations

Related Tools by manchittlab

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Languages

Folders and files

Latest commit

History

Repository files navigation

Universal Web Scraper — Extract Everything from Any Page

What it extracts per page

Quick start

Input

Pricing

Performance

Limitations

Related Tools by manchittlab

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 0

Languages

Packages

Contributors