Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@ FIRECRAWL_API_KEY=<your Firecrawl api key here>
SCRAPINGBEE_API_KEY=<your ScrapingBee api key here>
SCRAPERAPI_API_KEY=<your ScraperAPI api key here>
TAVILY_API_KEY=<your Tavily api key here>
ZYTE_API_KEY=<your ZYTE api key>
ZYTE_API_KEY=<your ZYTE api key>
SCRAPEGRAPHAI_API_KEY=<your ScrapeGraphAI api key>
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,16 @@

Scrape-Evals is an evaluation framework for web scraping engines ("engines") that benchmarks quality and robustness on a fixed dataset. We focus on: (1) whether an engine successfully retrieves page content (Coverage/Success Rate); and (2) how well the retrieved content captures a human-curated core snippet while avoiding noise (Recall/Precision/F1). The F1 score measures content quality by balancing how much important content is captured (recall) against how much irrelevant content is filtered out (precision). In our results, we refer to the F1 score as "quality" for simplicity.

This framework supports APIs for Firecrawl, Apify, ScraperAPI, ScrapingBee, Zyte, and more but also some self-hosted engines like Crawl4AI, Playwright, Puppeteer, Rest, Scrapy, and Selenium. Additional APIs can be easily integrated.
This framework supports APIs for Firecrawl, Apify, ScraperAPI, ScrapingBee, Zyte, and more but also some self-hosted engines like Crawl4AI, Playwright, Puppeteer, Rest, Scrapy ScrapegraphAi, and Selenium. Additional APIs can be easily integrated.

## Results

Below are evaluation results across different engines.

| Engine | Coverage (Success Rate) (%) | Quality (F1) |
|-----------------|-----------------------------|--------------|
| Firecrawl | 80.9 | 0.68 |
| ScrapegraphAi | 82.5 | 0.61 |
| Firecrawl | 81.6 | 0.66 |
| Exa | 76.3 | 0.53 |
| Tavily | 67.6 | 0.50 |
| ScraperAPI | 63.5 | 0.45 |
Expand Down
Loading