Skip to content

hejiheji001/Web-Scraper-Plus

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Scraper Plus

Web Scraper Plus is a chrome browser extension built for data extraction from web pages. Using this extension you can create a plan (sitemap) how a web site should be traversed and what should be extracted. Using these sitemaps the Web Scraper will navigate the site accordingly and extract all data. Scraped data later can be exported as CSV.

Install the extension from chrome-store

Document for new features: wiki

This tool is forked form Web-Scraper with many more features

New Features

  1. CLI Support: Start scraping from CMD/Terminal
  2. MySQL Support: Support MySQL database (v5.7+)
  3. Anti Lazy-Loading: Anti Lazy-Loading feature on pages
  4. Data Filter: Support user defined JS code for data preprocess and much more
  5. Distinct: Remove dulplicate data before the end of every task.
  6. Custom Columns: Define the columns you want to display, please use this feature together with Data Filter
  7. Easy Scrape: Create & scrape sitemap in a more easily way. (Based on https://github.com/aagiss)
  8. Random Interval: Add a random delay between requests. (Provided by https://github.com/Euphorbium)

Features(Forked from original work)

  1. Scrape multiple pages
  2. Sitemaps and scraped data are stored in browsers local storage or in CouchDB
  3. Multiple data selection types
  4. Extract data from dynamic pages (JavaScript+AJAX)
  5. Browse scraped data
  6. Export scraped data as CSV
  7. Import, Export sitemaps
  8. Depends only on Chrome browser

Help

Basic documentation and tutorials are available on webscraper.io

Submit bugs and suggest features on github-issues

Bugs

When submitting a bug please attach an exported sitemap if possible.

License

LGPLv3

About

Web data extraction tool implemented as chrome extension with much more features

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 80.3%
  • HTML 11.9%
  • CSS 7.8%