Skip to content

gventuri/python-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simple python web scraper (WIP)

This is a very simple python web scraper (stil work in progres). It allows you to scrape data from the provided website by following the rules you provide. A better documentation will be provided soon, as the first official release is complete.

Setup

To setup, run yarn setup. It will install all the required dependencies.

Run

To run the scraper, run yarn scrape. It will run the scraper based on the config file provided in the config.json (have a look at the config.sample.json for a better understanding)

Todo

[ ] Improve documentation

[ ] Create a user friendly helper to start scraping without knowing what json is

[ ] Add custom export settings

[ ] Add csv export

[x] Add a savings file, so that you can start from there if the script is interrupted

[ ] Add error handling preventing the script to crash on error

[ ] Add concurrent scraping (possibility to multiple scrapes at the same time)

About

A simple python scraper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages