This is a very simple python web scraper (stil work in progres). It allows you to scrape data from the provided website by following the rules you provide. A better documentation will be provided soon, as the first official release is complete.
To setup, run yarn setup
. It will install all the required dependencies.
To run the scraper, run yarn scrape
. It will run the scraper based on the config file provided in the config.json (have a look at the config.sample.json for a better understanding)
[ ] Improve documentation
[ ] Create a user friendly helper to start scraping without knowing what json is
[ ] Add custom export settings
[ ] Add csv export
[x] Add a savings file, so that you can start from there if the script is interrupted
[ ] Add error handling preventing the script to crash on error
[ ] Add concurrent scraping (possibility to multiple scrapes at the same time)