At the moment project is in draft mode. I've made many crawlers already and I'd like to share that soon and improve so more people could use it.
- Node.js
- Redis (for job queue)
- MongoDB (for storing crawled data)
- https://github.com/itemsapi/website-to-json
- easy to scale (by adding new workers to remote machines)
- easy to deploy (digital ocean, aws, etc)
- intuitiveness (should be very easy to start)
- easy to manage data (import / export)
- crawling many different websites at the same time
- cli
- easy to monitor status (to see what's going on)