itemsapi / crawler Public

Notifications You must be signed in to change notification settings
Fork 0
Star 4

(draft) Very scalable and easy crawler in Node.js, MongoDB and Redis

Apache-2.0 license

4 stars 0 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Repository files navigation

Scalable crawler made easy

At the moment project is in draft mode. I've made many crawlers already and I'd like to share that soon and improve so more people could use it.

Technologies

Node.js
Redis (for job queue)
MongoDB (for storing crawled data)
https://github.com/itemsapi/website-to-json

Desired features

easy to scale (by adding new workers to remote machines)
easy to deploy (digital ocean, aws, etc)
intuitiveness (should be very easy to start)
easy to manage data (import / export)
crawling many different websites at the same time
cli
easy to monitor status (to see what's going on)

Inspirations

About

(draft) Very scalable and easy crawler in Node.js, MongoDB and Redis

Apache-2.0 license

Custom properties

Report repository

Releases

No releases published

Packages

No packages published