A basic crawler written in nodejs
JavaScript
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
lib
.gitignore
.npmignore
README.md
index.js
package.json

README.md

Crawljs

A basic crawler written in nodejs

Note

Crawljs has a dependency on jsdom which in turn has dependency on contextify which is a native nodejs extension. To run this crawler, you will require a C++ compiler on your machine. Details.

Usage

npm install -g crawljs
crawljs http://nodejs.org

Provide a limit to number of urls to be crawled

crawljs http://nodejs.org 500 Crawls only first 500 urls encountered

Programmatic api

var	Crawler = require("../lib/Crawler")
,	seed = "http://nodejs.org"
,	limit = 500;

var crawler = new Crawler(limit);
crawler.crawl(seed);