A website crawler that gives a readable stream of request streams.
JavaScript
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
.gitignore
.travis.yml
LICENSE
README.md
index.js
package.json
tests.js

README.md

crawlstream

A website crawler that gives a readable stream of request streams.

Development of this module has been sponsored by Knowit

Build Status

Installation

$ npm install crawlstream

Running the tests

$ npm test

Examples

Printing out the paths of all the pages found.

Streaming API

var crawlstream = require('crawlstream');

crawlstream('mysite.com', 10)
    .on('data', function(req) {
        console.log(req.uri.path);
    });

Callback API

var crawlstream = require('crawlstream');

crawlstream('mysite.com', 10, function(err, req) {
    console.log(req.uri.path);
});

Methods

var crawlstream = require('crawlstream')

crawlstream(baseUrl, concurrency, limit, [callback])

Crawl all pages under baseUrl.

Optionally supply a callback(err, req) which will receive the request stream(!) for all pages.

License

Copyright 2012 Knowit

MIT