Skip to content

A website crawler that gives a readable stream of request streams.

License

Notifications You must be signed in to change notification settings

edmellum/crawlstream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

crawlstream

A website crawler that gives a readable stream of request streams.

Development of this module has been sponsored by Knowit

Build Status

Installation

$ npm install crawlstream

Running the tests

$ npm test

Examples

Printing out the paths of all the pages found.

Streaming API

var crawlstream = require('crawlstream');

crawlstream('mysite.com', 10)
	.on('data', function(req) {
		console.log(req.uri.path);
	});

Callback API

var crawlstream = require('crawlstream');

crawlstream('mysite.com', 10, function(err, req) {
	console.log(req.uri.path);
});

Methods

var crawlstream = require('crawlstream')

crawlstream(baseUrl, concurrency, limit, [callback])

Crawl all pages under baseUrl.

Optionally supply a callback(err, req) which will receive the request stream(!) for all pages.

License

Copyright 2012 Knowit

MIT

About

A website crawler that gives a readable stream of request streams.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published