Skip to content
This repository has been archived by the owner on Nov 17, 2020. It is now read-only.

gpolitis/node-tarantula

Repository files navigation

node-tarantula

nodejs crawler/spider which provides a simple interface for crawling the Web. Its API has been inspired by crawler4j.

Quick Examples

var brain = {

    legs: 8,

    shouldVisit: function(uri) {
        return true;
    }

};

var tarantula = new Tarantula(brain);

tarantula.on('data', function (uri) {
	  console.info('200', uri);
});

tarantula.on('done', function() { 
    console.log('done'); 
});

tarantula.start(["http://stackoverflow.com"]);

Phantom Usage

If you would like to use the included PhantomJS plugin, you'll need to download and install the PhantomJS from their website. It's also on popular OS Package Managers:

  • brew install phantomjs
  • apt-get install phantomjs