web crawler/spider for nodejs
JavaScript
Switch branches/tags
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
lib
.gitignore
.jshintrc
LICENSE
README.md
THANKS
example-wikidive-phantom.js
example-wikidive.js
package.json
useragents.json

README.md

node-tarantula

nodejs crawler/spider which provides a simple interface for crawling the Web. Its API has been inspired by crawler4j.

Quick Examples

var brain = {

    legs: 8,

    shouldVisit: function(uri) {
        return true;
    }

};

var tarantula = new Tarantula(brain);

tarantula.on('data', function (uri) {
	  console.info('200', uri);
});

tarantula.on('done', function() { 
    console.log('done'); 
});

tarantula.start(["http://stackoverflow.com"]);

Phantom Usage

If you would like to use the included PhantomJS plugin, you'll need to download and install the PhantomJS from their website. It's also on popular OS Package Managers:

  • brew install phantomjs
  • apt-get install phantomjs