node-tarantula

nodejs crawler/spider which provides a simple interface for crawling the Web. Its API has been inspired by crawler4j.

Quick Examples

var brain = {

    legs: 8,

    shouldVisit: function(uri) {
        return true;
    }

};

var tarantula = new Tarantula(brain);

tarantula.on('data', function (uri) {
	  console.info('200', uri);
});

tarantula.on('done', function() { 
    console.log('done'); 
});

tarantula.start(["http://stackoverflow.com"]);

Phantom Usage

If you would like to use the included PhantomJS plugin, you'll need to download and install the PhantomJS from their website. It's also on popular OS Package Managers:

brew install phantomjs
apt-get install phantomjs

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
lib		lib
.gitignore		.gitignore
.jshintrc		.jshintrc
LICENSE		LICENSE
README.md		README.md
THANKS		THANKS
example-wikidive-phantom.js		example-wikidive-phantom.js
example-wikidive.js		example-wikidive.js
package.json		package.json
useragents.json		useragents.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

node-tarantula

Quick Examples

Phantom Usage

About

Releases 4

Packages

Contributors 4

Languages

License

gpolitis/node-tarantula

Folders and files

Latest commit

History

Repository files navigation

node-tarantula

Quick Examples

Phantom Usage

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 4

Languages

Packages