Smeagol is a very simple NodeJS crawler module where you can create url patterns to extract different contents from different pages.

Install smeagol

npm install smeagol

How to use

Require Smeagol

var Smeagol = require('smeagol');

Instance and settings

let smeagol = new Smeagol(
        crawl : [
                pattern_url : '^*)?$', 
                id : 'news',
                each_item : '#glb-materia',
                find : {
                    id    : '$(".share-bar").attr("data-url")',
                    title   : '$(".entry-title").text()'
        limit: 6,
        continuous : true,
        maxConcurrency: 6,
        domain : '',
        pattern_to_crawl : '^*)?$'

"pattern_url" define what pages Smeagol will scrap. "id" is the identification for the result group in Smeagol results. "each_item" is a CSS selector. Smeagol will iterate this selector on the page and extract the data defined in "find". "find" is a object with label and CSS selector for each information you want to get from each "each_item".


Just start crawling!

    uri : ''


Smeagol uses nodeJs events to let you decide what to do when you get the information you want to scrap.

####complete(results)#### Emitted when Smeagol complete scrapping or scrap the limit pages in settings.

smeagol.on('complete', function(results){

####crawl(result)#### Emitted every item (each_item in setting) Smeagol scrap.

result is a json object. url is the page url where Smeagol scrapped the result.

smeagol.on('crawl', function(url, result){
    console.log('crawl', url, result);