Skip to content
master
Switch branches/tags
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
src
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Tatooine: A pluggable, simple and powerful web scraper.

Build Status codecov

Installation

$ npm install tatooine --save

Docs

// schemas: Array<Schema> => A list of schemas.
// customEngines?: Array<CustomEngine> => An optional list of custom engines.

const promise = Tatooine(schemas, customEngines)

Standard Engines

For convenience, Tatooine provide you with three useful standard engines.

Extending Standard Engines

The fork property allows extends the engine capabilities for your needs while creating schemas for the standard engines spa, json and/or markup.

// index.js

import Tatooine from "tatooine"

const schemas = [{
  engine: "json",
  options: { ... },
  selectors: { ... },
  fork({ sources, error }) {
    // Do anything you want with the data provided and then;

    return { sources, error };
  }
}]

const promise = Tatooine(schemas)

Note: The data returned in fork as parameter is the data already processed using the given schema configs.

Custom Engines

Beyond the standard engines, you can also create custom engines with your own rules whenever needed. Basically, you should follow the structure below to extend Tatooine's engine capabilities:

// xyz-engine.js

function getSourcesFromSomewhere(schema) {
  // Your engine logic
}

export default {
  engine: "xyz",
  run: getSourcesFromSomewhere,
}
// xyz-schema.js

export default {
  engine: "xyz",
  ...
};
// index.js

import Tatooine from "tatooine"

import xyzEngine from "./xyz-engine.js"
import xyzSchema from "./xyz-schema.js"

const promise = Tatooine([xyzSchema], [xyzEngine])