Skip to content
master
Go to file
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

Helper to extract data from DOM

Build Status

Supported environments:

  • Browsers
  • PhantomJS
  • Cheerio
  • jsdom

Example

This example uses Cheerio:

var cheerio = require('cheerio');
var eee = require('dom-eee');
var html = '<ul><li>item1</li><li>item2 <span>with span</span></li></ul>';
var $ = cheerio.load(html);
var result = eee($.root(),
    {
        items: {
            selector: 'li',
            type: 'collection',
            extract: { text: { selector: ':self' } },
            filter: { exists: 'span' }
        }
    },
    { env: 'cheerio', cheerio: $ });
console.log(result);

Prints:

{ items: [ { text: 'item2 with span' } ] }

Extraction expressions

The system works by evaluating an object-formatted DSL expression. The syntax of the DSL and its semantics is described below.

ObjectExpression:

{
    "prop1": Expression,
    "prop2": Expression
}

ObjectExpression returns an object with given properties.

Property values are described by further a Expression(s).

Expression is either CollectionExpression or SingleExpression, returning a value described by it.

CollectionExpression:

{
    "type": "collection",
    "selector": CSSSelector,
    "extract": ObjectExpression,
    "filter": FilterExpression
}

CollectionExpression returns an array of items. Items are extracted by applying extract expression to each element matched by the selector CSS rule. If the rule matches no elements then an empty array is returned.

Optionally, the filter property might be set. Then the array of raw elements is first filtered through the FilterExpression.

SingleExpression:

{
    "type": "single",
    "selector": CSSSelector,
    "property": String,
    "attribute": String,
    "html": Boolean
}

Properties property and attribute are optional. If present the extracted value is either a property or an attribute of the node matched by the selector. If html property is set to true then element's markup is extracted. If not present, the text contents of the element is returned. If selector matches nothing then null is returned.

Property type is optional. When not set, single is assumed as the default.

FilterExpression:

{
    "exists": CSSSelector
}

An element passes a FilterExpression if it has elements that match the CSS rule in the exists property.

Testing

Run npm test.

License

The MIT License.

About

Extraction Expression Evaluator for DOM

Topics

Resources

License

Releases

No releases published

Packages

No packages published
You can’t perform that action at this time.