Skip to content

Latest commit

 

History

History
420 lines (339 loc) · 16.9 KB

04_api_reference.md

File metadata and controls

420 lines (339 loc) · 16.9 KB

API Reference

Objects

parse5 : object
## Typedefs
ElementLocationInfo : Object
LocationInfo : Object
ParserOptions : Object
SerializerOptions : Object
SAXParserOptions : Object
## parse5 : object **Kind**: global namespace

parse5.ParserStream ⇐ stream.Writable

Kind: instance class of parse5
Extends: stream.Writable

new ParserStream(options)

Streaming HTML parser with the scripting support. Writable stream.

Param Type Description
options ParserOptions Parsing options.

Example

var parse5 = require('parse5');
var http = require('http');

// Fetch google.com content and obtain it's <body> node
http.get('http://google.com', function(res) {
 var parser = new parse5.ParserStream();

 parser.on('finish', function() {
     var body = parser.document.childNodes[0].childNodes[1];
 });

 res.pipe(parser);
});

parserStream.document : ASTNode.<document>

Resulting document node.

Kind: instance property of ParserStream

"script" (scriptElement, documentWrite(html), resume)

Raised then parser encounters <script> element. If event has listeners then parsing will be suspended on event emission. So, if <script> has src attribute you can fetch it, execute and then resume parser like browsers do.

Kind: event emitted by ParserStream

Param Type Description
scriptElement ASTNode Script element that caused the event.
documentWrite(html) function Write additional html at the current parsing position. Suitable for the DOM document.write and document.writeln methods implementation.
resume function Resumes the parser.

Example

var parse = require('parse5');
var http = require('http');

var parser = new parse5.ParserStream();

parser.on('script', function(scriptElement, documentWrite, resume) {
  var src = parse5.treeAdapters.default.getAttrList(scriptElement)[0].value;

  http.get(src, function(res) {
     // Fetch script content, execute it with DOM built around `parser.document` and
     // `document.write` implemented using `documentWrite`
     ...
     // Then resume the parser
     resume();
  });
});

parser.end('<script src="example.com/script.js"></script>');

parse5.SerializerStream ⇐ stream.Readable

Kind: instance class of parse5
Extends: stream.Readable

new SerializerStream(node, [options])

Streaming AST node to HTML serializer. Readable stream.

Param Type Description
node ASTNode Node to serialize.
[options] SerializerOptions Serialization options.

Example

var parse5 = require('parse5');
var fs = require('fs');

var file = fs.createWriteStream('/home/index.html');

// Serialize parsed document to the HTML and write it to file
var document = parse5.parse('<body>Who is John Galt?</body>');
var serializer = new parse5.SerializerStream(document);

serializer.pipe(file);

parse5.SAXParser ⇐ stream.Transform

Kind: instance class of parse5
Extends: stream.Transform

new SAXParser(options)

Streaming SAX-style HTML parser. Transform stream (which means you can pipe through it, see example).

Param Type Description
options SAXParserOptions Parsing options.

Example

var parse5 = require('parse5');
var http = require('http');
var fs = require('fs');

var file = fs.createWriteStream('/home/google.com.html');
var parser = new SAXParser();

parser.on('text', function(text) {
 // Handle page text content
 ...
});

http.get('http://google.com', function(res) {
 // SAXParser is the Transform stream, which means you can pipe
 // through it. So you can analyze page content and e.g. save it
 // to the file at the same time:
 res.pipe(parser).pipe(file);
});

"startTag" (name, attributes, selfClosing, [location])

Raised then parser encounters start tag.

Kind: event emitted by SAXParser

Param Type Description
name String Tag name.
attributes String List of attributes in { key: String, value: String } form.
selfClosing Boolean Indicates if tag is self-closing.
[location] LocationInfo Start tag source code location info. Available if location info is enabled in SAXParserOptions.

"endTag" (name, [location])

Raised then parser encounters end tag.

Kind: event emitted by SAXParser

Param Type Description
name String Tag name.
[location] LocationInfo End tag source code location info. Available if location info is enabled in SAXParserOptions.

"comment" (text, [location])

Raised then parser encounters comment.

Kind: event emitted by SAXParser

Param Type Description
text String Comment text.
[location] LocationInfo Comment source code location info. Available if location info is enabled in SAXParserOptions.

"doctype" (name, publicId, systemId, [location])

Raised then parser encounters document type declaration.

Kind: event emitted by SAXParser

Param Type Description
name String Document type name.
publicId String Document type publicId.
systemId String Document type systemId.
[location] LocationInfo Document type declaration source code location info. Available if location info is enabled in SAXParserOptions.

"text" (text, [location])

Raised then parser encounters text content.

Kind: event emitted by SAXParser

Param Type Description
text String Text content.
[location] LocationInfo Text content code location info. Available if location info is enabled in SAXParserOptions.

parse5.treeAdapters

Provides built-in tree adapters which can be used for parsing and serialization.

Kind: instance property of parse5
Properties

Name Type Description
default TreeAdapter Default tree format for parse5.
htmlparser2 TreeAdapter Quite popular htmlparser2 tree format (e.g. used by cheerio and jsdom).

Example

var parse5 = require('parse5');

// Use default tree adapter for parsing
var document = parse5.parse('<div></div>', { treeAdapter: parse5.treeAdapters.default });

// Use htmlparser2 tree adapter with SerializerStream
var serializer = new parse5.SerializerStream(node, { treeAdapter: parse5.treeAdapters.htmlparser2 });

parse5.parse(html, [options]) ⇒ ASTNode.<Document>

Parses HTML string.

Kind: instance method of parse5
Returns: ASTNode.<Document> - document

Param Type Description
html string Input HTML string.
[options] ParserOptions Parsing options.

Example

var parse5 = require('parse5');

var document = parse5.parse('<!DOCTYPE html><html><head></head><body>Hi there!</body></html>');

parse5.parseFragment([fragmentContext], html, [options]) ⇒ ASTNode.<DocumentFragment>

Parses HTML fragment.

Kind: instance method of parse5
Returns: ASTNode.<DocumentFragment> - documentFragment

Param Type Description
[fragmentContext] ASTNode Parsing context element. If specified, given fragment will be parsed as if it was set to the context element's innerHTML property.
html string Input HTML fragment string.
[options] ParserOptions Parsing options.

Example

var parse5 = require('parse5');

var documentFragment = parse5.parseFragment('<table></table>');

//Parse html fragment in context of the parsed <table> element
var trFragment = parser.parseFragment(documentFragment.childNodes[0], '<tr><td>Shake it, baby</td></tr>');

parse5.serialize(node, [options]) ⇒ String

Serializes AST node to HTML string.

Kind: instance method of parse5
Returns: String - html

Param Type Description
node ASTNode Node to serialize.
[options] SerializerOptions Serialization options.

Example

var parse5 = require('parse5');

var document = parse5.parse('<!DOCTYPE html><html><head></head><body>Hi there!</body></html>');

//Serialize document
var html = parse5.serialize(document);

//Serialize <body> element content
var bodyInnerHtml = parse5.serialize(document.childNodes[0].childNodes[1]);

parse5.stop()

Stops parsing. Useful if you want parser to stop consume CPU time once you've obtained desired info from input stream. Doesn't prevents piping, so data will flow through parser as usual.

Kind: instance method of parse5
Example

var parse5 = require('parse5');
var http = require('http');
var fs = require('fs');

var file = fs.createWriteStream('/home/google.com.html');
var parser = new parse5.SAXParser();

parser.on('doctype', function(name, publicId, systemId) {
 // Process doctype info ans stop parsing
 ...
 parser.stop();
});

http.get('http://google.com', function(res) {
 // Despite the fact that parser.stop() was called whole
 // content of the page will be written to the file
 res.pipe(parser).pipe(file);
});

ElementLocationInfo : Object

Kind: global typedef
Extends: LocationInfo
Properties

Name Type Description
startTag LocationInfo Element's start tag LocationInfo.
endTag LocationInfo Element's end tag LocationInfo.

LocationInfo : Object

Kind: global typedef
Properties

Name Type Description
line Number One-based line index
col Number One-based column index
startOffset Number Zero-based first character index
endOffset Number Zero-based last character index

ParserOptions : Object

Kind: global typedef
Properties

Name Type Default Description
decodeHtmlEntities Boolean true Decode HTML-entities like &amp;, &nbsp;, etc. Warning: disabling this option may result in output that does not conform to the HTML5 specification.
locationInfo Boolean false Enables source code location information for the nodes. When enabled, each node (except root node) has __location property. In case the node is not an empty element, __location will be ElementLocationInfo object, otherwise it's LocationInfo. If element was implicitly created by the parser it's __location property will be null.
treeAdapter TreeAdapter parse5.treeAdapters.default Specifies resulting tree format.

SerializerOptions : Object

Kind: global typedef
Properties

Name Type Default Description
encodeHtmlEntities Boolean true HTML-encode characters like <, >, &, etc. Warning: disabling this option may result in output that does not conform to the HTML5 specification.
treeAdapter TreeAdapter parse5.treeAdapters.default Specifies input tree format.

SAXParserOptions : Object

Kind: global typedef
Properties

Name Type Default Description
decodeHtmlEntities Boolean true Decode HTML-entities like &amp;, &nbsp;, etc. Warning: disabling this option may result in output that does not conform to the HTML5 specification.
locationInfo Boolean false Enables source code location information for the tokens. When enabled, each token event handler will receive LocationInfo object as the last argument.