- parse5 :
object
- ElementLocationInfo :
Object
- LocationInfo :
Object
- ParserOptions :
Object
- SerializerOptions :
Object
- SAXParserOptions :
Object
object
**Kind**: global namespace
- parse5 :
object
- .ParserStream ⇐
stream.Writable
- new ParserStream(options)
- .document :
ASTNode.<document>
- "script" (scriptElement, documentWrite(html), resume)
- .SerializerStream ⇐
stream.Readable
- .SAXParser ⇐
stream.Transform
- .treeAdapters
- .parse(html, [options]) ⇒
ASTNode.<Document>
- .parseFragment([fragmentContext], html, [options]) ⇒
ASTNode.<DocumentFragment>
- .serialize(node, [options]) ⇒
String
- .stop()
- .ParserStream ⇐
Kind: instance class of parse5
Extends: stream.Writable
- .ParserStream ⇐
stream.Writable
- new ParserStream(options)
- .document :
ASTNode.<document>
- "script" (scriptElement, documentWrite(html), resume)
Streaming HTML parser with the scripting support. Writable stream.
Param | Type | Description |
---|---|---|
options | ParserOptions |
Parsing options. |
Example
var parse5 = require('parse5');
var http = require('http');
// Fetch google.com content and obtain it's <body> node
http.get('http://google.com', function(res) {
var parser = new parse5.ParserStream();
parser.on('finish', function() {
var body = parser.document.childNodes[0].childNodes[1];
});
res.pipe(parser);
});
Resulting document node.
Kind: instance property of ParserStream
Raised then parser encounters <script>
element.
If event has listeners then parsing will be suspended on event emission.
So, if <script>
has src
attribute you can fetch it, execute and then
resume parser like browsers do.
Kind: event emitted by ParserStream
Param | Type | Description |
---|---|---|
scriptElement | ASTNode |
Script element that caused the event. |
documentWrite(html) | function |
Write additional html at the current parsing position. Suitable for the DOM document.write and document.writeln methods implementation. |
resume | function |
Resumes the parser. |
Example
var parse = require('parse5');
var http = require('http');
var parser = new parse5.ParserStream();
parser.on('script', function(scriptElement, documentWrite, resume) {
var src = parse5.treeAdapters.default.getAttrList(scriptElement)[0].value;
http.get(src, function(res) {
// Fetch script content, execute it with DOM built around `parser.document` and
// `document.write` implemented using `documentWrite`
...
// Then resume the parser
resume();
});
});
parser.end('<script src="example.com/script.js"></script>');
Kind: instance class of parse5
Extends: stream.Readable
Streaming AST node to HTML serializer. Readable stream.
Param | Type | Description |
---|---|---|
node | ASTNode |
Node to serialize. |
[options] | SerializerOptions |
Serialization options. |
Example
var parse5 = require('parse5');
var fs = require('fs');
var file = fs.createWriteStream('/home/index.html');
// Serialize parsed document to the HTML and write it to file
var document = parse5.parse('<body>Who is John Galt?</body>');
var serializer = new parse5.SerializerStream(document);
serializer.pipe(file);
Kind: instance class of parse5
Extends: stream.Transform
- .SAXParser ⇐
stream.Transform
Streaming SAX-style HTML parser. Transform stream (which means you can pipe through it, see example).
Param | Type | Description |
---|---|---|
options | SAXParserOptions |
Parsing options. |
Example
var parse5 = require('parse5');
var http = require('http');
var fs = require('fs');
var file = fs.createWriteStream('/home/google.com.html');
var parser = new SAXParser();
parser.on('text', function(text) {
// Handle page text content
...
});
http.get('http://google.com', function(res) {
// SAXParser is the Transform stream, which means you can pipe
// through it. So you can analyze page content and e.g. save it
// to the file at the same time:
res.pipe(parser).pipe(file);
});
Raised then parser encounters start tag.
Kind: event emitted by SAXParser
Param | Type | Description |
---|---|---|
name | String |
Tag name. |
attributes | String |
List of attributes in { key: String, value: String } form. |
selfClosing | Boolean |
Indicates if tag is self-closing. |
[location] | LocationInfo |
Start tag source code location info. Available if location info is enabled in SAXParserOptions. |
Raised then parser encounters end tag.
Kind: event emitted by SAXParser
Param | Type | Description |
---|---|---|
name | String |
Tag name. |
[location] | LocationInfo |
End tag source code location info. Available if location info is enabled in SAXParserOptions. |
Raised then parser encounters comment.
Kind: event emitted by SAXParser
Param | Type | Description |
---|---|---|
text | String |
Comment text. |
[location] | LocationInfo |
Comment source code location info. Available if location info is enabled in SAXParserOptions. |
Raised then parser encounters document type declaration.
Kind: event emitted by SAXParser
Param | Type | Description |
---|---|---|
name | String |
Document type name. |
publicId | String |
Document type publicId. |
systemId | String |
Document type systemId. |
[location] | LocationInfo |
Document type declaration source code location info. Available if location info is enabled in SAXParserOptions. |
Raised then parser encounters text content.
Kind: event emitted by SAXParser
Param | Type | Description |
---|---|---|
text | String |
Text content. |
[location] | LocationInfo |
Text content code location info. Available if location info is enabled in SAXParserOptions. |
Provides built-in tree adapters which can be used for parsing and serialization.
Kind: instance property of parse5
Properties
Name | Type | Description |
---|---|---|
default | TreeAdapter |
Default tree format for parse5. |
htmlparser2 | TreeAdapter |
Quite popular htmlparser2 tree format (e.g. used by cheerio and jsdom). |
Example
var parse5 = require('parse5');
// Use default tree adapter for parsing
var document = parse5.parse('<div></div>', { treeAdapter: parse5.treeAdapters.default });
// Use htmlparser2 tree adapter with SerializerStream
var serializer = new parse5.SerializerStream(node, { treeAdapter: parse5.treeAdapters.htmlparser2 });
Parses HTML string.
Kind: instance method of parse5
Returns: ASTNode.<Document>
- document
Param | Type | Description |
---|---|---|
html | string |
Input HTML string. |
[options] | ParserOptions |
Parsing options. |
Example
var parse5 = require('parse5');
var document = parse5.parse('<!DOCTYPE html><html><head></head><body>Hi there!</body></html>');
Parses HTML fragment.
Kind: instance method of parse5
Returns: ASTNode.<DocumentFragment>
- documentFragment
Param | Type | Description |
---|---|---|
[fragmentContext] | ASTNode |
Parsing context element. If specified, given fragment will be parsed as if it was set to the context element's innerHTML property. |
html | string |
Input HTML fragment string. |
[options] | ParserOptions |
Parsing options. |
Example
var parse5 = require('parse5');
var documentFragment = parse5.parseFragment('<table></table>');
//Parse html fragment in context of the parsed <table> element
var trFragment = parser.parseFragment(documentFragment.childNodes[0], '<tr><td>Shake it, baby</td></tr>');
Serializes AST node to HTML string.
Kind: instance method of parse5
Returns: String
- html
Param | Type | Description |
---|---|---|
node | ASTNode |
Node to serialize. |
[options] | SerializerOptions |
Serialization options. |
Example
var parse5 = require('parse5');
var document = parse5.parse('<!DOCTYPE html><html><head></head><body>Hi there!</body></html>');
//Serialize document
var html = parse5.serialize(document);
//Serialize <body> element content
var bodyInnerHtml = parse5.serialize(document.childNodes[0].childNodes[1]);
Stops parsing. Useful if you want parser to stop consume CPU time once you've obtained desired info from input stream. Doesn't prevents piping, so data will flow through parser as usual.
Kind: instance method of parse5
Example
var parse5 = require('parse5');
var http = require('http');
var fs = require('fs');
var file = fs.createWriteStream('/home/google.com.html');
var parser = new parse5.SAXParser();
parser.on('doctype', function(name, publicId, systemId) {
// Process doctype info ans stop parsing
...
parser.stop();
});
http.get('http://google.com', function(res) {
// Despite the fact that parser.stop() was called whole
// content of the page will be written to the file
res.pipe(parser).pipe(file);
});
Kind: global typedef
Extends: LocationInfo
Properties
Name | Type | Description |
---|---|---|
startTag | LocationInfo |
Element's start tag LocationInfo. |
endTag | LocationInfo |
Element's end tag LocationInfo. |
Kind: global typedef
Properties
Name | Type | Description |
---|---|---|
line | Number |
One-based line index |
col | Number |
One-based column index |
startOffset | Number |
Zero-based first character index |
endOffset | Number |
Zero-based last character index |
Kind: global typedef
Properties
Name | Type | Default | Description |
---|---|---|---|
decodeHtmlEntities | Boolean |
true |
Decode HTML-entities like & , , etc. Warning: disabling this option may result in output that does not conform to the HTML5 specification. |
locationInfo | Boolean |
false |
Enables source code location information for the nodes. When enabled, each node (except root node) has __location property. In case the node is not an empty element, __location will be ElementLocationInfo object, otherwise it's LocationInfo. If element was implicitly created by the parser it's __location property will be null . |
treeAdapter | TreeAdapter |
parse5.treeAdapters.default |
Specifies resulting tree format. |
Kind: global typedef
Properties
Name | Type | Default | Description |
---|---|---|---|
encodeHtmlEntities | Boolean |
true |
HTML-encode characters like < , > , & , etc. Warning: disabling this option may result in output that does not conform to the HTML5 specification. |
treeAdapter | TreeAdapter |
parse5.treeAdapters.default |
Specifies input tree format. |
Kind: global typedef
Properties
Name | Type | Default | Description |
---|---|---|---|
decodeHtmlEntities | Boolean |
true |
Decode HTML-entities like & , , etc. Warning: disabling this option may result in output that does not conform to the HTML5 specification. |
locationInfo | Boolean |
false |
Enables source code location information for the tokens. When enabled, each token event handler will receive LocationInfo object as the last argument. |