Skip to content

Commit

Permalink
Docs: SAXParser
Browse files Browse the repository at this point in the history
  • Loading branch information
inikulin committed Oct 2, 2015
1 parent d2bbe2f commit e8e9757
Show file tree
Hide file tree
Showing 4 changed files with 281 additions and 50 deletions.
197 changes: 158 additions & 39 deletions docs/04_api_reference.md
Expand Up @@ -14,6 +14,8 @@
<dd></dd>
<dt><a href="#SerializerOptions">SerializerOptions</a> : <code>Object</code></dt>
<dd></dd>
<dt><a href="#SAXParserOptions">SAXParserOptions</a> : <code>Object</code></dt>
<dd></dd>
</dl>
<a name="parse5"></a>
## parse5 : <code>object</code>
Expand All @@ -26,10 +28,18 @@
* ["script" (scriptElement, documentWrite(html), resume)](#parse5+ParserStream+event_script)
* [.SerializerStream](#parse5+SerializerStream) ⇐ <code>stream.Readable</code>
* [new SerializerStream(node, [options])](#new_parse5+SerializerStream_new)
* [.SAXParser](#parse5+SAXParser) ⇐ <code>stream.Transform</code>
* [new SAXParser(options)](#new_parse5+SAXParser_new)
* ["startTag" (name, attributes, selfClosing, [location])](#parse5+SAXParser+event_startTag)
* ["endTag" (name, [location])](#parse5+SAXParser+event_endTag)
* ["comment" (text, [location])](#parse5+SAXParser+event_comment)
* ["doctype" (name, publicId, systemId, [location])](#parse5+SAXParser+event_doctype)
* ["text" (text, [location])](#parse5+SAXParser+event_text)
* [.treeAdapters](#parse5+treeAdapters)
* [.parse(html, [options])](#parse5+parse) ⇒ <code>ASTNode.&lt;Document&gt;</code>
* [.parseFragment([fragmentContext], html, [options])](#parse5+parseFragment) ⇒ <code>ASTNode.&lt;DocumentFragment&gt;</code>
* [.serialize(node, [options])](#parse5+serialize) ⇒ <code>String</code>
* [.stop()](#parse5+stop)

<a name="parse5+ParserStream"></a>
### parse5.ParserStream ⇐ <code>stream.Writable</code>
Expand All @@ -43,8 +53,7 @@

<a name="new_parse5+ParserStream_new"></a>
#### new ParserStream(options)
Streaming HTML parser with the scripting support.
[Writable stream](https://nodejs.org/api/stream.html#stream_class_stream_writable).
Streaming HTML parser with the scripting support.[Writable stream](https://nodejs.org/api/stream.html#stream_class_stream_writable).


| Param | Type | Description |
Expand All @@ -53,19 +62,7 @@ Streaming HTML parser with the scripting support.

**Example**
```js
var parse5 = require('parse5');
var http = require('http');

// Fetch google.com content and obtain it's <body> node
http.get('http://google.com', function(res) {
var parser = new parse5.ParserStream();

parser.on('finish', function() {
var body = parser.document.childNodes[0].childNodes[1];
});

res.pipe(parser);
});
var parse5 = require('parse5');var http = require('http');// Fetch google.com content and obtain it's <body> nodehttp.get('http://google.com', function(res) { var parser = new parse5.ParserStream(); parser.on('finish', function() { var body = parser.document.childNodes[0].childNodes[1]; }); res.pipe(parser);});
```
<a name="parse5+ParserStream+document"></a>
#### parserStream.document : <code>ASTNode.&lt;document&gt;</code>
Expand All @@ -74,10 +71,7 @@ Resulting document node.
**Kind**: instance property of <code>[ParserStream](#parse5+ParserStream)</code>
<a name="parse5+ParserStream+event_script"></a>
#### "script" (scriptElement, documentWrite(html), resume)
Raised then parser encounters `<script>` element.
If event has listeners then parsing will be suspended on event emission.
So, if `<script>` has `src` attribute you can fetch it, execute and then
resume parser like browsers do.
Raised then parser encounters `<script>` element.If event has listeners then parsing will be suspended on event emission.So, if `<script>` has `src` attribute you can fetch it, execute and thenresume parser like browsers do.

**Kind**: event emitted by <code>[ParserStream](#parse5+ParserStream)</code>

Expand All @@ -89,24 +83,7 @@ resume parser like browsers do.

**Example**
```js
var parse = require('parse5');
var http = require('http');

var parser = new parse5.ParserStream();

parser.on('script', function(scriptElement, documentWrite, resume) {
var src = parse5.treeAdapters.default.getAttrList(scriptElement)[0].value;

http.get(src, function(res) {
// Fetch script content, execute it with DOM built around `parser.document` and
// `document.write` implemented using `documentWrite`
...
// Then resume the parser
resume();
});
});

parser.end('<script src="example.com/script.js"></script>');
var parse = require('parse5');var http = require('http');var parser = new parse5.ParserStream();parser.on('script', function(scriptElement, documentWrite, resume) { var src = parse5.treeAdapters.default.getAttrList(scriptElement)[0].value; http.get(src, function(res) { // Fetch script content, execute it with DOM built around `parser.document` and // `document.write` implemented using `documentWrite` ... // Then resume the parser resume(); });});parser.end('<script src="example.com/script.js"></script>');
```
<a name="parse5+SerializerStream"></a>
### parse5.SerializerStream ⇐ <code>stream.Readable</code>
Expand Down Expand Up @@ -136,6 +113,110 @@ var serializer = new parse5.SerializerStream(document);

serializer.pipe(file);
```
<a name="parse5+SAXParser"></a>
### parse5.SAXParser ⇐ <code>stream.Transform</code>
**Kind**: instance class of <code>[parse5](#parse5)</code>
**Extends:** <code>stream.Transform</code>
* [.SAXParser](#parse5+SAXParser) ⇐ <code>stream.Transform</code>
* [new SAXParser(options)](#new_parse5+SAXParser_new)
* ["startTag" (name, attributes, selfClosing, [location])](#parse5+SAXParser+event_startTag)
* ["endTag" (name, [location])](#parse5+SAXParser+event_endTag)
* ["comment" (text, [location])](#parse5+SAXParser+event_comment)
* ["doctype" (name, publicId, systemId, [location])](#parse5+SAXParser+event_doctype)
* ["text" (text, [location])](#parse5+SAXParser+event_text)
<a name="new_parse5+SAXParser_new"></a>
#### new SAXParser(options)
Streaming [SAX](https://en.wikipedia.org/wiki/Simple_API_for_XML)-style HTML parser.
[Transform stream](https://nodejs.org/api/stream.html#stream_class_stream_transform)
(which means you can pipe *through* it, see example).
| Param | Type | Description |
| --- | --- | --- |
| options | <code>[SAXParserOptions](#SAXParserOptions)</code> | Parsing options. |
**Example**
```js
var parse5 = require('parse5');
var http = require('http');
var fs = require('fs');

var file = fs.createWriteStream('/home/google.com.html');
var parser = new SAXParser();

parser.on('text', function(text) {
// Handle page text content
...
});

http.get('http://google.com', function(res) {
// SAXParser is the Transform stream, which means you can pipe
// through it. So you can analyze page content and e.g. save it
// to the file at the same time:
res.pipe(parser).pipe(file);
});
```
<a name="parse5+SAXParser+event_startTag"></a>
#### "startTag" (name, attributes, selfClosing, [location])
Raised then parser encounters start tag.
**Kind**: event emitted by <code>[SAXParser](#parse5+SAXParser)</code>
| Param | Type | Description |
| --- | --- | --- |
| name | <code>String</code> | Tag name. |
| attributes | <code>String</code> | List of attributes in `{ key: String, value: String }` form. |
| selfClosing | <code>Boolean</code> | Indicates if tag is self-closing. |
| [location] | <code>[LocationInfo](#LocationInfo)</code> | Start tag source code location info. Available if location info is enabled in [SAXParserOptions](#SAXParserOptions). |
<a name="parse5+SAXParser+event_endTag"></a>
#### "endTag" (name, [location])
Raised then parser encounters end tag.
**Kind**: event emitted by <code>[SAXParser](#parse5+SAXParser)</code>
| Param | Type | Description |
| --- | --- | --- |
| name | <code>String</code> | Tag name. |
| [location] | <code>[LocationInfo](#LocationInfo)</code> | End tag source code location info. Available if location info is enabled in [SAXParserOptions](#SAXParserOptions). |
<a name="parse5+SAXParser+event_comment"></a>
#### "comment" (text, [location])
Raised then parser encounters comment.
**Kind**: event emitted by <code>[SAXParser](#parse5+SAXParser)</code>
| Param | Type | Description |
| --- | --- | --- |
| text | <code>String</code> | Comment text. |
| [location] | <code>[LocationInfo](#LocationInfo)</code> | Comment source code location info. Available if location info is enabled in [SAXParserOptions](#SAXParserOptions). |
<a name="parse5+SAXParser+event_doctype"></a>
#### "doctype" (name, publicId, systemId, [location])
Raised then parser encounters [document type declaration](https://en.wikipedia.org/wiki/Document_type_declaration).
**Kind**: event emitted by <code>[SAXParser](#parse5+SAXParser)</code>
| Param | Type | Description |
| --- | --- | --- |
| name | <code>String</code> | Document type name. |
| publicId | <code>String</code> | Document type publicId. |
| systemId | <code>String</code> | Document type systemId. |
| [location] | <code>[LocationInfo](#LocationInfo)</code> | Document type declaration source code location info. Available if location info is enabled in [SAXParserOptions](#SAXParserOptions). |
<a name="parse5+SAXParser+event_text"></a>
#### "text" (text, [location])
Raised then parser encounters text content.
**Kind**: event emitted by <code>[SAXParser](#parse5+SAXParser)</code>
| Param | Type | Description |
| --- | --- | --- |
| text | <code>String</code> | Text content. |
| [location] | <code>[LocationInfo](#LocationInfo)</code> | Text content code location info. Available if location info is enabled in [SAXParserOptions](#SAXParserOptions). |
<a name="parse5+treeAdapters"></a>
### parse5.treeAdapters
Provides built-in tree adapters which can be used for parsing and serialization.
Expand Down Expand Up @@ -222,6 +303,34 @@ var html = parse5.serialize(document);
//Serialize <body> element content
var bodyInnerHtml = parse5.serialize(document.childNodes[0].childNodes[1]);
```
<a name="parse5+stop"></a>
### parse5.stop()
Stops parsing. Useful if you want parser to stop consume
CPU time once you've obtained desired info from input stream.
Doesn't prevents piping, so data will flow through parser as usual.
**Kind**: instance method of <code>[parse5](#parse5)</code>
**Example**
```js
var parse5 = require('parse5');
var http = require('http');
var fs = require('fs');

var file = fs.createWriteStream('/home/google.com.html');
var parser = new parse5.SAXParser();

parser.on('doctype', function(name, publicId, systemId) {
// Process doctype info ans stop parsing
...
parser.stop();
});

http.get('http://google.com', function(res) {
// Despite the fact that parser.stop() was called whole
// content of the page will be written to the file
res.pipe(parser).pipe(file);
});
```
<a name="ElementLocationInfo"></a>
## ElementLocationInfo : <code>Object</code>
**Kind**: global typedef
Expand Down Expand Up @@ -252,7 +361,7 @@ var bodyInnerHtml = parse5.serialize(document.childNodes[0].childNodes[1]);
| Name | Type | Default | Description |
| --- | --- | --- | --- |
| decodeHtmlEntities | <code>Boolean</code> | <code>true</code> | Decode HTML-entities like `&amp;`, `&nbsp;`, etc. **Warning:** disabling this option may cause output which is not conform HTML5 specification. |
| decodeHtmlEntities | <code>Boolean</code> | <code>true</code> | Decode HTML-entities like `&amp;`, `&nbsp;`, etc. **Warning:** disabling this option may result in output that does not conform to the HTML5 specification. |
| locationInfo | <code>Boolean</code> | <code>false</code> | Enables source code location information for the nodes. When enabled, each node (except root node) has `__location` property. In case the node is not an empty element, `__location` will be [ElementLocationInfo](#ElementLocationInfo) object, otherwise it's [LocationInfo](#LocationInfo). If element was implicitly created by the parser it's `__location` property will be `null`. |
| treeAdapter | <code>TreeAdapter</code> | <code>parse5.treeAdapters.default</code> | Specifies resulting tree format. |
Expand All @@ -263,6 +372,16 @@ var bodyInnerHtml = parse5.serialize(document.childNodes[0].childNodes[1]);
| Name | Type | Default | Description |
| --- | --- | --- | --- |
| encodeHtmlEntities | <code>Boolean</code> | <code>true</code> | HTML-encode characters like `<`, `>`, `&`, etc. **Warning:** disabling this option may cause output which is not conform HTML5 specification. |
| encodeHtmlEntities | <code>Boolean</code> | <code>true</code> | HTML-encode characters like `<`, `>`, `&`, etc. **Warning:** disabling this option may result in output that does not conform to the HTML5 specification. |
| treeAdapter | <code>TreeAdapter</code> | <code>parse5.treeAdapters.default</code> | Specifies input tree format. |
<a name="SAXParserOptions"></a>
## SAXParserOptions : <code>Object</code>
**Kind**: global typedef
**Properties**
| Name | Type | Default | Description |
| --- | --- | --- | --- |
| decodeHtmlEntities | <code>Boolean</code> | <code>true</code> | Decode HTML-entities like `&amp;`, `&nbsp;`, etc. **Warning:** disabling this option may result in output that does not conform to the HTML5 specification. |
| locationInfo | <code>Boolean</code> | <code>false</code> | Enables source code location information for the tokens. When enabled, each token event handler will receive [LocationInfo](#LocationInfo) object as the last argument. |
Expand Down
11 changes: 4 additions & 7 deletions lib/index.js
Expand Up @@ -7,10 +7,9 @@ var Parser = require('./parser'),

/**
* Parses HTML string.
* @function
* @function parse
* @memberof parse5
* @instance
* @name parse
* @param {string} html - Input HTML string.
* @param {ParserOptions} [options] - Parsing options.
* @returns {ASTNode<Document>} document
Expand All @@ -27,10 +26,9 @@ exports.parse = function parse(html, options) {

/**
* Parses HTML fragment.
* @function
* @function parseFragment
* @memberof parse5
* @instance
* @name parseFragment
* @param {ASTNode} [fragmentContext] - Parsing context element. If specified, given fragment
* will be parsed as if it was set to the context element's `innerHTML` property.
* @param {string} html - Input HTML fragment string.
Expand All @@ -57,10 +55,9 @@ exports.parseFragment = function parseFragment(fragmentContext, html, options) {

/**
* Serializes AST node to HTML string.
* @function
* @function serialize
* @memberof parse5
* @instance
* @name serialize
* @param {ASTNode} node - Node to serialize.
* @param {SerializerOptions} [options] - Serialization options.
* @returns {String} html
Expand All @@ -83,7 +80,7 @@ exports.serialize = function (node, options) {

/**
* Provides built-in tree adapters which can be used for parsing and serialization.
* @name treeAdapters
* @var treeAdapters
* @memberof parse5
* @instance
* @property {TreeAdapter} default - Default tree format for parse5.
Expand Down
1 change: 0 additions & 1 deletion lib/parser/index.js
Expand Up @@ -16,7 +16,6 @@ var $ = HTML.TAG_NAMES,
NS = HTML.NAMESPACES,
ATTRS = HTML.ATTRS;

//Default options
/**
* @typedef {Object} ParserOptions
*
Expand Down

0 comments on commit e8e9757

Please sign in to comment.