Skip to content

remarkablemark/html-dom-parser

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
lib
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

html-dom-parser

NPM

NPM version Build Status codecov NPM downloads

HTML to DOM parser that works on both the server (Node.js) and the client (browser):

HTMLDOMParser(string[, options])

The parser converts an HTML string to a JavaScript object that describes the DOM tree.

Example

const parse = require('html-dom-parser');
parse('<p>Hello, World!</p>');

Output:

[
  Element {
    type: 'tag',
    parent: null,
    prev: null,
    next: null,
    startIndex: null,
    endIndex: null,
    children: [
      Text {
        type: 'text',
        parent: [Circular],
        prev: null,
        next: null,
        startIndex: null,
        endIndex: null,
        data: 'Hello, World!'
      }
    ],
    name: 'p',
    attribs: {}
  }
]

Replit | JSFiddle | Examples

Install

NPM:

npm install html-dom-parser --save

Yarn:

yarn add html-dom-parser

CDN:

<script src="https://unpkg.com/html-dom-parser@latest/dist/html-dom-parser.min.js"></script>
<script>
  window.HTMLDOMParser(/* string */);
</script>

Usage

Import or require the module:

// ES Modules
import parse from 'html-dom-parser';

// CommonJS
const parse = require('html-dom-parser');

Parse empty string:

parse('');

Output:

[]

Parse string:

parse('Hello, World!');
[
  Text {
    type: 'text',
    parent: null,
    prev: null,
    next: null,
    startIndex: null,
    endIndex: null,
    data: 'Hello, World!'
  }
]

Parse element with attributes:

parse('<p class="foo" style="color: #bada55">Hello, <em>world</em>!</p>');

Output:

[
  Element {
    type: 'tag',
    parent: null,
    prev: null,
    next: null,
    startIndex: null,
    endIndex: null,
    children: [ [Text], [Element], [Text] ],
    name: 'p',
    attribs: { class: 'foo', style: 'color: #bada55' }
  }
]

The server parser is a wrapper of htmlparser2 parseDOM but with the root parent node excluded.

The client parser mimics the server parser by using the DOM API to parse the HTML string.

Testing

Run server and client tests:

npm test

Generate HTML coverage report for server tests:

npx nyc report --reporter=html

Lint files:

npm run lint
npm run lint:fix

Test TypeScript declaration file for style and correctness:

npm run lint:dts

Migration

v3.0.0

domhandler has been upgraded to v5 so some parser options like normalizeWhitespace have been removed.

Release

Release and publish are automated by Release Please.

Special Thanks

License

MIT