parse-xml

A fast, safe, compliant XML parser for Node.js and browsers.

Installation

npm install @rgrove/parse-xml

Or, if you like living dangerously, you can load the minified UMD bundle in a browser via Unpkg and use the parseXml global.

Features

Returns an object tree representing an XML document.
Works great in Node.js 8+ and in modern browsers. Also works in older browsers if you provide polyfills for Object.assign(), Object.freeze(), and String.fromCodePoint().
Provides helpful, detailed error messages with context when a document is not well-formed.
Mostly conforms to XML 1.0 (Fifth Edition) as a non-validating parser (see below for details).
Passes all relevant tests in the XML Conformance Test Suite.
It's fast, tiny, and has no dependencies.

Not Features

This parser is not a complete implementation of the XML specification because parts of the spec aren't very useful or aren't safe when the XML being parsed comes from an untrusted source. However, those parts of XML that are implemented behave as defined in the spec.

The following XML features are ignored by the parser and are not exposed in the document tree:

XML declarations
Document type definitions
Processing instructions

In addition, the only supported character encoding is UTF-8.

Examples

Basic Usage

const parseXml = require('@rgrove/parse-xml');
parseXml('<kittens fuzzy="yes">I like fuzzy kittens.</kittens>');

Output

{
  type: "document",
  children: [
    {
      type: "element",
      name: "kittens",
      attributes: {
        fuzzy: "yes"
      },
      children: [
        {
          type: "text",
          text: "I like fuzzy kittens."
        }
      ]
    }
  ]
}

Friendly Errors

When something goes wrong, parse-xml throws an error that tells you exactly what happened and shows you where the problem is so you can fix it.

parseXml('<foo><bar>baz</foo>');

Output

Error: Missing end tag for element bar (line 1, column 14)
  <foo><bar>baz</foo>
               ^

In addition to a helpful message, error objects have the following properties:

column Number

Column where the error occurred (1-based).
excerpt String

Excerpt from the input string that contains the problem.
line Number

Line where the error occurred (1-based).
pos Number

Character position where the error occurred relative to the beginning of the input (0-based).

API

`parseXml(xml: string, options?: object) => object`

Parses an XML document and returns an object tree.

Options

The following options may be provided as properties of the options argument:

ignoreUndefinedEntities Boolean (default: false)

When true, an undefined named entity like &bogus; will be left as is instead of causing a parse error.
preserveCdata Boolean (default: false)

When true, CDATA sections will be preserved in the document tree as nodes of type cdata. Otherwise CDATA sections will be represented as nodes of type text.
preserveComments Boolean (default: false)

When true, comments will be preserved in the document tree as nodes of type comment. Otherwise comments will not be included in the document tree.
resolveUndefinedEntity Function

When an undefined named entity is encountered, this function will be called with the entity as its only argument. It should return a string value with which to replace the entity, or null or undefined to treat the entity as undefined (which may result in a parse error depending on the value of ignoreUndefinedEntities).

Nodes

An XML document is parsed into a tree of node objects. Each node has the following common properties:

parent Object?

Reference to this node's parent node, or null if this node is the document node (which has no parent).
type String

Node type.

Each node also has a toJSON() method that returns a serializable representation of the node without the parent property (in order to avoid circular references). This means you can safely pass any node to JSON.stringify() to serialize it and its children as JSON.

`cdata`

A CDATA section. Only emitted when the preserveCdata option is true (by default, CDATA sections become text nodes).

Properties

text String

Unescaped text content of the CDATA section.

Example

<![CDATA[kittens are fuzzy & cute]]>

{
  type: "cdata",
  text: "kittens are fuzzy & cute",
  parent: { ... }
}

`comment`

A comment. Only emitted when the preserveComments option is true.

Properties

content String

Comment text.

Example

<!-- I'm a comment! -->

{
  type: "comment",
  content: "I'm a comment!",
  parent: { ... }
}

`document`

The top-level node of an XML document.

Properties

children Object[]

Array of child nodes.

Example

<root />

{
  type: "document",
  children: [
    {
      type: "element",
      name: "root",
      attributes: {},
      children: [],
      parent: { ... }
    }
  ],
  parent: null
}

`element`

An element.

Note that since parse-xml doesn't implement XML Namespaces, no special treatment is given to namespace prefixes in element and attribute names.

In other words, <foo:bar foo:baz="quux" /> will result in the element name "foo:bar" and the attribute name "foo:baz".

Properties

attributes Object

Hash of attribute names to values.

Attribute names in this object are always in alphabetical order regardless of their order in the document, and values are normalized and unescaped. Values are always strings.
children Object[]

Array of child nodes.
name String

Name of the element as given in the start and/or end tags.
preserveWhitespace Boolean?

This property will be set to true if the special xml:space attribute on this element or on the closest parent with an xml:space attribute has the value "preserve". This indicates that whitespace in the text content of this element should be preserved rather than normalized.

If neither this element nor any of its ancestors has an xml:space attribute set to "preserve", or if the closest xml:space attribute is set to "default", this property will not be defined.

Example

<kittens description="fuzzy &amp; cute">I &lt;3 kittens</kittens>

{
  type: "element",
  name: "kittens",
  attributes: {
    description: "fuzzy & cute"
  },
  children: [
    {
      type: "text",
      text: "I <3 kittens",
      parent: { ... }
    }
  ],
  parent: { ... }
}

`text`

Text content inside an element.

Properties

text String

Unescaped text content.

Example

kittens are fuzzy &amp; cute

{
  type: "text"
  text: "kittens are fuzzy & cute",
  parent: { ... }
}

Why another XML parser?

There are many XML parsers for Node, and some of them are good. However, most of them suffer from one or more of the following shortcomings:

Native dependencies.
Loose, non-standard, "works for me" parsing behavior that can lead to unexpected or even unsafe results when given input the author didn't anticipate.
Kitchen sink APIs that tightly couple a parser with DOM manipulation functions, a stringifier, or other tooling that isn't directly related to parsing.
Stream-based parsing. This is great in the rare case that you need to parse truly enormous documents, but can be a pain to work with when all you want is an object tree.
Poor error handling.
Too big or too Node-specific to work well in browsers.

parse-xml's goal is to be a small, fast, safe, reasonably compliant, non-streaming, non-validating, browser-friendly parser, because I think this is an under-served niche.

I think parse-xml demonstrates that it's not necessary to jettison the spec entirely or to write complex code in order to implement a small, fast XML parser.

Also, it was fun.

Benchmark

Here's how parse-xml stacks up against two comparable libraries, libxmljs (which is based on the native libxml library) and xmldoc (which is based on sax-js).

Node.js v10.1.0 / Darwin x64
Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz

                      Small document (291 bytes)
          27,143 op/s » libxmljs (native)
          67,938 op/s » parse-xml
          35,749 op/s » xmldoc (sax-js)

                      Medium document (72081 bytes)
             571 op/s » libxmljs (native)
             436 op/s » parse-xml
             236 op/s » xmldoc (sax-js)

                      Large document (1162464 bytes)
              50 op/s » libxmljs (native)
              33 op/s » parse-xml
              21 op/s » xmldoc (sax-js)


  Suites:  3
  Benches: 9
  Elapsed: 15,383.87 ms

See the parse-xml-benchmark repo for instructions on running this benchmark yourself.

License

ISC License

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
src		src
tests		tests
.babelrc.js		.babelrc.js
.editorconfig		.editorconfig
.eslintignore		.eslintignore
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
webpack.config.js		webpack.config.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parse-xml

Contents

Installation

Features

Not Features

Examples

Basic Usage

Friendly Errors

API

`parseXml(xml: string, options?: object) => object`

Options

Nodes

`cdata`

Properties

Example

`comment`

Properties

Example

`document`

Properties

Example

`element`

Properties

Example

`text`

Properties

Example

Why another XML parser?

Benchmark

License

About

Releases

Packages

Languages

License

rossj/parse-xml

Folders and files

Latest commit

History

Repository files navigation

parse-xml

Contents

Installation

Features

Not Features

Examples

Basic Usage

Friendly Errors

API

parseXml(xml: string, options?: object) => object

Options

Nodes

cdata

Properties

Example

comment

Properties

Example

document

Properties

Example

element

Properties

Example

text

Properties

Example

Why another XML parser?

Benchmark

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

`parseXml(xml: string, options?: object) => object`

`cdata`

`comment`

`document`

`element`

`text`

Packages