Skip to content

Parser Output

Marshall Lochbaum edited this page Jul 21, 2016 · 2 revisions

htmllint uses htmlparser2 to parse its input. Because we need more information than the parser returns by default, htmllint calls the parser with a modified version of DomHandler, the class that uses htmlparser2 events to build a DOM structure. A specification for the output of our modified parser follows.

The parser for htmllint should accept a string and output an array of structures (DOM objects) of the following form:

// A DOM OBJECT
{
  // element information
  "type": <string>,
  "name": <string>,
  "index": <number>,

  // attribute information 
  "attribs": <object>, // detailed below
  "attribsArr": <string>[], // currently NOT set - see DomBuilder.js
  "dupes": <string>[],

  // tree information
  "children": <DOM>[],
  "next": <DOM>,
  "prev": <DOM>,
  "parent": <DOM>,

  // tag information ('close' information is included if element is non-void)
  "open": <string>,
  "openIndex": <number>,
  "openLineCol": <array>,
  "close": <string>,
  "closeIndex": <array>,
  "closeLineCol": <number>,
}
// AN ATTRIBS OBJECT
[
  'attribName1': {
      'value': <string>,
      'nameIndex': <number>,
      'valueIndex': <number>,
      'attributeContext': <string>,
      'nameLineCol': <array>,
      'valueLineCol': <array>,
  },
  'attribName2': {
      'value': <string>,
      'nameIndex': <number>,
      'valueIndex': <number>,
      'attributeContext': <string>,
      'nameLineCol': <array>,
      'valueLineCol': <array>,
  },
]
// A TEXT OBJECT
{
  "type": <string>,
  "index": <number>,
  "lineCol": <array>, // note: this is named 'lineCol', NOT 'openLineCol' or 'closeLineCol'!!
  "next": <DOM>,
  "prev": <DOM>,
  "parent": <DOM>,
}

DOM Properties

Element Information

type

The type of element - taken from fb55/domelementtype
Currently: [text, directive, comment, script, style, tag, cdata]

  • Example: 'text'

name

The name given in the open tag.

  • Example: 'div'

index

The index of the '<' character for the open tag of the element.

  • Example: 6

Attribute Information

attribs

An associative array of the attributes for the DOM element. An example is above.
To get the value for an attribute, access element.attribs['name'].value

attribsArr

A normal array of the attribute names for guaranteed in-order traversal, no duplicates.

  • Example: ['div', 'p', 'a']

dupes

A normal array of any duplicated attribute names.

  • Example: ['class']

Tree Information

children

A list of DOM objects contained in this DOM object, guaranteed in-order.

next

The next sibling of this DOM element.

prev

The next sibling of this DOM element.

parent

The parent of this DOM element.

Tag Information

open

The string contained between < and > of the opening tag.

  • Example: 'div class="hello" id="dogs"'

openIndex

The index of the '<' character in the open tag of the element.

  • Example: 36

openLineCol

An array of the line and column that corresponds to the openIndex.

  • Example: [2,2]

close

Note: only used if an end tag exists on the element (by checking if it is non-void)
The string contained between </ and > of the closing tag.

  • Example: 'div'

closeIndex

Note: only used if an end tag exists on the element (by checking if it is non-void)
The index of the '<' character in the closing tag of the element.

  • Example: 36

closeLineCol

Note: only used if an end tag exists on the element (by checking if it is non-void)
An array of the line and column that corresponds to the closeIndex.

  • Example: [2,2]

Representing text

Text is represented as in the example below. It uses a special 'lineCol' property instead of an 'openLineCol' or 'closeLineCol' property, but the array is the same structure.

Example input and output

Input:

<body prop="value" prop2='val'><div id="Hello, World!">
  Some text</dIv>
  <div>abcd<b/>efgh</div>
</body>

Output:

body = {
  location: { "line": 1, "column": 1 },
  name: "body",
  attrib: { prop: "value", prop2: "val" },
  parent: null,
  open: "body prop=\"value\" prop2='val'",
  close: "body",
  content: [div1, div2],
  prev: ""
};

div1 = {
  location: { "line": 1, "column": 33 },
  name: "div",
  attrib: { id: "Hello, World!" },
  parent: body,
  open: 'div id="Hello, World!"',
  close: "dIv",
  content: [text1],
  prev: ""
};

div2 = {
  location: { "line": 3, "column": 3 },
  name: "div",
  attrib: {},
  parent: body,
  open: "div",
  close: "div",
  content: [btag, text2],
  prev: "\n  "
};

text1 = {
  location: { "line": 2, "column": 3 },
  prev: "\n  Some text"
};

btag = {
  location: { "line": 3, "column": 12 },
  name: "b",
  attrib: {},
  parent: body,
  open: "b",
  prev: "abcd"
};

text2 = {
  location: { "line": 2, "column": 16 },
  prev: "efgh"
};