Skip to content

Latest commit

 

History

History
475 lines (280 loc) · 20.9 KB

DOM.md

File metadata and controls

475 lines (280 loc) · 20.9 KB

DOM

The DOM class, when instantiated, parses the HTML that is passed as the first argument, with the newly constructed instance acting as the document's root node. The DOM class extends Node, providing document instances with the full suite of Node methods and properties with a couple overrides. The properties and methods described below are only those that are unique to the DOM class or have been overridden from the implementation of the same name in the Node class.

Table of Contents

Construction

new DOM( htmlContent[, options] )

Parameters

  • htmlContent String

    The HTML to parse into an editable DOM object.

  • options Object (optional)

    An object with some or all of the following properties:

    • allowCustomRootElement Boolean (default: false)

      Whether to allow the document type Node to specify the tag name of the root Node. In HTML documents, the root Node is always an <html> element, but in XML documents, the <!DOCTYPE> node's name is used to specify which element is the root.

      For example, <!DOCTYPE doc> indicates the root node to be the <doc> element. However, for that to work, the <doc> element has to be the top most Node in the document's hierarchy.

    • allowSelfClosingSyntax Boolean (default: false)

      Whether to allow tags to be self-closing (ie. have a forward-slash "/" at the very end of the open tag instead of having a closing tag, like <div />). Useful for parsing XML.

    • allowCDATA Boolean (default: false)

      Whether to parse XML CDATA tags (like <![CDATA[ data ]]>). When this is false, any CDATA tags that are encountered will be parsed as comments.

    • allowProcessingInstructions Boolean (default: false)

      Whether to parse processing instructions (like <?instruction data?>). When this is false, any processing instructions that are encountered will be parsed as comments.

    • decodeEntities Boolean (default: false)

      Whether to decode HTML entities (eg. turning &amp; into &) while parsing HTML.

    • encodeEntities Boolean or RegExp (default: false)

      If specified as a Boolean, whether to encode HTML entities (eg. turning & into &amp;) when serializing nodes as text.

      If specified as a RegExp, the regular expression will be used to determine which characters (or sequences of characters) should be encoded as entities. The entities have to exist in this document's entityEncoder, or the match made by the RegExp will be ignored.

    • entities Entities (default: undefined)

      If specified, this will be the set of entities used during entity encoding and decoding. Leaving this undefined, or setting it to a falsy value, results in the set of default entities being used instead. The default set of entities can be changed by assigning a set of entities to EntityEncoder.defaultEntities.

    • collapseWhitespace Boolean (default: false)

      Whether to collapse multiple consecutive whitespace characters in text nodes into a single space character, mimicking how browsers render consecutive whitespace characters. This option is ignored if trimWhitespace is true.

      ℹ️ Note: DOM.minifyWhitespace() may be a better choice depending on your usage of collapseWhitespace.

    • trimWhitespace Boolean (default: false)

      Whether to remove whitespace characters from both ends of text nodes, which is useful for minifying HTML documents. When this option is true, the collapseWhitespace option is ignored.

      ℹ️ Note: DOM.minifyWhitespace() may be a better choice depending on your usage of trimWhitespace.

    • lowerAttributeCase Boolean (default: false)

      Whether to convert attribute names on tags to lower case. When this is false, attribute selectors will be case sensitive (ie. the selector [CaSe] will match the element <div CaSe></div>, but the selector [case] will not). This is false by default for performance reasons, but should be true if you want standards-compliant behaviour.

Properties

Non-standard

  • entityEncoder EntityEncoder (read-only)

    The EntityEncoder instance used by this DOM instance for entity encoding and decoding. Changing the entities of this EntityEncoder will only affect this document.

  • innerHTML String (override)

    Gets or sets the HTML markup of the entire document, including any <!DOCTYPE> node that may be present. When setting this with markup that either doesn't specify a <!DOCTYPE> or doesn't have an <html> element as the root node, the document's nodeType property will be DOCUMENT_FRAGMENT_NODE instead of DOCUMENT_NODE. When specifying a <!DOCTYPE>, the node that is named in the document type tag will be expected as the document's root instead of <html>.

  • outerHTML Null (override)

    This simply overrides Node.outerHTML, making it do nothing on DOM instances.

Semi-standard

  • doctype Node, Object, or Null — [standard] [MDN]

    Gets the document type node associated with the document if one is present, or null otherwise. Unlike the standard, you may set this property using a document type node (such as from DOM.createDocumentType()), or an Object with name (required), publicId (optional), and systemId (optional) properties as strings. You may also set this to null to remove an existing document type from a document.

    When set to an Object, if the object either doesn't have a name property, or the property's value isn't a string, the document type's name, publicId, and systemId properties will be set to empty strings.

  • documentElement Node (read-only) — [standard] [MDN]

    Gets the document's root Node, or null if there is no root Node (such as when the DOM instance represents a document fragment).

Standard

  • body Node — [standard] [MDN]

    Gets the document's <body> element, or null if one doesn't exist. When setting, if the node's tag name is either "BODY" or "FRAMESET", this will add the Node to the document's root node if the document doesn't already have a body element, or the existing body element will be replaced by the Node.

  • head Node (read-only) — [standard] [MDN]

    Gets the document's <head> element, or null if one doesn't exist.

  • title String — [standard] [MDN]

    Gets or sets the value of the document's <title> element if the document has a <head> element, creating the <title> element (when setting) if it doesn't exist.

Methods

document.createCDATASection( data )

[standard] [MDN]

Creates a new Node of type CDATA_SECTION_NODE.

Parameters

  • data String

    A String that is used as the contents of the CDATA section (eg. " data " in <![CDATA[ data ]]>).

Return Value

The new Node.

Exceptions

An Error is thrown if data contains the end sequence ]]>.


document.createComment( data )

[standard] [MDN]

Creates a new Node of type COMMENT_NODE.

Parameters

  • data String

    A String that is used as the contents of the comment.

    ⚠️ Caution: No check is performed for the sequence -->, which can result in the early termination of the comment when the document is output as text, making the remaining contents of the comment appear as text in the document.

Return Value

The new Node.


document.createDocumentType( name[, publicId[, systemId]] )

[standard] [MDN]

Creates a new Node of type DOCUMENT_TYPE_NODE.

Parameters

  • name String

    A String that is used as the document type's name (eg. "html" in <!DOCTYPE html>).

  • publicId String (optional)

    A String that is used as the document type's public identifier. If this isn't a non-empty string, an empty string will be used instead.

  • systemId String (optional)

    A String that is used as the document type's system identifier. If this isn't a non-empty string, an empty string will be used instead.

Return Value

The new Node.


document.createElement( tagName )

[standard] [MDN]

Creates a new Node of type ELEMENT_NODE.

Parameters

  • tagName String

    A String that specifies the tagName of the element (eg. <name></name>)

Return Value

The new Node, or undefined if the specified tagName parameter isn't a String or is an empty string.


document.createProcessingInstruction( target, data )

[standard] [MDN]

Creates a new Node of type PROCESSING_INSTRUCTION_NODE.

Parameters

  • target String

    A String representing the target of the processing instruction (eg. "target" in <?target?>).

  • data String

    A String containing the data of the processing instruction (eg. "data" in <?target data?>). This string can be anything, except the sequence ?> which indicates the end of the processing instruction node.

Return Value

The new Node.

Exceptions

An Error is thrown if either of the following is true:

  • target is invalid — it must be a String that is a valid XML name.
  • data contains the end sequence ?>.

document.createTextNode( text )

[standard] [MDN]

Creates a new Node of type TEXT_NODE.

Parameters

  • text String

    A String that is used as the contents of the new text node.

Return Value

The new Node.


document.getElementsByName( name )

[standard] [MDN]

Gets all elements (ELEMENT_NODE type nodes) which have a name attribute that matches the specified name parameter string.

Parameters

  • name String

    The name attribute of the elements to locate. The comparison is case sensitive.

Return Value

An Array of all elements in the document whose name attribute matches the specified name. The array can be empty if no elements matched.


document.minifyWhitespace( [inlineElements[, transforms[, userValue]]] )

[non-standard]

Minifies whitespace within the entire HTML document in a manner that closely replicates browser rendering behavior. This method replaces line breaks and tab characters with spaces, collapses multiple spaces to a single space, removes leading and trailing spaces inside block elements, and adjusts spaces inside and around inline elements accordingly.

The result is a document that maintains its visual appearance when rendered by a browser, but with reduced file size due to whitespace removal.

Parameters

  • inlineElements Array (optional)

    An Array of strings representing additional HTML tag names to treat as inline elements, supplementing the built-in list.

    Built-in Inline Elements: a, abbr, acronym, audio, b, bdi, bdo, big, button, cite, code, data, del, dfn, em, font, i, img, input, ins, kbd, label, mark, math, meter, nobr, noscript, object, output, picture, progress, q, rp, rt, rtc, ruby, s, samp, select, slot, small, span, strike, strong, sub, sup, svg, textarea, time, tt, u, var, video, wbr

  • transforms Object (optional)

    An Object containing methods for custom transformation of specific text elements.

    • inlineStyles( node, value, userValue )

      Transforms style attribute values.

    • style( node, value, userValue )

      Transforms inline CSS within style elements.

    • script( node, value, userValue )

      Transforms inline JavaScript within script elements.

    The return value of these methods replaces the original content. Returning null or undefined removes the element from the document.

    The arguments for the transformation methods are:

    • node Node

      The node whose value or attribute should be transformed.

      ⚠️ Caution: You should not remove this node (or any of its parent nodes) from its document from inside your transformation method or an error will occur. To remove this node, simply return null or undefined from your method and the node will be removed when it is safe to do so.

    • value String

      The value to be transformed.

    • userValue Any

      The value you passed, if any, to minifyWhitespace() in its userValue parameter.

  • userValue Any (optional)

    Any value you wish to pass to the transformation methods. This can be used for tracking any necessary context between transformations.

Return Value

Nothing. Use DOM.innerHTML to retrieve the modified HTML markup.

Example

document.minifyWhitespace( null, {
	inlineStyles()
	{
		// Remove all "style" attributes from every element.
		return null;
	},
	
	style( node, value )
	{
		// Remove all "style" elements that have the "debug" attribute.
		if ( node.hasAttribute( "debug" ) )
			return null;
		
		// Keep all other "style" elements as they are by simply returning the
		// CSS unchanged.
		return value;
	},
	
	// Minify JavaScript using "JSMinifier" and pass the file path as context.
	script( node, value, filePath )
	{
		// Create the minifier instance.
		const minifier = new JSMinifier( value, filePath );
		
		// Return the minified JavaScript code.
		return minifier.minifiedValue;
	}
}, filePath );

// Output the minified HTML document.
console.log( document.innerHTML );

In this example, we show how minifyWhitespace can be used to optimize an HTML document:

  • Removing Inline Styles: All inline "style" attributes from elements are eliminated.
  • Filtering Out Debug Styles: "style" elements marked with a "debug" attribute are removed.
  • Minifying JavaScript: JavaScript code within script tags is minified using a fictional library called "JSMinifier". This library requires the file path of the script for its processing.

While in this simple example it isn't really necessary, the userValue parameter is used to pass the file path to the "script" transformation method. This is particularly useful when your transformation methods are located in a separate file or module. By passing contextual information like file paths via userValue, you enable these external methods to access and use data that they wouldn’t otherwise have direct access to.

Finally, the processed HTML is output to the console. This showcases this method's capability to reduce file size while preserving the visual layout, which is especially useful for preparing HTML files for production environments.


Node.js Only (Non-standard)

The below convenience methods read and parse the file "lib/entities.json" (which includes all standard HTML 5 entities) and are only available when using FauxDOM in Node.js.

ℹ️ Note: As the standard set of entities is extremely comprehensive, using the standard set can lead to potentially undesirable output, such as encoding new-line characters (which would normally be ignored by browsers) as &NewLine;. This sort of unintended encoding of entities can lead to rather broken looking documents.

To avoid this, either create your document using the encodeEntities option as a RegExp that matches only the entities you care about, or use a custom set of entities that only defines the entities you care about.


document.importStandardEntities()

Sets the entities of this document's entityEncoder to the standard set of HTML 5 entities.

Return Value

Nothing.


DOM.importStandardEntities()

The static version of the above method. Sets EntityEncoder.defaultEntities to the standard set of HTML 5 entities.

Return Value

Nothing.