Draft specification for DOM "innerText" property
HTML
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
LICENSE
README.md
index.html

README.md

innerText

This repository historically contained a proposed specification for the DOM innerText property, as well as tests. The specification has since moved into the HTML Standard, and the tests have moved to the Web Platform Tests project.

This repository is now kept for historical reference (via the commit history, issue tracker, and the contents of this README).

History

innerText was introduced by Internet Explorer but never specified in any official document. It was later implemented (incompatibly) by WebKit; this implementation was inherited by Blink. The intent of innerText, as it has evolved, seems to be to return the "rendered text" of an element, similar to what browsers produce when equivalent HTML content is copied to the clipboard as plaintext.

There is no "one true way" to convert HTML, or rich text in general, to plain text. Such conversion can generally be implemented effectively in Javascript with existing DOM APIs. Therefore, there is no need to have this feature in the Web platform, and we might be better off if it was delegated to libraries. However, innerText is implemented in popular browsers and is used by many Web sites. Therefore it should be specified. The goal of this repository was to produce a reasonably simple specification that is Web-compatible and that browser vendors are willing to implement.

Compatibility Notes

Setting innerText

The original intent of the innerText setter seems to be that setting an HTML element's innerText to a string will render that string visually. To that end, both Chrome and Edge (as of October 2015) convert \n and \r characters to <br> elements. They also both convert a \r\n pair to a single <br>. Edge goes further, converting some whitespace characters to &nbsp; to avoid whitespace trimming and collapsing — but Chrome does not.

There is already a simple and completely accurate way to have an HTML element render a string as-is: make the element white-space:pre and set its textContent. Therefore we specified innerText to do the simplest thing compatible with what browsers do, i.e. Chrome's behavior.

Getting innerText

The intent here seems to be that getting an element's innerText returns its text "as rendered". All browsers seem to have implemented a different set of heuristics for approximating that, typically approximating some subset of CSS processing.

We can avoid specifying processing rules that duplicate and approximate some subset of CSS by specifying innerText's getter in terms of the rendered text produced by CSS layout of the element. This won't work for elements which have no CSS boxes (e.g. display:none elements and elements not in a document), but Webkit/Blink already bail out for those elements and just return their textContent, so we followed suit.

Note that descendant nodes of most replaced elements (e.g. <textarea>, <select>, and <video> --- but not <button>) are not rendered by CSS, strictly speaking, and therefore have no CSS boxes for the purposes of the specified algorithm.

In the algorithm, processing of text nodes is heavily CSS-based but processing of elements is mainly based on DOM structure and the computed value of display. Alternatively we could have replaced DOM traversal with traversal of the CSS boxes generated by the element and its descendants. For example, that approach would naturally lead to generated content and shadow DOM content being included in innerText, while the specified algorithm naturally excludes such content. That approach seems likely to be less Web-compatible than the specified algorithm, more surprising for Web developers in edge cases, and no simpler than the specified algorithm.

The specified algorithm is depends on computed styles and on layout (due to ::first-line affecting text-transform) --- ::first-line and ::first-letter styles are implicitly honoured. Avoiding dependence on layout would add implementation complexity (to Gecko, at least), because it's tricky to compute the correct styles for text as if ::first-line did not apply.

Special handling of <rp> is not in Chrome and probably not needed for Web compatibility. However the entire point of the <rp> element is to make plaintext rendering of ruby intelligible, so it seems churlish to not support it. As specified, an <rp> in a context where CSS boxes would not normally be created (e.g. inside a <select>) will contribute its text, but there's no simple way to avoid that. In practice this is unlikely to be a problem.

References