This repository historically contained a proposed specification for the DOM
innerText property, as well as tests. The specification has since moved into the HTML Standard, and the tests have moved to the Web Platform Tests project.
This repository is now kept for historical reference (via the commit history, issue tracker, and the contents of this README).
innerText was introduced by Internet Explorer but never specified in any official document. It was later implemented (incompatibly) by WebKit; this implementation was inherited by Blink. The intent of
innerText, as it has evolved, seems to be to return the "rendered text" of an element, similar to what browsers produce when equivalent HTML content is copied to the clipboard as plaintext.
innerText is implemented in popular browsers and is used by many Web sites. Therefore it should be specified. The goal of this repository was to produce a reasonably simple specification that is Web-compatible and that browser vendors are willing to implement.
The original intent of the
innerText setter seems to be that setting an HTML element's
innerText to a string will render that string visually. To that end, both Chrome and Edge (as of October 2015) convert
\r characters to
<br> elements. They also both convert a
\r\n pair to a single
<br>. Edge goes further, converting some whitespace characters to
to avoid whitespace trimming and collapsing — but Chrome does not.
There is already a simple and completely accurate way to have an HTML element render a string as-is: make the element
white-space:pre and set its
textContent. Therefore we specified
innerText to do the simplest thing compatible with what browsers do, i.e. Chrome's behavior.
The intent here seems to be that getting an element's
innerText returns its text "as rendered". All browsers seem to have implemented a different set of heuristics for approximating that, typically approximating some subset of CSS processing.
We can avoid specifying processing rules that duplicate and approximate some subset of CSS by specifying
innerText's getter in terms of the rendered text produced by CSS layout of the element. This won't work for elements which have no CSS boxes (e.g.
display:none elements and elements not in a document), but Webkit/Blink already bail out for those elements and just return their
textContent, so we followed suit.
Note that descendant nodes of most replaced elements (e.g.
<video> --- but not
<button>) are not rendered by CSS, strictly speaking, and therefore have no CSS boxes for the purposes of the specified algorithm.
In the algorithm, processing of text nodes is heavily CSS-based but processing of elements is mainly based on DOM structure and the computed value of
display. Alternatively we could have replaced DOM traversal with traversal of the CSS boxes generated by the element and its descendants. For example, that approach would naturally lead to generated content and shadow DOM content being included in
innerText, while the specified algorithm naturally excludes such content. That approach seems likely to be less Web-compatible than the specified algorithm, more surprising for Web developers in edge cases, and no simpler than the specified algorithm.
The specified algorithm is depends on computed styles and on layout (due to
::first-letter styles are implicitly honoured. Avoiding dependence on layout would add implementation complexity (to Gecko, at least), because it's tricky to compute the correct styles for text as if
::first-line did not apply.
Special handling of
<rp> is not in Chrome and probably not needed for Web compatibility. However the entire point of the
<rp> element is to make plaintext rendering of ruby intelligible, so it seems churlish to not support it. As specified, an
<rp> in a context where CSS boxes would not normally be created (e.g. inside a
<select>) will contribute its text, but there's no simple way to avoid that. In practice this is unlikely to be a problem.