Support document.write

See servo/servo#3704.

The argument to `document.write` is a sequence of UCS-2 code units and we need a way to interface this with the UTF-8 parser.  My plan is:

(**Edit**: Largely superseded by [this proposal](https://github.com/kmcallister/tendril))
- Convert to UTF-8 as soon as possible.
- Convert invalid surrogate sequences to U+FFFD 'REPLACEMENT CHARACTER'. This is a deviation from the spec, but nobody has objected strongly in the course of various discussions.  There was even talk of amending the spec to allow this behavior, since it's currently written under the assumption that all parsers use UCS-2 natively.
- If a `document.write` input ends with a leading surrogate, we can't convert it yet, so save this single `u16` in the `BufferQueue` alongside the UTF-8 buffers.
- If a `document.write` input starts with a trailing surrogate, and there's a saved leading surrogate in the `BufferQueue`, then replace both with the appropriate Unicode character as UTF-8.
- If the parser receives any other input and there's a saved leading surrogate, drop the saved surrogate and prepend U+FFFD to the input.  (This means that a script split an invalid surrogate sequence across multiple `document.write` calls, or wrote a lone leading surrogate and then finished.)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support document.write #6

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support document.write #6

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions