Skip to content

Support document.write #6

@kmcallister

Description

@kmcallister

See servo/servo#3704.

The argument to document.write is a sequence of UCS-2 code units and we need a way to interface this with the UTF-8 parser. My plan is:

(Edit: Largely superseded by this proposal)

  • Convert to UTF-8 as soon as possible.
  • Convert invalid surrogate sequences to U+FFFD 'REPLACEMENT CHARACTER'. This is a deviation from the spec, but nobody has objected strongly in the course of various discussions. There was even talk of amending the spec to allow this behavior, since it's currently written under the assumption that all parsers use UCS-2 natively.
  • If a document.write input ends with a leading surrogate, we can't convert it yet, so save this single u16 in the BufferQueue alongside the UTF-8 buffers.
  • If a document.write input starts with a trailing surrogate, and there's a saved leading surrogate in the BufferQueue, then replace both with the appropriate Unicode character as UTF-8.
  • If the parser receives any other input and there's a saved leading surrogate, drop the saved surrogate and prepend U+FFFD to the input. (This means that a script split an invalid surrogate sequence across multiple document.write calls, or wrote a lone leading surrogate and then finished.)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions