Skip to content

Recommend avoiding invalid surrogate pairs #12

@GULPF

Description

@GULPF

Unicode code points in the range U+D800 to U+DFFF are used for encoding UTF-16 surrogate pairs, and can not occur on their own in a valid Unicode sequence. However, the \uHHHH escape sequence can be used to construct such a an invalid Unicode sequence, e.g "\uDEAD".

The interpretation of invalid Unicode sequences like "\uDEAD" is not specified in the JSON, JSON5 or ES5 (I think) specs. However, it is specified in the ES6 spec:

A code unit that is in the range 0xD800 to 0xDFFF, but is not part of a surrogate pair, is interpreted as a code point with the same value.

This also matches the behavior of the JSON5 reference implementation. I think it's best if this is also clarified in the JSON5 spec.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions