`SER` breaks high unicode values.

According to [the specification section 4.1.2](https://python-processing-unit.github.io/Prefix/SPECIFICATION.html#412-escape-sequences):

> The character `\` MUST begin an escape sequence in a `STR` literal.

> - `\uHHHH` = exactly four hexadecimal digits (`0-9`|`A-F`|`a-f`). Produces code point U+HHHH.
>
> - `\UHHHHHHHH` = exactly eight hexadecimal digits (`0-9`|`A-F`|`a-f`). Produces code point U+HHHHHHHH.

The `SER` builtin serializes strings to JSON format. In the implementation, however, `jb_append_json_string` in builtins.c processes the UTF-8 string byte-by-byte, escaping any byte >= 0x7F as `\u00xx` using the raw byte value. For multi-byte UTF-8 sequences (e.g. U+00E9 `é` encoded as bytes 0xC3 0xA9), this produces two separate escape sequences (`\u00c3\u00a9`) instead of a single Unicode escape for the codepoint. The test ser-strings.pre expects `SER("\u00E9")` to produce `\u00E9` (or lowercase variant), causing the assertion to fail.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`SER` breaks high unicode values. #153

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

SER breaks high unicode values. #153

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

`SER` breaks high unicode values. #153