Clarify UNICODE_ESCAPE valid token value #2123

ehuss · 2025-12-20T21:50:09Z

This clarifies the UNICODE_ESCAPE rule that the hex value must be a valid Unicode scalar value. This resolves the problem that a string like "\u{ffffff}" is not a valid token, but the grammar did not reflect that.

I don't see a practical way to define this with character ranges. The resulting expression is huge.

Note that this restriction means that the UNICODE_ESCAPE rule will not match an invalid value, and that all the places where UNICODE_ESCAPE is used, the preceding character must not be \, which forces those rules to fail their match. In turn the only rules that contain UNICODE_ESCAPE have ' or " characters, which won't match any other rule in the grammar, forcing them to fail the parse.

If all those assumptions seem too fragile, then we can consider adding the cut operator just after the \u so that the interpretation is clear that a failure to match the part from the opening brace is an immediate parse failure.

This clarifies the UNICODE_ESCAPE rule that the hex value must be a valid Unicode scalar value. This resolves the problem that a string like `"\u{ffffff}"` is not a valid token, but the grammar did not reflect that. I don't see a practical way to define this with character ranges. The resulting expression is huge. Note that this restriction means that the UNICODE_ESCAPE rule will not match an invalid value, and that all the places where UNICODE_ESCAPE is used, the preceding character must *not* be `\`, which forces those rules to fail their match. In turn the only rules that contain UNICODE_ESCAPE have `'` or `"` characters, which won't match any other rule in the grammar, forcing them to fail the parse. If all those assumptions seem too fragile, then we can consider adding the [cut operator](rust-lang#2104) just after the `\u` so that the interpretation is clear that a failure to match the part from the opening brace is an immediate parse failure.

rustbot added the S-waiting-on-review Status: The marked PR is awaiting review from a maintainer label Dec 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Clarify UNICODE_ESCAPE valid token value #2123

Clarify UNICODE_ESCAPE valid token value #2123

ehuss commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Clarify UNICODE_ESCAPE valid token value #2123

Are you sure you want to change the base?

Clarify UNICODE_ESCAPE valid token value #2123

Conversation

ehuss commented Dec 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants