Clarify UNICODE_ESCAPE valid token value #2123
Open
+4
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This clarifies the UNICODE_ESCAPE rule that the hex value must be a valid Unicode scalar value. This resolves the problem that a string like
"\u{ffffff}"is not a valid token, but the grammar did not reflect that.I don't see a practical way to define this with character ranges. The resulting expression is huge.
Note that this restriction means that the UNICODE_ESCAPE rule will not match an invalid value, and that all the places where UNICODE_ESCAPE is used, the preceding character must not be
\, which forces those rules to fail their match. In turn the only rules that contain UNICODE_ESCAPE have'or"characters, which won't match any other rule in the grammar, forcing them to fail the parse.If all those assumptions seem too fragile, then we can consider adding the cut operator just after the
\uso that the interpretation is clear that a failure to match the part from the opening brace is an immediate parse failure.