Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accept underscores in unicode escapes #43716

Merged
merged 1 commit into from Sep 12, 2017

Conversation

@MaloJaffre
Copy link
Contributor

MaloJaffre commented Aug 7, 2017

Fixes #43692.

I don't know if this need an RFC, but at least the impl is here!

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

rust-highfive commented Aug 7, 2017

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @nikomatsakis (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

@MaloJaffre MaloJaffre force-pushed the MaloJaffre:_-in-literals branch from 47416d3 to 7d1fa02 Aug 7, 2017

///
/// At this point, we have already seen the \ and the u, the { is the current character. We
/// will read at least one digit, and up to 6, and pass over the }.
/// At this point, we have already seen the `\` and the `u`, the `{` is the current character. We

This comment has been minimized.

@kennytm

kennytm Aug 7, 2017

Member

The CI is unhappy with this line because it is too long 🙂

[00:03:14] tidy error: /checkout/src/libsyntax/parse/lexer/mod.rs:968: line longer than 100 chars
[00:03:15] some tidy checks failed
@kennytm

This comment has been minimized.

Copy link
Member

kennytm commented Aug 7, 2017

issue-43692.rs should be a run-pass test, not a compile-fail test. It should check (assert_eq) if '\u{10__FFFF}' and "\u{10_F0FF}foo\u{1_0_0_0}" are equivalent to their no-underscore counterparts.

@arielb1

This comment has been minimized.

Copy link
Contributor

arielb1 commented Aug 8, 2017

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Aug 9, 2017

@SimonSapin what do you think about this?

Also ping @rust-lang/lang
This probably doesn't need a full RFC, but will certainly need an FCP.

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Aug 9, 2017

Making numeric escape sequences consistent with integer literals makes sense to me. 👍

@joshtriplett

This comment has been minimized.

Copy link
Member

joshtriplett commented Aug 9, 2017

Please don't allow prefixes or suffixes, but otherwise this seems like a great idea.

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Aug 12, 2017

@MaloJaffre
Could you somehow share the lexing code between unicode escapes and normal hexadecimal literals to ensure the rules are identical?
For example, scan_digits can be reused for unicode escapes.
Unicode escapes are more restrictive, but the restrictions could be enforced after a unicode escape is lexed (this can also give better error reporting and recovery).

@MaloJaffre MaloJaffre force-pushed the MaloJaffre:_-in-literals branch from 14085a8 to 0bac86c Aug 13, 2017

@MaloJaffre

This comment has been minimized.

Copy link
Contributor Author

MaloJaffre commented Aug 13, 2017

Thanks for the suggestion @petrochenkov.
I've also rebased on master.

Edit: Travis failure looks spurious (workers failed to start)

loop {
match self.ch {
Some('}') => {
if valid && count == 0 {

This comment has been minimized.

@petrochenkov

petrochenkov Aug 17, 2017

Contributor

if count == 0 would give the same result

This comment has been minimized.

@MaloJaffre

MaloJaffre Aug 17, 2017

Author Contributor

No, because in the case \u{#}, we don't want to say that the escape is empty, so we check there was no invalid characters before.

This comment has been minimized.

@petrochenkov

petrochenkov Aug 17, 2017

Contributor

Ah, right, this is in a loop, okay then.

self.err_span_char(start_bpos,
self.pos,
"invalid character in unicode escape",
c);

This comment has been minimized.

@petrochenkov

petrochenkov Aug 17, 2017

Contributor

This error can now be reported a lot of times in case of unterminated unicode escapes.
It probably should be reported only the first time.

diag.struct_span_err(span, "invalid unicode character escape")
.help("unicode escape must be at most 10FFFF")
.emit();
None

This comment has been minimized.

@petrochenkov

petrochenkov Aug 17, 2017

Contributor

I think you can avoid an changing the return type to option here and just return something like Replacement character U+FFFD.

This comment has been minimized.

@petrochenkov

petrochenkov Aug 17, 2017

Contributor

@MaloJaffre
Could you also squash commits after updating the PR?

This comment has been minimized.

@MaloJaffre

MaloJaffre Aug 17, 2017

Author Contributor

Thanks for the review @petrochenkov!

Ok, I will shortly do another round of changes and squash everything.

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Aug 17, 2017

Implementation LGTM, modulo comments.

@rfcbot fcp merge

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Aug 17, 2017

I have no rights for @rfcbot, could someone start an FCP?

@MaloJaffre MaloJaffre force-pushed the MaloJaffre:_-in-literals branch from 0bac86c to d4e0e52 Aug 17, 2017

@MaloJaffre

This comment has been minimized.

Copy link
Contributor Author

MaloJaffre commented Aug 17, 2017

@petrochenkov Done.
I've also added a more precise help message for surrogates.

Edit: Travis failure looks spurious (OSX jobs failed to start).

@MaloJaffre

This comment has been minimized.

Copy link
Contributor Author

MaloJaffre commented Aug 25, 2017

Friendly ping @nikomatsakis, to start a FCP, if there are no concerns about the implementation.

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Aug 25, 2017

@MaloJaffre
There's one more thing that I forgot about - this needs a feature gate (unless the lang team decides it doesn't), #[feature(unicode_escape_underscores)] or something.
See 50ecee2 for an example of how to add it.
"Parse session" is available from the lexer, so it shouldn't be a problem (I think).

@SimonSapin

This comment has been minimized.

Copy link
Contributor

SimonSapin commented Aug 25, 2017

I think this is fine without a feature gate. (Though I’m not in any team that would make that decision.)

@aturon

This comment has been minimized.

Copy link
Member

aturon commented Aug 25, 2017

@rfcbot fcp merge

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Aug 25, 2017

Team member @aturon has proposed to merge this. The next step is review by the rest of the tagged teams:

No concerns currently listed.

Once these reviewers reach consensus, this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

self.bump();
count += 1;
if let Some('_') = self.ch {
// disallow leading `_`

This comment has been minimized.

@pnkfelix

pnkfelix Aug 30, 2017

Member

do we need a compile-fail test checking that leading _ is disallowed?

This comment has been minimized.

@MaloJaffre

MaloJaffre Aug 30, 2017

Author Contributor

There is already a parse-fail test that checks that, do I need to move it to compile-fail?

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Sep 1, 2017

🔔 This is now entering its final comment period, as per the review above. 🔔

@rfcbot

This comment has been minimized.

Copy link

rfcbot commented Sep 11, 2017

The final comment period is now complete.

@petrochenkov

This comment has been minimized.

Copy link
Contributor

petrochenkov commented Sep 11, 2017

The final comment period is now complete.

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Sep 11, 2017

📌 Commit d4e0e52 has been approved by petrochenkov

@bors

This comment has been minimized.

Copy link
Contributor

bors commented Sep 12, 2017

⌛️ Testing commit d4e0e52 with merge 11f64d8...

bors added a commit that referenced this pull request Sep 12, 2017

Auto merge of #43716 - MaloJaffre:_-in-literals, r=petrochenkov
Accept underscores in unicode escapes

Fixes #43692.

I don't know if this need an RFC, but at least the impl is here!
@bors

This comment has been minimized.

Copy link
Contributor

bors commented Sep 12, 2017

☀️ Test successful - status-appveyor, status-travis
Approved by: petrochenkov
Pushing 11f64d8 to master...

@bors bors merged commit d4e0e52 into rust-lang:master Sep 12, 2017

1 of 2 checks passed

continuous-integration/travis-ci/pr The Travis CI build could not complete due to an error
Details
homu Test successful
Details

@MaloJaffre MaloJaffre deleted the MaloJaffre:_-in-literals branch Sep 12, 2017

@chris-morgan

This comment has been minimized.

Copy link
Member

chris-morgan commented Sep 20, 2017

It is worth noting that most syntax highlighters will need updating to support this. (I just did Vim.)

We need something like a mailing list for syntax highlighters where syntax changes can be announced.

Regular expression highlighters will now need something like \\u\{(?:\x_*){1,6}\}.

chris-morgan added a commit to rust-lang/rust.vim that referenced this pull request Sep 20, 2017

gnomesysadmins pushed a commit to GNOME/gtksourceview that referenced this pull request Jun 13, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.