Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce unescape module #60261

Merged
merged 2 commits into from May 6, 2019

Conversation

Projects
None yet
8 participants
@matklad
Copy link
Member

commented Apr 25, 2019

A WIP PR to gauge early feedback

Currently, we deal with escape sequences twice: once when we lex a string, and a second time when we unescape literals. Note that we also produce different sets of diagnostics in these two cases.

This PR aims to remove this duplication, by introducing a new unescape module as a single source of truth for character escaping rules.

I think this would be a useful cleanup by itself, but I also need this for #59706.

In the current state, the PR has unescape module which fully (modulo bugs) deals with string and char literals. I am quite happy about the state of this module

What this PR doesn't have yet are:

  • handling of byte and byte string literals (should be simple to add)
  • good diagnostics
  • actual removal of code from lexer (giant scan_char_or_byte should go away completely)
  • performance check
  • general cleanup of the new code

Diagnostics will be the most labor-consuming bit here, but they are mostly a question of just correctly adjusting spans to sub-tokens. The current setup for diagnostics is that unescape produces a plain old enum with various problems, and they are rendered into Handler separately. This bit is not actually required (it is possible to just pass the Handler in), but I like the separation between diagnostics and logic this approach imposes, and such separation should again be useful for #59706

cc @eddyb , @petrochenkov

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Apr 25, 2019

r? @eddyb

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Apr 25, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:02400c06:start=1556199436669093120,finish=1556199437433886678,duration=764793558
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---

[00:03:55] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:03:55] tidy error: /checkout/src/libsyntax/parse/lexer/mod.rs:1456: line longer than 100 chars
[00:03:55] tidy error: /checkout/src/libsyntax/parse/unescape_error_reporting.rs:10: line longer than 100 chars
[00:03:55] tidy error: /checkout/src/libsyntax/parse/unescape_error_reporting.rs:57: TODO is deprecated; use FIXME
[00:03:57] some tidy checks failed
[00:03:57] 
[00:03:57] 
[00:03:57] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:03:57] 
[00:03:57] 
[00:03:57] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:57] Build completed unsuccessfully in 0:00:48
[00:03:57] Build completed unsuccessfully in 0:00:48
[00:03:57] Makefile:67: recipe for target 'tidy' failed
[00:03:57] make: *** [tidy] Error 1
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:0b5480ef
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Thu Apr 25 13:41:26 UTC 2019
---
travis_time:end:033982d0:start=1556199687689453400,finish=1556199687694864454,duration=5411054
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:04cfeb68
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:06b813f7
travis_time:start:06b813f7
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:0401c513
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@matklad matklad changed the title introduce unescape module WIP: introduce unescape module Apr 25, 2019

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Apr 25, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:10f74b48:start=1556220154917281636,finish=1556220157524918567,duration=2607636931
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---

[00:03:38] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:03:38] tidy error: /checkout/src/libsyntax/parse/lexer/mod.rs:1456: line longer than 100 chars
[00:03:38] tidy error: /checkout/src/libsyntax/parse/unescape_error_reporting.rs:10: line longer than 100 chars
[00:03:38] tidy error: /checkout/src/libsyntax/parse/unescape_error_reporting.rs:57: TODO is deprecated; use FIXME
[00:03:40] some tidy checks failed
[00:03:40] 
[00:03:40] 
[00:03:40] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:03:40] 
[00:03:40] 
[00:03:40] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:40] Build completed unsuccessfully in 0:00:44
[00:03:40] Build completed unsuccessfully in 0:00:44
[00:03:40] Makefile:67: recipe for target 'tidy' failed
[00:03:40] make: *** [tidy] Error 1
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:1600b5d4
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Thu Apr 25 19:26:28 UTC 2019
---
travis_time:end:0c392e75:start=1556220389503326639,finish=1556220389507923003,duration=4596364
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:23cccc3a
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:04596812
travis_time:start:04596812
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:2d50cf26
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Apr 28, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:065c394c:start=1556464050685635777,finish=1556464144389422280,duration=93703786503
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---

[00:03:32] travis_fold:start:tidy
travis_time:start:tidy
tidy check
[00:03:32] tidy error: /checkout/src/libsyntax/parse/lexer/mod.rs:1181: line longer than 100 chars
[00:03:32] tidy error: /checkout/src/libsyntax/parse/unescape_error_reporting.rs:10: line longer than 100 chars
[00:03:32] tidy error: /checkout/src/libsyntax/parse/unescape_error_reporting.rs:57: TODO is deprecated; use FIXME
[00:03:34] some tidy checks failed
[00:03:34] 
[00:03:34] 
[00:03:34] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/tidy" "/checkout/src" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0/bin/cargo" "--no-vendor" "--quiet"
[00:03:34] 
[00:03:34] 
[00:03:34] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test src/tools/tidy
[00:03:34] Build completed unsuccessfully in 0:00:44
[00:03:34] Build completed unsuccessfully in 0:00:44
[00:03:34] make: *** [tidy] Error 1
[00:03:34] Makefile:67: recipe for target 'tidy' failed
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:075162db
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Sun Apr 28 15:12:48 UTC 2019
---
travis_time:end:27d9f214:start=1556464368987353964,finish=1556464368991966173,duration=4612209
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:05d3e0d4
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:19a1c122
travis_time:start:19a1c122
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:2260a790
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Apr 28, 2019

Meta: for reviewing convenience it's better to update UI test outputs and satisfy tidy to make CI green, even if the changes in test results are temporarily wrong / intended to disappear.
This way it's clear how exactly they are wrong and what still needs to be fixed.

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Apr 28, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:08b9b73e:start=1556477280331010720,finish=1556477363078024064,duration=82747013344
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---
[00:24:48]    Compiling panic_abort v0.0.0 (/checkout/src/libpanic_abort)
[00:24:54]    Compiling rustc-std-workspace-alloc v1.0.0 (/checkout/src/tools/rustc-std-workspace-alloc)
[00:24:54]    Compiling panic_unwind v0.0.0 (/checkout/src/libpanic_unwind)
[00:24:54]    Compiling hashbrown v0.3.0
[00:24:55] error: FIXME: OutOfRangeHexEscape
[00:24:55]      |
[00:24:55]      |
[00:24:55] 1349 |         let s = CString::new(&b"abc\x01\x02\n\xE2\x80\xA6\xFF"[..]).unwrap();
[00:24:55] 
[00:24:55] error: aborting due to previous error
[00:24:55] 
[00:24:55] error: Could not compile `std`.

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@matklad

This comment has been minimized.

Copy link
Member Author

commented Apr 28, 2019

Apparently, running ./x.py test src/test/ui --bless is not enough to make tests green :(

@matklad matklad force-pushed the matklad:one-escape branch 2 times, most recently from a39c555 to f91fbdc Apr 28, 2019

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Apr 28, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:2098fe88:start=1556482979246652902,finish=1556483063838188965,duration=84591536063
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---
[00:26:44]    Compiling rustc-demangle v0.1.10
[00:26:49]    Compiling rustc-std-workspace-alloc v1.0.0 (/checkout/src/tools/rustc-std-workspace-alloc)
[00:26:49]    Compiling panic_unwind v0.0.0 (/checkout/src/libpanic_unwind)
[00:26:49]    Compiling hashbrown v0.3.0
[00:26:51] error: this form of character escape may only be used with characters in the range [\x00-\x7f]
[00:26:51]      |
[00:26:51]      |
[00:26:51] 1349 |         let s = CString::new(&b"abc\x01\x02\n\xE2\x80\xA6\xFF"[..]).unwrap();
[00:26:51] 
[00:26:51] error: aborting due to previous error
[00:26:51] 
[00:26:51] error: Could not compile `std`.

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Apr 28, 2019

Question: what happens if a literal is lexed, but never "parsed properly"?
For example, if it's passed to a macro that accepts tts and throws them away.
The errors for incorrect escapes, etc, should be reported in that case as well.

(P.S. I haven't reviewed everything yet, will continue tomorrow.)

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Apr 28, 2019

@matklad

This comment has been minimized.

Copy link
Member Author

commented Apr 29, 2019

Question: what happens if a literal is lexed, but never "parsed properly"?

Good question! Given that diag: Option<(Span, &Handler)> argument to char_lit function, I was under the impression that we always parse literals properly. Turns out that even today we don't do that, so existing code is buggy.

The following compiles, while it shouldn't (the 6F literal is out of range for char):

macro_rules! erase {
    ($($tt:tt)*) => {}
}

fn main() {
    erase! {
        '\u{FFFFFF}'
    }
}

playground

If we pursue the approach in this PR, then we should run unescape_* family of functions twice: once in the lexer, where we just report errors and disregard escaped characters, and once in the parser, where we do the opposite and ignore errors, but collect unescaped literals. That means that we will be able to remove that diag: Option argument (indeed, "optionally" reporting diagnostics seems like a sure way to have bugs)

@matklad

This comment has been minimized.

Copy link
Member Author

commented Apr 29, 2019

Hm, or is the above example an expected behavior? We don't check ranges of integer literals, for example:

macro_rules! erase {
    ($($tt:tt)*) => {}
}

fn main() {
    erase!(999u8);
}

for chars, we do check that there are at most six hex digits in the lexer, but we only do precise check for range and surrogates in the parser, which seems somewhat arbitrary.

@matklad matklad force-pushed the matklad:one-escape branch from f91fbdc to 072c0fa Apr 29, 2019

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Apr 29, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:00e56881:start=1556531611442584547,finish=1556531707530144628,duration=96087560081
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---
[00:25:33]    Compiling panic_abort v0.0.0 (/checkout/src/libpanic_abort)
[00:25:38]    Compiling rustc-std-workspace-alloc v1.0.0 (/checkout/src/tools/rustc-std-workspace-alloc)
[00:25:38]    Compiling panic_unwind v0.0.0 (/checkout/src/libpanic_unwind)
[00:25:38]    Compiling hashbrown v0.3.0
[00:25:39] error: this form of character escape may only be used with characters in the range [\x00-\x7f]
[00:25:39]      |
[00:25:39]      |
[00:25:39] 1349 |         let s = CString::new(&b"abc\x01\x02\n\xE2\x80\xA6\xFF"[..]).unwrap();
[00:25:39] 
[00:25:39] error: aborting due to previous error
[00:25:39] 
[00:25:39] error: Could not compile `std`.
---
travis_time:end:2bbd16f7:start=1556533257014088233,finish=1556533257019279371,duration=5191138
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:1349e401
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:06c40e0a
travis_time:start:06c40e0a
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@eddyb

This comment has been minimized.

Copy link
Member

commented Apr 30, 2019

@matklad I think for out of range integer literals, we warn and truncate later in the compilation, but there's not much we can do for char (there is no meaningful truncation to speak of).

That said, maybe literals should be checked by AST -> HIR lowering, i.e. only unescaped once, to store both versions, or just the unescaped one, in HIR?

One notable exception to "AST literals can be opaque, only HIR needs unescaping" is string literals used in attributes (such as #[path = "foo\\bar\\baz.rs"] mod baz;).

@matklad

This comment has been minimized.

Copy link
Member Author

commented Apr 30, 2019

That said, maybe literals should be checked by AST -> HIR lowering, i.e. only unescaped once, to store both versions, or just the unescaped one, in HIR?

I think we should do validation before macro expansion, for two reasons at least:

  • macro_rules that drop tokens can be used to skip AST -> HIR validation, and that seems bad
  • proc_macros should be able to assume that the tokens they get are well-formed

As for actual unescapeing, agree that, abstractly, it makes sense to do it as late as possible. Ideally (not necessary practically), running cargo check should not construct unescapped strings at all. However, changing the place where unescaping occurs is probably out of scope for this PR.

So my current plan is:

  1. run the unescape_* functions twice, once for validation, in lexer, once for actual unescaping, in the parser
  2. treat \u{fff_fff} as a lexical error (that will require lang team sign off and a crate run) (after thinking about this more, I come to the conclusion that chars and integers are really different: to validate ints, we need type inference results. chars we can validate lexically, and something like "hello \u{lone surrogate} world" just doesn't make sense to me)

@matklad matklad force-pushed the matklad:one-escape branch from 072c0fa to a3fd9bd Apr 30, 2019

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Apr 30, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed on Travis (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
travis_time:end:05a691b8:start=1556638388505783147,finish=1556638473400023216,duration=84894240069
$ git checkout -qf FETCH_HEAD
travis_fold:end:git.checkout

Encrypted environment variables have been removed for security reasons.
See https://docs.travis-ci.com/user/pull-requests/#pull-requests-and-security-restrictions
$ export SCCACHE_BUCKET=rust-lang-ci-sccache2
$ export SCCACHE_REGION=us-west-1
$ export GCP_CACHE_BUCKET=rust-lang-ci-cache
$ export AWS_ACCESS_KEY_ID=AKIA46X5W6CZEJZ6XT55
---
[01:14:58] ..............................................ii...i..ii............................................ 3500/5475
[01:15:02] .................................................................................................... 3600/5475
[01:15:06] .................................................................................................... 3700/5475
[01:15:10] .....................................................ii............................................. 3800/5475
[01:15:12] ...........................................................................iF....................... 3900/5475
[01:15:14] .............................................F...................................................... 4000/5475
[01:15:16] ..............F....................i................................................................ 4100/5475
[01:15:30] .................................................................................................... 4300/5475
[01:15:34] .................................................................................................... 4400/5475
[01:15:37] .................................................................................................... 4500/5475
[01:15:41] .................................................................................................... 4600/5475
---
[01:16:15] failures:
[01:16:15] 
[01:16:15] ---- [ui] ui/parser/byte-string-literals.rs stdout ----
[01:16:15] 
[01:16:15] error: /checkout/src/test/ui/parser/byte-string-literals.rs:3: expected error not found: unknown byte escape
[01:16:15] 
[01:16:15] error: /checkout/src/test/ui/parser/byte-string-literals.rs:6: expected error not found: unknown byte escape
[01:16:15] 
[01:16:15] error: /checkout/src/test/ui/parser/byte-string-literals.rs:7: expected error not found: invalid character in numeric character escape: Z
[01:16:15] error: /checkout/src/test/ui/parser/byte-string-literals.rs:8: expected error not found: byte constant must be ASCII
[01:16:15] 
[01:16:15] error: 0 unexpected errors found, 4 expected errors not found
[01:16:15] status: exit code: 1
[01:16:15] status: exit code: 1
[01:16:15] command: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/src/test/ui/parser/byte-string-literals.rs" "-Zthreads=1" "--target=x86_64-unknown-linux-gnu" "--error-format" "json" "-Zui-testing" "-C" "prefer-dynamic" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/parser/byte-string-literals/a" "-Crpath" "-O" "-Zunstable-options" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-Z" "continue-parse-after-error" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/parser/byte-string-literals/auxiliary" "-A" "unused"
[01:16:15]     Error {
[01:16:15]         line_num: 3,
[01:16:15]         kind: Some(
[01:16:15]             Error,
[01:16:15]             Error,
[01:16:15]         ),
[01:16:15]         msg: "unknown byte escape",
[01:16:15]     Error {
[01:16:15]         line_num: 6,
[01:16:15]         kind: Some(
[01:16:15]             Error,
[01:16:15]             Error,
[01:16:15]         ),
[01:16:15]         msg: "unknown byte escape",
[01:16:15]     Error {
[01:16:15]         line_num: 7,
[01:16:15]         kind: Some(
[01:16:15]             Error,
[01:16:15]             Error,
[01:16:15]         ),
[01:16:15]         msg: "invalid character in numeric character escape: Z",
[01:16:15]     Error {
[01:16:15]         line_num: 8,
[01:16:15]         kind: Some(
[01:16:15]             Error,
[01:16:15]             Error,
[01:16:15]         ),
[01:16:15]         msg: "byte constant must be ASCII",
[01:16:15] ]
[01:16:15] 
[01:16:15] thread '[ui] ui/parser/byte-string-literals.rs' panicked at 'explicit panic', src/tools/compiletest/src/runtest.rs:1402:13
[01:16:15] note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
[01:16:15] note: Run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
[01:16:15] 
[01:16:15] ---- [ui] ui/parser/issue-23620-invalid-escapes.rs stdout ----
[01:16:15] diff of stderr:
[01:16:15] 
[01:16:15] 14   --> $DIR/issue-23620-invalid-escapes.rs:10:15
[01:16:15] 15    |
[01:16:15] 16 LL |     let _ = b'\u';
[01:16:15] -    |               ^^ incorrect unicode escape sequence
[01:16:15] -    |
[01:16:15] -    = help: format of unicode escape sequences is `\u{...}`
[01:16:15] - error: unicode escape sequences cannot be used as a byte or in a byte string
[01:16:15] -   --> $DIR/issue-23620-invalid-escapes.rs:10:15
[01:16:15] -    |
[01:16:15] -    |
[01:16:15] - LL |     let _ = b'\u';
[01:16:15] 26 
[01:16:15] 26 
[01:16:15] 27 error: numeric character escape is too short
[01:16:15] -   --> $DIR/issue-23620-invalid-escapes.rs:14:17
[01:16:15] +   --> $DIR/issue-23620-invalid-escapes.rs:14:15
[01:16:15] 29    |
[01:16:15] 29    |
[01:16:15] 30 LL |     let _ = b'\x5';
[01:16:15] +    |               ^^^
[01:16:15] 32 
[01:16:15] 32 
[01:16:15] 33 error: invalid character in numeric character escape: x
[01:16:15] 
[01:16:15] 
[01:16:15] 36 LL |     let _ = b'\xxy';
[01:16:15] 38 
[01:16:15] 38 
[01:16:15] - error: invalid character in numeric character escape: y
[01:16:15] -    |
[01:16:15] -    |
[01:16:15] - LL |     let _ = b'\xxy';
[01:16:15] - 
[01:16:15] - 
[01:16:15] 45 error: numeric character escape is too short
[01:16:15] +   --> $DIR/issue-23620-invalid-escapes.rs:21:14
[01:16:15] 47    |
[01:16:15] 48 LL |     let _ = '\x5';
[01:16:15] -    |                ^
[01:16:15] -    |                ^
[01:16:15] +    |              ^^^
[01:16:15] 50 
[01:16:15] 51 error: invalid character in numeric character escape: x
[01:16:15] 
[01:16:15] 
[01:16:15] 54 LL |     let _ = '\xxy';
[01:16:15] 56 
[01:16:15] 56 
[01:16:15] - error: invalid character in numeric character escape: y
[01:16:15] -    |
[01:16:15] -    |
[01:16:15] - LL |     let _ = '\xxy';
[01:16:15] - 
[01:16:15] 63 error: unicode escape sequences cannot be used as a byte or in a byte string
[01:16:15] 64   --> $DIR/issue-23620-invalid-escapes.rs:28:15
[01:16:15] 65    |
[01:16:15] 65    |
[01:16:15] 
[01:16:15] 66 LL |     let _ = b"\u{a4a4} \xf \u";
[01:16:15] +    |               ^^^^^^^^^^^^^^^
[01:16:15] 68 
[01:16:15] 68 
[01:16:15] 69 error: invalid character in numeric character escape:  
[01:16:15] 
[01:16:15] 73    |                           ^
[01:16:15] 74 
[01:16:15] 75 error: incorrect unicode escape sequence
[01:16:15] 75 error: incorrect unicode escape sequence
[01:16:15] -   --> $DIR/issue-23620-invalid-escapes.rs:28:28
[01:16:15] +   --> $DIR/issue-23620-invalid-escapes.rs:28:15
[01:16:15] 77    |
[01:16:15] 78 LL |     let _ = b"\u{a4a4} \xf \u";
[01:16:15] -    |                            ^^ incorrect unicode escape sequence
[01:16:15] -    |
[01:16:15] -    = help: format of unicode escape sequences is `\u{...}`
[01:16:15] 82 
[01:16:15] - error: unicode escape sequences cannot be used as a byte or in a byte string
[01:16:15] -   --> $DIR/issue-23620-invalid-escapes.rs:28:28
[01:16:15] -    |
[01:16:15] -    |
[01:16:15] - LL |     let _ = b"\u{a4a4} \xf \u";
[01:16:15] - 
[01:16:15] - 
[01:16:15] 89 error: invalid character in numeric character escape:  
[01:16:15] 91    |
[01:16:15] 
[01:16:15] 
[01:16:15] 92 LL |     let _ = "\xf \u";
[01:16:15] 94 
[01:16:15] 94 
[01:16:15] - error: this form of character escape may only be used with characters in the range [\x00-\x7f]
[01:16:15] -    |
[01:16:15] -    |
[01:16:15] - LL |     let _ = "\xf \u";
[01:16:15] - 
[01:16:15] 101 error: incorrect unicode escape sequence
[01:16:15] -   --> $DIR/issue-23620-invalid-escapes.rs:34:18
[01:16:15] +   --> $DIR/issue-23620-invalid-escapes.rs:34:14
[01:16:15] +   --> $DIR/issue-23620-invalid-escapes.rs:34:14
[01:16:15] 103    |
[01:16:15] 104 LL |     let _ = "\xf \u";
[01:16:15] -    |                  ^^ incorrect unicode escape sequence
[01:16:15] -    |
[01:16:15] -    = help: format of unicode escape sequences is `\u{...}`
[01:16:15] 108 
[01:16:15] 109 error: incorrect unicode escape sequence
[01:16:15] 110   --> $DIR/issue-23620-invalid-escapes.rs:39:14
[01:16:15] 
[01:16:15] 
[01:16:15] 111    |
[01:16:15] 112 LL |     let _ = "\u8f";
[01:16:15] -    |              ^^--
[01:16:15] -    |              |
[01:16:15] -    |              help: format of unicode escape sequences uses braces: `\u{8f}`
[01:16:15] 116 
[01:16:15] - error: aborting due to 18 previous errors
[01:16:15] + error: aborting due to 13 previous errors
[01:16:15] 118 
[01:16:15] 118 
[01:16:15] 119 
[01:16:15] 
[01:16:15] 
[01:16:15] The actual stderr differed from the expected stderr.
[01:16:15] Actual stderr saved to /checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/parser/issue-23620-invalid-escapes/issue-23620-invalid-escapes.stderr
[01:16:15] To update references, rerun the tests and pass the `--bless` flag
[01:16:15] To only update this specific test, also pass `--test-args parser/issue-23620-invalid-escapes.rs`
[01:16:15] error: 1 errors occurred comparing output.
[01:16:15] status: exit code: 1
[01:16:15] status: exit code: 1
[01:16:15] command: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/src/test/ui/parser/issue-23620-invalid-escapes.rs" "-Zthreads=1" "--target=x86_64-unknown-linux-gnu" "--error-format" "json" "-Zui-testing" "-C" "prefer-dynamic" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/parser/issue-23620-invalid-escapes/a" "-Crpath" "-O" "-Zunstable-options" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-Z" "continue-parse-after-error" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/parser/issue-23620-invalid-escapes/auxiliary" "-A" "unused"
[01:16:15] ------------------------------------------
[01:16:15] 
[01:16:15] ------------------------------------------
[01:16:15] stderr:
[01:16:15] stderr:
[01:16:15] ------------------------------------------
[01:16:15] error: unicode escape sequences cannot be used as a byte or in a byte string
[01:16:15]   --> /checkout/src/test/ui/parser/issue-23620-invalid-escapes.rs:4:15
[01:16:15]    |
[01:16:15] LL |     let _ = b"\u{a66e}";
[01:16:15] 
[01:16:15] error: unicode escape sequences cannot be used as a byte or in a byte string
[01:16:15]   --> /checkout/src/test/ui/parser/issue-23620-invalid-escapes.rs:7:15
[01:16:15]    |
[01:16:15]    |
[01:16:15] LL |     let _ = b'\u{a66e}';
[01:16:15] 
[01:16:15] error: incorrect unicode escape sequence
[01:16:15]   --> /checkout/src/test/ui/parser/issue-23620-invalid-escapes.rs:10:15
[01:16:15]    |
[01:16:15]    |
[01:16:15] LL |     let _ = b'\u';
[01:16:15] 
[01:16:15] 
[01:16:15] error: numeric character escape is too short
[01:16:15]    |
[01:16:15]    |
[01:16:15] LL |     let _ = b'\x5';
[01:16:15] 
[01:16:15] 
[01:16:15] error: invalid character in numeric character escape: x
[01:16:15]    |
[01:16:15]    |
[01:16:15] LL |     let _ = b'\xxy';
[01:16:15] 
[01:16:15] 
[01:16:15] error: numeric character escape is too short
[01:16:15]    |
[01:16:15] LL |     let _ = '\x5';
[01:16:15]    |              ^^^
[01:16:15] 
[01:16:15] 
[01:16:15] error: invalid character in numeric character escape: x
[01:16:15]    |
[01:16:15]    |
[01:16:15] LL |     let _ = '\xxy';
[01:16:15] 
[01:16:15] error: unicode escape sequences cannot be used as a byte or in a byte string
[01:16:15]   --> /checkout/src/test/ui/parser/issue-23620-invalid-escapes.rs:28:15
[01:16:15]    |
[01:16:15]    |
[01:16:15] LL |     let _ = b"\u{a4a4} \xf \u";
[01:16:15] 
[01:16:15] 
[01:16:15] error: invalid character in numeric character escape:  
[01:16:15]    |
[01:16:15]    |
[01:16:15] LL |     let _ = b"\u{a4a4} \xf \u";
[01:16:15] 
[01:16:15] error: incorrect unicode escape sequence
[01:16:15]   --> /checkout/src/test/ui/parser/issue-23620-invalid-escapes.rs:28:15
[01:16:15]    |
[01:16:15]    |
[01:16:15] LL |     let _ = b"\u{a4a4} \xf \u";
[01:16:15] 
[01:16:15] 
[01:16:15] error: invalid character in numeric character escape:  
[01:16:15]    |
[01:16:15]    |
[01:16:15] LL |     let _ = "\xf \u";
[01:16:15] 
[01:16:15] error: incorrect unicode escape sequence
[01:16:15]   --> /checkout/src/test/ui/parser/issue-23620-invalid-escapes.rs:34:14
[01:16:15]    |
[01:16:15]    |
[01:16:15] LL |     let _ = "\xf \u";
[01:16:15] 
[01:16:15] error: incorrect unicode escape sequence
[01:16:15]   --> /checkout/src/test/ui/parser/issue-23620-invalid-escapes.rs:39:14
[01:16:15]    |
[01:16:15]    |
[01:16:15] LL |     let _ = "\u8f";
[01:16:15] 
[01:16:15] error: aborting due to 13 previous errors
[01:16:15] 
[01:16:15] 
[01:16:15] 
[01:16:15] ------------------------------------------
[01:16:15] 
[01:16:15] 
[01:16:15] ---- [ui] ui/parser/lex-bare-cr-string-literal-doc-comment.rs stdout ----
[01:16:15] 
[01:16:15] error: /checkout/src/test/ui/parser/lex-bare-cr-string-literal-doc-comment.rs:21: unexpected error: '21:15: 21:22: character constant must be escaped: \r'
[01:16:15] 
[01:16:15] error: /checkout/src/test/ui/parser/lex-bare-cr-string-literal-doc-comment.rs:21: expected error not found: bare CR not allowed in string
[01:16:15] error: 1 unexpected errors found, 1 expected errors not found
[01:16:15] status: exit code: 1
[01:16:15] status: exit code: 1
[01:16:15] command: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "/checkout/src/test/ui/parser/lex-bare-cr-string-literal-doc-comment.rs" "-Zthreads=1" "--target=x86_64-unknown-linux-gnu" "--error-format" "json" "-Zui-testing" "-C" "prefer-dynamic" "-o" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/parser/lex-bare-cr-string-literal-doc-comment/a" "-Crpath" "-O" "-Zunstable-options" "-Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "-Z" "continue-parse-after-error" "-L" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui/parser/lex-bare-cr-string-literal-doc-comment/auxiliary" "-A" "unused"
[01:16:15]     Error {
[01:16:15]         line_num: 21,
[01:16:15]         kind: Some(
[01:16:15]             Error,
[01:16:15]             Error,
[01:16:15]         ),
[01:16:15]         msg: "21:15: 21:22: character constant must be escaped: \\r",
[01:16:15] ]
[01:16:15] 
[01:16:15] not found errors (from test file): [
[01:16:15]     Error {
[01:16:15]     Error {
[01:16:15]         line_num: 21,
[01:16:15]         kind: Some(
[01:16:15]             Error,
[01:16:15]         ),
[01:16:15]         msg: "bare CR not allowed in string",
[01:16:15] ]
[01:16:15] 
[01:16:15] thread '[ui] ui/parser/lex-bare-cr-string-literal-doc-comment.rs' panicked at 'explicit panic', src/tools/compiletest/src/runtest.rs:1402:13
[01:16:15] 
---
[01:16:15] 
[01:16:15] thread 'main' panicked at 'Some tests failed', src/tools/compiletest/src/main.rs:517:22
[01:16:15] 
[01:16:15] 
[01:16:15] command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/ui" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "ui" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-6.0/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--target-rustcflags" "-Crpath -O -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "6.0.0\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
[01:16:15] 
[01:16:15] 
[01:16:15] failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test
[01:16:15] Build completed unsuccessfully in 0:04:27
[01:16:15] Build completed unsuccessfully in 0:04:27
[01:16:15] Makefile:48: recipe for target 'check' failed
[01:16:15] make: *** [check] Error 1
The command "stamp sh -x -c "$RUN_SCRIPT"" exited with 2.
travis_time:start:06fe3778
$ date && (curl -fs --head https://google.com | grep ^Date: | sed 's/Date: //g' || true)
Tue Apr 30 16:50:58 UTC 2019
---
travis_time:end:2524b0d0:start=1556643059682159105,finish=1556643059687886180,duration=5727075
travis_fold:end:after_failure.3
travis_fold:start:after_failure.4
travis_time:start:010e3312
$ ln -s . checkout && for CORE in obj/cores/core.*; do EXE=$(echo $CORE | sed 's|obj/cores/core\.[0-9]*\.!checkout!\(.*\)|\1|;y|!|/|'); if [ -f "$EXE" ]; then printf travis_fold":start:crashlog\n\033[31;1m%s\033[0m\n" "$CORE"; gdb --batch -q -c "$CORE" "$EXE" -iex 'set auto-load off' -iex 'dir src/' -iex 'set sysroot .' -ex bt -ex q; echo travis_fold":"end:crashlog; fi; done || true
travis_fold:end:after_failure.4
travis_fold:start:after_failure.5
travis_time:start:19bc4f4a
travis_time:start:19bc4f4a
$ cat ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers || true
cat: ./obj/build/x86_64-unknown-linux-gnu/native/asan/build/lib/asan/clang_rt.asan-dynamic-i386.vers: No such file or directory
travis_fold:end:after_failure.5
travis_fold:start:after_failure.6
travis_time:start:021feb02
$ dmesg | grep -i kill

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented May 3, 2019

Ok, let's resolve #60494 separately then.

@bors try

@bors

This comment has been minimized.

Copy link
Contributor

commented May 3, 2019

⌛️ Trying commit 1835cbe with merge bfdcf6d...

bors added a commit that referenced this pull request May 3, 2019

Auto merge of #60261 - matklad:one-escape, r=<try>
introduce unescape module

A WIP PR to gauge early feedback

Currently, we deal with escape sequences twice: once when we [lex](https://github.com/rust-lang/rust/blob/112f7e9ac564e2cfcfc13d599c8376a219fde1bc/src/libsyntax/parse/lexer/mod.rs#L928-L1065) a string, and a second time when we [unescape](https://github.com/rust-lang/rust/blob/112f7e9ac564e2cfcfc13d599c8376a219fde1bc/src/libsyntax/parse/mod.rs#L313-L366) literals. Note that we also produce different sets of diagnostics in these two cases.

This PR aims to remove this duplication, by introducing a new `unescape` module as a single source of truth for character escaping rules.

I think this would be a useful cleanup by itself, but I also need this for #59706.

In the current state, the PR has `unescape` module which fully (modulo bugs) deals with string and char literals. I am quite happy about the state of this module

What this PR doesn't have yet are:
* [x] handling of byte and byte string literals (should be simple to add)
* [x] good diagnostics
* [x] actual removal of code from lexer (giant `scan_char_or_byte` should go away completely)
* [ ] performance check
* [x] general cleanup of the new code

Diagnostics will be the most labor-consuming bit here, but they are mostly a question of just correctly adjusting spans to sub-tokens. The current setup for diagnostics is that `unescape` produces a plain old `enum` with various problems, and they are rendered into `Handler` separately. This bit is not actually required (it is possible to just pass the `Handler` in), but I like the separation between diagnostics and logic this approach imposes, and such separation should again be useful for #59706

cc @eddyb , @petrochenkov
@bors

This comment has been minimized.

Copy link
Contributor

commented May 3, 2019

☀️ Try build successful - checks-travis
Build commit: bfdcf6d

@matklad

This comment has been minimized.

Copy link
Member Author

commented May 3, 2019

This probably should be tagged with Breaking Change and Waiting on Crater presumably?

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented May 3, 2019

@craterbot run mode=check-only

@craterbot

This comment has been minimized.

Copy link
Collaborator

commented May 3, 2019

👌 Experiment pr-60261 created and queued.
🤖 Automatically detected try build bfdcf6d
🔍 You can check out the queue and this experiment's details.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@craterbot

This comment has been minimized.

Copy link
Collaborator

commented May 3, 2019

🚧 Experiment pr-60261 is now running on agent aws-2.

ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented May 3, 2019

@rust-timer

This comment has been minimized.

Copy link

commented May 3, 2019

Success: Queued bfdcf6d with parent 1891bfa, comparison URL.

@rust-timer

This comment has been minimized.

Copy link

commented May 3, 2019

Finished benchmarking try commit bfdcf6d

@matklad

This comment has been minimized.

Copy link
Member Author

commented May 3, 2019

Looks like there are no significant perf differences, let's wait what crater says

@craterbot

This comment has been minimized.

Copy link
Collaborator

commented May 5, 2019

🎉 Experiment pr-60261 is completed!
📊 0 regressed and 0 fixed (60951 total)
📰 Open the full report.

⚠️ If you notice any spurious failure please add them to the blacklist!
ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented May 5, 2019

@bors r+

@bors

This comment has been minimized.

Copy link
Contributor

commented May 5, 2019

📌 Commit 1835cbe has been approved by petrochenkov

@bors

This comment has been minimized.

Copy link
Contributor

commented May 6, 2019

⌛️ Testing commit 1835cbe with merge 46d0ca0...

bors added a commit that referenced this pull request May 6, 2019

Auto merge of #60261 - matklad:one-escape, r=petrochenkov
introduce unescape module

A WIP PR to gauge early feedback

Currently, we deal with escape sequences twice: once when we [lex](https://github.com/rust-lang/rust/blob/112f7e9ac564e2cfcfc13d599c8376a219fde1bc/src/libsyntax/parse/lexer/mod.rs#L928-L1065) a string, and a second time when we [unescape](https://github.com/rust-lang/rust/blob/112f7e9ac564e2cfcfc13d599c8376a219fde1bc/src/libsyntax/parse/mod.rs#L313-L366) literals. Note that we also produce different sets of diagnostics in these two cases.

This PR aims to remove this duplication, by introducing a new `unescape` module as a single source of truth for character escaping rules.

I think this would be a useful cleanup by itself, but I also need this for #59706.

In the current state, the PR has `unescape` module which fully (modulo bugs) deals with string and char literals. I am quite happy about the state of this module

What this PR doesn't have yet are:
* [x] handling of byte and byte string literals (should be simple to add)
* [x] good diagnostics
* [x] actual removal of code from lexer (giant `scan_char_or_byte` should go away completely)
* [x] performance check
* [x] general cleanup of the new code

Diagnostics will be the most labor-consuming bit here, but they are mostly a question of just correctly adjusting spans to sub-tokens. The current setup for diagnostics is that `unescape` produces a plain old `enum` with various problems, and they are rendered into `Handler` separately. This bit is not actually required (it is possible to just pass the `Handler` in), but I like the separation between diagnostics and logic this approach imposes, and such separation should again be useful for #59706

cc @eddyb , @petrochenkov
@bors

This comment has been minimized.

Copy link
Contributor

commented May 6, 2019

☀️ Test successful - checks-travis, status-appveyor
Approved by: petrochenkov
Pushing 46d0ca0 to master...

@bors bors added the merged-by-bors label May 6, 2019

@bors bors merged commit 1835cbe into rust-lang:master May 6, 2019

2 checks passed

Travis CI - Pull Request Build Passed
Details
homu Test successful
Details

@matklad matklad deleted the matklad:one-escape branch May 6, 2019

@matklad

This comment has been minimized.

Copy link
Member Author

commented May 7, 2019

FWIW, this is now used by rust-analyzer: rust-analyzer/rust-analyzer#1253

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.