Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make sure that all file loading happens via SourceMap #63525

Merged
merged 2 commits into from Aug 16, 2019

Conversation

@matklad
Copy link
Member

commented Aug 13, 2019

That way, callers don't need to repeat "let's add this to sm manually
for tracking dependencies" trick.

It should make it easier to switch to using FileLoader for binary
files in the future as well

cc #62948

r? @petrochenkov

cx.source_map().new_source_file(file.into(), src);

base::MacEager::expr(cx.expr_str(sp, interned_src))
match cx.source_map().load_binary_file(&file) {

This comment has been minimized.

Copy link
@matklad

matklad Aug 13, 2019

Author Member

I am not sure whether include_str should normalize file contents... This approach does not do normalization, just like the previous version.

This comment has been minimized.

Copy link
@petrochenkov

petrochenkov Aug 13, 2019

Contributor

I'm pretty sure it should.
The content included by include_str is still text / source code, if it's a long multi-line string encoded on Windows and stored in a file with UTF-8 BOM, then all those \r\ns and BOM shouldn't get into the string for the same reason why any other (non-include_bytes) inclusion method filters them away from string literals.

Am I right that #62948 applied the normalization to include_str as well and crater found no regression?
This should give as the right to make the change (in this PR or in #62948, not much difference).

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Aug 13, 2019

r=me if you don't plan to address the include_str issue in this PR.

@matklad

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2019

Am I right that #62948 applied the normalization to include_str as well and crater found no regression?

I think that's not the case

  • it was a check run, so it couldn't have detected this.
  • similar to include_bytes, normalization was applied only to the data stored in the SourceMap, the actual included string was not normalized.
  • we know that code tested by crater doesn't include_str files with isolated \r though.

I've added commit that adds normalization to include_str, which currently means BOM removal. It seems reasonable to me because programs (ie, compiler) are supposed to work the same regardless of the presence of BOM. Moreover, if we don't strip BOM, there's a high chance that it ends up being in the middle of some text stream produced by the program.

This is technically a breaking change, so we should probably FCP it?

@matklad

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2019

we know that code tested by crater doesn't include_str files with isolated \r though.

Actually, even this is not true:

https://github.com/nabijaczleweli/codepage-437/blob/2e17570377121dbd2bab5adebfcb46172804b95b/tests/cp437_control/mod.rs#L7

@rust-highfive

This comment has been minimized.

Copy link
Collaborator

commented Aug 14, 2019

The job x86_64-gnu-llvm-6.0 of your PR failed (raw log). Through arcane magic we have determined that the following fragments from the build log may contain information about the problem.

Click to expand the log.
2019-08-14T07:16:12.7992804Z ##[command]git remote add origin https://github.com/rust-lang/rust
2019-08-14T07:16:12.8215170Z ##[command]git config gc.auto 0
2019-08-14T07:16:12.8281514Z ##[command]git config --get-all http.https://github.com/rust-lang/rust.extraheader
2019-08-14T07:16:12.8340369Z ##[command]git config --get-all http.proxy
2019-08-14T07:16:12.8507856Z ##[command]git -c http.extraheader="AUTHORIZATION: basic ***" fetch --force --tags --prune --progress --no-recurse-submodules --depth=2 origin +refs/heads/*:refs/remotes/origin/* +refs/pull/63525/merge:refs/remotes/pull/63525/merge
---
2019-08-14T07:16:47.9082646Z do so (now or later) by using -b with the checkout command again. Example:
2019-08-14T07:16:47.9082682Z 
2019-08-14T07:16:47.9082951Z   git checkout -b <new-branch-name>
2019-08-14T07:16:47.9082984Z 
2019-08-14T07:16:47.9083058Z HEAD is now at a0394cbd6 Merge a9539d26c8a1ad712ab267cf268cf9382e8e3b51 into 60960a260f7b5c695fd0717311d72ce62dd4eb43
2019-08-14T07:16:47.9241723Z ##[section]Starting: Collect CPU-usage statistics in the background
2019-08-14T07:16:47.9244991Z ==============================================================================
2019-08-14T07:16:47.9245055Z Task         : Bash
2019-08-14T07:16:47.9245124Z Description  : Run a Bash script on macOS, Linux, or Windows
---
2019-08-14T08:20:38.4718807Z .................................................................................................... 1300/8876
2019-08-14T08:20:45.0594073Z .................................................................................................... 1400/8876
2019-08-14T08:20:51.5005907Z .................................................................................................... 1500/8876
2019-08-14T08:21:02.3701079Z ....................................................................................i............... 1600/8876
2019-08-14T08:21:10.2804886Z i................................................................................................... 1700/8876
2019-08-14T08:21:17.1908312Z ...........................................................................iiiii.................... 1800/8876
2019-08-14T08:21:39.9297110Z .................................................................................................... 2000/8876
2019-08-14T08:21:42.4636502Z .................................................................................................... 2100/8876
2019-08-14T08:21:45.3137163Z .................................................................................................... 2200/8876
2019-08-14T08:21:53.2129437Z .................................................................................................... 2300/8876
---
2019-08-14T08:25:54.6872874Z .................................................................................................... 5300/8876
2019-08-14T08:26:02.2079100Z .........i.......................................................................................... 5400/8876
2019-08-14T08:26:07.7927947Z .................................................................................................... 5500/8876
2019-08-14T08:26:20.4199569Z .................................................................................................... 5600/8876
2019-08-14T08:26:34.7056198Z ....ii...i..ii...........i.......................................................................... 5700/8876
2019-08-14T08:26:50.8705475Z .................................................................................................... 5900/8876
2019-08-14T08:26:55.7624602Z .................................................................................................... 6000/8876
2019-08-14T08:26:55.7624602Z .................................................................................................... 6000/8876
2019-08-14T08:27:10.2967102Z .....i..ii.......................................................................................... 6100/8876
2019-08-14T08:27:29.6584902Z ................................................i................................................... 6300/8876
2019-08-14T08:27:31.8903728Z .................................................................................................... 6400/8876
2019-08-14T08:27:34.4588392Z ....................i............................................................................... 6500/8876
2019-08-14T08:27:39.1733742Z .................................................................................................... 6600/8876
---
2019-08-14T08:31:45.6341068Z ------------------------------------------
2019-08-14T08:31:45.6341129Z stderr:
2019-08-14T08:31:45.6341351Z ------------------------------------------
2019-08-14T08:31:45.6341597Z thread 'main' panicked at 'assertion failed: `(left == right)`
2019-08-14T08:31:45.6341707Z   left: `[239, 187, 191, 84, 104, 105, 115, 32, 102, 105, 108, 101, 32, 115, 116, 97, 114, 116, 115, 32, 119, 105, 116, 104, 32, 66, 79, 77, 46, 10, 76, 105, 110, 101, 115, 32, 97, 114, 101, 32, 115, 101, 112, 97, 114, 97, 116, 101, 100, 32, 98, 121, 32, 92, 114, 92, 110, 46, 10]`,
2019-08-14T08:31:45.6342309Z  right: `[239, 187, 191, 84, 104, 105, 115, 32, 102, 105, 108, 101, 32, 115, 116, 97, 114, 116, 115, 32, 119, 105, 116, 104, 32, 66, 79, 77, 46, 13, 10, 76, 105, 110, 101, 115, 32, 97, 114, 101, 32, 115, 101, 112, 97, 114, 97, 116, 101, 100, 32, 98, 121, 32, 92, 114, 92, 110, 46, 13, 10]`', /checkout/src/test/ui/include-macros/normalization.rs:4:5
2019-08-14T08:31:45.6342446Z 
2019-08-14T08:31:45.6342672Z ------------------------------------------
2019-08-14T08:31:45.6342719Z 
2019-08-14T08:31:45.6342745Z 
---
2019-08-14T08:31:45.6385221Z thread 'main' panicked at 'Some tests failed', src/tools/compiletest/src/main.rs:536:22
2019-08-14T08:31:45.6385506Z note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
2019-08-14T08:31:45.6405553Z 
2019-08-14T08:31:45.6405653Z 
2019-08-14T08:31:45.6407500Z command did not execute successfully: "/checkout/obj/build/x86_64-unknown-linux-gnu/stage0-tools-bin/compiletest" "--compile-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib" "--run-lib-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/lib/rustlib/x86_64-unknown-linux-gnu/lib" "--rustc-path" "/checkout/obj/build/x86_64-unknown-linux-gnu/stage2/bin/rustc" "--src-base" "/checkout/src/test/ui" "--build-base" "/checkout/obj/build/x86_64-unknown-linux-gnu/test/ui" "--stage-id" "stage2-x86_64-unknown-linux-gnu" "--mode" "ui" "--target" "x86_64-unknown-linux-gnu" "--host" "x86_64-unknown-linux-gnu" "--llvm-filecheck" "/usr/lib/llvm-6.0/bin/FileCheck" "--host-rustcflags" "-Crpath -O -Cdebuginfo=0 -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--target-rustcflags" "-Crpath -O -Cdebuginfo=0 -Zunstable-options  -Lnative=/checkout/obj/build/x86_64-unknown-linux-gnu/native/rust-test-helpers" "--docck-python" "/usr/bin/python2.7" "--lldb-python" "/usr/bin/python2.7" "--gdb" "/usr/bin/gdb" "--quiet" "--llvm-version" "6.0.0\n" "--system-llvm" "--cc" "" "--cxx" "" "--cflags" "" "--llvm-components" "" "--llvm-cxxflags" "" "--adb-path" "adb" "--adb-test-dir" "/data/tmp/work" "--android-cross-path" "" "--color" "always"
2019-08-14T08:31:45.6408007Z 
2019-08-14T08:31:45.6408033Z 
2019-08-14T08:31:45.6416254Z failed to run: /checkout/obj/build/bootstrap/debug/bootstrap test
2019-08-14T08:31:45.6416339Z Build completed unsuccessfully in 1:08:40
2019-08-14T08:31:45.6416339Z Build completed unsuccessfully in 1:08:40
2019-08-14T08:31:46.3715398Z ##[error]Bash exited with code '1'.
2019-08-14T08:31:46.3776457Z ##[section]Starting: Checkout
2019-08-14T08:31:46.3778418Z ==============================================================================
2019-08-14T08:31:46.3778473Z Task         : Get sources
2019-08-14T08:31:46.3778519Z Description  : Get sources from a repository. Supports Git, TfsVC, and SVN repositories.

I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact @TimNN. (Feature Requests)

@matklad matklad force-pushed the matklad:centraliza-file-loading branch 2 times, most recently from a5d9445 to 9f09e91 Aug 14, 2019

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2019

Actually, even this is not true:

That example kinda makes me rethink the prohibition of bare \rs.
Raw strings (including include_str) cannot use escaping and have no choice than to use bare \r if the carriage return needs to be in the string.

Actually, we may want to prohibit only \r\r\n rather than all bare \rs at loading time, this will already ensure that the file can survive repeated EOL conversions (e.g. by Git).

@matklad

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2019

this will already ensure that the file can survive repeated EOL conversions (e.g. by Git).

It looks like Git treats files with lone CR as binary and doesn't convert them:
https://github.com/git/git/blob/e35b8cb8e212e3557efc565157ceb5cbaaf0d87f/convert.c#L90-L103

I also begin to feel that forbidding lone CR causes more problems that it solves. Sole \r does not seem that different from any other weird unicode/ascii junk like \v. I'll implement simple \r\r\n -> \r\n conversion, and we can have a separate "weird whitespace" lint if we want to, but it seems to be unlikely that folks will hit this problem in practice.

@matklad

This comment has been minimized.

Copy link
Member Author

commented Aug 14, 2019

Marking as waiting for review. I've changed this to normalize the string, and I think we should move forward with this regardless of #62948.

@rustbot modify labels: -S-waiting-on-author, +S-waiting-on-review

@petrochenkov

This comment has been minimized.

Copy link
Contributor

commented Aug 14, 2019

Let's just keep the status quo behavior of include_str (at least for now) and not introduce a new third flavor of normalization.

I wanted to write out some possible use cases for various flavor of normalization, but I just need to go and sleep now.

TLDR is that BOM+EOL normalization is the common case requirement for text (mod m;, include!, include_str! (simply long strings, generated strings)) and should attempt to to change include_str in this direction and see what breaks (crater, some other crates.io search, live experiment on nightly), but include_bytes_as_str reading bytes without any normalization like include_bytes and packaging them as str instead of [u8; N] would also make sense, even if rarely (some ASCII-based binary protocols requiring \r\n (do they exist?)? Old WinAPI text controls unable to work with \n?).

@matklad matklad force-pushed the matklad:centraliza-file-loading branch from 9f09e91 to 0885a8d Aug 15, 2019

matklad added some commits Aug 13, 2019

Make sure that all file loading happens via SourceMap
That way, callers don't need to repeat "let's add this to sm manually
for tracking dependencies" trick.

It should make it easier to switch to using `FileLoader` for binary
files in the future as well

@matklad matklad force-pushed the matklad:centraliza-file-loading branch from 0885a8d to 14bc998 Aug 15, 2019

@matklad

This comment has been minimized.

Copy link
Member Author

commented Aug 15, 2019

based on #63525 (comment)

@bors r=petrochenkov

I've removed the normalization commit, but I've added the test that normalization doesn't happen.

I think the way forward here is to complete #62948 first, and then have a separate discussion/crater runs about include_str normalization.

but include_bytes_as_str reading bytes without any normalization like include_bytes and packaging them as str instead of [u8; N] would also make sense, even if rarely (some ASCII-based binary protocols requiring \r\n (do they exist?)?

I feel this can be build on top of include_bytes! with 80% of usability, which seems ok for a rare use-case.

fn my_text_file_with_crlf() -> &'static str {
    unsafe { 
        std::str::from_utf8_unchecked(&include_bytes!("my_text_file_with_crlf.txt")[..])
    }
}

#[test] 
fn my_text_file_is_acutualy_a_text_file() {
    std::str::from_utf8(&include_bytes!("my_text_file_with_crlf.txt")[..])
        .unwrap()
}
@bors

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2019

📌 Commit 14bc998 has been approved by petrochenkov

@bors

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2019

💡 This pull request was already approved, no need to approve it again.

  • There's another pull request that is currently being tested, blocking this pull request: #63583
@bors

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2019

📌 Commit 14bc998 has been approved by petrochenkov

Centril added a commit to Centril/rust that referenced this pull request Aug 15, 2019

Rollup merge of rust-lang#63525 - matklad:centraliza-file-loading, r=…
…petrochenkov

Make sure that all file loading happens via SourceMap

That way, callers don't need to repeat "let's add this to sm manually
for tracking dependencies" trick.

It should make it easier to switch to using `FileLoader` for binary
files in the future as well

cc rust-lang#62948

r? @petrochenkov

bors added a commit that referenced this pull request Aug 15, 2019

Auto merge of #63588 - Centril:rollup-9ad6qs2, r=Centril
Rollup of 12 pull requests

Successful merges:

 - #63155 (Add UWP MSVC targets)
 - #63165 (Add builtin targets for mips64(el)-unknown-linux-muslabi64)
 - #63467 (Add Catalyst (iOS apps running on macOS) target)
 - #63470 (Utilize -Zbinary-dep-depinfo in rustbuild)
 - #63495 ( Remove redundant `ty` fields from `mir::Constant` and `hair::pattern::PatternRange`.)
 - #63525 (Make sure that all file loading happens via SourceMap)
 - #63546 (Remove uses of `mem::uninitialized()` from cloudabi)
 - #63559 (rustc_codegen_utils: account for 1-indexed anonymous lifetimes in v0 mangling.)
 - #63572 (remove unused Level::PhaseFatal)
 - #63577 (Test HRTB issue accepted by compiler)
 - #63582 (Fix ICE #63226)
 - #63586 (cleanup: Remove `Spanned` where possible)

Failed merges:

r? @ghost

Centril added a commit to Centril/rust that referenced this pull request Aug 16, 2019

Rollup merge of rust-lang#63525 - matklad:centraliza-file-loading, r=…
…petrochenkov

Make sure that all file loading happens via SourceMap

That way, callers don't need to repeat "let's add this to sm manually
for tracking dependencies" trick.

It should make it easier to switch to using `FileLoader` for binary
files in the future as well

cc rust-lang#62948

r? @petrochenkov

bors added a commit that referenced this pull request Aug 16, 2019

Auto merge of #63640 - Centril:rollup-yeb8o66, r=Centril
Rollup of 10 pull requests

Successful merges:

 - #60492 (Add custom nth_back for Chain)
 - #61780 (Finalize the error type for `try_reserve`)
 - #63495 ( Remove redundant `ty` fields from `mir::Constant` and `hair::pattern::PatternRange`.)
 - #63525 (Make sure that all file loading happens via SourceMap)
 - #63595 (add sparc64-unknown-openbsd target)
 - #63604 (Some update for vxWorks)
 - #63613 (Hygienize use of built-in macros in the standard library)
 - #63632 (A couple of comment fixes.)
 - #63634 (ci: properly set the job name in CPU stats)
 - #63636 (ci: move linkcheck from mingw-2 to mingw-1)

Failed merges:

r? @ghost
@bors

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2019

☔️ The latest upstream changes (presumably #63640) made this pull request unmergeable. Please resolve the merge conflicts.

@bors bors merged commit 14bc998 into rust-lang:master Aug 16, 2019

4 checks passed

pr Build #20190815.11 succeeded
Details
pr (Linux mingw-check) Linux mingw-check succeeded
Details
pr (Linux x86_64-gnu-llvm-6.0) Linux x86_64-gnu-llvm-6.0 succeeded
Details
pr (LinuxTools) LinuxTools succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.