Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fall back to the unoptimized implementation in read_binary_file if File::metadata lies #115549

Merged
merged 2 commits into from Sep 21, 2023

Conversation

saethlin
Copy link
Member

@saethlin saethlin commented Sep 4, 2023

Fixes #115458

r? @jackh726 because you approved the previous PR

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 4, 2023
@@ -133,7 +133,29 @@ impl FileLoader for RealFileLoader {
file.read_buf_exact(buf.unfilled())?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is wrong because some files lie and have metadata size much larger than the actual file's size:
e.g. on linux /sys/class/tty/console/dev is like that:
full contents:

5:1

stat gives:

  File: /sys/class/tty/console/dev
  Size: 4096            Blocks: 0          IO Block: 4096   regular file
Device: 17h/23d Inode: 15051       Links: 1
Access: (0444/-r--r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2023-09-04 17:01:42.101801391 -0700
Modify: 2023-09-04 17:01:42.101801391 -0700
Change: 2023-09-04 17:01:42.101801391 -0700
 Birth: -

Copy link
Member Author

@saethlin saethlin Sep 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error: couldn't read /sys/class/tty/console/dev: failed to fill buffer
 --> demo.rs:1:22
  |
1 | const BYTES: &[u8] = include_bytes!("/sys/class/tty/console/dev");
  |                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = note: this error originates in the macro `include_bytes` (in Nightly builds, run with -Z macro-backtrace for more info)

error: aborting due to previous error

I've been wondering if there was such a case! I'll adjust this in a bit...
(does this work here?) (neat it does)
@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 5, 2023
@petrochenkov
Copy link
Contributor

This code was previously using fs::read.
Whatever changes you intend to do, they should probably be go to fs::read first, and then copypasted to the compiler with the Lrc adaptation.

@saethlin
Copy link
Member Author

saethlin commented Sep 5, 2023

I do not think that is possible. The whole point of this change is to allow reading large files without peak memory use which is twice the size of the file. fs::read returns a Vec<u8> which cannot be turned into an Lrc<[u8]> without copying all the data because those types have different alignment requirements for their heap allocations.

@saethlin saethlin added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Sep 12, 2023
@Mark-Simulacrum
Copy link
Member

let text = std::str::from_utf8(&bytes).unwrap_or("").to_string();

It looks like we're allocating a String (i.e., Vec<u8>) anyway in the only(?) code path using this function -- maybe it makes sense to make that reuse the allocation from fs::read and make the copy for Lrc<[u8]>? I suppose that doesn't help for files that aren't UTF-8 where the String isn't allocated, but seems like there might be something to this.

@jackh726
Copy link
Member

I thought I had done a review on this. Maybe i was just looking into this.

I don't really like that we are essentially just duplicating the logic in std for this. Is there a way to reuse the same code?

Or, at the very least, we should do our best to point to std as a "reference impl" for future maintenance.

@saethlin
Copy link
Member Author

saethlin commented Sep 16, 2023

I don't really like that we are essentially just duplicating the logic in std for this. Is there a way to reuse the same code?

The whole point of this is to read file bytes into an Lrc<[u8]> without at some time having two copies of the whole allocation in memory. There is no sound way to convert between a Vec<u8> and an Lrc<[u8]> without creating a second allocation, because the header for the Lrc needs the same alignment as usize so that it can store the reference counts, and allocations must be allocated and deallocated with the same alignment (so even if I create a Vec with the refcount bytes already in it, that is not sufficiently aligned to access the [usize; 2] header).

So unless the standard library produces an API to read a file into a Rc<[u8]> and Arc<[u8]> or to resize one of those types, I do not think we have a better option here. I tried to ask about this in Zulip: https://rust-lang.zulipchat.com/#narrow/stream/219381-t-libs/topic/Growing.20an.20Rc.3C.5BMaybeUninit.3Cu8.3E.5D.3E and the only responses were ways to avoid that whole issue.

@jackh726
Copy link
Member

Is there a test we can add?

If so, can we add it. Otherwise, r=me.

@programmerjake
Copy link
Member

for testing on linux, you could try reading: /sys/class/tty/console/dev (smaller than advertised) and /proc/self/cmdline (bigger than advertised). try reading using both std::fs and read_binary_file and check if the contents match. both of those are files that should be available on basically all linux systems and have stable contents for the duration of the test executable's lifetime. do note that /sys/... isn't available on android though...

@programmerjake
Copy link
Member

turns out /sys/devices/system/cpu/kernel_max is available on android (and other linux) and has the same property of being smaller than advertised and having a stable value.

@saethlin
Copy link
Member Author

I tried to make them reasonably resilient, but I can imagine the tests going wrong on the numerous platforms I don't understand.
@bors r=jackh726 rollup=iffy

@bors
Copy link
Contributor

bors commented Sep 20, 2023

📌 Commit fb50270 has been approved by jackh726

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Sep 20, 2023
@bors
Copy link
Contributor

bors commented Sep 21, 2023

⌛ Testing commit fb50270 with merge c1430b1...

bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 21, 2023
…jackh726

Fall back to the unoptimized implementation in read_binary_file if File::metadata lies

Fixes rust-lang#115458

r? `@jackh726` because you approved the previous PR
@@ -567,3 +567,30 @@ fn test_next_point() {
assert_eq!(span.hi().0, 6);
assert!(sm.span_to_snippet(span).is_err());
}

#[cfg(unix)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(unix)]
#[cfg(linux)]

this should be #[cfg(linux)] since those files aren't available on some other systems, e.g. I'd be surprised if /sys/devices/system/cpu/kernel_max was available on macOS or BSDs.

@saethlin
Copy link
Member Author

@bors r=jackh726

@bors
Copy link
Contributor

bors commented Sep 21, 2023

📌 Commit 5f33647 has been approved by jackh726

It is now in the queue for this repository.

@rust-log-analyzer
Copy link
Collaborator

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

@bors
Copy link
Contributor

bors commented Sep 21, 2023

⌛ Testing commit 5f33647 with merge 4fda889...

@bors
Copy link
Contributor

bors commented Sep 21, 2023

☀️ Test successful - checks-actions
Approved by: jackh726
Pushing 4fda889 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 21, 2023
@bors bors merged commit 4fda889 into rust-lang:master Sep 21, 2023
12 checks passed
@rustbot rustbot added this to the 1.74.0 milestone Sep 21, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (4fda889): comparison URL.

Overall result: ✅ improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.4% [-0.4%, -0.4%] 2
All ❌✅ (primary) - - 0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.0% [2.0%, 2.0%] 1
Regressions ❌
(secondary)
2.6% [2.0%, 4.6%] 11
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.0% [2.0%, 2.0%] 1

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 632.213s -> 631.763s (-0.07%)
Artifact size: 317.78 MiB -> 317.88 MiB (0.03%)

@saethlin saethlin deleted the include-bytes-resilient branch November 9, 2023 23:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

#115296 broke include_bytes! on paths with untrustworthy metadata
9 participants