Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate ZST allocations in Box and Vec #113113

Merged
merged 1 commit into from Jul 14, 2023
Merged

Conversation

Amanieu
Copy link
Member

@Amanieu Amanieu commented Jun 27, 2023

This PR fixes 2 issues with Box and RawVec related to ZST allocations. Specifically, the Allocator trait requires that:

  • If you allocate a zero-sized layout then you must later deallocate it, otherwise the allocator may leak memory.
  • You cannot pass a ZST pointer to the allocator that you haven't previously allocated.

These restrictions exist because an allocator implementation is allowed to allocate non-zero amounts of memory for a zero-sized allocation. For example, malloc in libc does this.

Currently, ZSTs are handled differently in Box and Vec:

  • Vec never allocates when T is a ZST or if the vector capacity is 0.
  • Box just blindly passes everything on to the allocator, including ZSTs.

This causes problems due to the free conversions between Box<[T]> and Vec<T>, specifically that ZST allocations could get leaked or a dangling pointer could be passed to deallocate.

This PR fixes this by changing Box to not allocate for zero-sized values and slices. It also fixes a bug in RawVec::shrink where shrinking to a size of zero did not actually free the backing memory.

@rustbot
Copy link
Collaborator

rustbot commented Jun 27, 2023

r? @Mark-Simulacrum

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jun 27, 2023
@Amanieu
Copy link
Member Author

Amanieu commented Jun 27, 2023

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 27, 2023
@bors
Copy link
Contributor

bors commented Jun 27, 2023

⌛ Trying commit d75660ace87ed695200ac1b3bf2026144843eb77 with merge 596ac10b39cf84a4f4742f705dc63599654dcf78...

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@Amanieu Amanieu mentioned this pull request Jun 28, 2023
12 tasks
@nbdd0121
Copy link
Contributor

Does this mean we'll need to check for zero for every deallocation of Box<DST>?

@Amanieu
Copy link
Member Author

Amanieu commented Jun 28, 2023

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jun 28, 2023

⌛ Trying commit 853b6048108f88bc6e62e056b27a32c6484b0ffc with merge 58ad3a76f47a26e116e1d152a72ad86aeb650f41...

@bors
Copy link
Contributor

bors commented Jun 28, 2023

☀️ Try build successful - checks-actions
Build commit: 58ad3a76f47a26e116e1d152a72ad86aeb650f41 (58ad3a76f47a26e116e1d152a72ad86aeb650f41)

1 similar comment
@bors
Copy link
Contributor

bors commented Jun 28, 2023

☀️ Try build successful - checks-actions
Build commit: 58ad3a76f47a26e116e1d152a72ad86aeb650f41 (58ad3a76f47a26e116e1d152a72ad86aeb650f41)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (58ad3a76f47a26e116e1d152a72ad86aeb650f41): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.8% [0.8%, 3.3%] 3
Regressions ❌
(secondary)
0.5% [0.5%, 0.5%] 3
Improvements ✅
(primary)
-1.8% [-1.8%, -1.8%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.9% [-1.8%, 3.3%] 4

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.8% [2.6%, 5.0%] 4
Regressions ❌
(secondary)
3.9% [3.9%, 3.9%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 3.8% [2.6%, 5.0%] 4

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.0% [1.1%, 3.5%] 3
Regressions ❌
(secondary)
2.6% [2.6%, 2.6%] 1
Improvements ✅
(primary)
-1.8% [-3.0%, -1.2%] 7
Improvements ✅
(secondary)
-5.7% [-8.6%, -2.9%] 10
All ❌✅ (primary) -0.7% [-3.0%, 3.5%] 10

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.4%] 37
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 4
Improvements ✅
(primary)
-0.4% [-1.0%, -0.0%] 5
Improvements ✅
(secondary)
-0.4% [-0.4%, -0.4%] 3
All ❌✅ (primary) 0.0% [-1.0%, 0.4%] 42

Bootstrap: 662.942s -> 665.669s (0.41%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jun 28, 2023
@Amanieu
Copy link
Member Author

Amanieu commented Jun 30, 2023

Does this mean we'll need to check for zero for every deallocation of Box<DST>?

Yes. Unfortunately this is required for correctness.

@Amanieu
Copy link
Member Author

Amanieu commented Jun 30, 2023

The perf results show a slight regression, but fundamentally I feel this is a cost we have to pay for correct handling of ZST allocations.

@rust-log-analyzer

This comment has been minimized.

@Mark-Simulacrum
Copy link
Member

r=me modulo CI

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 8, 2023
@Amanieu
Copy link
Member Author

Amanieu commented Jul 13, 2023

@bors r=Mark-Simulacrum

@bors
Copy link
Contributor

bors commented Jul 13, 2023

📌 Commit d24be14 has been approved by Mark-Simulacrum

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 13, 2023
@bors
Copy link
Contributor

bors commented Jul 14, 2023

⌛ Testing commit d24be14 with merge cca3373...

@bors
Copy link
Contributor

bors commented Jul 14, 2023

☀️ Test successful - checks-actions
Approved by: Mark-Simulacrum
Pushing cca3373 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 14, 2023
@bors bors merged commit cca3373 into rust-lang:master Jul 14, 2023
12 checks passed
@rustbot rustbot added this to the 1.73.0 milestone Jul 14, 2023
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (cca3373): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
1.7% [0.5%, 3.5%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.0% [-1.7%, -0.4%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.6% [-1.7%, 3.5%] 5

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.6% [2.1%, 4.7%] 8
Regressions ❌
(secondary)
3.5% [3.5%, 3.5%] 2
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.0% [-2.0%, -2.0%] 1
All ❌✅ (primary) 3.6% [2.1%, 4.7%] 8

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.2% [1.2%, 3.7%] 3
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.5% [-1.5%, -1.5%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 1.3% [-1.5%, 3.7%] 4

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.1% [0.0%, 0.3%] 34
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 4
Improvements ✅
(primary)
-0.3% [-0.7%, -0.0%] 5
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.0% [-0.7%, 0.3%] 39

Bootstrap: 658.145s -> 659.456s (0.20%)

@pnkfelix
Copy link
Member

  • regressions here were anticipated and unavoidable. This is a bug fix.
  • Marking as triaged.

@rustbot label: +perf-regression-triaged

@rustbot rustbot added the perf-regression-triaged The performance regression has been triaged. label Jul 18, 2023
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 24, 2023
Indicate that multiplication in Layout::array cannot overflow

Since rust-lang#113113, we have added a check that skips calling into the allocator at all if `capacity == 0`. The global, default allocator will not actually try to allocate though; it returns a dangling pointer explicitly. However, these two checks are not merged/deduplicated by LLVM and so we're comparing to zero twice whenever vectors are allocated/grown. Probably cheap, but also potentially expensive in code size and seems like an unfortunate miss.

This removes that extra check by telling LLVM that the multiplication as part of Layout::array can't overflow, turning the original non-zero value into a zero value afterwards. In my checks locally this successfully drops the duplicate comparisons.

See https://rust.godbolt.org/z/b6nPP9dcK for a code example.

```rust
pub fn foo(elements: usize) -> Vec<u32> {
    Vec::with_capacity(elements)
}
```

r? `@scottmcm` since you touched this in a32305a - curious if you have thoughts on doing this / can confirm my model of this being correct.
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 24, 2023
Indicate that multiplication in Layout::array cannot overflow

Since rust-lang#113113, we have added a check that skips calling into the allocator at all if `capacity == 0`. The global, default allocator will not actually try to allocate though; it returns a dangling pointer explicitly. However, these two checks are not merged/deduplicated by LLVM and so we're comparing to zero twice whenever vectors are allocated/grown. Probably cheap, but also potentially expensive in code size and seems like an unfortunate miss.

This removes that extra check by telling LLVM that the multiplication as part of Layout::array can't overflow, turning the original non-zero value into a zero value afterwards. In my checks locally this successfully drops the duplicate comparisons.

See https://rust.godbolt.org/z/b6nPP9dcK for a code example.

```rust
pub fn foo(elements: usize) -> Vec<u32> {
    Vec::with_capacity(elements)
}
```

r? `@scottmcm` since you touched this in a32305a - curious if you have thoughts on doing this / can confirm my model of this being correct.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. perf-regression Performance regression. perf-regression-triaged The performance regression has been triaged. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants