Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

30x performance improvement in with_nix_path #1655

Merged
merged 1 commit into from Feb 14, 2022
Merged

30x performance improvement in with_nix_path #1655

merged 1 commit into from Feb 14, 2022

Conversation

SUPERCILEX
Copy link
Contributor

@SUPERCILEX SUPERCILEX commented Feb 3, 2022

I've been digging into CPU instructions counts and found that nix accounted for an eye-watering 85% of my program's instruction counts (yes, I do a lot of I/O, I know).

The fix is simple: don't initialize the stack memory since we're just going to overwrite it anyway.

Note: I also ran rustfmt in a separate commit, not sure if that's ok.

Before

650,398,225 (85.05%) 5,451,056 (17.31%) 627,714,969 (97.28%) 60 ( 0.54%) 22 ( 0.07%) 3,997 (20.98%) 10 ( 0.18%) 0          752 ( 5.98%) 627,716,244 (97.30%) 267,333 (34.62%) 914,814 (35.11%) 105 ( 0.73%)  => /home/asaveau/Desktop/nix/src/lib.rs:<[u8] as nix::NixPath>::with_nix_path (152,469x)
1,677,159 ( 0.22%)       0          762,345 ( 0.12%) 14 ( 0.13%)  0           0          2 ( 0.04%) .           .                .          .                .           .               fn with_nix_path<T, F>(&self, f: F) -> Result<T>
2,287,035 ( 0.30%) 304,938 ( 0.97%) 152,469 ( 0.02%)  5 ( 0.04%)  0           0          0          0           0          304,938 ( 0.05%) .                .           .           => ???:__rust_probestack (152,469x)
        .                .                .           .           .           .          .          .           .                .          .                .           .               where
        .                .                .           .           .           .          .          .           .                .          .                .           .                   F: FnOnce(&CStr) -> T,
        .                .                .           .           .           .          .          .           .                .          .                .           .               {
  457,407 ( 0.06%) 152,469 ( 0.48%) 152,469 ( 0.02%)  0           0          93 ( 0.49%) 0          0          16 ( 0.13%)       0          0          152,469 ( 5.85%) 97 ( 0.67%)          let mut buf = [0u8; PATH_MAX as usize];
627,104,997 (82.00%) 304,938 ( 0.97%) 624,513,024 (96.78%)  1 ( 0.01%)  0          3,814 (20.02%) 1 ( 0.02%) 0          720 ( 5.72%) 625,122,900 (96.90%) 152,495 (19.75%)       .           .           => ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:__memset_avx2_unaligned_erms (152,469x)
        .                .                .           .           .           .          .          .           .                .          .                .           .           
  304,938 ( 0.04%)       0                0           0           0           0          0          0           0          152,469 ( 0.02%) 2 ( 0.00%)       .           .                   if self.len() >= PATH_MAX as usize {
        .                .                .           .           .           .          .          .           .                .          .                .           .                       return Err(Errno::ENAMETOOLONG);
        .                .                .           .           .           .          .          .           .                .          .                .           .                   }
        .                .                .           .           .           .          .          .           .                .          .                .           .           
1,067,283 ( 0.14%) 152,469 ( 0.48%) 152,469 ( 0.02%)  1 ( 0.01%)  4 ( 0.01%)  0          1 ( 0.02%) 0           0                0          0          152,469 ( 5.85%)  1 ( 0.01%)          buf[..self.len()].copy_from_slice(self);
3,202,541 ( 0.42%) 914,814 ( 2.91%) 609,876 ( 0.09%)  1 ( 0.01%)  2 ( 0.01%)  0          0          0           0          458,005 ( 0.07%) 114,562 (14.84%) 152,469 ( 5.85%)  .           => /rustc/21b4a9cfdcbb1e76f4b36b5c3cfd64d627285093/library/core/src/slice/mod.rs:core::slice::<impl [T]>::copy_from_slice (152,469x)
  762,345 ( 0.10%) 304,938 ( 0.97%) 152,469 ( 0.02%)  0           6 ( 0.02%)  0          0          0           0          152,469 ( 0.02%) 6 ( 0.00%) 152,469 ( 5.85%)  1 ( 0.01%)          match CStr::from_bytes_with_nul(&buf[..=self.len()]) {
8,350,190 ( 1.09%) 1,181,828 ( 3.75%) 1,067,283 ( 0.17%) 19 ( 0.17%)  2 ( 0.01%) 87 ( 0.46%) 3 ( 0.05%) 0          16 ( 0.13%) 1,220,525 ( 0.19%) 167 ( 0.02%) 152,469 ( 5.85%)  5 ( 0.03%)  => /rustc/21b4a9cfdcbb1e76f4b36b5c3cfd64d627285093//library/std/src/ffi/c_str.rs:std::ffi::c_str::CStr::from_bytes_with_nul (152,469x)
  609,876 ( 0.08%) 609,876 ( 1.94%)       0           0           1 ( 0.00%)  .          .          .           .                .          .                .           .                       Ok(s) => Ok(f(s)),
        .                .                .           .           .           .          .          .           .                .          .                .           .                       Err(_) => Err(Errno::EINVAL),
        .                .                .           .           .           .          .          .           .                .          .                .           .                   }
  914,814 ( 0.12%) 762,345 ( 2.42%)       .           .           .           .          .          .           .                .          .                .           .               }
        .                .                .           .           .           .          .          .           .                .          .                .           .           }

After

21,462,416 (15.81%) 4,688,455 (15.27%) 2,896,847 (14.17%) 74 ( 0.64%) 11 ( 0.04%) 249 ( 1.58%) 8 ( 0.15%) 0          48 ( 0.38%) 2,593,200 (12.98%) 1,128 ( 0.22%) 762,305 (31.08%) 158 ( 1.32%)  => /home/asaveau/Desktop/nix/src/lib.rs:<[u8] as nix::NixPath>::with_nix_path (152,461x)
1,067,227 ( 0.79%)       0          609,844 ( 2.98%)  1 ( 0.01%)  0          0          1 ( 0.02%) .          .                .          .                .           .               fn with_nix_path<T, F>(&self, f: F) -> Result<T>
2,286,915 ( 1.68%) 304,922 ( 0.99%) 152,461 ( 0.75%) 19 ( 0.16%)  0          0          0          0          0          304,922 ( 1.53%) .                .           .           => ???:__rust_probestack (152,461x)
        .                .                .           .           .          .          .          .          .                .          .                .           .               where
        .                .                .           .           .          .          .          .          .                .          .                .           .                   F: FnOnce(&CStr) -> T,
        .                .                .           .           .          .          .          .          .                .          .                .           .               {
  304,922 ( 0.22%)       0                0           0           0          0          0          0          0          152,461 ( 0.76%) 6 ( 0.00%)       .           .                   if self.len() >= PATH_MAX as usize {
        .                .                .           .           .          .          .          .          .                .          .                .           .                       return Err(Errno::ENAMETOOLONG);
        .                .                .           .           .          .          .          .          .                .          .                .           .                   }
        .                .                .           .           .          .          .          .          .                .          .                .           .           
        .                .                .           .           .          .          .          .          .                .          .                .           .                   let mut buf = MaybeUninit::<[u8; PATH_MAX as usize]>::uninit();
        .                .                .           .           .          .          .          .          .                .          .                .           .                   let buf_ptr = buf.as_mut_ptr() as *mut u8;
        .                .                .           .           .          .          .          .          .                .          .                .           .           
        .                .                .           .           .          .          .          .          .                .          .                .           .                   unsafe {
        .                .                .           .           .          .          .          .          .                .          .                .           .                       ptr::copy_nonoverlapping(self.as_ptr(), buf_ptr, self.len());
        .                .                .           .           .          .          .          .          .                .          .                .           .                       buf_ptr.add(self.len()).write(0);
        .                .                .           .           .          .          .          .          .                .          .                .           .                   }
        .                .                .           .           .          .          .          .          .                .          .                .           .           
1,067,227 ( 0.79%) 304,922 ( 0.99%) 152,461 ( 0.75%)  1 ( 0.01%)  3 ( 0.01%) 0          1 ( 0.02%) 0          0          304,922 ( 1.53%) 4 ( 0.00%) 152,461 ( 6.22%)  1 ( 0.01%)          match CStr::from_bytes_with_nul(unsafe { slice::from_raw_parts(buf_ptr, self.len() + 1) }) {
8,349,726 ( 6.15%) 1,181,764 ( 3.85%) 1,067,227 ( 5.22%) 18 ( 0.15%)  1 ( 0.00%) 83 ( 0.53%) 2 ( 0.04%) 0          16 ( 0.13%) 1,220,453 ( 6.11%) 158 ( 0.03%) 152,461 ( 6.22%)  4 ( 0.03%)  => /rustc/21b4a9cfdcbb1e76f4b36b5c3cfd64d627285093//library/std/src/ffi/c_str.rs:std::ffi::c_str::CStr::from_bytes_with_nul (152,461x)
  609,844 ( 0.45%) 609,844 ( 1.99%)       .           .           .          .          .          .          .                .          .                .           .                       Ok(s) => Ok(f(s)),
        .                .                .           .           .          .          .          .          .                .          .                .           .                       Err(_) => Err(Errno::EINVAL),
        .                .                .           .           .          .          .          .          .                .          .                .           .                   }
  762,305 ( 0.56%) 609,844 ( 1.99%)       0           1 ( 0.01%)  0          0          1 ( 0.02%) .          .                .          .                .           .               }
        .                .                .           .           .          .          .          .          .                .          .                .           .           }

src/lib.rs Outdated
if self.len() >= PATH_MAX as usize {
return Err(Errno::ENAMETOOLONG);
}

#[allow(clippy::uninit_assumed_init)] // u8s do nothing when dropped
let mut buf: [u8; PATH_MAX as usize] = unsafe { MaybeUninit::uninit().assume_init() };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Rust documentation doesn't make it immediately clear that this is sound upon a quick read (and actually seems to imply it isn't), so I'll request more justification in your comment.

From https://doc.rust-lang.org/nightly/core/mem/union.MaybeUninit.html#safety:

It is up to the caller to guarantee that the MaybeUninit really is in an initialized state. Calling this (.assume_init()) when the content is not yet fully initialized causes immediate undefined behavior.

It seems like it'd be required to call assume_init() after populating buf, but that seems tricky to do without falling back to copy_nonoverlapping/CStr::from_ptr, which has performance implications for large paths (see #1583).

The nomicon has some helpful things to say, but even that example demonstrates the use of an array of MaybeUninit types followed by a transmute, rather than going directly to the desired type.

TL;DR: This ultimately seems reasonable to me, but I'd like to see a more in-depth justification and also get @asomers thoughts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RalfJung hope you don't mind the ping. :) Can you comment on if creating an assume_init array of u8s is legal so long as you write them before reading? And if not, what's the prefered way to let people perform write operations on a MaybeUninit slice when there's no API that supports it? I've actually had similar use cases before where there appears to be no solution besides marking the whole array as assume_init. Here's an example where I want to use a RNG's fill_bytes method:

let mut buf: [u8; 4096] = unsafe { MaybeUninit::uninit().assume_init() };
random.fill_bytes(&mut buf[0..used]);
file.write_all(&buf[0..used])?;

@rtzoeller Agreed, the assume_init requirements are confusing and seem to contradict each other. For example, they say "The assume_init is safe because the type we are claiming to have initialized here is a bunch of MaybeUninits, which do not require initialization." A u8 also doesn't require initialization, so does that make it too? Hopefully @RalfJung can answer. Otherwise, I'll continue digging into the commit history to see if I can find something.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SUPERCILEX can you try the following implementation and report back on performance? It much more directly mirrors the example usage of MaybeUninit, and in my (admittedly very synthetic) benchmarks seems to perform similarly to your original implementation (which surprised me).

fn with_nix_path<T, F>(&self, f: F) -> Result<T>
where
    F: FnOnce(&CStr) -> T,
{
    if self.len() >= PATH_MAX as usize {
        return Err(Errno::ENAMETOOLONG);
    }

    // MaybeUninit usage modelled after
    // https://doc.rust-lang.org/stable/std/mem/union.MaybeUninit.html#initializing-an-array-element-by-element
    let mut buf: [MaybeUninit<u8>; PATH_MAX as usize] =
        unsafe { MaybeUninit::uninit().assume_init() };
    // copy_from_slice() doesn't work for MaybeUninit types
    for (ptr, elt) in buf.iter_mut().zip(self) {
        ptr.write(*elt);
    }
    buf[self.len()].write(0); // NUL terminate the string

    let buf = unsafe { std::mem::transmute::<_, [u8; PATH_MAX as usize]>(buf) };
    match CStr::from_bytes_with_nul(&buf[..=self.len()]) {
        Ok(s) => Ok(f(s)),
        Err(_) => Err(Errno::EINVAL),
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's definitely not kosher to call assume_init() anything that isn't actually initialized. This is the easy and safe way to implement that function without over-initializing, though it involves a heap allocation. Is it fast enough?

    fn with_nix_path<T, F>(&self, f: F) -> Result<T>
            where F: FnOnce(&CStr) -> T {
        let mut v: Vec<u8> = Vec::from(self);
        v.push(0);
        match CStr::from_bytes_with_nul(&v) {
            Ok(s) => Ok(f(s)),
            Err(_) => Err(Errno::EINVAL)
        }
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've spent a gazillion hours reading what feels like everything under the sun about uninitialized memory. 😁 Here's my executive summary:

  • Simply creating a variable with uninitialized memory for a type where not all bit patterns are valid is UB. bool is a good example: in the Rust abstract machine, bool can only ever be 0 or 1, so simply the act of declaring an uninitialized bool results in an invalid Rust program. This means the optimizer is free to assume you cannot possibly be in the current state (since a valid Rust program would never be) and therefore you can delete all the code after that variable declaration, insert a panic, etc.
  • Creating a variable with uninitialized memory for a type where every bit pattern is valid is not UB. In LLVM IR, it will be represented as an undef (which is below a poison which is below UB). That said, it means that the optimizer can pick whatever value makes it happy and does not need to be consistent about its choices. Here's a slightly simplified version of a popular example:
    #![feature(bench_black_box)]
    
    use std::mem;
    
    fn always_returns_true(b: u8) -> bool {
        std::hint::black_box(b == 5) || std::hint::black_box(b != 5)
    }
    
    pub fn main() {
       let x: u8 = unsafe { mem::MaybeUninit::uninit().assume_init() };
       println!("{}", always_returns_true(x));
    }
    Run it: https://rust.godbolt.org/z/cPEa5dnxs

So where does that leave us? Well theoretically speaking, my code has one piece of UB (more accurately, unspecified behavior) and one correct undef. First, it creates a reference (buf[..self.len()]) that points to uninitialized memory, but references must always point to initialized memory. Therefore, not a valid Rust program. Second, we have a bunch of u8's that are undef. However, since we never read them before writing, we cannot observe their undef behavior so we don't care.

Now what about from a pragmatic perspective? Neither cases have been specified: https://github.com/rust-lang/unsafe-code-guidelines/blob/master/active_discussion/validity.md. So really, people are already using references "incorrectly" (even in the stdlib IIRC) so it's unlikely that this model will break anytime soon.

That said, I figured out a way to rewrite this with no UB that I can tell. We use an initialized raw pointer. Raw pointers can be whatever the heck we want, it's only when they are dereferenced that the data they point to must be valid. The only part I'm still a little iffy on is buf.assume_init_ref()[..=self.len()]: it creates a reference to an array that is partially initialized, then immediately creates another reference to the part of the array that is initialized. I think that's fine, but if we really wanted to go crazy, we could stick with raw pointers and create the slice directly: slice::from_raw_parts(buf, self.len() + 1) (though this is ~1M instructions slower for whatever reason).

With the new implementation, we get even faster which is cool:

20,704,687 (15.32%) 4,384,493 (14.41%) 2,744,994 (13.52%) 134 ( 1.23%) 13 ( 0.04%) 265 ( 1.72%) 8 ( 0.15%) 0          45 ( 0.37%) 2,441,283 (12.30%) 1,160 ( 0.23%) 609,972 (26.51%)  4 ( 0.04%)  => /home/asaveau/Desktop/nix/src/lib.rs:<[u8] as nix::NixPath>::with_nix_path (152,493x)

@rtzoeller your implementation benchmarks similarly in terms of time taken, but suffers significantly on the instruction count side:

180,403,130 (61.18%) 56,855,699 (68.57%) 39,502,833 (69.22%) 590 ( 4.24%) 5,281 (12.19%) 7,548 (32.57%) 12 ( 0.22%) 701 (14.96%) 1,069 ( 8.01%) 18,761,972 (51.87%) 305,759 (37.73%) 6,101,922 (78.29%) 1,513 (14.69%)  => /home/asaveau/Desktop/nix/src/lib.rs:<[u8] as nix::NixPath>::with_nix_path (152,496x)

@asomers I'm very strongly opposed to creating an allocation per syscall. The whole point of this PR is to be completely zero-cost. Interestingly enough, allocating is not the worst on the instruction count front:

48,197,762 (29.65%) 13,229,886 (33.70%) 8,234,742 (31.94%) 135 ( 1.23%) 34 ( 0.11%) 9 ( 0.06%) 7 ( 0.13%) 0          0          6,253,608 (26.44%) 2,324 ( 0.46%) 1,524,930 (47.41%) 15 ( 0.20%)  => /home/asaveau/Desktop/nix/src/lib.rs:<[u8] as nix::NixPath>::with_nix_path (152,493x)

Though when I ran the benchmarks from #1583, it performed significantly worse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@asomers forgot to mention that I also opened a PR that removes the PATH_MAX restriction with a heap allocation so we can avoid probe frames: #1656

Copy link

@RalfJung RalfJung Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a variable with uninitialized memory for a type where every bit pattern is valid is not UB.

That is not correct -- the Rust Reference says this is UB, too. Indeed, undef is not just "some unknown bit pattern", it is a value that is clearly distinct from any single 'regular' bit pattern, and u8 is well-defined for any regular bit pattern but not for undef.

let mut buf: [u8; 4096] = unsafe { MaybeUninit::uninit().assume_init() }; is hence UB.

Here's an example where I want to use a RNG's fill_bytes method:

The current Write trait does not adequately support writing into an uninitialized buffer; this RFC describes the planned changes to make it compatible. Other, similar APIs should probably use the same pattern.

When interfacing with C code that does in-place initialization, the suggested pattern is to use raw pointers only until data is actually initialized. (I currently can't dig into the details of what this particular PR does so I hope at least some of these remarks are useful... if you have a self-contained question then I can also try to answer that :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not correct -- the Rust Reference says this is UB, too.

Isn't that still unspecified though? Or at least, I thought the validity discussions were about deciding whether or not it should be considered UB for Rust. From the LLVM perspective, an undef u8 is defined (as far as I understand). For example, undefu8 | 3 = uuuuuu11. Those us can be whatever and change whenever, but we know that no matter what LLVM chooses for undef, the lower two bits must be 1. That's what I got from these slides.

this RFC describes the planned changes to make it compatible.

That's freakin' awesome! Glad to see progress. :)

Copy link

@RalfJung RalfJung Feb 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is indeed no UB on the LLVM level here, but you are not writing LLVM IR -- you are writing Rust code, so the relevant rules for UB are the ones for Rust, not LLVM.

And for now, Rust is pretty clear in this regard. It is true that there is discussion here for what the final rules should be (rust-lang/unsafe-code-guidelines#71), but the only reason we can have that discussion is that currently this is declared UB -- so we can allow more code later, depending on the outcome of the discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhhh, that makes perfect sense. Thank you for all the feedback!

src/lib.rs Outdated Show resolved Hide resolved
@SUPERCILEX SUPERCILEX changed the title 30X performance improvement in with_nix_path 30x performance improvement in with_nix_path Feb 5, 2022
@rtzoeller
Copy link
Collaborator

@SUPERCILEX the changes look good to me!

Before merging, a few procedural things need to be addressed:

  1. Lets drop 067c1a4 from this PR, to keep the scope limited to performance changes.
  2. Can you squash the remaining commits?
  3. Please update the instruction counts in the PR description to reflect the latest code.

@SUPERCILEX
Copy link
Contributor Author

Sweet!

  1. Is it okay to merge Run rustfmt on lib.rs and ignore .idea #1659 and then I rebase?
  2. Can you squash on the PR side? I can do it but it means we'll lose the history which would be a bummer.
  3. Done!

SUPERCILEX added a commit to SUPERCILEX/ftzz that referenced this pull request Feb 13, 2022
- Update nix with my PRs: nix-rust/nix#1656, nix-rust/nix#1655
- Use raw pointer manipulation to extract name cache values
- Use a faster rand implementation with better statistical guarantees
- Tune the task queue capacity and consumption rate and get rid of copies in drain (by using a VecDeque)

Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
@rtzoeller
Copy link
Collaborator

@SUPERCILEX apologies for the delay in getting back to this.

My preference would be to avoid submitting bulk formatting changes for now - I'd prefer if those could wait until #1657 is submitted, which I'd like to hold off on for now.

We use bors-ng for merging and validating PRs, and do not have it configured to squash changes. Consequently these changes will need to be manually squashed. I am not particularly worried about losing the history, as most of the intermediate changes were unsound.

Signed-off-by: Alex Saveau <saveau.alexandre@gmail.com>
@SUPERCILEX
Copy link
Contributor Author

Makes sense, done!

@rtzoeller
Copy link
Collaborator

bors r+

bors bot added a commit that referenced this pull request Feb 14, 2022
1655: 30x performance improvement in `with_nix_path` r=rtzoeller a=SUPERCILEX

I've been digging into CPU instructions counts and found that `nix` accounted for an eye-watering 85% of my program's instruction counts (yes, I do a lot of I/O, I know).

The fix is simple: don't initialize the stack memory since we're just going to overwrite it anyway.

> Note: I also ran rustfmt in a separate commit, not sure if that's ok.

### Before

```
650,398,225 (85.05%) 5,451,056 (17.31%) 627,714,969 (97.28%) 60 ( 0.54%) 22 ( 0.07%) 3,997 (20.98%) 10 ( 0.18%) 0          752 ( 5.98%) 627,716,244 (97.30%) 267,333 (34.62%) 914,814 (35.11%) 105 ( 0.73%)  => /home/asaveau/Desktop/nix/src/lib.rs:<[u8] as nix::NixPath>::with_nix_path (152,469x)
```

```
1,677,159 ( 0.22%)       0          762,345 ( 0.12%) 14 ( 0.13%)  0           0          2 ( 0.04%) .           .                .          .                .           .               fn with_nix_path<T, F>(&self, f: F) -> Result<T>
2,287,035 ( 0.30%) 304,938 ( 0.97%) 152,469 ( 0.02%)  5 ( 0.04%)  0           0          0          0           0          304,938 ( 0.05%) .                .           .           => ???:__rust_probestack (152,469x)
        .                .                .           .           .           .          .          .           .                .          .                .           .               where
        .                .                .           .           .           .          .          .           .                .          .                .           .                   F: FnOnce(&CStr) -> T,
        .                .                .           .           .           .          .          .           .                .          .                .           .               {
  457,407 ( 0.06%) 152,469 ( 0.48%) 152,469 ( 0.02%)  0           0          93 ( 0.49%) 0          0          16 ( 0.13%)       0          0          152,469 ( 5.85%) 97 ( 0.67%)          let mut buf = [0u8; PATH_MAX as usize];
627,104,997 (82.00%) 304,938 ( 0.97%) 624,513,024 (96.78%)  1 ( 0.01%)  0          3,814 (20.02%) 1 ( 0.02%) 0          720 ( 5.72%) 625,122,900 (96.90%) 152,495 (19.75%)       .           .           => ./string/../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:__memset_avx2_unaligned_erms (152,469x)
        .                .                .           .           .           .          .          .           .                .          .                .           .           
  304,938 ( 0.04%)       0                0           0           0           0          0          0           0          152,469 ( 0.02%) 2 ( 0.00%)       .           .                   if self.len() >= PATH_MAX as usize {
        .                .                .           .           .           .          .          .           .                .          .                .           .                       return Err(Errno::ENAMETOOLONG);
        .                .                .           .           .           .          .          .           .                .          .                .           .                   }
        .                .                .           .           .           .          .          .           .                .          .                .           .           
1,067,283 ( 0.14%) 152,469 ( 0.48%) 152,469 ( 0.02%)  1 ( 0.01%)  4 ( 0.01%)  0          1 ( 0.02%) 0           0                0          0          152,469 ( 5.85%)  1 ( 0.01%)          buf[..self.len()].copy_from_slice(self);
3,202,541 ( 0.42%) 914,814 ( 2.91%) 609,876 ( 0.09%)  1 ( 0.01%)  2 ( 0.01%)  0          0          0           0          458,005 ( 0.07%) 114,562 (14.84%) 152,469 ( 5.85%)  .           => /rustc/21b4a9cfdcbb1e76f4b36b5c3cfd64d627285093/library/core/src/slice/mod.rs:core::slice::<impl [T]>::copy_from_slice (152,469x)
  762,345 ( 0.10%) 304,938 ( 0.97%) 152,469 ( 0.02%)  0           6 ( 0.02%)  0          0          0           0          152,469 ( 0.02%) 6 ( 0.00%) 152,469 ( 5.85%)  1 ( 0.01%)          match CStr::from_bytes_with_nul(&buf[..=self.len()]) {
8,350,190 ( 1.09%) 1,181,828 ( 3.75%) 1,067,283 ( 0.17%) 19 ( 0.17%)  2 ( 0.01%) 87 ( 0.46%) 3 ( 0.05%) 0          16 ( 0.13%) 1,220,525 ( 0.19%) 167 ( 0.02%) 152,469 ( 5.85%)  5 ( 0.03%)  => /rustc/21b4a9cfdcbb1e76f4b36b5c3cfd64d627285093//library/std/src/ffi/c_str.rs:std::ffi::c_str::CStr::from_bytes_with_nul (152,469x)
  609,876 ( 0.08%) 609,876 ( 1.94%)       0           0           1 ( 0.00%)  .          .          .           .                .          .                .           .                       Ok(s) => Ok(f(s)),
        .                .                .           .           .           .          .          .           .                .          .                .           .                       Err(_) => Err(Errno::EINVAL),
        .                .                .           .           .           .          .          .           .                .          .                .           .                   }
  914,814 ( 0.12%) 762,345 ( 2.42%)       .           .           .           .          .          .           .                .          .                .           .               }
        .                .                .           .           .           .          .          .           .                .          .                .           .           }
```

### After

```
21,462,416 (15.81%) 4,688,455 (15.27%) 2,896,847 (14.17%) 74 ( 0.64%) 11 ( 0.04%) 249 ( 1.58%) 8 ( 0.15%) 0          48 ( 0.38%) 2,593,200 (12.98%) 1,128 ( 0.22%) 762,305 (31.08%) 158 ( 1.32%)  => /home/asaveau/Desktop/nix/src/lib.rs:<[u8] as nix::NixPath>::with_nix_path (152,461x)
```

```
1,067,227 ( 0.79%)       0          609,844 ( 2.98%)  1 ( 0.01%)  0          0          1 ( 0.02%) .          .                .          .                .           .               fn with_nix_path<T, F>(&self, f: F) -> Result<T>
2,286,915 ( 1.68%) 304,922 ( 0.99%) 152,461 ( 0.75%) 19 ( 0.16%)  0          0          0          0          0          304,922 ( 1.53%) .                .           .           => ???:__rust_probestack (152,461x)
        .                .                .           .           .          .          .          .          .                .          .                .           .               where
        .                .                .           .           .          .          .          .          .                .          .                .           .                   F: FnOnce(&CStr) -> T,
        .                .                .           .           .          .          .          .          .                .          .                .           .               {
  304,922 ( 0.22%)       0                0           0           0          0          0          0          0          152,461 ( 0.76%) 6 ( 0.00%)       .           .                   if self.len() >= PATH_MAX as usize {
        .                .                .           .           .          .          .          .          .                .          .                .           .                       return Err(Errno::ENAMETOOLONG);
        .                .                .           .           .          .          .          .          .                .          .                .           .                   }
        .                .                .           .           .          .          .          .          .                .          .                .           .           
        .                .                .           .           .          .          .          .          .                .          .                .           .                   let mut buf = MaybeUninit::<[u8; PATH_MAX as usize]>::uninit();
        .                .                .           .           .          .          .          .          .                .          .                .           .                   let buf_ptr = buf.as_mut_ptr() as *mut u8;
        .                .                .           .           .          .          .          .          .                .          .                .           .           
        .                .                .           .           .          .          .          .          .                .          .                .           .                   unsafe {
        .                .                .           .           .          .          .          .          .                .          .                .           .                       ptr::copy_nonoverlapping(self.as_ptr(), buf_ptr, self.len());
        .                .                .           .           .          .          .          .          .                .          .                .           .                       buf_ptr.add(self.len()).write(0);
        .                .                .           .           .          .          .          .          .                .          .                .           .                   }
        .                .                .           .           .          .          .          .          .                .          .                .           .           
1,067,227 ( 0.79%) 304,922 ( 0.99%) 152,461 ( 0.75%)  1 ( 0.01%)  3 ( 0.01%) 0          1 ( 0.02%) 0          0          304,922 ( 1.53%) 4 ( 0.00%) 152,461 ( 6.22%)  1 ( 0.01%)          match CStr::from_bytes_with_nul(unsafe { slice::from_raw_parts(buf_ptr, self.len() + 1) }) {
8,349,726 ( 6.15%) 1,181,764 ( 3.85%) 1,067,227 ( 5.22%) 18 ( 0.15%)  1 ( 0.00%) 83 ( 0.53%) 2 ( 0.04%) 0          16 ( 0.13%) 1,220,453 ( 6.11%) 158 ( 0.03%) 152,461 ( 6.22%)  4 ( 0.03%)  => /rustc/21b4a9cfdcbb1e76f4b36b5c3cfd64d627285093//library/std/src/ffi/c_str.rs:std::ffi::c_str::CStr::from_bytes_with_nul (152,461x)
  609,844 ( 0.45%) 609,844 ( 1.99%)       .           .           .          .          .          .          .                .          .                .           .                       Ok(s) => Ok(f(s)),
        .                .                .           .           .          .          .          .          .                .          .                .           .                       Err(_) => Err(Errno::EINVAL),
        .                .                .           .           .          .          .          .          .                .          .                .           .                   }
  762,305 ( 0.56%) 609,844 ( 1.99%)       0           1 ( 0.01%)  0          0          1 ( 0.02%) .          .                .          .                .           .               }
        .                .                .           .           .          .          .          .          .                .          .                .           .           }
 ```

Co-authored-by: Alex Saveau <saveau.alexandre@gmail.com>
@bors
Copy link
Contributor

bors bot commented Feb 14, 2022

Build failed:

@SUPERCILEX
Copy link
Contributor Author

bors retry

@bors
Copy link
Contributor

bors bot commented Feb 14, 2022

🔒 Permission denied

Existing reviewers: click here to make SUPERCILEX a reviewer

@SUPERCILEX
Copy link
Contributor Author

@rtzoeller

Failed to start an instance! CreateContainerError: context deadline exceeded

Guessing a retry is needed

@rtzoeller
Copy link
Collaborator

bors retry

@bors bors bot merged commit 3ea9b7b into nix-rust:master Feb 14, 2022
@SUPERCILEX SUPERCILEX deleted the perf branch February 27, 2022 21:52
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 9, 2022
Reduce CString allocations in std as much as possible

Currently, every operation involving paths in `fs` allocates memory to hold the path before sending it through the syscall. This PR instead uses a stack allocation (chosen size is somewhat arbitrary) when the path is short before falling back to heap allocations for long paths.

Benchmarks show that the stack allocation is ~2x faster for short paths:

```
test sys::unix::fd::tests::bench_heap_path_alloc                  ... bench:          34 ns/iter (+/- 2)
test sys::unix::fd::tests::bench_stack_path_alloc                 ... bench:          15 ns/iter (+/- 1)
```

For long paths, I couldn't find any measurable difference.

---

I'd be surprised if I was the first to think of this, so I didn't fully flush out the PR. If this change is desirable, I'll make use of `run_with_cstr` across all platforms in every fs method (currently just unix open for testing). I also added an `impl From<FromBytesWithNulError>` which is presumably a no-no (or at least needs to be done in another PR).

---

Also see nix-rust/nix#1655 with a bunch of discussion where I'm doing something similar.
RalfJung pushed a commit to RalfJung/miri that referenced this pull request Oct 10, 2022
Reduce CString allocations in std as much as possible

Currently, every operation involving paths in `fs` allocates memory to hold the path before sending it through the syscall. This PR instead uses a stack allocation (chosen size is somewhat arbitrary) when the path is short before falling back to heap allocations for long paths.

Benchmarks show that the stack allocation is ~2x faster for short paths:

```
test sys::unix::fd::tests::bench_heap_path_alloc                  ... bench:          34 ns/iter (+/- 2)
test sys::unix::fd::tests::bench_stack_path_alloc                 ... bench:          15 ns/iter (+/- 1)
```

For long paths, I couldn't find any measurable difference.

---

I'd be surprised if I was the first to think of this, so I didn't fully flush out the PR. If this change is desirable, I'll make use of `run_with_cstr` across all platforms in every fs method (currently just unix open for testing). I also added an `impl From<FromBytesWithNulError>` which is presumably a no-no (or at least needs to be done in another PR).

---

Also see nix-rust/nix#1655 with a bunch of discussion where I'm doing something similar.
thomcc pushed a commit to tcdi/postgrestd that referenced this pull request Feb 10, 2023
Reduce CString allocations in std as much as possible

Currently, every operation involving paths in `fs` allocates memory to hold the path before sending it through the syscall. This PR instead uses a stack allocation (chosen size is somewhat arbitrary) when the path is short before falling back to heap allocations for long paths.

Benchmarks show that the stack allocation is ~2x faster for short paths:

```
test sys::unix::fd::tests::bench_heap_path_alloc                  ... bench:          34 ns/iter (+/- 2)
test sys::unix::fd::tests::bench_stack_path_alloc                 ... bench:          15 ns/iter (+/- 1)
```

For long paths, I couldn't find any measurable difference.

---

I'd be surprised if I was the first to think of this, so I didn't fully flush out the PR. If this change is desirable, I'll make use of `run_with_cstr` across all platforms in every fs method (currently just unix open for testing). I also added an `impl From<FromBytesWithNulError>` which is presumably a no-no (or at least needs to be done in another PR).

---

Also see nix-rust/nix#1655 with a bunch of discussion where I'm doing something similar.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants