Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe memset for slices #2067

Closed
sethjackson opened this issue Jul 15, 2017 · 20 comments
Closed

Safe memset for slices #2067

sethjackson opened this issue Jul 15, 2017 · 20 comments
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.

Comments

@sethjackson
Copy link

sethjackson commented Jul 15, 2017

RFC 1419 implemented safe memcpy but not memset.
It would be awesome if we could fill slices with data in a safe manner that would lower into memset.

Currently if I want to fill a slice with memset semantics I have to do this:

unsafe {
    ptr::write_bytes(self.slice.as_mut_ptr(), 0, self.slice.len() - 1);
}

I would like to be able to do something like this:

self.slice.fill(0);
@SimonSapin
Copy link
Contributor

Would this only exist for [u8]? Or be unsafe? Writing arbitrary bytes to arbitrary types is not valid in the general case. For examples the items of [&T] are not allowed to be null.

@burdges
Copy link

burdges commented Jul 15, 2017

Would adding a method to Default make sense?

@mark-i-m
Copy link
Member

I think a slightly more general impl would be a blanket impl like

impl<T: Copy> [T] {
    pub fn fill(val: T) {
        for i in self.iter_mut() {
            *i = val;
        }
    }
}

@sethjackson
Copy link
Author

sethjackson commented Jul 16, 2017

@SimonSapin I would like fill (or whatever the chosen method name would be) to lower into memset (ptr::write_bytes) for at least all primitives and be safe. Like copy_from_slice does for memcpy.

Basically I would like a safe memset for slices like there exists a safe memcpy.

@SimonSapin
Copy link
Contributor

@sethjackson Right, I understand the goal, it’s just more tricky to get there than with copy_from_slice.

First, would the argument to [T]::fill be u8 or T? If the former, how do you make it safe for arbitrary Ts? If the latter, for size_of::<T>() > 1, do you check that each byte of the value is identical to call write_bytes, and have a fallback code path if they’re not? Or have a default impl like @mark-i-m wrote, and use specialization for primitive integer types? What about other types like [char] or [*const T] or [Option<&T>] or [(u16, u16)]? memset to zero would be safe for those, but other values not necessarily.

@sethjackson
Copy link
Author

Ah I see.

I was thinking the argument would be T. Like C++'s std::array::fill. I think in that case a default impl could be provided and then specialize the primitive impls like you mentioned.

@mark-i-m
Copy link
Member

@sethjackson Right, that's what I was going for too.

@abonander
Copy link

The implementation doesn't have to lower to memset, it could hit a vectorized write loop instead. It could have a check for if the bytes of the value are all the same (ignoring padding) and lower to memset then.

@mark-i-m
Copy link
Member

mark-i-m commented Jul 23, 2017 via email

@abonander
Copy link

LLVM can still optimize it, of course. I'm just thinking about performance improvement for dynamic values.

@joshtriplett
Copy link
Member

joshtriplett commented Jul 27, 2017

@sethjackson (and others) For anything other than [u8] and equivalent, do you need full memset, or do you just need zeroing?

@sethjackson
Copy link
Author

@joshtriplett not sure I understand. What do you mean by full memset or just zeroing?

For [u8] and equivalent I would like fill(val: T) to memset. For other types where fill can't memset I don't want zeroing I want it to fill the slice with the value specified to fill ( the value passed to fill could be something other than zero).

@bluss
Copy link
Member

bluss commented Aug 3, 2017

I would prefer this a generic method using either T: Copy or T: Clone. The details are a matter of implementation quality, but it's not complicated to guarantee that it's like memset when T is a byte.

impl<T> [T] {
    fn fill(&mut self, value: T) where T: Copy { ... }
}

For T: Copy we can even write our own multi-byte splat function with simd down the line, if needed.

(An alternative could potentially be a generalization, that it instead is a method on iterators - implemented for Iterator<Item=&mut T>. With specialization we can still recover memset for &mut [u8].)

@Techcable
Copy link

Techcable commented Oct 1, 2017

You could have an unsafe trait ArbitraryBytes {} that guarantees a type can be filled with arbitrary bytes and used with fill_bytes.
It could be safely implemented for all the primitive integers, without implementing it for floating point types or references. The fill_bytes method could be safely and directly lowered to a very fastmemset without the cost of a general solution for all Copy types.

In fact, this abstraction is already possible in stable rust. Hopefully, something like this will be added to nightly soon.

@joshtriplett
Copy link
Member

@Techcable We could also have a parent trait of that, for types that permit zero-filled memory as a valid representation. Floating-point types can implement that, too, as can Option.

@SimonSapin
Copy link
Contributor

as can Option

Option<&T> yes, but other options might have other representation.

@eddyb
Copy link
Member

eddyb commented Oct 2, 2017

Indeed, I have a branch where Option<bool> is one byte, and None::<bool> is 2.

@Techcable
Copy link

Techcable commented Oct 2, 2017

@joshtriplett
While I do agree about some sort of parent ZeroableBytes trait,
I don't think the type system can currently require that an Option uses the null-pointer representation. Additionally, this would allow code to depend on the null-pointer representation and make changing the representation a breaking change. I don't think we need to complicate the safe memset wrapper with support for the null-pointer optimization right now.

However, like @bluss suggested specialization could be used to provide much of the benefit transparently when the null-pointer representation is used. Specialization could also be used to optimize filling an AribitraryBytes type into a memset as long as the type is only a single byte wide. When we're zeroing out a ZeroableBytes trait, we can be even more general, and specialize into a memset regardless of the size of the type.

@Centril Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Dec 6, 2017
@scottmcm
Copy link
Member

One curious thing I saw when this came up in Discord: the simple loop will turn into memset for 0_u8 and 0_u16, but not for (0_u8, 0_u16).

I assume that's because the padding is undef, not 0? It makes me wonder if the optimization can be improved to be ok with zero there, or if it implies there's something that would be worth having as a safe API in core to avoid the edge case.

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Apr 5, 2020
Add slice::fill

Adds the `slice::fill` method to fill a slice with an item. This replaces manual for loops where items are copied one-by-one. This is a counterpart to C++20's [`std::fill`](https://en.cppreference.com/w/cpp/algorithm/fill) function.

## Usage

```rust
let mut buf = vec![0; 10];
buf.fill(1);
assert_eq!(buf, vec![1; 10]);
```

## Performance

When compiling in release mode, for `[u8]` and `[u16]` this method will optimize to a `memset(3)` call ([godbolt](https://godbolt.org/z/85El_c)). The initial implementation relies on LLVM's optimizer to make it as fast as possible for any given input. But as @jonas-schievink [pointed out](https://twitter.com/sheevink/status/1245756597453885442) this can later be optimized through specialization to guarantee it has a specific performance profile.

## Why now?

Conversations about adding `slice::fill` are not new. In fact, rust-lang/rfcs#2067 was opened 3 years ago about this exact topic. However discussion stranded while discussing implementation details, and it's not seen much forward motion since.

In ["The Hunt for the Fastest Zero"](https://travisdowns.github.io/blog/2020/01/20/zero.html) Travis Downs provides disects C++'s `std::fill` performance profile on gcc, comparing it among others to `memset(3)`. Even though `memset(3)` outperforms `std::fill` in their tests, the author notes the following:

>  That the optimization fails, perhaps unexpectedly, in some cases is unfortunate but it’s nice that you can fix it yourself. [...] Do we throw out modern C++ idioms, at least where performance matters, for example by replacing std::fill with memset? I don’t think so.

Much of the article focuses on how how to fix the performance of `std::fill` by providing specializations for specific input. In Rust we don't have any dedicated methods to fill slices with values, so it either needs to be optimized at the MIR layer, or more likely rely on LLVM's optimizer.

By adding a dedicated method for filling slices with values it opens up the ability for us to in the future guarantee that e.g. `Vec<u8>` will always optimize to `memset` even in debug mode. Or perhaps provide stronger guarantees about memory when zeroing values when a certain flag is passed. But regardless of that, it improves general ergonomics of working with slices by providing a dedicated method with documentation and examples.

## References
- [slice-fill prototype on docs.rs](https://docs.rs/slice-fill/1.0.1/slice_fill/)
- [The Hunt For The Fastest Zero](https://travisdowns.github.io/blog/2020/01/20/zero.html)
- [Safe memset for slices](rust-lang/rfcs#2067)
- [C++20 std::fill](https://en.cppreference.com/w/cpp/algorithm/fill)
- [ASM output on Godbolt](https://godbolt.org/z/5-XU66)
@mbrubeck
Copy link
Contributor

The slice::fill method is now stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

No branches or pull requests