Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Safe memcpy for slices ([T]::copy_from_slice) #1419

Merged
merged 12 commits into from Feb 18, 2016

Conversation

Projects
None yet
@ubsan
Copy link
Contributor

commented Dec 20, 2015

No description provided.

@Aatch

This comment has been minimized.

Copy link
Contributor

commented Dec 20, 2015

Looks good to me. I'd personally prefer only panicking on src.len() > dst.len() but slicing dst to the appropriate length isn't difficult, so I don't feel strongly about it.

@bluss

This comment has been minimized.

Copy link

commented Dec 20, 2015

I think it's a good call to have a free function for std::slice::copy(src, dst), the two parameters have equal importance.

@ubsan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2015

@Aatch I also don't feel too strongly about this, but the current design does make this specific use nice:

// let src: &[T];
let mut dst: [T; N] = unsafe { std::mem::uninitialized() };
std::slice::copy(src, &mut dst); // You know that dst is fully initialized from here on
@Dr-Emann

This comment has been minimized.

Copy link

commented Dec 20, 2015

As a note, dlang has special syntax for set and copy, and it only allows arrays of equal size.

e.g.

auto x = new int[10];
auto y = new int[11];
x[] = 5; // fill x with 5's
y[] = x[]; // Array lengths don't match for copy: 10 != 11
However, `std::slice::bytes::copy_memory`, the function I'm basing this on, only
panics if `dst.len() < src.len()`. So... room for discussion, here.

These are necessary functions, in the opinion of the author.

This comment has been minimized.

Copy link
@eddyb

eddyb Dec 20, 2015

Member

They could be methods on slices, not necessarily free functions.

@Amanieu

This comment has been minimized.

Copy link
Contributor

commented Dec 20, 2015

I think fill would be a better name than set. This matches the C++ std::fill function which fills a range with copies of a single.

@eddyb

This comment has been minimized.

Copy link
Member

commented Dec 20, 2015

@Amanieu I definitely prefer xs.fill(x). Not sure if there are actual issues around adding methods to slice types.

@ubsan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2015

@eddyb @Amanieu I don't think there are issues around adding methods to slice types, but I don't personally like having these functions as methods. These feel "right" as free functions (at least, copy does). I can see set as a method called fill however.

Although, perhaps fill would be better as a method for Clone types. This is meant to be lowered to a memset in most cases, or something similar.

@eddyb

This comment has been minimized.

Copy link
Member

commented Dec 20, 2015

@ubsan Keep in mind that the main reason for std::slice and std::str (and all of the modules for integers and floats, but I digress) is that once upon a time, there were no inherent impls for primitives, so you had to choose between free functions and traits.
And free functions won for a long time.

I can see how copy looks cleaner as a free function, but I think assignment syntax is optimal there.

@ubsan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2015

@eddyb true, for set. I think I'll change the RFC to be fn set -> fn [T]::fill. I still think copy looks better as a free function, however.

Switch to fill, Add some language
specifically about being defined for slices with uninitialized values

Also add some language about alternatives
Add one function to `std::slice`.

```rust
pub fn copy<T: Copy>(src: &[T], dst: &mut [T]);

This comment has been minimized.

Copy link
@Manishearth

Manishearth Dec 20, 2015

Member

Could this cause issue when T contains &U? We have to be careful to avoid lifetime lengthening. But I guess variance will fix it.

@Manishearth

This comment has been minimized.

Copy link
Member

commented Dec 20, 2015

Note that free functions are harder to discover doc-wise. The general rust community seems to avoid them in code; though I'm not certain of this.

@ubsan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2015

@Manishearth copy, imho, is better as a free function. copy_from and copy_to are both valid ways of writing copy as a method, and are both fairly ugly to write in regular code, imo.

slice.copy(slice2); // which is src? which is dst?
slice.copy_to(slice2); // okay, it's obvious here, but it's kind of ugly
slice.copy_from(slice2); // same as ^
std::slice::copy(slice, slice2); // nice! also, a connection to std::ptr::copy_nonoverlapping here.
@Manishearth

This comment has been minimized.

Copy link
Member

commented Dec 20, 2015

I think copy_to or copy_from are better (I like copy_to better of these, but not for any real reason). They're clearer, don't obfuscate the src/dst, and are methods instead of free fns. copy(src, dst) will invariably have you looking up the docs all the time. The &[T]/&mut [T] distinction helps, but in case both happen to be mutable you're stuck again.

@jFransham

This comment has been minimized.

Copy link

commented Dec 20, 2015

@ubsan equally to your first example:

std::slice::copy(slice, slice2); // which is src? which is dst?

I think this is what @Manishearth was talking about when he mentioned looking up the docs. It's obvious if you've used memcpy for years but for people new to systems programming it will be confusing.

@Manishearth

This comment has been minimized.

Copy link
Member

commented Dec 20, 2015

It's obvious if you've used memcpy for years

Not even that, because that convention isn't uniformly followed across programming languages. I think even rust had a deliberately "backwards" memcpy ordering in the past (I think it got fixed in the rush for 1.0 or something)

@tomjakubowski

This comment has been minimized.

Copy link
Contributor

commented Dec 20, 2015

@ubsan equally to your first example:

std::slice::copy(slice, slice2); // which is src? which is dst?

I think this is what @Manishearth was talking about when he mentioned looking up the docs. It's obvious if you've used memcpy for years but for people new to systems programming it will be confusing.

The order proposed in the RFC is the opposite of memcpy's, in fact:

void *memcpy(void *restrict dst, const void *restrict src, size_t n);

Anyway, in many cases the Rust would look more like:

std::slice::copy(&slice, &mut slice2);

which makes it pretty clear which argument is source and which is destination.

```

`fill` loops through slice, setting each member to value. This will lower to a
memset in all possible cases. It is defined to call `fill` on a slice which has

This comment has been minimized.

Copy link
@tomjakubowski

tomjakubowski Dec 20, 2015

Contributor

it might be worth mentioning how the behavior is defined :)

This comment has been minimized.

Copy link
@ubsan

ubsan Dec 20, 2015

Author Contributor

"defined" meaning "will not exhibit undefined behavior". :P

@ubsan ubsan changed the title Safe memcpy, memset for slices (std::slice::{ copy, set }) Safe memcpy, memset for slices [T]::{ copy_from, fill } Dec 20, 2015

@ubsan ubsan changed the title Safe memcpy, memset for slices [T]::{ copy_from, fill } Safe memcpy, memset for slices ([T]::{ copy_from, fill }) Dec 20, 2015

@ubsan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2015

@Manishearth @tomjakubowski I polled some people in the rust community, and it seems that generally people don't like my free functions, and they've convinced me, so I've decided to switch to methods. This also means that I've switched to the name copy_from. Is there anything else that you see?

@Amanieu

This comment has been minimized.

Copy link
Contributor

commented Dec 20, 2015

Some alternatives that might be worth considering:

  1. Allow direct assignment of slices with the same panic semantics as the proposed copy function:
a[..] = b[..];

This is currently disallowed because [T] is an unsized type.

  1. Make fill an iterator method. This would allow it to be used on any collection type instead of just slices. This method would only be available when Iter::Item is &mut T.

  2. Allow any Clone type for copy and fill instead of only Copy types. This would make the functions more useful for generic code.

  3. Maybe consider the general concept of copying data from one iterator to another. This would allow values to be modified prior to copying by using map for example:

iter::copy(b.iter(), a.iter_mut());
iter::copy(b.iter().map(|x| x * 2), a.iter_mut());

The downside is that this might be getting too generic, and it wouldn't make sense to have the proposed panic semantics when the iterator lengths don't match.

@Manishearth

This comment has been minimized.

Copy link
Member

commented Dec 20, 2015

As far as iterators go there's not much value (relatively) there because you can write this as a function already.

The reason this RfC is important is that currently there's no way to copy/fill slices in safe Rust efficiently. You have to use a for loop and hope the optimizer magicks it. This RfC provides a way to do that. For iterators, a one-line for loop just works and can't be optimized further, so there's much less value there.

@Manishearth

This comment has been minimized.

Copy link
Member

commented Dec 20, 2015

@ubsan mostly looks good, 👍 for this rfc.

Small nit: Perhaps fill() should also have a descriptive name (fill_from in its current form)?

@ubsan

This comment has been minimized.

Copy link
Contributor Author

commented Dec 20, 2015

@Manishearth Hmm, I can see your argument... I do like @eddyb's suggestion of fill_with. Are there any other suggestions?

@alexcrichton

This comment has been minimized.

Copy link
Member

commented Jan 22, 2016

@ubsan, @bluss If we don't guarantee memset, then why do we want to hinder ourselves? Surely LLVM can do better than we at the source level can ever do because of monomorphization?

I guess what I don't understand is that if we don't guarantee memset, then why are we hamstringing ourselves to Copy when Clone is the more general bound here? If we're talking about speed then we're only talking about optimized code, and do you think that we can write a generic fill that is faster than what LLVM would already produce?

@bluss

This comment has been minimized.

Copy link

commented Jan 22, 2016

T: Copy is not a hinderance but a boon to the implementor, i.e. more freedom for them.

slice.fill(x) with T: Clone would be equal with for elt in slice { *elt = x.clone(); }, so there's not much to gain by putting that into a function.

With T: Copy we can provide something of better quality and better performance, even in debug builds.

Specialization will change the parameters we're looking at here, but we don't know if / when that arrives.

@bluss

This comment has been minimized.

Copy link

commented Jan 22, 2016

The T: Copy version can call memset if size of T is 1, so already that is an advantage for debug build performance. No guarantee for T of all sizes or so, but it's still a big improvement.

@alexcrichton

This comment has been minimized.

Copy link
Member

commented Jan 22, 2016

@bluss

so there's not much to gain by putting that into a function.

I somewhat disagree with this personally because I think the same argument could be made about clone_from_slice, e.g. it's just a

for (a, b) in dst.iter_mut().zip(src) { *a = b.clone(); }

These kinds of conveniences on core types (like slices) feel to me like they're often useful enough to provide.

Your point about guaranteeing a memset if the size of T is one seems pretty nice though! To me that's a pretty convincing argument for having some variant with T: Copy as it'll cover all the easy use cases.

That means in terms of naming, however, that we may want both a Copy and a Clone variant for filling a slice as we do with copying a slice (e.g. this is proposing a Copy alternative to the clone_from_slice method). Ideally both filling via clone and copy would both exist and have a similar naming scheme. I'd be fine with just adding the Copy fill for now so long as we ensure there's a gap for fill-via-Clone to exist.

@bluss

This comment has been minimized.

Copy link

commented Jan 22, 2016

The naive version of the "clone_from_slice" loop produces terrible code and the naive fill loop produces good code. It's good that we have clone_from_slice so that the right way to tickle llvm for an optimized copying is abstracted out into a method. I don't think the pragmatic argument is the only one for clone_from_slice though.

I think .fill(x) is a nice API, but it's not as important as .copy_from().

@ubsan

This comment has been minimized.

Copy link
Contributor Author

commented Jan 23, 2016

Agreed, it isn't as important. The only reason I also added fill is because memset and memcpy kinda go together. copy_from is the important part of this RFC.

@alexcrichton

This comment has been minimized.

Copy link
Member

commented Jan 26, 2016

In the case that fill was a bit of an addition, perhaps that API can be left to its own RFC? (or perhaps an RFC issue for now) It sounds like there's some outstanding questions around it and otherwise copy_from seems fine modulo naming.

@ubsan

This comment has been minimized.

Copy link
Contributor Author

commented Jan 27, 2016

@alexcrichton I would be fine with that. I don't personally see any pressing need for it, and no one else seems to see any pressing need for it.

@ubsan ubsan changed the title Safe memcpy, memset for slices ([T]::{ copy_from, fill }) Safe memcpy for slices ([T]::copy_from) Jan 27, 2016

@ubsan

This comment has been minimized.

Copy link
Contributor Author

commented Jan 27, 2016

Yeah, I'm taking out [T]::fill. It can be decided upon later. The important part of this RFC is [T]::copy_from.

@BurntSushi

This comment has been minimized.

Copy link
Member

commented Jan 27, 2016

👍 to this RFC. Guaranteed memcpy is good. I do also think it should be named copy_from_slice to be consistent with clone_from_slice.

@alexcrichton

This comment has been minimized.

Copy link
Member

commented Feb 11, 2016

🔔 This RFC is now entering its week-long final comment period 🔔


My personal opinion is that this is good to go with the name copy_from_slice, the semantics all seem good to me though!

@ubsan

This comment has been minimized.

Copy link
Contributor Author

commented Feb 11, 2016

@alexcrichton I don't personally like the name, because it's a very common function (at least in my personal code) to have such a long name. However, I do see the benefit of it in terms of consistency, so I am not opposed to it. I'd just prefer a shorter name.

@bluss

This comment has been minimized.

Copy link

commented Feb 12, 2016

Let's get this in. Thanks for pushing it @ubsan (and I like your new avatar).

Using this to replace most uses of clone_from_slice in rustc and libstd will improve performance for debug builds by using memcpy directly, so we benefit immediately.

@ubsan

This comment has been minimized.

Copy link
Contributor Author

commented Feb 12, 2016

@bluss thanks :) I needed a new avatar very badly...

Yeah, I will be interested in io::Write performance on Debug (I think it uses clone_from_slice currently).

@aturon

This comment has been minimized.

Copy link
Member

commented Feb 12, 2016

A couple thoughts:

  • To clarify some points that @bluss made: the compiler and lang teams were against using a compiler hack to do custom specialization for clone_from_slice. But the possibility of doing so through specialization proper is definitely open (and specialization is likely to land in the near future).
    • The main question in actually specializing clone_from_slice for Copy data is the divergence currently possible -- the fact that you can write custom Clone impls for Copy data. This was necessary for various reasons, and would be difficult to change backwards-compatibly. But we could viably consider this a contract violation, and still give clone_from_slice memcpy semantics on Copy data.
  • On the other hand, there is perhaps some value to having a separate method which is purely a safe memcpy, and in particular asserts the Copy-ness of the data -- even if we also specialize clone_from_slice. I personally could go either way on that question.
  • Finally, I agree with the general sentiment that the name of this method should match the idiom for clone_from_slice, which was recently stabilized.

@ubsan ubsan changed the title Safe memcpy for slices ([T]::copy_from) Safe memcpy for slices ([T]::copy_from_slice) Feb 13, 2016

@alexcrichton

This comment has been minimized.

Copy link
Member

commented Feb 18, 2016

The libs team discussed this RFC during triage yesterday, and the conclusion was to merge this RFC. We discussed the possibility of having clone_from_slice guarantee the same semantics as copy_from_slice here if T is Copy (disregarding possible diverging implementations of copy/clone), but our conclusion was that even if we had that behavior we would still want to have copy_from_slice. This method is a good static assertion that the operation is indeed a memcpy and we're also not sure how comfortable we would be relying on specialization for semantics like that.

Anyway, I shall merge soon. Thanks again for the RFC @ubsan!

@alexcrichton alexcrichton merged commit ea0ad1c into rust-lang:master Feb 18, 2016

bors added a commit to rust-lang/rust that referenced this pull request Feb 26, 2016

@lgvz

This comment has been minimized.

Copy link

commented Mar 4, 2016

FWIW at this point, I'd most definitely prefer the shorter name copy_from despite inconsistency with clone_from_slice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.