Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do copy[_nonoverlapping]/swap[_nonoverlapping] do typed copies? #63159

Closed
gnzlbg opened this issue Jul 31, 2019 · 40 comments · Fixed by #97712
Closed

Do copy[_nonoverlapping]/swap[_nonoverlapping] do typed copies? #63159

gnzlbg opened this issue Jul 31, 2019 · 40 comments · Fixed by #97712
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 31, 2019

The following example (godbolt):

#[repr(align(128))] #[derive(Copy, Clone)] pub struct A(u8);

pub unsafe fn foo(x: &A) -> A { 
    *x
}

pub unsafe fn bar(x: &A) -> A { 
    let mut y: A = std::mem::uninitialized();
    std::ptr::copy_nonoverlapping(
        x as *const A, &mut y as *mut A, 1
    );
    y
}

produces the following LLVM-IR:

%A = type { [0 x i8], i8, [127 x i8] }

define void @_ZN7example3foo17h89c067fa0b9a17bcE(%A* noalias nocapture sret dereferenceable(128), %A* noalias nocapture readonly align 128 dereferenceable(128) %x) unnamed_addr #0 {
  %1 = getelementptr inbounds %A, %A* %0, i64 0, i32 0, i64 0
  %2 = getelementptr inbounds %A, %A* %x, i64 0, i32 0, i64 0
  tail call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 128 %1, i8* nonnull align 128 %2, i64 128, i1 false)
  ret void
}

define void @_ZN7example3bar17hd5d27715385ba486E(%A* noalias nocapture sret dereferenceable(128), %A* noalias nocapture readonly align 128 dereferenceable(128) %x) unnamed_addr #0 {
  %1 = getelementptr inbounds %A, %A* %x, i64 0, i32 0, i64 0
  %2 = getelementptr inbounds %A, %A* %0, i64 0, i32 0, i64 0
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* nonnull align 128 %2, i8* nonnull align 128 %1, i64 128, i1 false) #2
  ret void
}

and machine code:

example::foo:
        mov     rax, rdi
        movaps  xmm0, xmmword ptr [rsi + 112]
        movaps  xmmword ptr [rdi + 112], xmm0
        movaps  xmm0, xmmword ptr [rsi + 96]
        movaps  xmmword ptr [rdi + 96], xmm0
        movaps  xmm0, xmmword ptr [rsi + 80]
        movaps  xmmword ptr [rdi + 80], xmm0
        movaps  xmm0, xmmword ptr [rsi + 64]
        movaps  xmmword ptr [rdi + 64], xmm0
        movaps  xmm0, xmmword ptr [rsi]
        movaps  xmm1, xmmword ptr [rsi + 16]
        movaps  xmm2, xmmword ptr [rsi + 32]
        movaps  xmm3, xmmword ptr [rsi + 48]
        movaps  xmmword ptr [rdi + 48], xmm3
        movaps  xmmword ptr [rdi + 32], xmm2
        movaps  xmmword ptr [rdi + 16], xmm1
        movaps  xmmword ptr [rdi], xmm0
        ret

example::bar:
        mov     rax, rdi
        movaps  xmm0, xmmword ptr [rsi + 112]
        movaps  xmmword ptr [rdi + 112], xmm0
        movaps  xmm0, xmmword ptr [rsi + 96]
        movaps  xmmword ptr [rdi + 96], xmm0
        movaps  xmm0, xmmword ptr [rsi + 80]
        movaps  xmmword ptr [rdi + 80], xmm0
        movaps  xmm0, xmmword ptr [rsi + 64]
        movaps  xmmword ptr [rdi + 64], xmm0
        movaps  xmm0, xmmword ptr [rsi]
        movaps  xmm1, xmmword ptr [rsi + 16]
        movaps  xmm2, xmmword ptr [rsi + 32]
        movaps  xmm3, xmmword ptr [rsi + 48]
        movaps  xmmword ptr [rdi + 48], xmm3
        movaps  xmmword ptr [rdi + 32], xmm2
        movaps  xmmword ptr [rdi + 16], xmm1
        movaps  xmmword ptr [rdi], xmm0
        ret

where 128 bytes are copied every time a value of type A is moved/copied/read/...

However, one actually only has to copy a single byte, since all other bytes are trailing padding. The expected machine code is (godbolt):

example::foo:
        mov     al, byte ptr [rdi]
        ret

example::bar:
        mov     al, byte ptr [rdi]
        ret

cc @nikic @rkruppe

@gnzlbg gnzlbg changed the title Missed optimization: unnecessary copy of padding bytes Missed optimization: unnecessary copy of trailing padding bytes Jul 31, 2019
@jonas-schievink jonas-schievink added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 31, 2019
@hanna-kruppe
Copy link
Contributor

The least gross and special-cased way I can think of achieving this would be extending llvm.memcpy (and related intrinsics) with some extra info about bytes which don't need to be copied, e.g. a bitset in metadata. It would discarded when lowering to a memcpy call but could hypothetically be used when expanding to a sequence of loads and stores inline. That's still a lot of implementation work for what appears to be a somewhat artificial case, though.

A slightly more hacky, and more limited, fix for this specific case would be to trim the size copied by the amount of trailing padding (in this case 128 -> 1) if we can see statically that it's a typed copy of a single element (intrinsics::copy* call with constant 1 as length, MIR Copy/Move). However, this might regress performance of e.g. (u8, u8, u8) because it would at first glance disallow copying it with a 32 bit load/store (similar issue exists with the proposed stride/size distinction).

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 31, 2019

The least gross and special-cased way I can think of achieving this would be extending llvm.memcpy (and related intrinsics) with some extra info about bytes which don't need to be copied, e.g. a bitset in metadata.

Instead of extending memcpy, could the Rust front end generate single loads for each of the fields, and would LLVM be able to merge them as appropriate?

That is, for the OP, I'd expect Rust to emit an 8-bit load. For something like (u8, u16) I'd expect Rust to emit two loads (8-bit and 16-bit), and for (u8, u8, u8) I'd expect Rust to emit three 8-bit loads. For (u8, u16) and for (u8, u8, u8) I do however expect LLVM to merge the loads into a single 32-bit load (EDIT: if possible).

@hanna-kruppe
Copy link
Contributor

I don't think that can work. Leaving aside the huge inefficiencies of emitting so many instructions and praying that LLVM merges them, if we just emit loads and stores for the non-padding bytes and don't say anything about the padding, then LLVM has no indication that it's even allowed to clobber those parts.

@nikic
Copy link
Contributor

nikic commented Jul 31, 2019

LLVM already supports specifying padding through !tbaa.struct metadata: https://llvm.org/docs/LangRef.html#tbaa-struct-metadata

We currently don't use it because we generally aren't interested in TBAA. I also haven't checked whether the padding specified there is actually taking into account.

@RalfJung
Copy link
Member

RalfJung commented Jul 31, 2019

I am not at all convinced that this is a legal optimization, and I think making it legal makes the operational semantics much more annoying than it should be.

copy_nonoverlapping says

Copies count * size_of::() bytes from src to dst

To me this looks like the description of an untyped, memcpy-like operation that just copies a bunch of bytes. I feel fairly strongly that behavior of copy_nonoverlapping should not depend on the type of the pointers.

If you had used y = *x, that would be a totally different situation.

@RalfJung
Copy link
Member

RalfJung commented Jul 31, 2019

In particular, the docs also say that copy_nonoverlapping is equivalent to memcpy, which in turn is documented as:

Copies count characters from the object pointed to by src to the object pointed to by dest. Both objects are interpreted as arrays of unsigned char.

Given this spec, I see no way to justify not copying padding bytes. As in, I think we have to copy all bytes to comply with the spec, and the proposed optimization is illegal.

@nikic
Copy link
Contributor

nikic commented Jul 31, 2019

@RalfJung Not for copy_nonoverlapping possibly, but we should be able to elide padding copies for implicit memcpy's, as in the foo example (copy of return value into sret).

@RalfJung
Copy link
Member

Ah, yes, I agree foo can be optimized as described in the OP. This can be explained, for example, by my proposed definition for a "typed copy".

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 31, 2019

@RalfJung

In particular, the docs also say that copy_nonoverlapping is equivalent to memcpy, which in turn is documented as:

https://rust.godbolt.org/z/Ka2VLG (A C call to memcpy does not need to call the memcpy function).

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 31, 2019

If you had used y = *x, that would be a totally different situation.

I did use y = *x in my original comment, e.g., see fn foo.

@RalfJung
Copy link
Member

RalfJung commented Jul 31, 2019

A C call to memcpy does not need to call the memcpy function

I never claimed it had to. Of course the compiler is allowed to inline memcpy. But I claim it has to copy all the bytes, including padding bytes. I don't understand how this comment responds to anything I said.

I did use y = *x in my original comment, e.g., see fn foo.

Agreed, I had missed that. But if I read your OP correctly, you are saying the optimization should happen for copy_nonoverlapping (that's bar). I vehemently disagree.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 31, 2019

But if I read your OP correctly, you are saying the optimization should happen for copy_nonoverlapping (that's bar). I vehemently disagree.

Yep, for copy_nonoverlapping as specified, that cannot be done. For foo is ok.

TIL that copy_nonoverlapping<T> works like memcpy. I expected it to work like std::copy in C++, which is also generic over T and does not copy padding when profitable.

I think I've never needed copy_nonoverlapping<T> with those semantics for a T that is not u8, and when I needed it with memcpy semantics, i've casted the pointers to u8 manually. I am very surprised that there isn't a typed copy intrinsic that does the sane thing and avoids copying padding when profitable.

@hanna-kruppe
Copy link
Contributor

I am very surprised that there isn't a typed copy intrinsic that does the sane thing and avoids copying padding when profitable.

Likewise, I think we should add such a function (or feel out whether we can get away with redefining the existing functions that way).

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 31, 2019

or feel out whether we can get away with redefining the existing functions that way).

I think we would need to survey how users are using it in the wild. The API does say that all bytes are copied and it does not require the memory to be "valid" at T (e.g. the API docs do not suggest that Ts are read or written via the pointers), so users can use any T that fits for the operation as long as size_of::<T> and the size argument matches whatever memory they want to copy.

I don't recall any use in libcore/liballoc/libstd where changing the semantics would break things. It can't imagine why would it make sense for some code to pick an arbitrary T with different padding, instead of the T that you actually want the code to copy.

@RalfJung
Copy link
Member

I think we should add such a function (or feel out whether we can get away with redefining the existing functions that way).

Just to be sure we are on the same side here, the semantics of that would basically to do n repeated "typed copies"? So, this does not need any new mechanism in the Abstract Machine and can be explained with more primitive operations that we already have?

If it doesn't change the Abstract Machine, I am fine with whatever. ;)

@RalfJung
Copy link
Member

It can't imagine why would it make sense for some code to pick an arbitrary T with different padding, instead of the T that you actually want the code to copy.

A typed copy would also imply that this operation is UB if the copied value(s) does not satisfy the validity invariant.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 1, 2019

Just to be sure we are on the same side here, the semantics of that would basically to do n repeated "typed copies"?

Yes.

So, this does not need any new mechanism in the Abstract Machine and can be explained with more primitive operations that we already have?

Yes, I think we can just give this define the operational semantics of such a function as equivalent to:

fn copy<T>(src: *const T, dest: *mut T, len: usize) {
    for i in 0..len { dest.add(i).write(src.add(i).read()) } // EDIT: see rkruppe below
}

However, that sounds like you want to make copy_nonoverlapping a primitive operation in the Abstract Machine. Is that so? I think the operational semantics of copy_nonoverlapping can be just defined as:

fn copy_nonoverlapping<T>(src: *const T, dest: *mut T, len: usize) {
    let src = src as *const MaybeUninit<u8>;
    let dest = dest as *mut MaybeUninit<u8>;
    for i in 0..(len * size_of::<T>()) { *dest.add(i) = *src.add(i); }
}

and that should work independently of whether 0xUU is a valid bit-pattern for u8. (note: the function might be special in other ways, e.g., the Rust run-time currently requires a memcpy symbol to actually exist, so this is a lang item, and the Rust frontend generates llvm.memcpy via the lang item, but this is just a valid optimization that follows from its semantics).

A typed copy would also imply that this operation is UB if the copied value(s) does not satisfy the validity invariant.

Arguably, if the Ts are invalid having *mut T lying around is a foot-gun. Today, I would find it clearer if the code would do the copy via a *mut MaybeUninit<T> instead. For older code that just wanted to perform a byte-wise copy, I just always used *mut u8 since I found that clearer as well.

@hanna-kruppe
Copy link
Contributor

*dest.add(i) =*src.add(i); will run drop glue, you need ptr::write (in the first function). But with that change applied, I agree.

@RalfJung
Copy link
Member

RalfJung commented Aug 1, 2019

Turns out ptr::write is actually an intrinsic (move_val_init), so that would make sense. (I thought it would be implemented using copy_nonoverlapping. ;)

Or (for the spec) we just say we use the (Pseudo-)MIR-level =, which does not drop implicitly.

that sounds like you want to make copy_nonoverlapping a primitive operation in the Abstract Machine. Is that so?

I don't want to do anything, I was just trying to figure out what it is that you want to do. ;)

Personally I'd feel best about just keeping our current semantics.

I think the operational semantics of copy_nonoverlapping can be just defined as:

Do you mean the current operational semantics? Because that sure looks like it copies the padding bytes.

Arguably, if the Ts are invalid having *mut T lying around is a foot-gun. Today, I would find it clearer if the code would do the copy via a *mut MaybeUninit instead. For older code that just wanted to perform a byte-wise copy, I just always used *mut u8 since I found that clearer as well.

Now I am confused when you are talking about the old copy_nonoverlapping and when you are talking about the new copy_nonoverlapping (maybe copy_nonoverlapping_typed?).

Using MaybeUninit<T> seems reasonable except it opens the door to all our discussions around padding in unions. :/

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 1, 2019

Do you mean the current operational semantics? Because that sure looks like it copies the padding bytes.

Yes and yes (the current semantics do copy the padding bytes).

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 1, 2019

nd when you are talking about the new copy_nonoverlapping (maybe copy_nonoverlapping_typed?)

I called the new copy_nonoverlapping just copy<T>.

@RalfJung
Copy link
Member

RalfJung commented Aug 1, 2019

There already is a copy though...

@scottmcm
Copy link
Member

scottmcm commented Aug 6, 2019

Turns out ptr::write is actually an intrinsic (move_val_init), so that would make sense. (I thought it would be implemented using copy_nonoverlapping. ;)

Does this imply that ptr::read should be an intrinsic too? (Right now it is a copy_nonoverlapping, and thus generates that, not a load.)

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2019

Maybe. How we spec those (copying bytes vs. typed copy) and how they are implemented is orthogonal in principle, though if we actually want to exploit that they are typed we likely need intrinsics.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 6, 2019

Does this imply that ptr::read should be an intrinsic too?

I think that, at least to fix this bug, the answer is yes.

The current implementation does an untyped copy instead of a typed one. That's correct, but copies too much. One can implement an untyped copy on top of a typed one, but the opposite is not true.

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2019

I think that, at least to fix this bug, the answer is yes.

Well, we first need a lang team decision that this is indeed a bug -- i.e., that these operations should act like typed copies.

@Centril
Copy link
Contributor

Centril commented Aug 6, 2019

Well, we first need a lang team decision that this is indeed a bug -- i.e., that these operations should act like typed copies.

To that end, could someone make a summary for why it should be considered a bug or better yet why a typed version needs to exist? (and also why not?) A pros & cons would be helpful. :)

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Aug 6, 2019

I think we should separate the three questions at hand here. First of all, this function from the OP:

pub unsafe fn foo(x: &A) -> A { 
    *x
}

unarguably performs typed copy. That it currently results in machine code that copies padding is an implementation detail of the current compiler not to be relied upon any more than e.g. in a x = y; statement. This should not be controversial let alone need a new T-lang decision. People working on the compiler just need to decide whether the demonstrated inefficency is worth prioritizing (as discussed above, fixing it is non-trivial and/or may regress other programs), but it's pretty clearly compatible with currently agreed-upon semantics of Rust to not copy the padding in this program.

The second question (raised by the second code example in the OP) is whether ptr::copy and friends should perform typed copies, or gain new variants that perform typed copies. This is a trickier question which I currently don't care about enough to write the sort of summary @Centril asked for.

A third question, which came up in the last couple comments, is whether ptr::read and ptr::write should commit to being typed copies or raw-byte-copies (I think it's currently unclear both ways, unlike ptr::copy* clearly saying it copies bytes in its docs). I strongly think they should be typed copies and will try to write a summary of that situation soon, possibly in a different issue to avoid diluting this one further.

@RalfJung
Copy link
Member

RalfJung commented Aug 6, 2019

this function from the OP
unarguably performs typed copy.

Agreed. So there should likely be some issue just tracking the lost optimization potential here. Maybe that should be this one, after removing the other example from the OP.

A third question, which came up in the last couple comments, is whether ptr::read and ptr::write should commit to being typed copies or raw-byte-copies (I think it's currently unclear both ways, unlike ptr::copy* clearly saying it copies bytes in its docs).

Actually, thinking about this again -- both of them pass the input/output by value. So they already do a typed copy for that. Doing two typed copies in a row is indistinguishable from doing a typed copy and a byte-wise copy, so actually we can do what we want (between these two options) for ptr::read/ptr::write and (conforming) code cannot tell the difference.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 6, 2019

The second question (raised by the second code example in the OP) is whether ptr::copy and friends should perform typed copies, or gain new variants that perform typed copies. This is a trickier question which I currently don't care about enough to write the sort of summary @Centril asked for.

I think we all agree that changing the semantics of copy_nonoverlapping to do typed copies would be a breaking change, but beyond that, I think there is not much to summarize.

Whether we can make this breaking change or whether that's a change worth making are unresolved questions.

It is my personal opinion that typed copy semantics are a better default and they might enable some optimizations, but I wouldn't risk breaking the world for them when we can just add two new APIs without issues. While we could test how much code this change would break using miri, I don't think we can do the same test using crater, e.g., because it is ok to implement typed copies as untyped ones, so nothing might actually break during the crater run, but everything might break later when we update the LLVM version. The same applies to assessing the performance impact of this change, e.g., it might be none today. This makes it a high-risk change with unknown value.

I also think that resolving this is fairly low priority, but I'd be ok with someone implementing these behind a feature gate, and with with people experimenting with using @nikic's approach - we'll probably learn something useful from doing that.

@hanna-kruppe
Copy link
Contributor

I think we all agree that changing the semantics of copy_nonoverlapping to do typed copies would be a breaking change, but beyond that, I think there is not much to summarize.

There is definitely some analysis to be done here (lay out the implications for unsafe code using them & for codegen), if someone wants to push it further. Though FWIW since it's a library API change/expansion it seems more like a libs team thing than a lang team thing (though of course it falls in the subject matter of UCG, too).

I also think that resolving this is fairly low priority, but I'd be ok with someone implementing these behind a feature gate, and with with people experimenting with using @nikic's approach - we'll probably learn something useful from doing that.

If by "@nikic's approach" you mean TBAA metadata on memcpys, one unpleasant thing you'll learn is that it will miscompile Rust programs -- I am reasonably certain there's no way to express padding with the TBAA metadata without making type punning UB.

@Centril
Copy link
Contributor

Centril commented Aug 8, 2019

unarguably performs typed copy.

Yep; agreed.

Though FWIW since it's a library API change/expansion it seems more like a libs team thing than a lang team thing (though of course it falls in the subject matter of UCG, too).

(The behavior of intrinsics is mostly a lang team thing since the whole point of them is about their "intrinsicness".)

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Aug 8, 2019

There is definitely some analysis to be done here (lay out the implications for unsafe code using them & for codegen),

What implications for unsafe code do you have in mind? One can already write a copy_nonoverlapping_typed in plain Rust by using x = *v in a loop to perform typed copies, and I would be surprised if a lot of unsafe code doesn't do this already. The authors might have not rationalized why they chose typed vs untyped copies (they probably didn't knew copy_nonoverlapping existed). From a documentation point-of-view, we should just document that users should prefer typed copies over untyped ones, with the simple example, that they don't necessarily copy padding, and have a larger optimization potential. They should only use untyped copies when they need to copy all raw bytes, including padding, and we should warn them there again that padding bytes can change the moment a typed copy happens, e.g., when returning a value from a function, performing a typed assignment, etc.

& for codegen),

I think that experimenting with optimizations for typed copies is worth doing, and that we should try to find ways to make them faster, but this is an orthogonal issue to whether we add the semantic APIs for allowing users to easily perform typed copies. We could implement typed copies as untyped ones forever and that would be fine (miri would, however, implement typed copies appropriately and detect misuses here).

If by "@nikic's approach" you mean TBAA metadata on memcpys, one unpleasant thing you'll learn is that it will miscompile Rust programs -- I am reasonably certain there's no way to express padding with the TBAA metadata without making type punning UB.

Ouch.

@nikic
Copy link
Contributor

nikic commented Aug 8, 2019

If by "@nikic's approach" you mean TBAA metadata on memcpys, one unpleasant thing you'll learn is that it will miscompile Rust programs -- I am reasonably certain there's no way to express padding with the TBAA metadata without making type punning UB.

I don't think that's the case. TBAA metadata is generic and you don't need to model C++ semantics in particular with it. Making everything alias everything is also possible. (Of course, it's still a much bigger gun than what we actually need here.)

@hanna-kruppe
Copy link
Contributor

@gnzlbg The implications for unsafe code are exactly the implication of choosing typed copies vs copying bytes (which may be invalid for the type in question!) including padding, and the wider impact of recommending a different default. You're also missing my point re: codegen but I won't be tricked into writing the summary I already decided I don't want to take the time to write ;)

@nikic Oops, I think you're right, I forgot that there can be multiple distinct "roots". (Though I am worried about the explosion in metadata this may cause, and how the AA infrastructure will weigh the MayAlias from TBAA with a Must/NoAlias from other AA implementations.)

@RalfJung
Copy link
Member

RalfJung commented Aug 8, 2019

miri would, however, implement typed copies appropriately and detect misuses here

True. Note however that right now, while Miri does check the validity invariant on typed copies, it does not "kill" padding. That will be non-trivial to implement, I think. That's clearly a deficiency in Miri, just one I thought you should be aware of. :)

@RalfJung RalfJung changed the title Missed optimization: unnecessary copy of trailing padding bytes Do copy[_nonoverlapping] do typed copies? (was: Missed optimization: unnecessary copy of trailing padding bytes) Apr 30, 2022
@RalfJung RalfJung changed the title Do copy[_nonoverlapping] do typed copies? (was: Missed optimization: unnecessary copy of trailing padding bytes) Do copy[_nonoverlapping]/swap[_nonoverlapping] do typed copies? (was: Missed optimization: unnecessary copy of trailing padding bytes) May 13, 2022
@RalfJung
Copy link
Member

FWIW the same question arises for swap[_nonoverlapping]. I would think the answer is the same as for the copy functions: we haven't properly documented this, and by now probably too many programmers rely on these being untyped operations, so that's what we should go for.

Surprisingly, however, that is currently not the case -- ptr.swap_nonoverlapping is actually currently implemented as a typed copy! This changed with #94212, Cc @scottmcm. Is that really what we want?

@RalfJung RalfJung changed the title Do copy[_nonoverlapping]/swap[_nonoverlapping] do typed copies? (was: Missed optimization: unnecessary copy of trailing padding bytes) Do copy[_nonoverlapping]/swap[_nonoverlapping] do typed copies? May 13, 2022
@RalfJung
Copy link
Member

RalfJung commented May 13, 2022

Here's a little test I made for checking that all these operations are untyped. Currently, it fails in Miri.

@scottmcm
Copy link
Member

scottmcm commented May 13, 2022

I have no strong feeling about typed vs untyped copies in swap_nonoverlapping. So long as both sides are doing the same kind of copy, that's fine. (It was confusing when L->R was typed but R->L was untyped, so let's not do that again.)

@RalfJung
Copy link
Member

RalfJung commented Jun 3, 2022

In #97712 I am proposing to document that these functions are all doing untyped copies.

Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Jul 5, 2022
ptr::copy and ptr::swap are doing untyped copies

The consensus in rust-lang#63159 seemed to be that these operations should be "untyped", i.e., they should treat the data as raw bytes, should work when these bytes violate the validity invariant of `T`, and should exactly preserve the initialization state of the bytes that are being copied. This is already somewhat implied by the description of "copying/swapping size*N bytes" (rather than "N instances of `T`").

The implementations mostly already work that way (well, for LLVM's intrinsics the documentation is not precise enough to say what exactly happens to poison, but if this ever gets clarified to something that would *not* perfectly preserve poison, then I strongly assume there will be some way to make a copy that *does* perfectly preserve poison). However, I had to adjust `swap_nonoverlapping`; after `@scottmcm's` [recent changes](rust-lang#94212), that one (sometimes) made a typed copy. (Note that `mem::swap`, which works on mutable references, is unchanged. It is documented as "swapping the values at two mutable locations", which to me strongly indicates that it is indeed typed. It is also safe and can rely on `&mut T` pointing to a valid `T` as part of its safety invariant.)

On top of adding a test (that will be run by Miri), this PR then also adjusts the documentation to indeed stably promise the untyped semantics. I assume this means the PR has to go through t-libs (and maybe t-lang?) FCP.

Fixes rust-lang#63159
@bors bors closed this as completed in 8fa1ed8 Jul 5, 2022
workingjubilee pushed a commit to tcdi/postgrestd that referenced this issue Sep 15, 2022
ptr::copy and ptr::swap are doing untyped copies

The consensus in rust-lang/rust#63159 seemed to be that these operations should be "untyped", i.e., they should treat the data as raw bytes, should work when these bytes violate the validity invariant of `T`, and should exactly preserve the initialization state of the bytes that are being copied. This is already somewhat implied by the description of "copying/swapping size*N bytes" (rather than "N instances of `T`").

The implementations mostly already work that way (well, for LLVM's intrinsics the documentation is not precise enough to say what exactly happens to poison, but if this ever gets clarified to something that would *not* perfectly preserve poison, then I strongly assume there will be some way to make a copy that *does* perfectly preserve poison). However, I had to adjust `swap_nonoverlapping`; after ``@scottmcm's`` [recent changes](rust-lang/rust#94212), that one (sometimes) made a typed copy. (Note that `mem::swap`, which works on mutable references, is unchanged. It is documented as "swapping the values at two mutable locations", which to me strongly indicates that it is indeed typed. It is also safe and can rely on `&mut T` pointing to a valid `T` as part of its safety invariant.)

On top of adding a test (that will be run by Miri), this PR then also adjusts the documentation to indeed stably promise the untyped semantics. I assume this means the PR has to go through t-libs (and maybe t-lang?) FCP.

Fixes rust-lang/rust#63159
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants