Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upMissed optimization: unnecessary copy of trailing padding bytes #63159
Comments
This comment has been minimized.
This comment has been minimized.
The least gross and special-cased way I can think of achieving this would be extending A slightly more hacky, and more limited, fix for this specific case would be to trim the size copied by the amount of trailing padding (in this case 128 -> 1) if we can see statically that it's a typed copy of a single element ( |
This comment has been minimized.
This comment has been minimized.
Instead of extending memcpy, could the Rust front end generate single loads for each of the fields, and would LLVM be able to merge them as appropriate? That is, for the OP, I'd expect Rust to emit an 8-bit load. For something like |
This comment has been minimized.
This comment has been minimized.
I don't think that can work. Leaving aside the huge inefficiencies of emitting so many instructions and praying that LLVM merges them, if we just emit loads and stores for the non-padding bytes and don't say anything about the padding, then LLVM has no indication that it's even allowed to clobber those parts. |
This comment has been minimized.
This comment has been minimized.
LLVM already supports specifying padding through We currently don't use it because we generally aren't interested in TBAA. I also haven't checked whether the padding specified there is actually taking into account. |
This comment has been minimized.
This comment has been minimized.
I am not at all convinced that this is a legal optimization, and I think making it legal makes the operational semantics much more annoying than it should be.
To me this looks like the description of an untyped, If you had used |
This comment has been minimized.
This comment has been minimized.
In particular, the docs also say that
Given this spec, I see no way to justify not copying padding bytes. As in, I think we have to copy all bytes to comply with the spec, and the proposed optimization is illegal. |
This comment has been minimized.
This comment has been minimized.
@RalfJung Not for copy_nonoverlapping possibly, but we should be able to elide padding copies for implicit memcpy's, as in the |
This comment has been minimized.
This comment has been minimized.
Ah, yes, I agree |
This comment has been minimized.
This comment has been minimized.
https://rust.godbolt.org/z/Ka2VLG (A C call to |
This comment has been minimized.
This comment has been minimized.
I did use |
This comment has been minimized.
This comment has been minimized.
I never claimed it had to. Of course the compiler is allowed to inline
Agreed, I had missed that. But if I read your OP correctly, you are saying the optimization should happen for |
This comment has been minimized.
This comment has been minimized.
Yep, for TIL that I think I've never needed |
This comment has been minimized.
This comment has been minimized.
Likewise, I think we should add such a function (or feel out whether we can get away with redefining the existing functions that way). |
This comment has been minimized.
This comment has been minimized.
I think we would need to survey how users are using it in the wild. The API does say that all bytes are copied and it does not require the memory to be "valid" at I don't recall any use in libcore/liballoc/libstd where changing the semantics would break things. It can't imagine why would it make sense for some code to pick an arbitrary T with different padding, instead of the T that you actually want the code to copy. |
This comment has been minimized.
This comment has been minimized.
Just to be sure we are on the same side here, the semantics of that would basically to do If it doesn't change the Abstract Machine, I am fine with whatever. ;) |
This comment has been minimized.
This comment has been minimized.
A typed copy would also imply that this operation is UB if the copied value(s) does not satisfy the validity invariant. |
This comment has been minimized.
This comment has been minimized.
Yes.
Yes, I think we can just give this define the operational semantics of such a function as equivalent to: fn copy<T>(src: *const T, dest: *mut T, len: usize) {
for i in 0..len { dest.add(i).write(src.add(i).read()) } // EDIT: see rkruppe below
} However, that sounds like you want to make fn copy_nonoverlapping<T>(src: *const T, dest: *mut T, len: usize) {
let src = src as *const MaybeUninit<u8>;
let dest = dest as *mut MaybeUninit<u8>;
for i in 0..(len * size_of::<T>()) { *dest.add(i) = *src.add(i); }
} and that should work independently of whether
Arguably, if the |
This comment has been minimized.
This comment has been minimized.
|
This comment has been minimized.
This comment has been minimized.
Turns out Or (for the spec) we just say we use the (Pseudo-)MIR-level
I don't want to do anything, I was just trying to figure out what it is that you want to do. ;) Personally I'd feel best about just keeping our current semantics.
Do you mean the current operational semantics? Because that sure looks like it copies the padding bytes.
Now I am confused when you are talking about the old Using |
This comment has been minimized.
This comment has been minimized.
Yes and yes (the current semantics do copy the padding bytes). |
This comment has been minimized.
This comment has been minimized.
I called the new copy_nonoverlapping just |
This comment has been minimized.
This comment has been minimized.
There already is a |
This comment has been minimized.
This comment has been minimized.
Does this imply that |
This comment has been minimized.
This comment has been minimized.
Maybe. How we spec those (copying bytes vs. typed copy) and how they are implemented is orthogonal in principle, though if we actually want to exploit that they are typed we likely need intrinsics. |
This comment has been minimized.
This comment has been minimized.
I think that, at least to fix this bug, the answer is yes. The current implementation does an untyped copy instead of a typed one. That's correct, but copies too much. One can implement an untyped copy on top of a typed one, but the opposite is not true. |
This comment has been minimized.
This comment has been minimized.
Well, we first need a lang team decision that this is indeed a bug -- i.e., that these operations should act like typed copies. |
This comment has been minimized.
This comment has been minimized.
To that end, could someone make a summary for why it should be considered a bug or better yet why a typed version needs to exist? (and also why not?) A pros & cons would be helpful. :) |
This comment has been minimized.
This comment has been minimized.
I think we should separate the three questions at hand here. First of all, this function from the OP: pub unsafe fn foo(x: &A) -> A {
*x
} unarguably performs typed copy. That it currently results in machine code that copies padding is an implementation detail of the current compiler not to be relied upon any more than e.g. in a The second question (raised by the second code example in the OP) is whether A third question, which came up in the last couple comments, is whether |
This comment has been minimized.
This comment has been minimized.
Agreed. So there should likely be some issue just tracking the lost optimization potential here. Maybe that should be this one, after removing the other example from the OP.
Actually, thinking about this again -- both of them pass the input/output by value. So they already do a typed copy for that. Doing two typed copies in a row is indistinguishable from doing a typed copy and a byte-wise copy, so actually we can do what we want (between these two options) for |
This comment has been minimized.
This comment has been minimized.
I think we all agree that changing the semantics of Whether we can make this breaking change or whether that's a change worth making are unresolved questions. It is my personal opinion that typed copy semantics are a better default and they might enable some optimizations, but I wouldn't risk breaking the world for them when we can just add two new APIs without issues. While we could test how much code this change would break using I also think that resolving this is fairly low priority, but I'd be ok with someone implementing these behind a feature gate, and with with people experimenting with using @nikic's approach - we'll probably learn something useful from doing that. |
This comment has been minimized.
This comment has been minimized.
There is definitely some analysis to be done here (lay out the implications for unsafe code using them & for codegen), if someone wants to push it further. Though FWIW since it's a library API change/expansion it seems more like a libs team thing than a lang team thing (though of course it falls in the subject matter of UCG, too).
If by "@nikic's approach" you mean TBAA metadata on memcpys, one unpleasant thing you'll learn is that it will miscompile Rust programs -- I am reasonably certain there's no way to express padding with the TBAA metadata without making type punning UB. |
This comment has been minimized.
This comment has been minimized.
Yep; agreed.
(The behavior of intrinsics is mostly a lang team thing since the whole point of them is about their "intrinsicness".) |
This comment has been minimized.
This comment has been minimized.
What implications for unsafe code do you have in mind? One can already write a
I think that experimenting with optimizations for typed copies is worth doing, and that we should try to find ways to make them faster, but this is an orthogonal issue to whether we add the semantic APIs for allowing users to easily perform typed copies. We could implement typed copies as untyped ones forever and that would be fine (miri would, however, implement typed copies appropriately and detect misuses here).
Ouch. |
This comment has been minimized.
This comment has been minimized.
I don't think that's the case. TBAA metadata is generic and you don't need to model C++ semantics in particular with it. Making everything alias everything is also possible. (Of course, it's still a much bigger gun than what we actually need here.) |
This comment has been minimized.
This comment has been minimized.
@gnzlbg The implications for unsafe code are exactly the implication of choosing typed copies vs copying bytes (which may be invalid for the type in question!) including padding, and the wider impact of recommending a different default. You're also missing my point re: codegen but I won't be tricked into writing the summary I already decided I don't want to take the time to write ;) @nikic Oops, I think you're right, I forgot that there can be multiple distinct "roots". (Though I am worried about the explosion in metadata this may cause, and how the AA infrastructure will weigh the MayAlias from TBAA with a Must/NoAlias from other AA implementations.) |
This comment has been minimized.
This comment has been minimized.
True. Note however that right now, while Miri does check the validity invariant on typed copies, it does not "kill" padding. That will be non-trivial to implement, I think. That's clearly a deficiency in Miri, just one I thought you should be aware of. :) |
The following example (godbolt):
produces the following LLVM-IR:
and machine code:
where 128 bytes are copied every time a value of type A is moved/copied/read/...
However, one actually only has to copy a single byte, since all other bytes are trailing padding. The expected machine code is (godbolt):
cc @nikic @rkruppe