Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unions used for type-punning should be repr(C) #588

Closed
RalfJung opened this issue Jun 29, 2019 · 2 comments
Closed

Unions used for type-punning should be repr(C) #588

RalfJung opened this issue Jun 29, 2019 · 2 comments

Comments

@RalfJung
Copy link
Member

At

pub union u8x16 {
vector: __m128i,
bytes: [u8; 16],
}

a union is declared for the purpose of type-punning. However, without repr(C), the fields of that union might not be at offset 0, which makes it unsuited for type punning. To get such layout guarantees, you need to specify repr(C).

Also see https://rust-lang.github.io/unsafe-code-guidelines/layout/unions.html.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 1, 2019

So after thinking about this a bit, I believe this union shouldn't exist. I suppose this performs good for @BurntSushi, but that's because LLVM is able of undoing all the unnecessary work that this union is silently introducing.

The problem is the following. The call ABI of this union is Aggregateand not Vector. That is, it will be passed via memory, just like an array (which are also Aggregates), and not via a SIMD vector register.

That is, every time you use a vector operation, which from looking at the code, happens more often than array operations, we have to copy the union from the stack to a vector register, do the operation, and copy it back to the stack. This appears to be happening all the time.

Sadly, Rust can't do better for you here, e.g., it cannot make this union have a Vector ABI. The problem is, that whether you want an Aggregate ABI or a Vector ABI depends on how you will be using this union most of the time.

Last time I heard, performance of this code is great. But that's only because LLVM is able to undo all the consecutive stack->vector->stack->vector... spills, and keep everything in a vector. Otherwise I'd guess that this would perform slower than having no vectorization at all.

So my recommendation here would be to keep the union if most of the time you are operating on it like an array. The code appears to be operating on it almost always like a vector, though. So if that's the case, I think it would be better to commit to that, and use a type that has the Vector ABI, creating an array locally only when you need it, and being aware that this will spill the contents of the vector to the stack.

For that you can use a type alias or a newtype for the __m128i type, but if you use a newtype, you need to use repr(transparent) here, or otherwise that type might have an Aggregate ABI - I think that right now, Rust will optimize struct(__m128i) to have a Vector ABI (cc @eddyb), but if you want the guarantee that this optimization happens, repr(transparent) makes sure of that.

For copying the vector into an array and back, you probably want to just use mem::transmute instead of an union. It is safer because it checks the sizes, and it will also fix the issue with forgetting to make the union repr(C) like @RalfJung suggested above. You can obviously also use an union, but if you do, then it has to be repr(C).

gnzlbg added a commit to gnzlbg/regex that referenced this issue Jul 1, 2019
Before this commit, u8x16 and u8x32 were repr(Rust)
unions. This introduced unspecified behavior because
the field offsets of repr(Rust) unions are not
guaranteed to be at offset 0, so that field access
was potentially UB.

This commit fixes that, and closes rust-lang#588 .

The unions were also generating a lot of unnecessary
memory operations. This commit fixes that as well.

The issue is that unions have an Aggregate call ABI,
which is the same as the call ABI of arrays. That is,
they are passed around by memory, and not in Vector registers.

This is good, if most of the time one operates on them
as arrays. This was, however, not the case. Most of the
operations on these unions are using SIMD instructions.
This means that the union needs to be copied into a SIMD register,
operated on, and then spilled back to the stack, on every
single operation. That's unnecessary, although apparently
LLVM was able to optimize all the unnecessary memory operations
away and leave these always in registers.

This commit fixes this issue as well, by making the
u8x16 and u8x32 repr(transparent) newtypes over the architecture
specific vector types, giving them the Vector ABI.

The vectors are then copied to the stack only when necessary, and
as little as possible. This is done using mem::transmute, removing
the need for unions altogether (fixing rust-lang#588 by not having to
worry about union layout at all).

To make it clear when the vectors are spilled into the stack, the
vector::replace(index, value) API has been removed, and instead,
only a vector::bytes(self) and a vector::from_bytes(&mut self, [u8; N])
APIs are provided instead. This prevents spilling the vectors back
and forth onto the stack every time an index needs to be modified, by
using vector::bytes to spill the vector to the stack once, making
all the random-access modifications in memory, and then using
vector::from_bytes only once to move the memory back into a SIMD
register.
gnzlbg added a commit to gnzlbg/regex that referenced this issue Jul 1, 2019
Before this commit, u8x16 and u8x32 were repr(Rust) unions. This introduced
unspecified behavior because the field offsets of repr(Rust) unions are not
guaranteed to be at offset 0, so that field access was potentially UB.

This commit fixes that, and closes rust-lang#588 .

The unions were also generating a lot of unnecessary memory operations. This
commit fixes that as well.

The issue is that unions have an Aggregate call ABI, which is the same as the
call ABI of arrays. That is, they are passed around by memory, and not in Vector
registers.

This is good, if most of the time one operates on them as arrays. This was,
however, not the case. Most of the operations on these unions are using SIMD
instructions. This means that the union needs to be copied into a SIMD register,
operated on, and then spilled back to the stack, on every single operation.
That's unnecessary, although apparently LLVM was able to optimize all the
unnecessary memory operations away and leave these always in registers.

This commit fixes this issue as well, by making the u8x16 and u8x32
repr(transparent) newtypes over the architecture specific vector types, giving
them the Vector ABI.

The vectors are then copied to the stack only when necessary, and as little as
possible. This is done using mem::transmute, removing the need for unions
altogether (fixing rust-lang#588 by not having to worry about union layout at all).

To make it clear when the vectors are spilled into the stack, the
vector::replace(index, value) API has been removed, and instead, only a
vector::bytes(self) and a vector::from_bytes(&mut self, [u8; N]) APIs are
provided instead. This prevents spilling the vectors back and forth onto the
stack every time an index needs to be modified, by using vector::bytes to spill
the vector to the stack once, making all the random-access modifications in
memory, and then using vector::from_bytes only once to move the memory back into
a SIMD register.
gnzlbg added a commit to gnzlbg/regex that referenced this issue Jul 1, 2019
Before this commit, u8x16 and u8x32 were repr(Rust) unions. This introduced
unspecified behavior because the field offsets of repr(Rust) unions are not
guaranteed to be at offset 0, so that field access was potentially UB.

This commit fixes that, and closes rust-lang#588 .

The unions were also generating a lot of unnecessary memory operations. This
commit fixes that as well.

The issue is that unions have an Aggregate call ABI, which is the same as the
call ABI of arrays. That is, they are passed around by memory, and not in Vector
registers.

This is good, if most of the time one operates on them as arrays. This was,
however, not the case. Most of the operations on these unions are using SIMD
instructions. This means that the union needs to be copied into a SIMD register,
operated on, and then spilled back to the stack, on every single operation.
That's unnecessary, although apparently LLVM was able to optimize all the
unnecessary memory operations away and leave these always in registers.

This commit fixes this issue as well, by making the u8x16 and u8x32
repr(transparent) newtypes over the architecture specific vector types, giving
them the Vector ABI.

The vectors are then copied to the stack only when necessary, and as little as
possible. This is done using mem::transmute, removing the need for unions
altogether (fixing rust-lang#588 by not having to worry about union layout at all).

To make it clear when the vectors are spilled into the stack, the
vector::replace(index, value) and vector::extract(index) APIs have been removed,
and instead, only a vector::bytes(self) and a vector::from_bytes(&mut self, [u8;
N]) APIs are provided instead. This prevents spilling the vectors back and forth
onto the stack every time an index needs to be modified, by using vector::bytes
to spill the vector to the stack once, making all the random-access
modifications in memory, and then using vector::from_bytes only once to move the
memory back into a SIMD register.

fixup
gnzlbg added a commit to gnzlbg/regex that referenced this issue Jul 1, 2019
Before this commit, u8x16 and u8x32 were repr(Rust) unions. This introduced
unspecified behavior because the field offsets of repr(Rust) unions are not
guaranteed to be at offset 0, so that field access was potentially UB.

This commit fixes that, and closes rust-lang#588 .

The unions were also generating a lot of unnecessary memory operations. This
commit fixes that as well.

The issue is that unions have an Aggregate call ABI, which is the same as the
call ABI of arrays. That is, they are passed around by memory, and not in Vector
registers.

This is good, if most of the time one operates on them as arrays. This was,
however, not the case. Most of the operations on these unions are using SIMD
instructions. This means that the union needs to be copied into a SIMD register,
operated on, and then spilled back to the stack, on every single operation.
That's unnecessary, although apparently LLVM was able to optimize all the
unnecessary memory operations away and leave these always in registers.

This commit fixes this issue as well, by making the u8x16 and u8x32
repr(transparent) newtypes over the architecture specific vector types, giving
them the Vector ABI.

The vectors are then copied to the stack only when necessary, and as little as
possible. This is done using mem::transmute, removing the need for unions
altogether (fixing rust-lang#588 by not having to worry about union layout at all).

To make it clear when the vectors are spilled into the stack, the
vector::replace(index, value) and vector::extract(index) APIs have been removed,
and instead, only a vector::bytes(self) and a vector::from_bytes(&mut self, [u8;
N]) APIs are provided instead. This prevents spilling the vectors back and forth
onto the stack every time an index needs to be modified, by using vector::bytes
to spill the vector to the stack once, making all the random-access
modifications in memory, and then using vector::from_bytes only once to move the
memory back into a SIMD register.
@BurntSushi
Copy link
Member

@RalfJung Thanks so much for reporting this! It should be fixed in regex 1.1.8 on crates.io.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants