Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attempt to improve codegen for arrays of repeated enums #104384

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions compiler/rustc_codegen_llvm/src/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -561,6 +561,24 @@ impl<'a, 'll, 'tcx> BuilderMethods<'a, 'tcx> for Builder<'a, 'll, 'tcx> {
count: u64,
dest: PlaceRef<'tcx, &'ll Value>,
) -> Self {
if let OperandValue::Pair(mut v1, mut v2) = cg_elem.val && count < 1024 {
v1 = self.from_immediate(v1);
v2 = self.from_immediate(v2);
let ty = self.cx().val_ty(v1);
// Create a vector of size 2*count and store it in one instruction
if ty == self.cx().val_ty(v2) {
let count = count * 2;
let vec = unsafe { llvm::LLVMGetUndef(self.type_vector(ty, count as u64)) };
let vec = (0..count as usize).fold(vec, |acc, x| {
let elt = [v1, v2][x % 2];
self.insert_element(acc, elt, self.cx.const_i32(x as i32))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea if this is better, but you could do it in O(1) LLVM instructions by making a <2 x _> (with two insert_elements) and then repeating that one with a shuffle like

shufflevector <2 x i8> %x, <2 x i8> undef, <64 x i32> <i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1, i32 0, i32 1>

https://llvm.godbolt.org/z/5feKaEo5z

});
let vec = OperandRef::from_immediate_or_packed_pair(&mut self, vec, dest.layout);
vec.val.store(&mut self, dest);
return self;
}
}

let zero = self.const_usize(0);
let count = self.const_usize(count);
let start = dest.project_index(&mut self, zero).llval;
Expand Down
10 changes: 1 addition & 9 deletions compiler/rustc_codegen_ssa/src/mir/operand.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@ use rustc_middle::ty::layout::{LayoutOf, TyAndLayout};
use rustc_middle::ty::Ty;
use rustc_target::abi::{Abi, Align, Size};

use std::fmt;

/// The representation of a Rust value. The enum variant is in fact
/// uniquely determined by the value's type, but is kept as a
/// safety check.
Expand All @@ -38,7 +36,7 @@ pub enum OperandValue<V> {
/// to avoid nasty edge cases. In particular, using `Builder::store`
/// directly is sure to cause problems -- use `OperandRef::store`
/// instead.
#[derive(Copy, Clone)]
#[derive(Copy, Clone, Debug)]
pub struct OperandRef<'tcx, V> {
// The value.
pub val: OperandValue<V>,
Expand All @@ -47,12 +45,6 @@ pub struct OperandRef<'tcx, V> {
pub layout: TyAndLayout<'tcx>,
}

impl<V: CodegenObject> fmt::Debug for OperandRef<'_, V> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "OperandRef({:?} @ {:?})", self.val, self.layout)
}
}

impl<'a, 'tcx, V: CodegenObject> OperandRef<'tcx, V> {
pub fn new_zst<Bx: BuilderMethods<'a, 'tcx, Value = V>>(
bx: &mut Bx,
Expand Down
19 changes: 19 additions & 0 deletions src/test/codegen/enum-repeat.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
// compile-flags: -O

#![crate_type = "lib"]

// CHECK-LABEL: @none_repeat
#[no_mangle]
pub fn none_repeat() -> [Option<u8>; 64] {
// CHECK: store <128 x i8>
// CHECK-NEXT: ret void
[None; 64]
}

// CHECK-LABEL: @some_repeat
#[no_mangle]
pub fn some_repeat() -> [Option<u8>; 64] {
// CHECK: store <128 x i8>
// CHECK-NEXT: ret void
[Some(0); 64]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting to me that some_repeat seems to do 16 bytes at a time on x64, but none_repeat doesn't. Might be interesting to look at why LLVM is treating them differently.

Also, out of curiosity, what's the assembly difference before/after this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

godbolt - This is none_repeat before and after LLVM opts - it looks like LLVM is optimising out the undefined write and forgetting about it. This happened even if I explicitly added store undef.

The current assembly output is

  • None; 64 - 64 single byte movs to the tag
  • Some(0); 64 - A 16 byte constant vector (alternating 0/1) is stored into the array with movups
  • Some(1); 64 - The same as Some(0), but with an all-1's pattern

With this patch

  • None; 64 - 16 byte vector created with xorps, then movups
  • Some(0); 64/Some(1); 64 - Same as before

}