Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing optimization with signed pointer offset #56057

Open
TheIronBorn opened this issue Nov 19, 2018 · 5 comments
Open

Missing optimization with signed pointer offset #56057

TheIronBorn opened this issue Nov 19, 2018 · 5 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@TheIronBorn
Copy link

I am trying to elide the pointer offset of a slice indexing operation.

I tried this code:

pub fn index(table: &[u128; 4], idx: i32) -> u128 {
    table[(idx as usize & 0b11_0000) >> 4]
}

with RUST_BACKTRACE=full RUSTFLAGS='--emit=asm' cargo build --release.

I expected to see this happen:

example::index:
  and esi, 48
  mov rax, qword ptr [rsi + rdi]
  mov rdx, qword ptr [rsi + rdi + 8]
  ret

(selects two bits, already in the pointer offset position)

Instead, this happened:

example::index:
  shr esi, 4
  and esi, 3
  shl rsi, 4
  mov rax, qword ptr [rdi + rsi]
  mov rdx, qword ptr [rdi + rsi + 8]
  ret

A godbolt link for comparison with an unsafe version which does apply the optimization: https://godbolt.org/z/0QsA3z

Meta

rustc --version --verbose
rustc 1.32.0-nightly (6b9b97bd9 2018-11-15)
binary: rustc
commit-hash: 6b9b97bd9b704f85f0184f7a213cc4d62bd9654c
commit-date: 2018-11-15
host: x86_64-apple-darwin
release: 1.32.0-nightly
LLVM version: 8.0

Backtrace:
none

@TheIronBorn
Copy link
Author

Note that

pub fn index(table: &[u128; 4], idx: usize) -> u128 {
    table[(idx & 0b11_0000) >> 4]
}

does apply the optimization. The compiler seems to be worried about the sign-bit, despite the bit-and.

@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Dec 1, 2018
@nikic
Copy link
Contributor

nikic commented Dec 1, 2018

It seems that the problematic factor is here not the signedness, but the integer size. If the index uses isize it optimizes as expected. With i32 there is an extra zext that seems to inhibit this optimization.

@nikic
Copy link
Contributor

nikic commented Dec 1, 2018

Just looked into this... With usize the relevant part of SelectionDAG looks like

      t2: i64,ch = CopyFromReg t0, Register:i64 %0
            t4: i64,ch = CopyFromReg t0, Register:i64 %1
          t7: i64 = srl t4, Constant:i8<4>
        t9: i64 = and t7, Constant:i64<3>
      t10: i64 = shl t9, Constant:i64<4>
    t11: i64 = add t2, t10

and is DAGCombined into

      t2: i64,ch = CopyFromReg t0, Register:i64 %0
        t4: i64,ch = CopyFromReg t0, Register:i64 %1
      t26: i64 = and t4, Constant:i64<48>
    t11: i64 = add t2, t26

With isize instead we have

    t2: i64,ch = CopyFromReg t0, Register:i64 %0
            t4: i32,ch = CopyFromReg t0, Register:i32 %1
          t7: i32 = srl t4, Constant:i8<4>
        t9: i32 = and t7, Constant:i32<3>
      t10: i64 = zero_extend t9
    t12: i64 = shl t10, Constant:i64<4>
  t13: i64 = add t2, t12

The additional zero_extend between shl and and inhibits the optimization.

The relevant combine for this is https://github.com/llvm-mirror/llvm/blob/master/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L6177.

@nikic
Copy link
Contributor

nikic commented Dec 1, 2018

Reported as https://bugs.llvm.org/show_bug.cgi?id=39855.

@steveklabnik
Copy link
Member

triage: playground still reports

playground::index:
	shrl	$4, %esi
	andl	$3, %esi
	shlq	$4, %rsi
	movq	(%rdi,%rsi), %rax
	movq	8(%rdi,%rsi), %rdx
	retq

@Nilstrieb Nilstrieb added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants