Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x.trailing_zeros() > n is not optimized as well as x & ((1 << n) - 1) == 0 on x86 #43024

Closed
oli-obk opened this issue Jul 3, 2017 · 6 comments
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@oli-obk
Copy link
Contributor

oli-obk commented Jul 3, 2017

the bitshift and bitand version optimizes to

	testb	$15, %dil
	sete	%al
	retq

while the trailing_zeros version optimizes to

	movl	$32, %eax
	testl	%edi, %edi
	je	.LBB6_2
	bsfl	%edi, %eax
.LBB6_2:
	cmpl	$4, %eax
	seta	%al
	retq

even though I find the trailing_zeros version much more straight forward

Should this be reported upstream in llvm or is this something a mir pass should do?

@Mark-Simulacrum
Copy link
Member

It looks like trailing_zeros currently calls an LLVM intrinsic (cttz) which has a patch posted against it -- https://reviews.llvm.org/D9284 -- perhaps someone familiar with LLVM (cc @arielb1) could push that through review and then change libcore to not have the conditional that it does today: https://github.com/rust-lang/rust/blob/master/src/libcore/num/mod.rs#L1375-L1390.

@Mark-Simulacrum Mark-Simulacrum added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Jul 3, 2017
@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 26, 2017
@nikic
Copy link
Contributor

nikic commented Dec 15, 2018

@Mark-Simulacrum A variant of that patch was implemented via llvm-mirror/llvm@1886c8e in 2016, so we should definitely have it in all LLVM versions we support and can drop that workaround.

However, I don't think this is really related to the issue seen here -- LLVM just doesn't recognize this particular pattern (probably because it would be rather odd in C). Godbolt for reference: https://godbolt.org/z/ovqsCg

I'm a bit stumped about the u8 cttz codegen though. If I disable all optimizations, I get:

define internal i32 @"_ZN4core3num20_$LT$impl$u20$u8$GT$14trailing_zeros17hf3376a11a840d3c1E"(i8 %self) unnamed_addr #0 !dbg !5 {
  %tmp_ret = alloca i8, align 1
  %0 = call i8 @llvm.cttz.i8(i8 %self, i1 false), !dbg !11
  store i8 %0, i8* %tmp_ret, align 1, !dbg !11
  %1 = load i8, i8* %tmp_ret, align 1, !dbg !11
  br label %bb1, !dbg !11

bb1:                                              ; preds = %start
  %2 = zext i8 %1 to i32, !dbg !13
  ret i32 %2, !dbg !14
}

which directly calls @llvm.cttz.i8, without the | 0x100 workaround. Possibly the uint_cttz_call! macro is not working properly?

@nikic
Copy link
Contributor

nikic commented Dec 15, 2018

Yeah, looks like the macro just isn't doing what it's intended to do: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=33539371298f2c808a22ba3cf40db567

Centril added a commit to Centril/rust that referenced this issue Dec 16, 2018
Remove u8 cttz hack

This issue has since been fixed in LLVM: llvm-mirror/llvm@1886c8e

Furthermore this code doesn't actually work, because the 8 literal does not match the $BITS provided from the macro invocation, so effectively this was just dead code. Ref rust-lang#43024.

What LLVM does is still not ideal for CPUs that only have bsf but not tzcnt, will create a patch for that later.

r? @nagisa
@nikic
Copy link
Contributor

nikic commented Dec 16, 2018

Partially implemented in https://reviews.llvm.org/D55745. This will handle only == n and != n.

@nikic nikic self-assigned this Dec 16, 2018
@nikic
Copy link
Contributor

nikic commented Jan 27, 2019

This was fully fixed by https://reviews.llvm.org/D56355, which unfortunately did not make it into the last LLVM update.

@nikic
Copy link
Contributor

nikic commented Oct 31, 2019

There was another LLVM update since then, and this issue is now fully fixed.

@nikic nikic closed this as completed Oct 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

3 participants