`x.trailing_zeros() > n` is not optimized as well as `x & ((1 << n) - 1) == 0` on x86 #43024

oli-obk · 2017-07-03T14:22:20Z

the bitshift and bitand version optimizes to

	testb	$15, %dil
	sete	%al
	retq

while the trailing_zeros version optimizes to

	movl	$32, %eax
	testl	%edi, %edi
	je	.LBB6_2
	bsfl	%edi, %eax
.LBB6_2:
	cmpl	$4, %eax
	seta	%al
	retq

even though I find the trailing_zeros version much more straight forward

Should this be reported upstream in llvm or is this something a mir pass should do?

The text was updated successfully, but these errors were encountered:

Mark-Simulacrum · 2017-07-03T14:52:51Z

It looks like trailing_zeros currently calls an LLVM intrinsic (cttz) which has a patch posted against it -- https://reviews.llvm.org/D9284 -- perhaps someone familiar with LLVM (cc @arielb1) could push that through review and then change libcore to not have the conditional that it does today: https://github.com/rust-lang/rust/blob/master/src/libcore/num/mod.rs#L1375-L1390.

nikic · 2018-12-15T16:12:34Z

@Mark-Simulacrum A variant of that patch was implemented via llvm-mirror/llvm@1886c8e in 2016, so we should definitely have it in all LLVM versions we support and can drop that workaround.

However, I don't think this is really related to the issue seen here -- LLVM just doesn't recognize this particular pattern (probably because it would be rather odd in C). Godbolt for reference: https://godbolt.org/z/ovqsCg

I'm a bit stumped about the u8 cttz codegen though. If I disable all optimizations, I get:

define internal i32 @"_ZN4core3num20_$LT$impl$u20$u8$GT$14trailing_zeros17hf3376a11a840d3c1E"(i8 %self) unnamed_addr #0 !dbg !5 {
  %tmp_ret = alloca i8, align 1
  %0 = call i8 @llvm.cttz.i8(i8 %self, i1 false), !dbg !11
  store i8 %0, i8* %tmp_ret, align 1, !dbg !11
  %1 = load i8, i8* %tmp_ret, align 1, !dbg !11
  br label %bb1, !dbg !11

bb1:                                              ; preds = %start
  %2 = zext i8 %1 to i32, !dbg !13
  ret i32 %2, !dbg !14
}

which directly calls @llvm.cttz.i8, without the | 0x100 workaround. Possibly the uint_cttz_call! macro is not working properly?

nikic · 2018-12-15T16:20:47Z

Yeah, looks like the macro just isn't doing what it's intended to do: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=33539371298f2c808a22ba3cf40db567

@nagisa

Remove u8 cttz hack This issue has since been fixed in LLVM: llvm-mirror/llvm@1886c8e Furthermore this code doesn't actually work, because the 8 literal does not match the $BITS provided from the macro invocation, so effectively this was just dead code. Ref rust-lang#43024. What LLVM does is still not ideal for CPUs that only have bsf but not tzcnt, will create a patch for that later. r? @nagisa

nikic · 2018-12-16T20:00:40Z

Partially implemented in https://reviews.llvm.org/D55745. This will handle only == n and != n.

nikic · 2019-01-27T20:18:58Z

This was fully fixed by https://reviews.llvm.org/D56355, which unfortunately did not make it into the last LLVM update.

nikic · 2019-10-31T20:34:43Z

There was another LLVM update since then, and this issue is now fully fixed.

Mark-Simulacrum added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Jul 3, 2017

Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 26, 2017

nikic mentioned this issue Dec 15, 2018

Remove u8 cttz hack #56855

Merged

nikic self-assigned this Dec 16, 2018

nikic closed this as completed Oct 31, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`x.trailing_zeros() > n` is not optimized as well as `x & ((1 << n) - 1) == 0` on x86 #43024

`x.trailing_zeros() > n` is not optimized as well as `x & ((1 << n) - 1) == 0` on x86 #43024

oli-obk commented Jul 3, 2017

Mark-Simulacrum commented Jul 3, 2017

nikic commented Dec 15, 2018

nikic commented Dec 15, 2018

nikic commented Dec 16, 2018

nikic commented Jan 27, 2019

nikic commented Oct 31, 2019

x.trailing_zeros() > n is not optimized as well as x & ((1 << n) - 1) == 0 on x86 #43024

x.trailing_zeros() > n is not optimized as well as x & ((1 << n) - 1) == 0 on x86 #43024

Comments

oli-obk commented Jul 3, 2017

Mark-Simulacrum commented Jul 3, 2017

nikic commented Dec 15, 2018

nikic commented Dec 15, 2018

nikic commented Dec 16, 2018

nikic commented Jan 27, 2019

nikic commented Oct 31, 2019

`x.trailing_zeros() > n` is not optimized as well as `x & ((1 << n) - 1) == 0` on x86 #43024

`x.trailing_zeros() > n` is not optimized as well as `x & ((1 << n) - 1) == 0` on x86 #43024