Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leading zeros and other bit manipulations #190

Merged
merged 3 commits into from Nov 26, 2018

Conversation

TheIronBorn
Copy link
Contributor

$id::from_slice_unaligned(elems)
}

#[cfg_attr(not(target_arch = "wasm32"), test)] #[cfg_attr(target_arch = "wasm32", wasm_bindgen_test)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want to split these into two lines - rustfmt chokes on these

Copy link
Contributor

@gnzlbg gnzlbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 20, 2018

For the following architectures you need to:

  • fill in an LLVM bug for the respective backends (aarch64 and s390x): ideally you just write LLVM-IR in godbolt for one case that works, say in x86_64, compile it for one of these targets, and show the LLVM error.
  • open a bug here referencing the LLVM bug
  • add a workaround to codegen for these targets that, instead of using the LLVM intrinsic, uses pure rust code to compute these - these workaround should have a FIXME referencing the bug in these repo

It looks like these are not available in aarch64:

LLVM ERROR: Cannot select: 0x7fb76ee85270: v8i8 = cttz 0x7fb76ef05270, src/codegen/bit_manip.rs:176:39
  0x7fb76ef05270: v8i8,ch = load<(dereferenceable load 8 from %ir.5)> 0x7fb76ef23758, FrameIndex:i64<2>, undef:i64, src/codegen/bit_manip.rs:176:31
    0x7fb76ee5b138: i64 = FrameIndex<2>
    0x7fb76ee90958: i64 = undef
In function: _ZN110_$LT$packed_simd..Simd$LT$$u5b$u8$u3b$$u20$_$u5d$$GT$$u20$as$u20$packed_simd..codegen..bit_manip..BitManip$GT$4cttz17h3aae67eb204ee427E
error: Could not compile `packed_simd`.

And in s390x:

Intrinsic has incorrect return type!
void (<16 x i8>*, <16 x i8>*)* @llvm.ctpop.v16i8
LLVM ERROR: Broken function found, compilation aborted!
error: Could not compile `packed_simd`.

Both appears to be bugs in the respective LLVM backends.

Copy link
Contributor

@gnzlbg gnzlbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

@TheIronBorn
Copy link
Contributor Author

Turns out aarch64 has instructions for ctlz and ctpop. We should add those to assert_instr tests.

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 22, 2018

Please ping me when CI is green :) I try to check on this every now and then, but if you don't hear from me, then just ping me :D

@TheIronBorn
Copy link
Contributor Author

The remaining CI failures seem to be unrelated.

Copy link
Contributor

@gnzlbg gnzlbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this looks good. I've restarted the failing build bots, but yeah the failures look unrelated. Do you want to rebase this into a single commit ?

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 25, 2018

Also could you open an issue about optimizing the aarch64 intrinsics here (and maybe include there adding assert_instr tests for those) ?

@TheIronBorn
Copy link
Contributor Author

It seemed like every s390x intrinsic was failing so I just disabled them entirely

@gnzlbg
Copy link
Contributor

gnzlbg commented Nov 26, 2018

@TheIronBorn You have contributed a lot to the crate, you know it's structure, have good judgement, and I think it would be awesome if you could be a maintainer of packed_simd, are you interested ?

@gnzlbg gnzlbg merged commit 22a4b4f into rust-lang:master Nov 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants