Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential to add #[inline} attributes where possible #270

Open
shssoichiro opened this issue Oct 24, 2022 · 2 comments
Open

Potential to add #[inline} attributes where possible #270

shssoichiro opened this issue Oct 24, 2022 · 2 comments

Comments

@shssoichiro
Copy link

shssoichiro commented Oct 24, 2022

I noticed there is a PR #210 that removed most #[inline] annotations. However, in developing one of my crates, I found that Rust's stdlib cbrtf was a significant bottleneck. I attempted to use this crate, but it provided minimal benefit due to the functions not being inlined. The bottleneck was due to the fact that cbrtf was being called in a loop, and could not be autovectorized because it had to be accessed via a call instruction.

If I add the #[inline] attribute, my function is able to inline and autovectorize the code, which resulted in a 25% performance improvement. Therefore, I would move to consider readding the #[inline] attribute where possible. (#[inline(always)] is not needed, only #[inline] to allow the compiler to consider it.)

Edit: I tried doing the same with powf for other places it's used in the crate, and in that case it had a very negative effect... so this turns out to not be a universal thing at all 🤔

@Amanieu
Copy link
Member

Amanieu commented Oct 29, 2022

If you've found a specific case where the lack of #[inline] causes a performance problem for you then we are happy to accept a PR to add the #[inline] back.

@nia-e
Copy link

nia-e commented Dec 10, 2022

I benchmarked every function with #[inline] vs. as-is, and there are a few considerable improvements. On my machine (x86_64 Arch Linux, Ryzen 7 5800H) I got these significant results - though I'd appreciate someone on a different system/architecture running a bench as well:

  • atan: 3 -> 2 ns/iter
  • atan2: 6 -> 5 ns/iter
  • cosf: 4 -> 2 ns/iter
  • coshf: 4 -> 3 ns/iter
  • expm1f: 3 -> 2 ns/iter
  • hypot: 4 -> 2 ns/iter
  • hypotf: 2 -> 1 ns/iter
  • sinf: 5 -> 2 ns/iter
  • tan: 4 -> 3 ns/iter
  • tanf: 7 -> 2 ns/iter
  • tanh: 8 -> 6 ns/iter

I only included results where #[inline] helped by >10% but I will do more runs and average out results before doing any proper PR, for now just want to put this out there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants