Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking compiler inefficiencies #357

Open
mratsim opened this issue Feb 11, 2024 · 0 comments
Open

Tracking compiler inefficiencies #357

mratsim opened this issue Feb 11, 2024 · 0 comments

Comments

@mratsim
Copy link
Owner

mratsim commented Feb 11, 2024

As mentioned in https://github.com/mratsim/constantine/blob/661a481/README-PERFORMANCE.md#compiler-caveats
Compilers have a hard time optimizing bigint operations, even as simple as an addition with carries.

This issue track their evolution and the quality of the code generated with compiler builtins for ISAs of interest.

Note that as of February 2024, we use:

  • _addcarry_u64 on x86-64
  • uint128 on other ISAs

2019, GCC 9.2 and Clang 9.0

The original problem:
https://gcc.godbolt.org/z/2h768y

image

Even with intrinsics, an operation as simple as addition-with-carry is uglily implemented in GCC.
This has been mentioned by the GMP folks 30 years ago: https://gmplib.org/manual/Assembly-Carry-Propagation.html

2024, GCC 13.2 and Clang 17.0

https://gcc.godbolt.org/z/jdecvffaP

image

GMP fixed the x86 intrinsics but unfortunately the portable intrinsics has a terribad codegen and hence makes a terrible fallback for ARM.

Current status

Due to GCC abysmal __builtin__addcll it is a non-starter.
Clang has decent codegen.

Assembly is still very much needed.

This also explains the bad ARM performance on Apple M1, M2, M3, mentioned by @agnxsh (#354 (comment)) and @bkomuves

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant