-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
op/aarch64: refactor SVE functions #12683
op/aarch64: refactor SVE functions #12683
Conversation
@bosilca this is a refactor of the SVE functions, to make them more lookalike the SVE VLA examples provided by Arm. I was initially unable to compile the
and then
FWIW, if the source file contains only one subroutine, compilation works fine, but if it contains two or more subroutines, then the compiler crashes. @bosilca @jeffhammond can one of you please report this internally to the compiler team? |
I reported on Slack. Will escalate if necessary. |
Thanks @jeffhammond ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a fundamental difference with the original code and I really don't care much if it looks like the ARM documentation on SVE. But, if restructuring the code has no performance difference and makes another compiler happy, I'm all for it.
@jeffhammond alerted me to this issue. I will open an internal bug report on it here today. Sorry for any inconvenience caused by this issue. We have already frozen development for the 24.7 release, but I'll get this into the pipeline to be fixed as soon as possible subsequent to that. |
Refactor SVE functions and incidentally make NVIDIA compilers a happy panda again. Signed-off-by: Gilles Gouaillardet <gilles@rist.or.jp>
f4fc249
to
ba59533
Compare
Performance is basically on par (with some "interesting" variations")
|
Just to update - our developer has identified the issue and we are testing a fix in our development branch. This fix seems to have resolved the above issue with SVE intrinsics. We believe it will land in the HPC SDK 24.9 release, but we will keep you posted. (Ping me if I forget to check in here after that.) |
Refactor SVE functions and incidentally make NVIDIA compilers a happy panda again.