-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) #20098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 1 commit
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
3dd72b8
8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long)
galderz e43b390
Add IR test
galderz f910739
Refactor inline methods to unify their implementations
galderz ce71a0e
Add math vectorized JMH benchmark
galderz 8d66f7b
Rename benchmark class to MathLoopBench
galderz 605a78a
Fix multi long tests to use long arrays
galderz 1522e26
Implement cmovL as a jump+mov branch
galderz a64fcda
Switch movl to movq
galderz 13ed872
Fix format of assembly for the movl to movq switch
galderz da720c5
Distribute values targetting a branch percentage
galderz 0b71cb5
Fix min case to distribute numbers as per probability
galderz fe3aff4
Fix compilation error
galderz 0047a4b
Add an intermediate % that is more representative of real life
galderz f622852
Skip single array benchmarks
galderz 6fd8805
Add min/max benchmark that includes loops and reductions
galderz 93799d5
Renamed benchmark methods
galderz c06e869
Multiply array value in reduction for vectorization to kick in
galderz 28778c8
Remove previous benchmark effort
galderz bc648aa
Revert "Fix format of assembly for the movl to movq switch"
galderz 7a07aa8
Revert "Switch movl to movq"
galderz 16ae2a3
Revert "Implement cmovL as a jump+mov branch"
galderz 3f712e2
Merge branch 'master' into topic.intrinsify-max-min-long
galderz 6cc5484
Avoid creating result array in benchmark method
galderz c956012
Encapsulate benchmark state within an inner class
galderz 0b19789
Add clipping range benchmark that uses min/max
galderz e669893
Restore previous benchmark iterations and default param size
galderz dcf6b54
Make state class non-final
galderz b19fc81
Double/Float tests only when avx enabled
galderz f6f0244
Renamed benchmark class
galderz 0a8718e
Use same default size as in other vector reduction benchmarks
galderz aca0922
Merge branch 'master' into topic.intrinsify-max-min-long
galderz 65e2e48
Add empty line
galderz c964c26
Add max reduction test
galderz cfe0239
Fix style
galderz 7353a07
Adjust min/max identity IR test expectations after changes
galderz 130b475
Added comment around the assertions
galderz 4d4753f
Tests should also run on aarch64 asimd=true envs
galderz fb0f731
Fix license header
galderz c049198
Test can only run with 256 bit registers or bigger
galderz abbaf87
Make sure it runs with cpus with either avx512 or asimd
galderz 94397d3
Fix copyright years
galderz f83d886
Renaming methods and variables and add docu on algorithms
galderz 724a346
Fix typo
galderz a190ae6
Merge branch 'master' into topic.intrinsify-max-min-long
galderz d0e793a
Add simple reduction benchmarks on top of multiply ones
galderz 38537fc
Add assertion comments
galderz 1aa690d
Merge branch 'master' into topic.intrinsify-max-min-long
galderz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @galderz , may I ask if these long-reduction cases can't work even with
sve? It might be related with the limitation here. Somesvemachines have only 128 bits.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. Neoverse V2 is 4 pipes of 128 bits, V1 is 2 pipes of 256 bits.
That comment is "interesting". Maybe it should be tunable by the back end. Given that Neoverse V2 can issue 4 SVE operations per clock cycle, it might still be a win.
Galder, how about you disable that line and give it another try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I'm working on removing the line here.
The issue is that on some platforms 2-element vectors are somehow really slower, and we need a cost-model to give us a better heuristic, rather than the hard "no". See my draft #20964.
But yes: why don't you remove the line, and see if that makes it work. If so, then don't worry about this case for now, and maybe leave a comment in the test. We can then fix that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this limit limits reductions like this working on 128 bit registers:
I've tried today to remove that but then the profitable checks fail to pass. So, I'm not going down that route now.