-
Notifications
You must be signed in to change notification settings - Fork 6.1k
8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long) #20098
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 5 commits
3dd72b8
e43b390
f910739
ce71a0e
8d66f7b
605a78a
1522e26
a64fcda
13ed872
da720c5
0b71cb5
fe3aff4
0047a4b
f622852
6fd8805
93799d5
c06e869
28778c8
bc648aa
7a07aa8
16ae2a3
3f712e2
6cc5484
c956012
0b19789
e669893
dcf6b54
b19fc81
f6f0244
0a8718e
aca0922
65e2e48
c964c26
cfe0239
7353a07
130b475
4d4753f
fb0f731
c049198
abbaf87
94397d3
f83d886
724a346
a190ae6
d0e793a
38537fc
1aa690d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,155 @@ | ||||||
| /* | ||||||
| * Copyright (c) 2023, Oracle and/or its affiliates. All rights reserved. | ||||||
| * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. | ||||||
| * | ||||||
| * This code is free software; you can redistribute it and/or modify it | ||||||
| * under the terms of the GNU General Public License version 2 only, as | ||||||
| * published by the Free Software Foundation. | ||||||
| * | ||||||
| * This code is distributed in the hope that it will be useful, but WITHOUT | ||||||
| * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or | ||||||
| * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License | ||||||
| * version 2 for more details (a copy is included in the LICENSE file that | ||||||
| * accompanied this code). | ||||||
| * | ||||||
| * You should have received a copy of the GNU General Public License version | ||||||
| * 2 along with this work; if not, write to the Free Software Foundation, | ||||||
| * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. | ||||||
| * | ||||||
| * Please contact Oracle, 500 Oracle Parkway, Redwood Shores, CA 94065 USA | ||||||
| * or visit www.oracle.com if you need additional information or have any | ||||||
| * questions. | ||||||
| */ | ||||||
|
|
||||||
| /** | ||||||
| * @test | ||||||
| * @bug 8307513 | ||||||
| * @summary [SuperWord] MaxReduction and MinReduction should vectorize for long | ||||||
| * @library /test/lib / | ||||||
| * @run driver compiler.loopopts.superword.MinMaxRed_Long | ||||||
| */ | ||||||
|
|
||||||
| package compiler.loopopts.superword; | ||||||
|
|
||||||
| import compiler.lib.ir_framework.*; | ||||||
| import jdk.test.lib.Utils; | ||||||
|
|
||||||
| import java.util.Arrays; | ||||||
| import java.util.Random; | ||||||
| import java.util.stream.LongStream; | ||||||
|
|
||||||
| public class MinMaxRed_Long { | ||||||
|
|
||||||
| private static final Random random = Utils.getRandomInstance(); | ||||||
|
|
||||||
| public static void main(String[] args) throws Exception { | ||||||
| TestFramework framework = new TestFramework(); | ||||||
| framework.addFlags("-XX:+IgnoreUnrecognizedVMOptions", | ||||||
| "-XX:LoopUnrollLimit=250", | ||||||
| "-XX:CompileThresholdScaling=0.1"); | ||||||
| framework.start(); | ||||||
| } | ||||||
|
|
||||||
| @Run(test = {"maxReductionImplement"}, | ||||||
| mode = RunMode.STANDALONE) | ||||||
| public void runMaxTest() { | ||||||
| runMaxTest(50); | ||||||
| runMaxTest(80); | ||||||
| runMaxTest(100); | ||||||
| } | ||||||
|
|
||||||
| private static void runMaxTest(int probability) { | ||||||
| long[] longs = new long[1024]; | ||||||
| ReductionInit(longs, probability); | ||||||
| long res = 0; | ||||||
| for (int j = 0; j < 2000; j++) { | ||||||
| res = maxReductionImplement(longs, res); | ||||||
| } | ||||||
| if (res == 11 * Arrays.stream(longs).max().getAsLong()) { | ||||||
| System.out.println("Success"); | ||||||
| } else { | ||||||
| throw new AssertionError("Failed"); | ||||||
| } | ||||||
| } | ||||||
|
|
||||||
| @Run(test = {"minReductionImplement"}, | ||||||
| mode = RunMode.STANDALONE) | ||||||
| public void runMinTest() { | ||||||
| runMinTest(50); | ||||||
| runMinTest(80); | ||||||
| runMinTest(100); | ||||||
| } | ||||||
|
|
||||||
| private static void runMinTest(int probability) { | ||||||
| long[] longs = new long[1024]; | ||||||
| ReductionInit(longs, probability); | ||||||
| // Negating the values generated for controlling max branching | ||||||
| // allows same logic to be used for min tests. | ||||||
| longs = negate(longs); | ||||||
| long res = 0; | ||||||
| for (int j = 0; j < 2000; j++) { | ||||||
| res = minReductionImplement(longs, res); | ||||||
| } | ||||||
| if (res == 11 * Arrays.stream(longs).min().getAsLong()) { | ||||||
| System.out.println("Success"); | ||||||
| } else { | ||||||
| throw new AssertionError("Failed"); | ||||||
| } | ||||||
| } | ||||||
|
|
||||||
| static long[] negate(long[] nums) { | ||||||
| return LongStream.of(nums).map(l -> -l).toArray(); | ||||||
| } | ||||||
|
|
||||||
| public static void ReductionInit(long[] longs, int probability) { | ||||||
|
||||||
| public static void ReductionInit(long[] longs, int probability) { | |
| public static void reductionInit(long[] longs, int probability) { |
This is a method name, not a class - so I think it should start lower-case, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the method might as well allocate the array too. But up to you.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you give a high-level definition / explanation what this does?
Also: what is the expected number of rounds you iterate here? I'm asking because I would like to be sure that a timeout is basically impossible because the probability is too low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I'll add. It's an approximation to make it run fast as sizes increase. In the worst case I've seen it take 15 rounds when size was 100, 50% probability and got 50 below max and 50 above. But with bigger array sizes, say 10'000, and 50% probability aim, it can take 1 or 2 rounds ending up with 5027 above max, 4973 below max.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I would call it diffToMax, because you are really just going to get a value below the max, and you are not decrementing the max. But up to you if you want to change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eme64 I've addressed all your comments except aarch64 testing.
asimdis not enough, you needsvefor this, but I'm yet to make it work even withsve, something's up and need to debug it further.
Hi @galderz , may I ask if these long-reduction cases can't work even with sve? It might be related with the limitation here. Some sve machines have only 128 bits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's right. Neoverse V2 is 4 pipes of 128 bits, V1 is 2 pipes of 256 bits.
That comment is "interesting". Maybe it should be tunable by the back end. Given that Neoverse V2 can issue 4 SVE operations per clock cycle, it might still be a win.
Galder, how about you disable that line and give it another try?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I'm working on removing the line here.
The issue is that on some platforms 2-element vectors are somehow really slower, and we need a cost-model to give us a better heuristic, rather than the hard "no". See my draft #20964.
But yes: why don't you remove the line, and see if that makes it work. If so, then don't worry about this case for now, and maybe leave a comment in the test. We can then fix that later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this limit limits reductions like this working on 128 bit registers:
// Length 2 reductions of INT/LONG do not offer performance benefits
if (((arith_type->basic_type() == T_INT) || (arith_type->basic_type() == T_LONG)) && (size == 2)) {
retValue = false;
I've tried today to remove that but then the profitable checks fail to pass. So, I'm not going down that route now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.