8376891: [VectorAlgorithms] add more if-conversion benchmarks and tests by eme64 · Pull Request #29522 · openjdk/jdk

eme64 · 2026-02-02T12:21:14Z

Changes:

Introduce BRANCH_PROBABILITY, so we can adjust the branch probability of benchmarks with branches that are sensitive to branch prediction.
filterI is sensitive to branch prediction: give it data that depends on BRANCH_PROBABILITY.
filterI: add some alternative implementations that speculate on all-true/all-false paths.
lowerCaseB adjust percentage of upper/lower case character based on BRANCH_PROBABILITY.
pieceWise2FunctionF piece wise function, shows branching vs vector vs vector with branching.
conditionalSumB: shows branching vs vector performance.

Builds on #28639

Please: have a look at the results and discussion in a comment further down: #29522 (comment)

The filterI_VectorAPI_v2_l2 benchmark performs poorly on x64, so I filed this RFE:
JDK-8378589 C2 VectorAPI x64: implement 2-element vector masks

We also see that some benchmarks are very slow, because we have not yet implemented "graceful degregation".
See also: https://bugs.openjdk.org/browse/JDK-8378373

Credits:
the all-true/all-false path implementations (dynamic uniformity) for filterI are inspired by this paper:
Combining Run-Time Checks and Compile-Time Analysis to Improve Control Flow Auto-Vectorization.
Bangtian Liu, Avery Laird, Wai Hung Tsang, Bardia Mahjour, and Maryam Mehri Dehnavi.
In PACT, 2022 [PDF]

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8376891: [VectorAlgorithms] add more if-conversion benchmarks and tests (Sub-task - P4)

Reviewers

Quan Anh Mai (@merykitty - Reviewer)
Paul Sandoz (@PaulSandoz - Reviewer)
Xiaohong Gong (@XiaohongGong - Committer)
Jatin Bhateja (@jatin-bhateja - Reviewer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/29522/head:pull/29522
$ git checkout pull/29522

Update a local copy of the PR:
$ git checkout pull/29522
$ git pull https://git.openjdk.org/jdk.git pull/29522/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 29522

View PR using the GUI difftool:
$ git pr show -t 29522

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/29522.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2026-02-02T12:22:59Z

👋 Welcome back epeter! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2026-02-02T12:23:49Z

@eme64 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8376891: [VectorAlgorithms] add more if-conversion benchmarks and tests

Reviewed-by: qamai, psandoz, xgong, jbhateja

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 24 new commits pushed to the master branch:

784dd56: 8378999: BeanContextSupport.add(Object) synchronizes on its argument
779970f: 8373626: [asan] read past end of buffer in sun.awt.image.ImagingLib.convolveBI
ea4a151: 8379156: Specify behavior of Types.asElement() on array types
... and 21 more: https://git.openjdk.org/jdk/compare/dfea6eb9f84142aaa3e51181ea345e8575729ea2...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2026-02-02T12:24:41Z

@eme64 The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

…arks-and-tests

eme64 · 2026-02-20T13:21:51Z

Result on AVX512 laptop:

Benchmark                                       (BRANCH_PROBABILITY)  (NUM_X_OBJECTS)  (SEED)  (SIZE)  Mode  Cnt      Score      Error  Units
VectorAlgorithms.pieceWise2FunctionF_VectorAPI                  0.01            10000       0   10000  avgt    3   9054.094 ±  218.989  ns/op
VectorAlgorithms.pieceWise2FunctionF_VectorAPI                  0.05            10000       0   10000  avgt    3   9056.102 ±  226.557  ns/op
VectorAlgorithms.pieceWise2FunctionF_VectorAPI                   0.3            10000       0   10000  avgt    3   9047.736 ±   47.267  ns/op
VectorAlgorithms.pieceWise2FunctionF_VectorAPI                   0.5            10000       0   10000  avgt    3   9052.642 ±  131.057  ns/op
VectorAlgorithms.pieceWise2FunctionF_VectorAPI                   0.7            10000       0   10000  avgt    3   9044.788 ±   97.747  ns/op
VectorAlgorithms.pieceWise2FunctionF_VectorAPI                  0.95            10000       0   10000  avgt    3   9110.475 ± 1893.316  ns/op
VectorAlgorithms.pieceWise2FunctionF_VectorAPI                  0.99            10000       0   10000  avgt    3   9048.477 ±  224.952  ns/op
VectorAlgorithms.pieceWise2FunctionF_loop                       0.01            10000       0   10000  avgt    3  35782.844 ±   40.692  ns/op
VectorAlgorithms.pieceWise2FunctionF_loop                       0.05            10000       0   10000  avgt    3  34319.087 ±  263.681  ns/op
VectorAlgorithms.pieceWise2FunctionF_loop                        0.3            10000       0   10000  avgt    3  25162.299 ±   80.084  ns/op
VectorAlgorithms.pieceWise2FunctionF_loop                        0.5            10000       0   10000  avgt    3  18010.296 ±  423.631  ns/op
VectorAlgorithms.pieceWise2FunctionF_loop                        0.7            10000       0   10000  avgt    3  11045.497 ±  479.314  ns/op
VectorAlgorithms.pieceWise2FunctionF_loop                       0.95            10000       0   10000  avgt    3   8120.907 ± 1622.572  ns/op
VectorAlgorithms.pieceWise2FunctionF_loop                       0.99            10000       0   10000  avgt    3   8128.132 ±  386.989  ns/op

Interesting: the scalar implementation can beat the vectorized one, if the branch probability is extreme enough.

And with speculating on one branch all-true, we even get:

Benchmark                                          (BRANCH_PROBABILITY)  (NUM_X_OBJECTS)  (SEED)  (SIZE)  Mode  Cnt     Score      Error  Units
VectorAlgorithms.pieceWise2FunctionF_VectorAPI_v2                  0.01            10000       0   10000  avgt    3  9039.454 ±   29.497  ns/op
VectorAlgorithms.pieceWise2FunctionF_VectorAPI_v2                  0.05            10000       0   10000  avgt    3  9045.280 ±  159.739  ns/op
VectorAlgorithms.pieceWise2FunctionF_VectorAPI_v2                   0.3            10000       0   10000  avgt    3  9042.699 ±   14.615  ns/op
VectorAlgorithms.pieceWise2FunctionF_VectorAPI_v2                   0.5            10000       0   10000  avgt    3  9040.755 ±   42.132  ns/op
VectorAlgorithms.pieceWise2FunctionF_VectorAPI_v2                   0.7            10000       0   10000  avgt    3  9068.483 ± 1364.741  ns/op
VectorAlgorithms.pieceWise2FunctionF_VectorAPI_v2                  0.95            10000       0   10000  avgt    3  5239.270 ±   18.841  ns/op
VectorAlgorithms.pieceWise2FunctionF_VectorAPI_v2                  0.99            10000       0   10000  avgt    3  1660.073 ±   11.396  ns/op

If the branch probability is high enough, we get the combined benefit of branch prediction and vectorization!

wenshao · 2026-02-23T06:52:11Z

test/hotspot/jtreg/compiler/vectorization/VectorAlgorithmsImpl.java


+            int filterI_range = 1000_000;
+            aI_filterI = new int[size];
+            Arrays.setAll(aI, i -> random.nextInt(filterI_range));


Suggested change

Arrays.setAll(aI, i -> random.nextInt(filterI_range));

Arrays.setAll(aI, i -> random.nextInt(aI_filterI));

Why are you suggesting this?
I think it is correct as is. The goal is to filter "in/out" an element with probability branchProbability. So we need to use the same range filterI_range for picking the elements and for eI_filterI.

eme64 · 2026-02-23T14:24:58Z

I'll have to run it again later to get less noisy results. But it looks promising (AVX512 laptop):

Benchmark                                      (BRANCH_PROBABILITY)  (NUM_X_OBJECTS)  (SEED)  (SIZE)  Mode  Cnt      Score       Error  Units
VectorAlgorithms.conditionalSumB_VectorAPI_v1                  0.01            10000       0   10000  avgt    3   1814.407 ±  7117.180  ns/op
VectorAlgorithms.conditionalSumB_VectorAPI_v1                  0.05            10000       0   10000  avgt    3   1310.797 ±   723.511  ns/op
VectorAlgorithms.conditionalSumB_VectorAPI_v1                   0.3            10000       0   10000  avgt    3   1264.866 ±    15.574  ns/op
VectorAlgorithms.conditionalSumB_VectorAPI_v1                   0.5            10000       0   10000  avgt    3   1276.427 ±   387.683  ns/op
VectorAlgorithms.conditionalSumB_VectorAPI_v1                   0.7            10000       0   10000  avgt    3   1262.752 ±    10.911  ns/op
VectorAlgorithms.conditionalSumB_VectorAPI_v1                  0.95            10000       0   10000  avgt    3   1661.397 ±  6421.994  ns/op
VectorAlgorithms.conditionalSumB_VectorAPI_v1                  0.99            10000       0   10000  avgt    3   1257.687 ±     3.951  ns/op
VectorAlgorithms.conditionalSumB_VectorAPI_v2                  0.01            10000       0   10000  avgt    3   1028.745 ±  3909.545  ns/op
VectorAlgorithms.conditionalSumB_VectorAPI_v2                  0.05            10000       0   10000  avgt    3    895.966 ±   101.069  ns/op
VectorAlgorithms.conditionalSumB_VectorAPI_v2                   0.3            10000       0   10000  avgt    3    909.189 ±   202.897  ns/op
VectorAlgorithms.conditionalSumB_VectorAPI_v2                   0.5            10000       0   10000  avgt    3    917.714 ±   138.545  ns/op
VectorAlgorithms.conditionalSumB_VectorAPI_v2                   0.7            10000       0   10000  avgt    3   1000.344 ±  2572.412  ns/op
VectorAlgorithms.conditionalSumB_VectorAPI_v2                  0.95            10000       0   10000  avgt    3   1439.590 ±   138.446  ns/op
VectorAlgorithms.conditionalSumB_VectorAPI_v2                  0.99            10000       0   10000  avgt    3    903.180 ±    14.482  ns/op
VectorAlgorithms.conditionalSumB_loop                          0.01            10000       0   10000  avgt    3   5211.599 ±  1341.394  ns/op
VectorAlgorithms.conditionalSumB_loop                          0.05            10000       0   10000  avgt    3   5368.796 ±  1293.927  ns/op
VectorAlgorithms.conditionalSumB_loop                           0.3            10000       0   10000  avgt    3  11524.422 ±  2455.560  ns/op
VectorAlgorithms.conditionalSumB_loop                           0.5            10000       0   10000  avgt    3  13321.561 ±  9368.497  ns/op
VectorAlgorithms.conditionalSumB_loop                           0.7            10000       0   10000  avgt    3   8116.188 ±    23.995  ns/op
VectorAlgorithms.conditionalSumB_loop                          0.95            10000       0   10000  avgt    3   6188.119 ± 29088.909  ns/op
VectorAlgorithms.conditionalSumB_loop                          0.99            10000       0   10000  avgt    3   8450.095 ±  1771.220  ns/op

Thanks @PaulSandoz and @rgiulietti for bringing this one up offline :)

…arks-and-tests

eme64 · 2026-02-26T09:05:06Z

Here some benchmark results, run with a command like this:

make test TEST="micro:vm.compiler.VectorAlgorithms.filterI" CONF=linux-x64 TEST_VM_OPTS="" MICRO="OPTIONS=-p SIZE=10000 -p BRANCH_PROBABILITY=0.001,0.002,0.004,0.008,0.016,0.031,0.063,0.125,0.250,0.500,0.750,0.875,0.938,0.969,0.984,0.992,0.996,0.998,0.999" | tee filterI.log
make test TEST="micro:vm.compiler.VectorAlgorithms.lowerCa" CONF=linux-x64 TEST_VM_OPTS="" MICRO="OPTIONS=-p SIZE=10000 -p BRANCH_PROBABILITY=0.001,0.002,0.004,0.008,0.016,0.031,0.063,0.125,0.250,0.500,0.750,0.875,0.938,0.969,0.984,0.992,0.996,0.998,0.999" | tee lowerCase.log
make test TEST="micro:vm.compiler.VectorAlgorithms.pieceWi" CONF=linux-x64 TEST_VM_OPTS="" MICRO="OPTIONS=-p SIZE=10000 -p BRANCH_PROBABILITY=0.001,0.002,0.004,0.008,0.016,0.031,0.063,0.125,0.250,0.500,0.750,0.875,0.938,0.969,0.984,0.992,0.996,0.998,0.999" | tee piece.log
make test TEST="micro:vm.compiler.VectorAlgorithms.conditi" CONF=linux-x64 TEST_VM_OPTS="" MICRO="OPTIONS=-p SIZE=10000 -p BRANCH_PROBABILITY=0.001,0.002,0.004,0.008,0.016,0.031,0.063,0.125,0.250,0.500,0.750,0.875,0.938,0.969,0.984,0.992,0.996,0.998,0.999" | tee conditionalSum.log

filterI

On my AVX512 laptop, results a bit noisy:

And on 2 other x64 servers, results much cleaner:

Comment:

Clear branch prediction shape of most implementations. High probability does writing, so it takes the same or more time than the low probability cases.
v2_l2: 2-element masks not yet supported on x64, so not intrinsification -> horrible performance. Still we see the branch prediction pattern peak through!
v1 (compress) gives generally the best performance, except v2_l8 has a slight speedup in extreme probabilities.

And on NEON (N1) aarch64:

Comment:

v1 (compress) not implemented -> horrible performance
v2_v8 (vector too long) -> horrible performance
Clear branch prediction shape
loop (scalar) performance is better for low and middle probability, but worse for high probability, probably as vectorized store becomes profitable than scalar stores.

lowerCaseB

On a x64 machine (other two machines looks very similar):

On the NEON machines:

Comment:

loop (scalar) has some branch misprediction penalty in the middle probability.
v1 and v2 have very similar performance: no difference on x64, but on NEON it is a slight bit better to only have a single comparison (v2) instead of two (v1).

pieceWise2FunctionF

On my 3 x64 machines it looks like this:

And on NEON:

Comment:

loop (scalar):
- low probability: mostly sqrt, so slower
- high probability: mostly mul, so a bit faster
- middle probability: we see branch misprediction penalty clearly for NEON, but I think also a very slight bump for x64.
v1 has constant performance, using only masked vectors.
v2 has the same performance as v1 for low up to middle probabilities, but around the middle probability it goes even faster, as we can take the uniform path towards only vectorized mul, and can avoid vectorized sqrt.

conditionalSumB

On my 3 x64 machines:

And on NEON:

Comment:

x64:
- loop (scalar): slowest, and branch misprediction penalty in the middle
- v2 is a bit faster than v1, because v2 loads full ByteVectors, and casts them to 3 IntVectors. v1 only loads a 4th of a ByteVector and casts it to a single IntVector, which seems to be a little slower (probably a memory instruction bottle-neck).
NEON:
- loop (scalar): apparently no branch misprediction penalty, maybe even a slight boost? Strange!
- v1 cannot (currently) have a ByteVector that is a 4th of a IntVector: ByteVector must be at least 64bit, and so the IntVector would be 256bit, over the 128bit limit of NEON. No intrinsification -> horrible performance.
- v2 vectorizes well.

Some More Comments / Observations

Auto Vectorization does not yet support control flow, so the loop implementations so far all get us scalar performance, almost always with the classic branch misprediction penalties.
- Auto Vectorization of control flow (using if-conversion) is not trivial: there are cases where the basic approach can lead to regressions: see pieceWise2FunctionF, where one branch is much cheaper than the other, and so if the branch probability leans even moderately high (0.875), branch prediction might be good enough, and if-converted vector code has to execute both branches and hence suffers from the vectorization of the slow branch (here vectorized sqrt). Thus, we need a good cost-model, and take branch probability into account.
The VectorAPI can be used to simulate control flow, and it mostly works quite well. We can even use it to experiment with advanced optimizations using dynamic uniformity (filterI v2, pieceWise2FunctionF v2), that combine vectorization with all-true/all-false checks.
The VectorAPI still has some issues:
- If intrinsification fails, we get horrible performance. The promise of the JEP "Graceful degradation" ("On platforms without vectors, graceful degradation will yield code competitive with manually-unrolled loops"): we are far away from getting scalar performance. There are multiple causes for missing intrinsification: vector length, specific instructions, implementation limitations (mask length). We will need to find a way to lower unsupported vector shapes inside C2 probably, lowering to a reasonable scalar implementation. The difficulty: doing this per-vector-instruction may not be the most efficient, rather one might want to detect a whole connected graph of vectors and find a good translation strategy (example: filterI with compress and masked store), but that might be too much work.
- Restrictive set of vector shapes: Especially on NEON, casts quickly lead us outside the supported vector length. The hardware would support vectors smaller than 64bits, but they cannot be expressed in the VectorAPI. We are stuck in the 2x range from 64bits to 128bits, which is too narrow. A larger range of vector shapes (below 64bits and above 512bits), together with automatic splitting of too-large vectors could help a lot here.

mlbridge · 2026-02-26T09:48:10Z

Webrevs

eme64 · 2026-03-03T10:12:47Z

@XiaohongGong @PaulSandoz @iwanowww @jatin-bhateja This is a continuation of #28639, would any of you be up to reviewing this here as well?

merykitty

LGTM otherwise

merykitty · 2026-03-03T10:12:59Z

test/micro/org/openjdk/bench/vm/compiler/VectorAlgorithmsImpl.java

+            }
+            v.intoArray(r, i);
+        }
+        for (; i < a.length; i++) {


Can this piece of scalar processing refactored into a method so that it does not duplicate the one that processes the whole array above?

We could. But I don't really want to. I'd like to demonstrate that we need two loops here. It is a trade-off :)

Understood then.

I think the implementation in the test and the micro are pretty similar, is it possible to have a common place that both can call?

PaulSandoz · 2026-03-03T21:45:19Z

test/hotspot/jtreg/compiler/vectorization/VectorAlgorithmsImpl.java

            }

+            int filterI_range = 1000_000;
+            aI_filterI = new int[size];


This is not used, aI is used instead.

Oh wow, that is a great catch! I might have to rerun the experiments, since this could possibly affect the results :/

XiaohongGong · 2026-03-04T02:22:53Z

test/hotspot/jtreg/compiler/vectorization/VectorAlgorithmsImpl.java

        public float[] bF;

+        // Input for piece-wise functions.
+        // Uniform [0..1[ with probability p and Uniform [1..2[ with probability (1-p)


Is this a typo issue [0..1[ ? Should be [0..1] instead?

No, [0..1[ is the interval that includes 0 but not 1 (a half-closed interval).

XiaohongGong · 2026-03-04T02:28:18Z

test/hotspot/jtreg/compiler/vectorization/VectorAlgorithmsImpl.java

+            } else if (mask.anyTrue()) {
+                int v0 = v.lane(0);
+                int v1 = v.lane(1);
+                if (v0 >= threshold) { r[j++] = v0; }
+                if (v1 >= threshold) { r[j++] = v1; }


Can we just use mask.laneIsSet(0) here?

Good idea, it lets us simplify to a line like this:
if (mask.laneIsSet(0)) { r[j++] = v.lane(0); }

XiaohongGong · 2026-03-04T02:30:44Z

test/hotspot/jtreg/compiler/vectorization/VectorAlgorithmsImpl.java

+        int i = 0;
+        for (; i < SPECIES_I256.loopBound(a.length); i += SPECIES_I256.length()) {
+            IntVector v = IntVector.fromArray(SPECIES_I256, a, i);
+            var mask = v.compare(VectorOperators.GE, thresholds);


We can use the compare with immediate API directly here.

Suggested change

var mask = v.compare(VectorOperators.GE, thresholds);

var mask = v.compare(VectorOperators.GE, threshold);

Sure, can do :)

XiaohongGong · 2026-03-04T03:33:56Z

test/hotspot/jtreg/compiler/vectorization/VectorAlgorithmsImpl.java

+                var vI1 = vB.castShape(SPECIES_I, 1);
+                var vI2 = vB.castShape(SPECIES_I, 2);
+                var vI3 = vB.castShape(SPECIES_I, 3);
+                accI = accI.add(vI0.add(vI1).add(vI2).add(vI3));


Will following change get better parallelization performance?

Suggested change

accI = accI.add(vI0.add(vI1).add(vI2).add(vI3));

accI = accI.add(vI0.add(vI1).add(vI2.add(vI3)));

It does not matter because the dependency chain is at accI, so as long as we add every else together before adding accI, it will be the same.

Yes, exactly. The critical dependency chain is accI. But feel free to investigate the performance difference in a follow-up RFE, and propose yet another implementation, if it shows to be better :)

Not a block to me. My point is that the dependence between vI0.add(vI1) and vI2.add(vI3) can be broken and hence it my get better parallelization. Although the critical dependency chain is accI, the performance might be better if its input can be calculated earlier.

XiaohongGong · 2026-03-04T03:35:15Z

test/hotspot/jtreg/compiler/vectorization/VectorAlgorithmsImpl.java

+                float s2 = (float)Math.sqrt(ai);
+                float s4 = (float)Math.sqrt(s2);
+                float s8 = (float)Math.sqrt(s4);


Suggested change

float s2 = (float)Math.sqrt(ai);

float s4 = (float)Math.sqrt(s2);

float s8 = (float)Math.sqrt(s4);

float s2 = (float) Math.sqrt(ai);

float s4 = (float) Math.sqrt(s2);

float s8 = (float) Math.sqrt(s4);

I don't see the point in this - is there some style guide that suggests this? Maybe it is just a matter of taste ;)

Yeah, it's not a block from me. Maybe just the personal style. Sorry for the noise and please ignore.

eme64 · 2026-03-04T08:01:05Z

@PaulSandoz @XiaohongGong Thanks for your suggestions around filterI, very helpful! I'll re-test and run the benchmarks again for filterI :)

Feel free to keep reviewing in the meantime ;)

…arks-and-tests

jatin-bhateja · 2026-03-04T09:32:04Z

test/hotspot/jtreg/compiler/vectorization/VectorAlgorithmsImpl.java

+        for (; i < SPECIES_I128.loopBound(a.length); i += SPECIES_I128.length()) {
+            IntVector v = IntVector.fromArray(SPECIES_I128, a, i);
+            var mask = v.compare(VectorOperators.GE, threshold);
+            if (mask.allTrue()) {


What you have here is similar to what we have in fallback implementation of Vector.compress
https://github.com/openjdk/jdk/blob/master/src/jdk.incubator.vector/share/classes/jdk/incubator/vector/IntVector.java#L528

If the Idea here is to only to make use of VectorAPIs to implement a scalar algorithm without caring about algorithm then its fine, else I think better would be to have a version (version 3 may be in follow-up PR) which uses shuffle lookup table where index of lookup table is computed using mask.toLong(). For I2, I4 and I8 size of lookup table will be less than equal to 16 rows.

So shuffle lookup table contains indexes to rearrange vector lanes corresponding to set mask bit, single rearrange can then pack the lane contiguously into result 'r'.

@jatin-bhateja Good point. But we may at some point change how we deal with fall-back, so I think I want to keep the example as is.

About the shuffle lookup: I think that would be an interesting addition. We can do that in a later RFE, feel free to implement that if you like :)
The issue with lookup table: it can increase the memory pressure, and that can hurt some programs. I've heard that often lookup tables are fast for micro benchmarks where memory pressure is low, but can hurt real programs where memory pressure is already high. But I have not done that kind of experiment myself yet, would be interesting to do :)

openjdk · 2026-03-05T11:24:03Z

@eme64 this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8376891-VectorAPI-if-conversion-benchmarks-and-tests
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

eme64 · 2026-03-05T11:51:30Z

@PaulSandoz @merykitty Ok, I think I fixed all the issues. The benchmarks results are still the same.
Can you re-approve?

If anybody else has suggestions, let them come ;)

merykitty

Still good.

XiaohongGong

LGTM!

eme64 · 2026-03-09T07:24:18Z

@merykitty @PaulSandoz @XiaohongGong @jatin-bhateja Thank you very much for all the suggestions, catching bugs and for the approvals :)

/integrate

openjdk · 2026-03-09T07:26:04Z

Going to push as commit b2728d0.
Since your change was applied there have been 28 commits pushed to the master branch:

8c4c8a1: 8378417: Printing All pages results in NPE for 1.1 PrintJob
b159add: 8379231: Assembler::mov64 always emits 10-byte movabs even for small immediates
7e5acdc: 8378057: CAccessibility roleKey and AWTAccessor.AccessibleBundleAccessor are Redundant
... and 25 more: https://git.openjdk.org/jdk/compare/dfea6eb9f84142aaa3e51181ea345e8575729ea2...master

Your commit was automatically rebased without conflicts.

openjdk · 2026-03-09T07:26:14Z

@eme64 Pushed as commit b2728d0.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

JDK-8376891

c5146b9

openjdk bot changed the title ~~JDK-8376891~~ 8376891: [VectorAlgorithms] add more if-conversion benchmarks and tests Feb 2, 2026

openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Feb 2, 2026

eme64 added 7 commits February 2, 2026 13:41

update other lengths

997c3a8

fix typo

af4c434

branchProbability for filterI

fd05490

fix typo

4f5c15e

Merge branch 'master' into JDK-8376891-VectorAPI-if-conversion-benchm…

d1a89fc

…arks-and-tests

string gen

036003b

wip pieceWise2FunctionF

93a78d4

eme64 added 2 commits February 20, 2026 15:26

wip pieceWise2FunctionF benchmark

47d491f

v2 for piecewise

38de171

wenshao reviewed Feb 23, 2026

View reviewed changes

part1 for conditionalSumB

c1c3ddc

eme64 added 4 commits February 23, 2026 16:26

conditionalSumB part2

c47f5d1

Merge branch 'master' into JDK-8376891-VectorAPI-if-conversion-benchm…

7d8c68a

…arks-and-tests

fix IR rules and vector length

78378dc

rm TODO

6cc5648

eme64 marked this pull request as ready for review February 26, 2026 09:40

openjdk bot added the rfr Pull request is ready for review label Feb 26, 2026

merykitty approved these changes Mar 3, 2026

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Mar 3, 2026

PaulSandoz approved these changes Mar 3, 2026

View reviewed changes

PaulSandoz reviewed Mar 3, 2026

View reviewed changes

XiaohongGong reviewed Mar 4, 2026

View reviewed changes

openjdk bot removed the ready Pull request is ready to be integrated label Mar 4, 2026

eme64 added 3 commits March 4, 2026 09:42

For Paul: fix input to filterI

17dfeda

For XiaohongGong: refactor filterI v2

8052679

Merge branch 'master' into JDK-8376891-VectorAPI-if-conversion-benchm…

6b9bffa

…arks-and-tests

jatin-bhateja reviewed Mar 4, 2026

View reviewed changes

openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Mar 5, 2026

fix missing part of precious fix

7054da5

openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Mar 5, 2026

manual merge

b28aa40

merykitty approved these changes Mar 5, 2026

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Mar 5, 2026

PaulSandoz approved these changes Mar 5, 2026

View reviewed changes

XiaohongGong approved these changes Mar 6, 2026

View reviewed changes

jatin-bhateja approved these changes Mar 7, 2026

View reviewed changes

openjdk bot added the integrated Pull request has been integrated label Mar 9, 2026

openjdk bot closed this Mar 9, 2026

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Mar 9, 2026

	Arrays.setAll(aI, i -> random.nextInt(filterI_range));
	Arrays.setAll(aI, i -> random.nextInt(aI_filterI));

	var mask = v.compare(VectorOperators.GE, thresholds);
	var mask = v.compare(VectorOperators.GE, threshold);

	accI = accI.add(vI0.add(vI1).add(vI2).add(vI3));
	accI = accI.add(vI0.add(vI1).add(vI2.add(vI3)));

Conversation

eme64 commented Feb 2, 2026 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Feb 2, 2026

Uh oh!

openjdk bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eme64 commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eme64 commented Feb 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eme64 commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlbridge bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

eme64 commented Mar 3, 2026

Uh oh!

merykitty left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eme64 commented Mar 4, 2026

Uh oh!

eme64 commented Feb 2, 2026 •

edited by openjdk bot

Loading

openjdk bot commented Feb 2, 2026 •

edited

Loading

openjdk bot commented Feb 2, 2026 •

edited

Loading

eme64 commented Feb 20, 2026 •

edited

Loading

eme64 commented Feb 23, 2026 •

edited

Loading

eme64 commented Feb 26, 2026 •

edited

Loading

mlbridge bot commented Feb 26, 2026 •

edited

Loading

jatin-bhateja Mar 4, 2026 •

edited

Loading