8340093: C2 SuperWord: implement cost model #27803

eme64 · 2025-10-14T16:10:22Z

Note: this looks like a large change, but only about 400-500 lines are VM changes. 2.5k comes from new tests.

Finally: after a long list of refactorings, we can implement the Cost-Model. The refactorings and this implementation was first PoC'd here: #20964

Main goal:

Carefully allow the vectorization of reduction cases that lead to speedups, and prevent those that do not (or may cause regressions).
Open up new vectorization opportunities in the future, that introduce expensive vector nodes that are only profitable in some cases but not others.

Why cost-model?

Usually, vectorization leads to speedups because we replace multiple scalar operations with a single vector operation. The scalar and vector operation have a very similar cost per instruction, and so going from 4 scalar ops to a single vector op may yield a 4x speedup. This is a bit simplistic, but the general idea.

But: some vector ops are expensive. Sometimes, the vector op can be more expensive than the multiple scalar ops it replaces. This is the case with some reduction ops. Or we may introduce a vector op that does not have any corresponding scalar op (e.g. in the case of shuffle). This prevents simple heuristics that only focus on single operations.

Weighing the total cost of the scalar loop vs the vector loop allows us a more "holistic" approach. There may be expensive vector ops, but other cheaper vector ops may still make it profitable.

Implementation

Items:

New VTransform::is_profitable: checks cost-model and some other cost related checks.
- VLoopAnalyzer::cost: scalar loop cost
- VTransformGraph::cost: vector loop cost
Old reduction heuristic with _num_work_vecs and _num_reductions used to count check for "simple" reductions where the only "work" vector was the reduction itself. Reductions were not considered profitable if they were "simple". I was able to lift those restrictions.
Adapted existing tests.
Wrote a new comprehensive test, matching the related JMH benchmark, which we use below.

Testing
Regular correctness testing, and performance testing. In addition to the JMH micro benchmarks below.

Some History

I have been bothered by "simple" reductions not vectorizing for a long time. It was also a part of my JVMLS2025 presentation.

During JDK9, reductions were first vectorized, but then restricted for "simple" and "2-element" reductions:

JDK-8074981
Integer/FP scalar reduction optimization
- Vectorized reduction, but led to a regression for some cases.
JDK-8078563 Restrict reduction optimization
- Disabled vectorization for many cases. It seems we disabled a bit too many cases, because the regression really only happened with the float/double add/mul cases with linear reductions. And the int/long reductions were not affected but still disabled. We filed the following RFE for investigation:
JDK-8188313 C2: Consider enabling auto-vectorization for simple reductions (disabled by JDK-8078563)
- Was never addressed.

During JDK21, I further improved reductions:

JDK-8302652 [SuperWord] Reduction should happen after loop, when possible
- Now "simple" and "2-element" reductions of the int/long variety would be even more worth it, but still disabled because of JDK-8078563.

Other reports:

JDK-8345044 Sum of array elements not vectorized
JDK-8336000 C2 SuperWord: report that 2-element reductions do not vectorize
JDK-8307516 C2 SuperWord: reconsider Reduction heuristic for UnorderedReduction

And I've been mapping out the reduction performance with benchmarks: #25387
You can see that we already used to vectorize a lot of cases, but especially did not vectorize:

"simple" reductions
"2-element" reductions

Future Work, discovered while writing the attached IR test:

JDK-8370671 C2 SuperWord [x86]: implement Long.max/min reduction for AVX2
JDK-8370673 C2 SuperWord [x86]: implement long mul reduction
JDK-8370677 C2 SuperWord [aarch64]: implement sequential reduction for add/mul D/F
JDK-8370685 C2 SuperWord: investigate why longMulBig does not vectorize
JDK-8370686 C2 SuperWord [aarch64]: investigate long mul reductions performance on NEON

Reduction Benchmarks

Results from the benchmark #25387 that is related to the attached IR test.

Legend:

master: performance before this patch
P1: default with this patch, i.e. -XX:AutoVectorizationOverrideProfitability=1, relying on new cost-model.
P0: patch, but auto vectorization disabled, i.e. -XX:AutoVectorizationOverrideProfitability=0.
P2: patch, but auto vectorization forced, i.e. -XX:AutoVectorizationOverrideProfitability=2.

How to look at the results below:

On the left, we have the raw performance numbers, and the errors.
On the right, we have the performance differences, marked with colors.
First focus on P1 vs master. Lower is better (marked green).
P1 vs P0 gives you a view on how many cases already profit from auto vectorization in total.
P1 vs P2 shows us how forced vectorization affects performance. There is basically no impact any more. See results from 8357530: C2 SuperWord: Diagnostic flag AutoVectorizationOverrideProfitability #25387 to see that we used to have a lot of cases where forcing vectorization led to speedups.

Note: some of the min/max benchmarks are not very stable. That is due to random input data: in some cases the scalar performance is better because it uses branching.

linux_x64 (AVX512)

windows_x64 (AVX2 - )

macosx_x64_sandybridge

linux_aarch64 (NEON)

macosx_aarch64 (NEON)

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Issue

JDK-8340093: C2 SuperWord: implement cost model (Enhancement - P4)

Reviewers

Vladimir Kozlov (@vnkozlov - Reviewer)
Quan Anh Mai (@merykitty - Committer)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27803/head:pull/27803
$ git checkout pull/27803

Update a local copy of the PR:
$ git checkout pull/27803
$ git pull https://git.openjdk.org/jdk.git pull/27803/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27803

View PR using the GUI difftool:
$ git pr show -t 27803

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27803.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2025-10-14T16:12:26Z

👋 Welcome back epeter! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-10-14T16:12:56Z

@eme64 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8340093: C2 SuperWord: implement cost model

Reviewed-by: kvn, qamai

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 144 new commits pushed to the master branch:

db76479: 8371316: Adjust assertion (GC pause time cannot be smaller than the sum of each phase) in G1GCPhaseTimes::print
ac9cf5d: 8370878: C1: Clean up unnecessary ConversionStub constructor
c754e3e: 8368528: HttpClient.Builder.connectTimeout should accept arbitrarily large values
... and 141 more: https://git.openjdk.org/jdk/compare/c97d50d793df46292e38707956586dfaa4b77d32...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

openjdk · 2025-10-14T16:14:02Z

@eme64 The following label will be automatically applied to this pull request:

hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-11-03T12:26:40Z

Webrevs

SirYwell

Nice work :)

SirYwell · 2025-11-03T13:31:10Z

src/hotspot/share/opto/vectorization.cpp

+// For now, we use unit cost. We might refine that in the future.
+// If needed, we could also use platform specific costs, if the
+// default here is not accurate enough.
+float VLoopAnalyzer::cost_for_vector_reduction(int opcode, int vlen, BasicType bt, bool requires_strict_order) const {
+  // Each reduction is composed of multiple instructions, each estimated with a unit cost.
+  //                                Linear: shuffle and reduce    Recursive: shuffle and reduce
+  float c = requires_strict_order ? 2 * vlen                    : 2 * exact_log2(vlen);


"unit cost" sounds a bit too simple given that there is some kind of estimation going on already. Maybe it would make sense to add some discussion how strict order affects the shape of the resulting vectorized code?

I assume cases where the reduction can be moved after the loop are covered somewhere else?

Thanks for the comment :)

By "unit cost" I mean unit cost per hardware instruction. Reduction ops use multiple instructions, so we count the instructions, and return that count.

Yes, if we move reductions out of the loop, then the reduction node is not in the loop anymore, and instead we have vector accumulators. And then we count the cost of the vector accumulators.

That's why I need methods like VTransformGraph::mark_vtnodes_in_loop to know what nodes are in the loop (the new vector accumulators, and not the reductions if moved out of the loop).

I think I'll improve the comments a little to make that more clear :)

Ah, when referring to hardware instructions this makes perfectly sense, somehow I assumed "unit cost of a node". Thanks for clarifying!

src/hotspot/share/opto/vectorization.cpp

Co-authored-by: Hannes Greule <SirYwell@users.noreply.github.com>

eme64 · 2025-11-03T12:04:49Z

src/hotspot/cpu/aarch64/aarch64_vector.ad

-      // NEON instructions support them. But the match rule support for them is profitable for
-      // Vector API intrinsics.
+      // NEON instructions support them. They use multiple instructions which is more
+      // expensive in almost all cases where we would auto vectorize.
+      // But the match rule support for them is profitable for Vector API intrinsics.
      if ((opcode == Op_VectorCastD2X && (bt == T_INT || bt == T_SHORT)) ||
          (opcode == Op_VectorCastL2X && bt == T_FLOAT) ||
          (opcode == Op_CountLeadingZerosV && bt == T_LONG) ||
          (opcode == Op_CountTrailingZerosV && bt == T_LONG) ||
+          opcode == Op_MulVL ||
          // The implementations of Op_AddReductionVD/F in Neon are for the Vector API only.
          // They are not suitable for auto-vectorization because the result would not conform
          // to the JLS, Section Evaluation Order.
+          // Note: we could implement sequential reductions for these reduction operators, but
+          //       this will still almost never lead to speedups, because the sequential
+          //       reductions are latency limited along the reduction chain, and not
+          //       throughput limited. This is unlike unordered reductions (associative op)
+          //       and element-wise ops which are usually throughput limited.
          opcode == Op_AddReductionVD || opcode == Op_AddReductionVF ||
-          opcode == Op_MulReductionVD || opcode == Op_MulReductionVF ||
-          opcode == Op_MulVL) {
+          opcode == Op_MulReductionVD || opcode == Op_MulReductionVF) {


Note: no functional changes, only moving Op_MulVL up to the other cases that work the same as it. And improving some comments.

eme64 · 2025-11-03T12:05:27Z

src/hotspot/share/opto/superword.cpp

-  _do_vector_loop(phase()->C->do_vector_loop()),            // whether to do vectorization/simd style
-  _num_work_vecs(0),                                        // amount of vector work we have
-  _num_reductions(0)                                        // amount of reduction work we have
+  _do_vector_loop(phase()->C->do_vector_loop())             // whether to do vectorization/simd style


Note: part of old reduction heuristic, no longer needed.

eme64 · 2025-11-03T12:14:29Z

src/hotspot/share/opto/vtransform.cpp

-  if (!Matcher::match_rule_supported_vector(vopc, vlen, bt)) {
-    DEBUG_ONLY( this->print(); )
-    assert(false, "do not have normal vector op for this reduction");
-    return false; // not implemented
+  if (!Matcher::match_rule_supported_auto_vectorization(vopc, vlen, bt)) {
+    // The element-wise vector operation needed for the vector accumulator
+    // is not implemented / supported.
+    return false;


I consider this a "performance bug", but it makes sense to fix it here.
match_rule_supported_vector returns true on aarch64 for MulVL, but match_rule_supported_auto_vectorization returns false. And that is because MulVL has a "fake vector implementation" in the backend on NEON, that just extracts to scalars, and does the op in scalar multiplication, and packs again.

Since we are auto-vectorizing, we should trust the match_rule_supported_auto_vectorization here.

On aarch64, the MulReductionVL is allowed for vectorization. But if we move it out of the loop here, we end up introducing a MulVL, which is very much not profitable. Making this change avoids this issue, and is also consistent with the match_rule_supported_auto_vectorization use instead of match_rule_supported_vector elsewhere in SuperWord.

eme64 · 2025-11-03T13:59:28Z

src/hotspot/share/opto/vectorization.cpp

+// For now, we use unit cost. We might refine that in the future.
+// If needed, we could also use platform specific costs, if the
+// default here is not accurate enough.
+float VLoopAnalyzer::cost_for_vector_reduction(int opcode, int vlen, BasicType bt, bool requires_strict_order) const {
+  // Each reduction is composed of multiple instructions, each estimated with a unit cost.
+  //                                Linear: shuffle and reduce    Recursive: shuffle and reduce
+  float c = requires_strict_order ? 2 * vlen                    : 2 * exact_log2(vlen);


Thanks for the comment :)

By "unit cost" I mean unit cost per hardware instruction. Reduction ops use multiple instructions, so we count the instructions, and return that count.

Yes, if we move reductions out of the loop, then the reduction node is not in the loop anymore, and instead we have vector accumulators. And then we count the cost of the vector accumulators.

That's why I need methods like VTransformGraph::mark_vtnodes_in_loop to know what nodes are in the loop (the new vector accumulators, and not the reductions if moved out of the loop).

I think I'll improve the comments a little to make that more clear :)

eme64 · 2025-11-03T14:08:58Z

@SirYwell Thanks for the comments and suggestions :)
I sent a small update, hope that helps. And I also sent some GitHub comments that may help additionally to understand some of the small changes.

vnkozlov · 2025-11-04T17:09:05Z

src/hotspot/share/opto/superword.cpp

+    if (_trace._info) {
      tty->print_cr("\nForced bailout of vectorization (AutoVectorizationOverrideProfitability=0).");


Side note. Consider separate RFE to change this to UL for such outputs.

Absolutely. The tricky part is that the current TraceAutoVectorization is a compile command that can be enabled with method name filtering. Is that already available via UL now?

Unfortunately no. I think this is what @anton-seoane worked on before.

Yes, I have taken the task again so sooner than later CompileCommand filtering for UL will be enabled for cases such as this

Ok, that's what I thought. For now, I'll extend the tracing the way I've been doing, and once we have UL available with method-level filtering, then I can migrate it all in one single PR :)

vnkozlov

This looks fine and not complex. I have only nit picks.

vnkozlov · 2025-11-04T17:47:09Z

src/hotspot/share/opto/vectorization.cpp

+#endif
+
+  float sum = 0;
+  for (int j = 0; j < body().body().length(); j++) {


What is body().body() mean?

VLoopAnalyzer (this) has multiple analysis subcomponents. One of them is the VLoopBody, i.e. this->body() / this->_body. And it has access to a GrowableArray body(), which maps the nodes of the loop.

Maybe loopBody().nodes() would sound better here. If you prefer that, I file a separate renaming RFE.

Yes, would be nice if you move body().body() into separate method with comment explaining it. Thanks!

FYI, I filed: JDK-8371391 C2 SuperWord: rename body().body() to something more understandable

vnkozlov · 2025-11-04T17:49:36Z

src/hotspot/share/opto/vectorization.cpp

+}
+
+// Compute the cost over all operations in the (scalar) loop.
+float VLoopAnalyzer::cost() const {


consider renaming it to cost_for_scalar() and cost_for_scalar() to cost_for_scalar_node()

I'll do some renamings to make it explicit which are for nodes, and which for the loop.

eme64 · 2025-11-05T09:47:38Z

@vnkozlov Thanks for reviewing and the suggestions. I renamed some cost functions, and I like it better this way now too :)

galderz

JDK-8370671 C2 SuperWord [x86]: implement Long.max/min reduction for AVX2

This is familiar to me. I discovered this when I was intrinsifying MinL/MaxL for JDK-8307513 and one of my servers only had AX2. Vectorization kicked in with AVX512 so I left it there.

Note: some of the min/max benchmarks are not very stable. That is due to random input data: in some cases the scalar performance is better because it uses branching.

Looking at the results, seems like most instability is with doubles? In any case, on the topic of instability of min/max and branching, #20098 (comment) has a good analysis on past observations with the JMH benchmark now called MinMaxVector. This benchmark shapes the data such that data in the arrays is laid out to achieve a certain % of branch taken. It might not be fully applicable to the instabilities you observe but might help direct attention.

WRT to the code changes in this PR, I don't have anything else to say other than I'm glad basic cases like JDK-8345044 are getting solved.

eme64 · 2025-11-05T12:52:27Z

@galderz Right, I did remember that you have had a better benchmark, and that's why I understood more quickly that the issue here with the doubles is just noise :)

vnkozlov

Good.

merykitty · 2025-11-05T17:23:35Z

src/hotspot/share/opto/vectorization.cpp

+float VLoopAnalyzer::cost_for_vector_reduction_node(int opcode, int vlen, BasicType bt, bool requires_strict_order) const {
+  // Each reduction is composed of multiple instructions, each estimated with a unit cost.
+  //                                Linear: shuffle and reduce    Recursive: shuffle and reduce
+  float c = requires_strict_order ? 2 * vlen                    : 2 * exact_log2(vlen);


Can we ask for the cost of the element-wise opcode here, something like (1 + element_wise_cost) would be more accurate?

To be a little more precise, the strict one should be something like:

vlen * (1 + Matcher::vector_op_pre_select_sz_estimate(Op_Extract, bt, vlen)) + (vlen - 1) * (1 + Matcher::scalar_op_pre_select_sz_estimate(opcode, bt)));

and the non-strict one would be:

float c = Matcher::vector_op_pre_select_sz_estimate(Op_Extract, bt, 2) * 2 + Matcher::scalar_op_pre_select_sz_estimate(opcode) + 3; for (int i = 4; i <= vlen; i *= 2) { c += 2 + Matcher::vector_op_pre_select_sz_estimate(Op_VectorRearrange, bt, i) + Matcher::vector_op_pre_select_sz_estimate(opcode, bt, i); }

Maybe refactoring a little bit to make the Matcher::vector_op_pre_select_sz_estimate less awkward would be welcomed, too. Currently, it returns the estimated size - 1, which is unsettling.

@merykitty Can we do that in a follow-up RFE? For now, I'd like to keep it as simple as possible. Cost-models can become arbitrarily complex. There is a bit of a trade-off between simplicity and accuracy. And we can for sure improve things in the future, this PR just lays the foundation.

My goal here is to start as simple as possible, and then add complexity if there is a proven need for it.

So if you/we can find a benchmark where the cost model is not accurate enough yet, provable by -XX:AutoVectorizationOverrideProfitability=0/2, then we should make it more complex.

Would that be acceptable for you?

What exactly does Matcher::vector_op_pre_select_sz_estimate return? The number of instructions or some kind of throughput estimate?

Personally, I don't want to get too stuck to counting instructions, but rather getting a throughput estimate. Counting instructions is an estimate for throughput, but I don't know yet if longterm it is the best.

I would like to wait a little more, and start depending on the cost model for more and more cases (extract, pack, shuffle, if-conversion, ...) and then we will run into issues along the way where the cost model is not yet accurate enough. And at that point we can think again what would produce the most accurate results.

What exactly does Matcher::vector_op_pre_select_sz_estimate return? The number of instructions or some kind of throughput estimate?

I believe it tries to estimate the number of instructions generated by a node.

I'm filing an RFE now

JDK-8371393
C2 SuperWord: improve cost model

merykitty · 2025-11-05T17:26:27Z

src/hotspot/share/opto/vectorization.cpp

+// If needed, we could also use platform specific costs, if the
+// default here is not accurate enough.
+float VLoopAnalyzer::cost_for_vector_node(int opcode, int vlen, BasicType bt) const {
+  float c = 1;


We have Matcher::vector_op_pre_select_sz_estimate, could it be used here? The corresponding for scalar is Matcher::scalar_op_pre_select_sz_estimate

Same answer as above :)

merykitty · 2025-11-05T17:27:43Z

src/hotspot/share/opto/vectorization.cpp

+// For now, we use unit cost. We might refine that in the future.
+// If needed, we could also use platform specific costs, if the
+// default here is not accurate enough.
+float VLoopAnalyzer::cost_for_scalar_node(int opcode) const {


You need a BasicType parameter for this method, some opcodes are used for multiple kinds of operands.

Will add it :)

Well, I actually tried it right now, and it would take a bit of engineering at the call sites. In quite a few cases the BasicType is not immediately available.

Is it ok if we ignore it for now, and only add it in once we really need it?

merykitty · 2025-11-05T17:30:59Z

src/hotspot/share/opto/vtransform.cpp

+//
+// in_loop: vtn->_idx -> bool
+void VTransformGraph::mark_vtnodes_in_loop(VectorSet& in_loop) const {
+  assert(is_scheduled(), "must already be scheduled");


May I ask if this schedule has already moved unordered reductions like addition out of the loop yet?

optimize happens before schedule. But the unordered reduction is still in the VTransformGraph, and so it is also scheduled. But mark_vtnodes_in_loop will find that the unordered reduction is outside the loop :)

Does that answer your question?

merykitty

Thanks for your replies. I think leaving my suggestions to future RFEs is reasonable.

merykitty · 2025-11-06T08:10:43Z

src/hotspot/share/opto/vectorization.cpp

+float VLoopAnalyzer::cost_for_vector_reduction_node(int opcode, int vlen, BasicType bt, bool requires_strict_order) const {
+  // Each reduction is composed of multiple instructions, each estimated with a unit cost.
+  //                                Linear: shuffle and reduce    Recursive: shuffle and reduce
+  float c = requires_strict_order ? 2 * vlen                    : 2 * exact_log2(vlen);


What exactly does Matcher::vector_op_pre_select_sz_estimate return? The number of instructions or some kind of throughput estimate?

I believe it tries to estimate the number of instructions generated by a node.

eme64 · 2025-11-06T08:19:57Z

@vnkozlov Thanks for reviewing and the approval!
FYI, I filed: JDK-8371391 C2 SuperWord: rename body().body() to something more understandable

@merykitty Thanks a lot for reviewing as well, and the ideas about improving the cost model. There is actually a lot of literature out there about cost models, and various compilers employ various methods. There could be a lot of exciting work in this area, but let's take it step-by-step ;)
FYI, I filed: JDK-8371393 C2 SuperWord: improve cost model

eme64 · 2025-11-10T15:55:20Z

@merykitty @vnkozlov Thank you very much for the reviews!

/integrate

openjdk · 2025-11-10T15:56:56Z

Going to push as commit 72989e0.
Since your change was applied there have been 200 commits pushed to the master branch:

6e838d6: 8371474: Wrong object class or methodID passed to JNI call in TestSharedCloseJvmti with -Xshare:off
2d4f2fd: 8349732: Add support for JARs signed with ML-DSA
1877ff9: 8331195: Improve com.sun.net.httpserver.HttpExchange usability
... and 197 more: https://git.openjdk.org/jdk/compare/c97d50d793df46292e38707956586dfaa4b77d32...master

Your commit was automatically rebased without conflicts.

openjdk · 2025-11-10T15:57:10Z

@eme64 Pushed as commit 72989e0.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

8340093

edf26bb

openjdk bot changed the title ~~8340093~~ 8340093: C2 SuperWord: implement cost model Oct 14, 2025

openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Oct 14, 2025

eme64 added 24 commits October 15, 2025 09:08

add cost to matcher

ce4ce1f

rm old reduction heuristic

8ac7d0a

refactor with is_profitable

57e69df

code transfer wip

30d916f

wip apply code

da3b4b3

wip impl cost for vtn

b7b5ac0

impl more cost methods

49f9242

fix comment

b32afed

scalar cost

a77059f

ignore pointer expression nodes

a8f11c4

zero cost for data scalar nodes that have zero cost

693dcf1

improve documentation and fix test

2a9aba2

fix another test

baa41e4

resolve some todos

24a6c33

resolve more TODOS

5373397

wip reductions IR test

f0d9fa2

linking comment

35eec33

wip test

8e4a2ce

added tests

ed16cf6

wip IR rules

802054a

int ir rules

90691a8

first long ir

8ecbf71

long ir rules

d3dad21

floating add ir test

b825109

eme64 added 4 commits October 29, 2025 11:40

fix aarch64 long mul reduction perf issue

18a8898

rm assert

2bd9c94

fix IR rules for aarch64 NEON

a8d31d7

simplify cost-model impl

3f7ef58

eme64 marked this pull request as ready for review November 3, 2025 12:20

openjdk bot added the rfr Pull request is ready for review label Nov 3, 2025

SirYwell reviewed Nov 3, 2025

View reviewed changes

Update src/hotspot/share/opto/vectorization.cpp

22dab5a

Co-authored-by: Hannes Greule <SirYwell@users.noreply.github.com>

eme64 commented Nov 3, 2025

View reviewed changes

More comments for SirYwell

d79df4f

vnkozlov reviewed Nov 4, 2025

View reviewed changes

rename cost methods for Vladimir K

23906b8

galderz reviewed Nov 5, 2025

View reviewed changes

vnkozlov approved these changes Nov 5, 2025

View reviewed changes

openjdk bot added the ready Pull request is ready to be integrated label Nov 5, 2025

merykitty reviewed Nov 5, 2025

View reviewed changes

merykitty approved these changes Nov 6, 2025

View reviewed changes

openjdk bot added the integrated Pull request has been integrated label Nov 10, 2025

openjdk bot closed this Nov 10, 2025

openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Nov 10, 2025

		if (_trace._info) {
		tty->print_cr("\nForced bailout of vectorization (AutoVectorizationOverrideProfitability=0).");

8340093: C2 SuperWord: implement cost model #27803

8340093: C2 SuperWord: implement cost model #27803

Conversation

eme64 commented Oct 14, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewers

Reviewing

Uh oh!

bridgekeeper bot commented Oct 14, 2025

Uh oh!

openjdk bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mlbridge bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

SirYwell left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eme64 commented Nov 3, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eme64 commented Nov 5, 2025

Uh oh!

galderz left a comment

Choose a reason for hiding this comment

Uh oh!

eme64 commented Nov 5, 2025

Uh oh!

vnkozlov left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

eme64 commented Oct 14, 2025 •

edited by openjdk bot

Loading

openjdk bot commented Oct 14, 2025 •

edited

Loading

openjdk bot commented Oct 14, 2025 •

edited

Loading

mlbridge bot commented Nov 3, 2025 •

edited

Loading

eme64 commented Nov 6, 2025 •

edited

Loading