Skip to content

Conversation

@archiecobbs
Copy link
Contributor

@archiecobbs archiecobbs commented May 14, 2025

Please review this small performance tweak ArrayDeque.

ArrayDeque has an invariant in which any unused elements in the array must be null. In a couple of places, the code is setting contiguous ranges of elements to null using for() loops. This can be both simplified and sped up by using Arrays.fill() instead.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8356993: ArrayDeque should use Arrays.fill() instead of for() loops (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/25237/head:pull/25237
$ git checkout pull/25237

Update a local copy of the PR:
$ git checkout pull/25237
$ git pull https://git.openjdk.org/jdk.git pull/25237/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 25237

View PR using the GUI difftool:
$ git pr show -t 25237

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/25237.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented May 14, 2025

👋 Welcome back acobbs! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented May 14, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the rfr Pull request is ready for review label May 14, 2025
@openjdk
Copy link

openjdk bot commented May 14, 2025

@archiecobbs The following label will be automatically applied to this pull request:

  • core-libs

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the core-libs core-libs-dev@openjdk.org label May 14, 2025
@mlbridge
Copy link

mlbridge bot commented May 14, 2025

Webrevs

@AlanBateman
Copy link
Contributor

Are you planning to add some JMH benchmarks to go with this?

@archiecobbs
Copy link
Contributor Author

Are you planning to add some JMH benchmarks to go with this?

I wasn't planning to, but I'm inferring from your question that you'd prefer to see one.

Which also makes me curious. I'd be shocked if this were slower, but even if not, I wonder how much faster it would be.

I will work on creating one.

@RogerRiggs
Copy link
Contributor

I'm curious to know whether C2 turns the loop into a vectorized operation. The Arrays.fill might be more expressive, but not necessarily faster.

@archiecobbs
Copy link
Contributor Author

I added a benchmark to the PR (hopefully I did that right).

It shows a decrease in performance. I have no idea why. I did this on my laptop so who knows, but if the effect is real then it kind of raises a lot of larger questions.

jdk-25+22-94-g0318e49500e (master):
Benchmark                                       Mode  Cnt   Score   Error  Units
ArrayDeque.ClearBenchmarkTestJMH.fillAndClear  thrpt   50  37.064 ± 0.225  ops/s

jdk-25+22-95-g84fb0903be0 (JDK-8356993):
Benchmark                                       Mode  Cnt   Score   Error  Units
ArrayDeque.ClearBenchmarkTestJMH.fillAndClear  thrpt   50  35.528 ± 0.180  ops/s

@Benchmark
@Measurement(iterations = 10)
@Warmup(iterations = 3)
public void fillAndClear() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to return the collection or send it to a BlackHole

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to return the collection or send it to a BlackHole

I'm fairly new to the benchmark game so I would not be surprised if this is broken.

Previously I was adding them to a list but that caused OOMs.

Can you clarify what you mean? By 'return' do you just mean returning the deque from the method? Also I don't konw what a BlackHole is.

Apologies for not knowing what I'm doing here. Thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are many exmaples on how to correctly use JMH.

A blackhole prevents the compiler to optimize away your code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are many exmaples on how to correctly use JMH.

A blackhole prevents the compiler to optimize away your code.

Thanks for the tip. FWIW after doing that the numbers came out about the same - which is not surprising given that Arrays.fill() is just the same for() loop...

#2 - After adding Blackhole

jdk-25+22-94-g0318e49500e (master):
Benchmark                                       Mode  Cnt   Score   Error  Units
ArrayDeque.ClearBenchmarkTestJMH.fillAndClear  thrpt   50  35.663 ± 0.163  ops/s

jdk-25+22-97-g9f0c5fe1f90 (JDK-8356993):
Benchmark                                       Mode  Cnt   Score   Error  Units
ArrayDeque.ClearBenchmarkTestJMH.fillAndClear  thrpt   50  35.112 ± 0.501  ops/s

@ExE-Boss
Copy link

Note that Arrays.fill(…) is simply a for(…) loop with an additional range check and is potentially subject to profile pollution due to JDK‑8015417:

public static void fill(Object[] a, int fromIndex, int toIndex, Object val) {
rangeCheck(a.length, fromIndex, toIndex);
for (int i = fromIndex; i < toIndex; i++)
a[i] = val;
}

@archiecobbs
Copy link
Contributor Author

Note that Arrays.fill(…) is simply a for(…) loop with an additional range check

Interesting... I was assuming that most of the "bulk" methods in Arrays were being hand-optimized with special hardware magic (e.g., vector instructions), and that the opportunity to do this was part of the motivation for adding them in the first place.

If C2 is already able to automatically optimize this into the maximum possible hardware performance, then great! But is that actually the case?

@archiecobbs
Copy link
Contributor Author

I'm closing the PR because it's gone into low-level optimization details that are beyond me.

However I'm still unclear on whether bulk memory set operations are being fully optimized. By that I mean doing something like this on arm64 at least. Any insights from the experts would be appreciated.

Thanks for the interesting discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core-libs core-libs-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

6 participants