Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7950: Better DisjointBuilder performance for events not ordered by event end attribute #449

Conversation

richardstartin
Copy link
Contributor

@richardstartin richardstartin commented Oct 30, 2022

When events are not disjoint and sorted by the end quantity, a lot of time can be spent in DisjointBuilder.add. Profiling shows that this is because of the linear search for the first lane ending before the added event starts, and then in the sort afterwards.

This patch uses binary search to improve the handling of non-disjoint or non-sorted cases. It also adds checks whether the modification will disrupt descending order to avoid doing any work reordering the array.

With a parameterised benchmark exercising some pathological cases:

import org.openjdk.jmc.common.item.IMemberAccessor;
import org.openjdk.jmc.common.unit.IQuantity;
import org.openjdk.jmc.common.unit.UnitLookup;
import org.openjdk.jmc.flightrecorder.internal.util.DisjointBuilder;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Level;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;

import java.util.concurrent.ThreadLocalRandom;

@State(Scope.Benchmark)
public class DisjointBuilderBenchmark {

    private static class RangeObject {
        private final IQuantity start;
        private final IQuantity end;

        RangeObject(IQuantity start, IQuantity end) {
            this.start = start;
            this.end = end;
        }

    }

    private final static IMemberAccessor<IQuantity, RangeObject> START = inObject -> inObject.start;

    private final static IMemberAccessor<IQuantity, RangeObject> END = inObject -> inObject.end;

    public enum Scenario {
        ASCENDING {
            @Override
            void fill(RangeObject[] rangeObjects) {
                long min = 0;
                long max = 1;
                for (int i = 0; i < rangeObjects.length; i++) {
                    rangeObjects[i] = new RangeObject(UnitLookup.NUMBER_UNITY.quantity(min),
                            UnitLookup.NUMBER_UNITY.quantity(max));
                    min++;
                    max++;
                }
            }
        },
        DESCENDING {
            @Override
            void fill(RangeObject[] rangeObjects) {
                long min = 2L * (rangeObjects.length - 1);
                long max = 2L * rangeObjects.length - 1;
                for (int i = 0; i < rangeObjects.length; i++) {
                    rangeObjects[i] = new RangeObject(UnitLookup.NUMBER_UNITY.quantity(min),
                            UnitLookup.NUMBER_UNITY.quantity(max));
                    min--;
                    max--;
                }
            }
        },
        SCRAMBLED {
            @Override
            void fill(RangeObject[] rangeObjects) {
                ASCENDING.fill(rangeObjects);
                shuffle(rangeObjects);
            }

            private void shuffle(RangeObject[] data) {
                for (int i = data.length; i > 1; i--) {
                    swap(data, i - 1, ThreadLocalRandom.current().nextInt(i));
                }
            }

            private void swap(RangeObject[] arr, int i, int j) {
                RangeObject tmp = arr[i];
                arr[i] = arr[j];
                arr[j] = tmp;
            }

        },
        OVERLAPS {
            @Override
            void fill(RangeObject[] rangeObjects) {
                long min = 0;
                long max = 2;
                for (int i = 0; i < rangeObjects.length; i++) {
                    rangeObjects[i] = new RangeObject(UnitLookup.NUMBER_UNITY.quantity(min),
                            UnitLookup.NUMBER_UNITY.quantity(max));
                    min++;
                    max++;
                }
            }
        },
        STACKED {
            @Override
            void fill(RangeObject[] rangeObjects) {
                long min = 0;
                long max = 2L * rangeObjects.length;
                for (int i = 0; i < rangeObjects.length; i++) {
                    rangeObjects[i] = new RangeObject(UnitLookup.NUMBER_UNITY.quantity(min),
                            UnitLookup.NUMBER_UNITY.quantity(max));
                    min++;
                    max--;
                }
            }
        },
        SHALLOW_STACKS {
            @Override
            void fill(RangeObject[] rangeObjects) {
                int stackDepth = 5;
                long min = 0;
                long max = 2 * stackDepth;
                for (int i = 0; i < rangeObjects.length; i++) {
                    rangeObjects[i] = new RangeObject(UnitLookup.NUMBER_UNITY.quantity(min),
                            UnitLookup.NUMBER_UNITY.quantity(max));
                    min++;
                    max--;
                    if (min == max) {
                        max += 2 * stackDepth;
                    }
                }
            }
        };

        abstract void fill(RangeObject[] rangeObjects);
    }

    @Param({"ASCENDING", "DESCENDING", "SCRAMBLED", "OVERLAPS", "STACKED", "SHALLOW_STACKS"})
    Scenario scenario;

    @Param({"10000"})
    int numEvents;
    private RangeObject[] input;

    @Setup(Level.Trial)
    public void setup() {
        input = new RangeObject[numEvents];
        scenario.fill(input);
    }

    @Benchmark
    public DisjointBuilder<RangeObject> build() {
        DisjointBuilder<RangeObject> builder = new DisjointBuilder<>(START, END);
        for (RangeObject rangeObject : input) {
            builder.add(rangeObject);
        }
        return builder;
    }
}

Good performance is maintained in the common ascending/disjoint case, but the degenerate cases perform much better:

Before:

Benchmark                       (numEvents)      (scenario)  Mode  Cnt       Score      Error  Units
DisjointBuilderBenchmark.build        10000       ASCENDING  avgt    5      96.050 ±    1.240  us/op
DisjointBuilderBenchmark.build        10000      DESCENDING  avgt    5  300246.685 ± 2101.505  us/op
DisjointBuilderBenchmark.build        10000       SCRAMBLED  avgt    5    3977.733 ±   29.612  us/op
DisjointBuilderBenchmark.build        10000        OVERLAPS  avgt    5     187.701 ±    6.161  us/op
DisjointBuilderBenchmark.build        10000         STACKED  avgt    5  268306.481 ±  980.223  us/op
DisjointBuilderBenchmark.build        10000  SHALLOW_STACKS  avgt    5     414.298 ±    5.719  us/op

After:

Benchmark                       (numEvents)      (scenario)  Mode  Cnt    Score    Error  Units
DisjointBuilderBenchmark.build        10000       ASCENDING  avgt    5   91.583 ±  1.070  us/op
DisjointBuilderBenchmark.build        10000      DESCENDING  avgt    5  608.281 ±  9.338  us/op
DisjointBuilderBenchmark.build        10000       SCRAMBLED  avgt    5  596.601 ± 18.243  us/op
DisjointBuilderBenchmark.build        10000        OVERLAPS  avgt    5  181.349 ±  1.012  us/op
DisjointBuilderBenchmark.build        10000         STACKED  avgt    5  555.167 ± 13.316  us/op
DisjointBuilderBenchmark.build        10000  SHALLOW_STACKS  avgt    5  353.700 ±  9.387  us/op

Progress

  • Commit message must refer to an issue
  • Change must be properly reviewed (1 review required, with at least 1 Committer)

Issue

  • JMC-7950: Better DisjointBuilder performance for events not ordered by event end attribute

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jmc pull/449/head:pull/449
$ git checkout pull/449

Update a local copy of the PR:
$ git checkout pull/449
$ git pull https://git.openjdk.org/jmc pull/449/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 449

View PR using the GUI difftool:
$ git pr show -t 449

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jmc/pull/449.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 30, 2022

👋 Welcome back richardstartin! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@richardstartin richardstartin marked this pull request as ready for review October 31, 2022 09:54
@richardstartin richardstartin changed the title better DisjointBuilder performance for overlapping or unordered events 7950: better DisjointBuilder performance for overlapping or unordered events Oct 31, 2022
@openjdk openjdk bot added the rfr label Oct 31, 2022
@mlbridge
Copy link

mlbridge bot commented Oct 31, 2022

Webrevs

@thegreystone thegreystone changed the title 7950: better DisjointBuilder performance for overlapping or unordered events 7950: Better DisjointBuilder performance for overlapping or unordered events Oct 31, 2022
@richardstartin richardstartin changed the title 7950: Better DisjointBuilder performance for overlapping or unordered events 7950: Better DisjointBuilder performance for events not ordered by event end attribute Oct 31, 2022
@egahlin
Copy link
Member

egahlin commented Nov 9, 2022

When I wrote the original code in 2010 collisions were rare. I think I saw a few cases of 2-3 lanes (for the same event type and thread), typically when doing recursive events. I wonder what has changed, if this is no longer a case? The JVM usually doesn't write overlapping events. Is it thread buffers coming out of order?

@richardstartin
Copy link
Contributor Author

richardstartin commented Nov 9, 2022

When I wrote the original code in 2010 collisions were rare. I think I saw a few cases of 2-3 lanes (for the same event type and thread), typically when doing recursive events. I wonder what has changed, if this is no longer a case? The JVM usually doesn't write overlapping events. Is it thread buffers coming out of order?

In our use case this method gets very hot sporadically throughout the day and it depends on the recordings. This tends to show up with custom JFR events which can be committed in any order, but I believe @Gunde has some pathological profiles with only built in events. Having spoken to @Gunde internally, the worst case behaviour of this method is a "known issue" but we're also aware that the one-lane optimisation is very effective for the majority of JFR profiles, so it's important not to regress on common cases.

@Gunde
Copy link
Collaborator

Gunde commented Nov 9, 2022

When I wrote the original code in 2010 collisions were rare. I think I saw a few cases of 2-3 lanes (for the same event type and thread), typically when doing recursive events. I wonder what has changed, if this is no longer a case? The JVM usually doesn't write overlapping events. Is it thread buffers coming out of order?

@egahlin As far as I've investigated the issue is prevalent in JVM applications running with Amazon Kinesis. It can produce tens of thousands of overlapping File Read events.

@egahlin
Copy link
Member

egahlin commented Nov 9, 2022

When I wrote the original code in 2010 collisions were rare. I think I saw a few cases of 2-3 lanes (for the same event type and thread), typically when doing recursive events. I wonder what has changed, if this is no longer a case? The JVM usually doesn't write overlapping events. Is it thread buffers coming out of order?

@egahlin As far as I've investigated the issue is prevalent in JVM applications running with Amazon Kinesis. It can produce tens of thousands of overlapping File Read events.

Interesting. I took a look at the FileRead events in the JDK I think I found a bug that could explain the issue:

    try {
        start = EventConfiguration.timestamp();
        write(b);
        bytesWritten = b.length;
    } finally {
        long duration = EventConfiguration.timestamp();
        if (eventConfiguration.shouldCommit(duration)) {
            FileWriteEvent.commit(start, duration, path, bytesWritten);
        }
    }

It should be:

long duration = EventConfiguration.timestamp() - start;

This regression was introduced in JDK 15.
https://hg.openjdk.java.net/jdk/jdk/rev/dd0caf00b05c

I will prepare a fix. Should be backported.

The fix to JMC still makes sense.

@openjdk
Copy link

openjdk bot commented Nov 19, 2022

@richardstartin This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

7950: Better DisjointBuilder performance for events not ordered by event end attribute

Reviewed-by: jbachorik, hdafgard

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 5 new commits pushed to the master branch:

  • 9ba238e: 7977: Maximum Young Gen Size is displaying the min in GC config Page
  • 761797e: 7944: Reduce allocations of Double during chunk loading
  • 3d80c40: 7946: core/common.test is not running any tests
  • a4ee558: 7945: Avoid throwing NoSuchFieldException in ValueReaders$ReflectiveReader
  • bd35870: 7861: Bump Agent build to Java 11 compliance and cleanup POM

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@Gunde) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready label Nov 19, 2022
@richardstartin
Copy link
Contributor Author

/integrate

@openjdk openjdk bot added the sponsor label Nov 19, 2022
@openjdk
Copy link

openjdk bot commented Nov 19, 2022

@richardstartin
Your change (at version fe81e99) is now ready to be sponsored by a Committer.

@Gunde
Copy link
Collaborator

Gunde commented Nov 19, 2022

/sponsor

@openjdk
Copy link

openjdk bot commented Nov 19, 2022

Going to push as commit 5c497ba.
Since your change was applied there have been 5 commits pushed to the master branch:

  • 9ba238e: 7977: Maximum Young Gen Size is displaying the min in GC config Page
  • 761797e: 7944: Reduce allocations of Double during chunk loading
  • 3d80c40: 7946: core/common.test is not running any tests
  • a4ee558: 7945: Avoid throwing NoSuchFieldException in ValueReaders$ReflectiveReader
  • bd35870: 7861: Bump Agent build to Java 11 compliance and cleanup POM

Your commit was automatically rebased without conflicts.

@openjdk
Copy link

openjdk bot commented Nov 19, 2022

@Gunde @richardstartin Pushed as commit 5c497ba.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants