Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype inline cursors for List #5

Closed
wants to merge 2 commits into from

Conversation

RogerRiggs
Copy link
Collaborator

@RogerRiggs RogerRiggs commented Mar 26, 2020

Implementation of Cursors and jmh tests comparing
typical List traversal via direct index, iterator,
inline cursor, and an iterator implemented on top of cursor.

Sample results:

XArrayListCursorTest.getViaArray            100000  avgt    5  507793.484  7086.038  ns/op
XArrayListCursorTest.getViaCursorForLoop    100000  avgt    5  656461.958  52488.547  ns/op
XArrayListCursorTest.getViaCursorWhileLoop  100000  avgt    5  641963.323  32219.409  ns/op
XArrayListCursorTest.getViaIterator         100000  avgt    5  558863.817  23539.256  ns/op
XArrayListCursorTest.getViaIteratorCurs     100000  avgt    5  733161.466  33721.881  ns/op

Progress

  • Change must not contain extraneous whitespace

Download

$ git fetch https://git.openjdk.java.net/valhalla pull/5/head:pull/5
$ git checkout pull/5

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Mar 26, 2020

👋 Welcome back rriggs! A progress list of the required criteria for merging this PR into lworld will be added to the body of your pull request.

@openjdk
Copy link

@openjdk openjdk bot commented Mar 26, 2020

@RogerRiggs This change now passes all automated pre-integration checks, type /integrate in a new comment to proceed. After integration, the commit message will be:

Prototype inline cursors for List
  • If you would like to add a summary, use the /summary command.
  • To credit additional contributors, use the /contributor command.
  • To add additional solved issues, use the /solves command.

Since the source branch of this PR was last updated there have been 6 commits pushed to the lworld branch. Since there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid automatic rebasing, please merge lworld into your branch, and then specify the current head hash when integrating, like this: /integrate d544758b986f909f355611fca3d9fc4034ef245d.

➡️ To integrate this PR with the above commit message, type /integrate in a new comment.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Mar 26, 2020

Webrevs

@mlbridge
Copy link

@mlbridge mlbridge bot commented Mar 26, 2020

Mailing list message from Brian Goetz on valhalla-dev:

These don't look so good!? I suspect that the Iterator is performing so
well because it's getting good EA, but not sure why cursors aren't doing
better?? Guess we'll have to dig into the generated code....

On 3/26/2020 1:14 PM, Roger Riggs wrote:

Implementation of Cursors and jmh tests comparing
typical List traversal via direct index, iterator,
inline cursor, and an iterator implemented on top of cursor.

Sample results:

                                \(size\)  Mode  Cnt       Score       Error  Units

XArrayListCursorTest.getViaArray 100000 avgt 5 507793.484 7086.038 ns/op
XArrayListCursorTest.getViaCursorForLoop 100000 avgt 5 656461.958 52488.547 ns/op
XArrayListCursorTest.getViaCursorWhileLoop 100000 avgt 5 641963.323 32219.409 ns/op
XArrayListCursorTest.getViaIterator 100000 avgt 5 558863.817 23539.256 ns/op
XArrayListCursorTest.getViaIteratorCurs 100000 avgt 5 733161.466 33721.881 ns/op

-------------

Commit messages:
- Prototype inline cursors for List

Changes: https://git.openjdk.java.net/valhalla/pull/5/files
Webrev: https://webrevs.openjdk.java.net/valhalla/5/webrev.00
Stats: 2139 lines in 3 files changed: 2139 ins; 0 del; 0 mod
Patch: https://git.openjdk.java.net/valhalla/pull/5.diff
Fetch: git fetch https://git.openjdk.java.net/valhalla pull/5/head:pull/5

PR: https://git.openjdk.java.net/valhalla/pull/5

@mlbridge
Copy link

@mlbridge mlbridge bot commented Mar 26, 2020

Mailing list message from Paul Sandoz on valhalla-dev:

IMHO you need to look at the hotspots of the generated code to get more insights as to why the cursor is not on par with the EA?ed iterator.

Use dtraceasm on the Mac, or perfasm on Linux, it's great!

Paul.

On Mar 26, 2020, at 10:14 AM, Roger Riggs <rriggs at openjdk.java.net> wrote:

Implementation of Cursors and jmh tests comparing
typical List traversal via direct index, iterator,
inline cursor, and an iterator implemented on top of cursor.

Sample results:

                              \(size\)  Mode  Cnt       Score       Error  Units

XArrayListCursorTest.getViaArray 100000 avgt 5 507793.484 7086.038 ns/op
XArrayListCursorTest.getViaCursorForLoop 100000 avgt 5 656461.958 52488.547 ns/op
XArrayListCursorTest.getViaCursorWhileLoop 100000 avgt 5 641963.323 32219.409 ns/op
XArrayListCursorTest.getViaIterator 100000 avgt 5 558863.817 23539.256 ns/op
XArrayListCursorTest.getViaIteratorCurs 100000 avgt 5 733161.466 33721.881 ns/op

-------------

Commit messages:
- Prototype inline cursors for List

Changes: https://git.openjdk.java.net/valhalla/pull/5/files
Webrev: https://webrevs.openjdk.java.net/valhalla/5/webrev.00
Stats: 2139 lines in 3 files changed: 2139 ins; 0 del; 0 mod
Patch: https://git.openjdk.java.net/valhalla/pull/5.diff
Fetch: git fetch https://git.openjdk.java.net/valhalla pull/5/head:pull/5

PR: https://git.openjdk.java.net/valhalla/pull/5

@mlbridge
Copy link

@mlbridge mlbridge bot commented Mar 26, 2020

Mailing list message from Remi Forax on valhalla-dev:

Hi Roger,
I think implementing remove() on a cursor is a bad idea, since we have introduced Collection.removeIf() in 8 the main use case of remove() has disappeared and it's still implemented by Iterator if someone really want it.
In term of performance, it makes a huge difference because you can now implement a snapshot at the beginning semantics, you don't have to be aware if the XArrayList changes and you don't have to propagate back the structural change done by remove(), so basically you can avoid the check for ConcurrentModification.

With that, in the cursor, you can capture the array and the size when creating it instead of capturing a reference to the enclosing class (XArrayList.this), it should be a little more efficient.
I believe you can also remove the try/catch + rethrow and use Objects.checkIndex instead.

If everything goes right, it should be has efficient as doing a for loop on the array.

regards,
R?mi

----- Mail original -----

De: "Roger Riggs" <rriggs at openjdk.java.net>
?: "valhalla-dev" <valhalla-dev at openjdk.java.net>
Envoy?: Jeudi 26 Mars 2020 18:14:10
Objet: [lworld] RFR: Prototype inline cursors for List

Implementation of Cursors and jmh tests comparing
typical List traversal via direct index, iterator,
inline cursor, and an iterator implemented on top of cursor.

Sample results:

                              \(size\)  Mode  Cnt       Score       Error  Units

XArrayListCursorTest.getViaArray 100000 avgt 5 507793.484
7086.038 ns/op
XArrayListCursorTest.getViaCursorForLoop 100000 avgt 5 656461.958
52488.547 ns/op
XArrayListCursorTest.getViaCursorWhileLoop 100000 avgt 5 641963.323
32219.409 ns/op
XArrayListCursorTest.getViaIterator 100000 avgt 5 558863.817
23539.256 ns/op
XArrayListCursorTest.getViaIteratorCurs 100000 avgt 5 733161.466
33721.881 ns/op

-------------

Commit messages:
- Prototype inline cursors for List

Changes: https://git.openjdk.java.net/valhalla/pull/5/files
Webrev: https://webrevs.openjdk.java.net/valhalla/5/webrev.00
Stats: 2139 lines in 3 files changed: 2139 ins; 0 del; 0 mod
Patch: https://git.openjdk.java.net/valhalla/pull/5.diff
Fetch: git fetch https://git.openjdk.java.net/valhalla pull/5/head:pull/5

PR: https://git.openjdk.java.net/valhalla/pull/5

@mlbridge
Copy link

@mlbridge mlbridge bot commented Mar 26, 2020

Mailing list message from Roger Riggs on valhalla-dev:

Hi Brian,

I've looked at the code and its not a lot different in the number of
fetches and conditional branches.
I'll be looking into it further.

Roger

On 3/26/20 1:45 PM, Brian Goetz wrote:

These don't look so good!? I suspect that the Iterator is performing
so well because it's getting good EA, but not sure why cursors aren't
doing better?? Guess we'll have to dig into the generated code....

On 3/26/2020 1:14 PM, Roger Riggs wrote:

Implementation of Cursors and jmh tests comparing
typical List traversal via direct index, iterator,
inline cursor, and an iterator implemented on top of cursor.

Sample results:

??????????????????????????????????? (size)? Mode? Cnt Score??????
Error? Units
XArrayListCursorTest.getViaArray??????????? 100000? avgt??? 5
507793.484? 7086.038? ns/op
XArrayListCursorTest.getViaCursorForLoop??? 100000? avgt??? 5
656461.958? 52488.547? ns/op
XArrayListCursorTest.getViaCursorWhileLoop? 100000? avgt??? 5
641963.323? 32219.409? ns/op
XArrayListCursorTest.getViaIterator???????? 100000? avgt??? 5
558863.817? 23539.256? ns/op
XArrayListCursorTest.getViaIteratorCurs???? 100000? avgt??? 5
733161.466? 33721.881? ns/op

-------------

Commit messages:
? - Prototype inline cursors for List

Changes: https://git.openjdk.java.net/valhalla/pull/5/files
? Webrev: https://webrevs.openjdk.java.net/valhalla/5/webrev.00
?? Stats: 2139 lines in 3 files changed: 2139 ins; 0 del; 0 mod
?? Patch: https://git.openjdk.java.net/valhalla/pull/5.diff
?? Fetch: git fetch https://git.openjdk.java.net/valhalla
pull/5/head:pull/5

PR: https://git.openjdk.java.net/valhalla/pull/5

@mlbridge
Copy link

@mlbridge mlbridge bot commented Mar 26, 2020

Mailing list message from Roger Riggs on valhalla-dev:

Hi Remi,

On 3/26/20 1:48 PM, Remi Forax wrote:

Hi Roger,
I think implementing remove() on a cursor is a bad idea, since we have introduced Collection.removeIf() in 8 the main use case of remove() has disappeared and it's still implemented by Iterator if someone really want it.
Possibly, I was trying out what it takes for parity with Iterator.
In term of performance, it makes a huge difference because you can now implement a snapshot at the beginning semantics, you don't have to be aware if the XArrayList changes and you don't have to propagate back the structural change done by remove(), so basically you can avoid the check for ConcurrentModification.
Yes simplifying the semantics would help, but that would not be an
ArrayList but something else.

Part of the exercise was to see where inline classes can subsitute for
(assumed) more expensive identity classes.

With that, in the cursor, you can capture the array and the size when creating it instead of capturing a reference to the enclosing class (XArrayList.this), it should be a little more efficient.
I believe you can also remove the try/catch + rethrow and use Objects.checkIndex instead.
Unless the VM can eliminate the checks the implicit array range check
should be cheaper/free.
Or I misunderstand how try/catch is implemented by the vm.

If everything goes right, it should be has efficient as doing a for loop on the array.
Yep, that's the idea.

The cost avoidance of not doing allocation is hard to measure.

In some previous experiements, the extra data that had to be moved around
wiped out the savings by not having an identity object.

I've looked at the code with JMH and perfasm and
I'll be looking more closely at the performance and any help is appreciated.

Thanks, Roger

regards,
R?mi

----- Mail original -----

De: "Roger Riggs" <rriggs at openjdk.java.net>
?: "valhalla-dev" <valhalla-dev at openjdk.java.net>
Envoy?: Jeudi 26 Mars 2020 18:14:10
Objet: [lworld] RFR: Prototype inline cursors for List
Implementation of Cursors and jmh tests comparing
typical List traversal via direct index, iterator,
inline cursor, and an iterator implemented on top of cursor.

Sample results:

                               \(size\)  Mode  Cnt       Score       Error  Units

XArrayListCursorTest.getViaArray 100000 avgt 5 507793.484
7086.038 ns/op
XArrayListCursorTest.getViaCursorForLoop 100000 avgt 5 656461.958
52488.547 ns/op
XArrayListCursorTest.getViaCursorWhileLoop 100000 avgt 5 641963.323
32219.409 ns/op
XArrayListCursorTest.getViaIterator 100000 avgt 5 558863.817
23539.256 ns/op
XArrayListCursorTest.getViaIteratorCurs 100000 avgt 5 733161.466
33721.881 ns/op

-------------

Commit messages:
- Prototype inline cursors for List

Changes: https://git.openjdk.java.net/valhalla/pull/5/files
Webrev: https://webrevs.openjdk.java.net/valhalla/5/webrev.00
Stats: 2139 lines in 3 files changed: 2139 ins; 0 del; 0 mod
Patch: https://git.openjdk.java.net/valhalla/pull/5.diff
Fetch: git fetch https://git.openjdk.java.net/valhalla pull/5/head:pull/5

PR: https://git.openjdk.java.net/valhalla/pull/5

@mlbridge
Copy link

@mlbridge mlbridge bot commented Mar 26, 2020

Mailing list message from Remi Forax on valhalla-dev:

----- Mail original -----

De: "Roger Riggs" <Roger.Riggs at oracle.com>
?: "valhalla-dev" <valhalla-dev at openjdk.java.net>
Envoy?: Jeudi 26 Mars 2020 19:14:54
Objet: Re: [lworld] RFR: Prototype inline cursors for List

Hi Remi,

Hi Roger,

On 3/26/20 1:48 PM, Remi Forax wrote:

Hi Roger,
I think implementing remove() on a cursor is a bad idea, since we have
introduced Collection.removeIf() in 8 the main use case of remove() has
disappeared and it's still implemented by Iterator if someone really want it.
Possibly, I was trying out what it takes for parity with Iterator.
In term of performance, it makes a huge difference because you can now implement
a snapshot at the beginning semantics, you don't have to be aware if the
XArrayList changes and you don't have to propagate back the structural change
done by remove(), so basically you can avoid the check for
ConcurrentModification.
Yes simplifying the semantics would help, but that would not be an
ArrayList but something else.

Part of the exercise was to see where inline classes can subsitute for
(assumed) more expensive identity classes.

With that, in the cursor, you can capture the array and the size when creating
it instead of capturing a reference to the enclosing class (XArrayList.this),
it should be a little more efficient.
I believe you can also remove the try/catch + rethrow and use Objects.checkIndex
instead.
Unless the VM can eliminate the checks the implicit array range check
should be cheaper/free.
Or I misunderstand how try/catch is implemented by the vm.

yes, i hope the range check to be fully removed.

If everything goes right, it should be has efficient as doing a for loop on the
array.
Yep, that's the idea.

The cost avoidance of not doing allocation is hard to measure.

especially when the escape analysis works

In some previous experiements, the extra data that had to be moved around
wiped out the savings by not having an identity object.

yes, when the cost of copy in a field or in an array is greater than the cost of dereferencing a pointer but here everything should be on the stack, so no trade-of.

I've looked at the code with JMH and perfasm and
I'll be looking more closely at the performance and any help is appreciated.

Here my implementation
https://github.com/forax/valuetype-lworld/blob/master/src/main/java/fr.umlv.valuetype/fr/umlv/valuetype/xlist/XArrayList.java#L1130
BTW, i believe your implementation has a bug, the index can overflow in advance().

with mostly the same test, i use microseconds so the results are more readable and a 3 seconds duration to have more stable results
https://github.com/forax/valuetype-lworld/blob/master/src/test/java/fr.umlv.valuetype/fr/umlv/valuetype/perf/XArrayListCursorBenchMark.java#L21

Cursors are now faster than iterators and a loop + get().

Benchmark (size) Mode Cnt Score Error Units
XArrayListCursorBenchMark.getViaArray 100000 avgt 5 345.798 ? 3.048 us/op
XArrayListCursorBenchMark.getViaCursorForLoop 100000 avgt 5 278.712 ? 2.501 us/op
XArrayListCursorBenchMark.getViaCursorWhileLoop 100000 avgt 5 278.281 ? 3.816 us/op
XArrayListCursorBenchMark.getViaIterator 100000 avgt 5 323.989 ? 4.374 us/op
XArrayListCursorBenchMark.getViaIteratorCurs 100000 avgt 5 312.738 ? 1.848 us/op

Thanks, Roger

cheers,
R?mi

regards,
R?mi

----- Mail original -----

De: "Roger Riggs" <rriggs at openjdk.java.net>
?: "valhalla-dev" <valhalla-dev at openjdk.java.net>
Envoy?: Jeudi 26 Mars 2020 18:14:10
Objet: [lworld] RFR: Prototype inline cursors for List
Implementation of Cursors and jmh tests comparing
typical List traversal via direct index, iterator,
inline cursor, and an iterator implemented on top of cursor.

Sample results:

                               \(size\)  Mode  Cnt       Score       Error  Units

XArrayListCursorTest.getViaArray 100000 avgt 5 507793.484
7086.038 ns/op
XArrayListCursorTest.getViaCursorForLoop 100000 avgt 5 656461.958
52488.547 ns/op
XArrayListCursorTest.getViaCursorWhileLoop 100000 avgt 5 641963.323
32219.409 ns/op
XArrayListCursorTest.getViaIterator 100000 avgt 5 558863.817
23539.256 ns/op
XArrayListCursorTest.getViaIteratorCurs 100000 avgt 5 733161.466
33721.881 ns/op

-------------

Commit messages:
- Prototype inline cursors for List

Changes: https://git.openjdk.java.net/valhalla/pull/5/files
Webrev: https://webrevs.openjdk.java.net/valhalla/5/webrev.00
Stats: 2139 lines in 3 files changed: 2139 ins; 0 del; 0 mod
Patch: https://git.openjdk.java.net/valhalla/pull/5.diff
Fetch: git fetch https://git.openjdk.java.net/valhalla pull/5/head:pull/5

PR: https://git.openjdk.java.net/valhalla/pull/5

@mlbridge
Copy link

@mlbridge mlbridge bot commented Mar 26, 2020

Mailing list message from Roger Riggs on valhalla-dev:

Hi Remi,

yes, with those modifications its faster, but its not a Cursor for ArrayList
since CME is possible and not checked.
That would be appropriate for CopyOnWriteArraylist for example.

We also have a chance to create new implementations of List, etc.

I do want to understand where the pitfalls are when attempting
to improve implementations by replacing with inline classes.

On 3/26/20 3:38 PM, Remi Forax wrote:

----- Mail original -----

De: "Roger Riggs" <Roger.Riggs at oracle.com>
?: "valhalla-dev" <valhalla-dev at openjdk.java.net>
Envoy?: Jeudi 26 Mars 2020 19:14:54
Objet: Re: [lworld] RFR: Prototype inline cursors for List
Hi Remi,
Hi Roger,

On 3/26/20 1:48 PM, Remi Forax wrote:

Hi Roger,
I think implementing remove() on a cursor is a bad idea, since we have
introduced Collection.removeIf() in 8 the main use case of remove() has
disappeared and it's still implemented by Iterator if someone really want it.
Possibly, I was trying out what it takes for parity with Iterator.
In term of performance, it makes a huge difference because you can now implement
a snapshot at the beginning semantics, you don't have to be aware if the
XArrayList changes and you don't have to propagate back the structural change
done by remove(), so basically you can avoid the check for
ConcurrentModification.
Yes simplifying the semantics would help, but that would not be an
ArrayList but something else.

Part of the exercise was to see where inline classes can subsitute for
(assumed) more expensive identity classes.

With that, in the cursor, you can capture the array and the size when creating
it instead of capturing a reference to the enclosing class (XArrayList.this),
it should be a little more efficient.
I believe you can also remove the try/catch + rethrow and use Objects.checkIndex
instead.
Unless the VM can eliminate the checks the implicit array range check
should be cheaper/free.
Or I misunderstand how try/catch is implemented by the vm.
yes, i hope the range check to be fully removed.

If everything goes right, it should be has efficient as doing a for loop on the
array.
Yep, that's the idea.

The cost avoidance of not doing allocation is hard to measure.
especially when the escape analysis works

In some previous experiements, the extra data that had to be moved around
wiped out the savings by not having an identity object.
yes, when the cost of copy in a field or in an array is greater than the cost of dereferencing a pointer but here everything should be on the stack, so no trade-of.

I've looked at the code with JMH and perfasm and
I'll be looking more closely at the performance and any help is appreciated.

Here my implementation
https://urldefense.com/v3/__https://github.com/forax/valuetype-lworld/blob/master/src/main/java/fr.umlv.valuetype/fr/umlv/valuetype/xlist/XArrayList.java*L1130__;Iw!!GqivPVa7Brio!KDKclikzivMcFelPyi433ivEn2MsWszmcVnYc16HMeRowkPFjlf386atqiTjAFWU$
BTW, i believe your implementation has a bug, the index can overflow in advance().
That was intentional, since anything >= size is out of range anyway
and it has to be checked in get().
It wasn't worth an extra check.

with mostly the same test, i use microseconds so the results are more readable and a 3 seconds duration to have more stable results
https://urldefense.com/v3/__https://github.com/forax/valuetype-lworld/blob/master/src/test/java/fr.umlv.valuetype/fr/umlv/valuetype/perf/XArrayListCursorBenchMark.java*L21__;Iw!!GqivPVa7Brio!KDKclikzivMcFelPyi433ivEn2MsWszmcVnYc16HMeRowkPFjlf386atqpU5B1iE$

Cursors are now faster than iterators and a loop + get().

Benchmark (size) Mode Cnt Score Error Units
XArrayListCursorBenchMark.getViaArray 100000 avgt 5 345.798 ? 3.048 us/op
XArrayListCursorBenchMark.getViaCursorForLoop 100000 avgt 5 278.712 ? 2.501 us/op
XArrayListCursorBenchMark.getViaCursorWhileLoop 100000 avgt 5 278.281 ? 3.816 us/op
XArrayListCursorBenchMark.getViaIterator 100000 avgt 5 323.989 ? 4.374 us/op
XArrayListCursorBenchMark.getViaIteratorCurs 100000 avgt 5 312.738 ? 1.848 us/op

Good.

Thanks, Roger

@mlbridge
Copy link

@mlbridge mlbridge bot commented Mar 26, 2020

Mailing list message from Sergey Kuksenko on valhalla-dev:

It won't work for ArrayList.

I told about that on Valhalla meeting (half year ago) and showed it's in
my presentation about Valhalla performance
(http://cr.openjdk.java.net/~skuksenko/valhalla/talk/vp.pdf, slides 68-75).

There is no difference cost between memory access to stack and heap. You
won't get performance benefits on any range checks. Iterator/Cursor
memory allocation itself is very cheap.

The only place where it may have significant effect (and it has) is GC
write barriers.? Simple not-tuned Cursor for HashMap give +40% boost. If
we want to see performance difference for Cursor we need an Iterator
where Iterator's state contains references and these references should
be updated on each "next()" invocation. ArrayList needs only update int
field - no GC barriers.

My raw experiments were published in last August
http://cr.openjdk.java.net/~skuksenko/valhalla/playground/.

It's still not enough to observe performance improvement on
microbenchmarks. HotSpot is really good at
Iterators-inside-microbenchmarks scalarization. Iterator should be
forced DO NOT to scalarize the iterator. The simplest way to do it:

@CompilerControl(CompilerControl.Mode.DONT_INLINE)
private static Iterator<Integer> hide(Iterator<Integer> it) {
return it;
}

@benchmark
public int sumIteratorHidden() {
int s = 0;
for (Iterator<Integer> iterator = hide(map.keyIterator()); iterator.hasNext(); ) {
s += iterator.next();
}
return s;
}

Minor note: Don't run it with ZGC. ZGC has no write barriers. Default G1
is the best option due to highest cost of write barriers. ParallelGC
write barriers has much less cost than G1 and Cursor still shows
performance boost on ParallelGC, but much less than on G1.

On 3/26/20 10:45 AM, Brian Goetz wrote:

These don't look so good!? I suspect that the Iterator is performing
so well because it's getting good EA, but not sure why cursors aren't
doing better?? Guess we'll have to dig into the generated code....

On 3/26/2020 1:14 PM, Roger Riggs wrote:

Implementation of Cursors and jmh tests comparing
typical List traversal via direct index, iterator,
inline cursor, and an iterator implemented on top of cursor.

Sample results:

??????????????????????????????????? (size)? Mode? Cnt Score??????
Error? Units
XArrayListCursorTest.getViaArray??????????? 100000? avgt??? 5
507793.484? 7086.038? ns/op
XArrayListCursorTest.getViaCursorForLoop??? 100000? avgt??? 5
656461.958? 52488.547? ns/op
XArrayListCursorTest.getViaCursorWhileLoop? 100000? avgt??? 5
641963.323? 32219.409? ns/op
XArrayListCursorTest.getViaIterator???????? 100000? avgt??? 5
558863.817? 23539.256? ns/op
XArrayListCursorTest.getViaIteratorCurs???? 100000? avgt??? 5
733161.466? 33721.881? ns/op

-------------

Commit messages:
? - Prototype inline cursors for List

Changes: https://git.openjdk.java.net/valhalla/pull/5/files
? Webrev: https://webrevs.openjdk.java.net/valhalla/5/webrev.00
?? Stats: 2139 lines in 3 files changed: 2139 ins; 0 del; 0 mod
?? Patch: https://git.openjdk.java.net/valhalla/pull/5.diff
?? Fetch: git fetch https://git.openjdk.java.net/valhalla
pull/5/head:pull/5

PR: https://git.openjdk.java.net/valhalla/pull/5

@mlbridge
Copy link

@mlbridge mlbridge bot commented Mar 26, 2020

Mailing list message from forax at univ-mlv.fr on valhalla-dev:

----- Mail original -----

De: "Roger Riggs" <Roger.Riggs at oracle.com>
?: "Remi Forax" <forax at univ-mlv.fr>
Cc: "valhalla-dev" <valhalla-dev at openjdk.java.net>
Envoy?: Jeudi 26 Mars 2020 21:14:43
Objet: Re: [lworld] RFR: Prototype inline cursors for List

Hi Remi,

yes, with those modifications its faster, but its not a Cursor for ArrayList
since CME is possible and not checked.
That would be appropriate for CopyOnWriteArraylist for example.

Or document that you may seen null when you iterate using a Cursor.

Another possible more heavy lifting isto change the implementation of ArrayList to have all removes + add at the beginning (add(int, E)) to create a new array.
(having a flag that say if at least one iterator was spawn may alleviate a little bit the pain).
But i think it defeats the diminish the appeal of ArrayList as being the go to, general purpose list.

We also have a chance to create new implementations of List, etc.

I do want to understand where the pitfalls are when attempting
to improve implementations by replacing with inline classes.

the problem is that reading modCount is what's make the implementation slow, i.e. as slow as the iterator when EA works.

R?mi

On 3/26/20 3:38 PM, Remi Forax wrote:

----- Mail original -----

De: "Roger Riggs" <Roger.Riggs at oracle.com>
?: "valhalla-dev" <valhalla-dev at openjdk.java.net>
Envoy?: Jeudi 26 Mars 2020 19:14:54
Objet: Re: [lworld] RFR: Prototype inline cursors for List
Hi Remi,
Hi Roger,

On 3/26/20 1:48 PM, Remi Forax wrote:

Hi Roger,
I think implementing remove() on a cursor is a bad idea, since we have
introduced Collection.removeIf() in 8 the main use case of remove() has
disappeared and it's still implemented by Iterator if someone really want it.
Possibly, I was trying out what it takes for parity with Iterator.
In term of performance, it makes a huge difference because you can now implement
a snapshot at the beginning semantics, you don't have to be aware if the
XArrayList changes and you don't have to propagate back the structural change
done by remove(), so basically you can avoid the check for
ConcurrentModification.
Yes simplifying the semantics would help, but that would not be an
ArrayList but something else.

Part of the exercise was to see where inline classes can subsitute for
(assumed) more expensive identity classes.

With that, in the cursor, you can capture the array and the size when creating
it instead of capturing a reference to the enclosing class (XArrayList.this),
it should be a little more efficient.
I believe you can also remove the try/catch + rethrow and use Objects.checkIndex
instead.
Unless the VM can eliminate the checks the implicit array range check
should be cheaper/free.
Or I misunderstand how try/catch is implemented by the vm.
yes, i hope the range check to be fully removed.

If everything goes right, it should be has efficient as doing a for loop on the
array.
Yep, that's the idea.

The cost avoidance of not doing allocation is hard to measure.
especially when the escape analysis works

In some previous experiements, the extra data that had to be moved around
wiped out the savings by not having an identity object.
yes, when the cost of copy in a field or in an array is greater than the cost of
dereferencing a pointer but here everything should be on the stack, so no
trade-of.

I've looked at the code with JMH and perfasm and
I'll be looking more closely at the performance and any help is appreciated.

Here my implementation
https://urldefense.com/v3/__https://github.com/forax/valuetype-lworld/blob/master/src/main/java/fr.umlv.valuetype/fr/umlv/valuetype/xlist/XArrayList.java*L1130__;Iw!!GqivPVa7Brio!KDKclikzivMcFelPyi433ivEn2MsWszmcVnYc16HMeRowkPFjlf386atqiTjAFWU$
BTW, i believe your implementation has a bug, the index can overflow in
advance().
That was intentional, since anything >= size is out of range anyway
and it has to be checked in get().
It wasn't worth an extra check.

with mostly the same test, i use microseconds so the results are more readable
and a 3 seconds duration to have more stable results
https://urldefense.com/v3/__https://github.com/forax/valuetype-lworld/blob/master/src/test/java/fr.umlv.valuetype/fr/umlv/valuetype/perf/XArrayListCursorBenchMark.java*L21__;Iw!!GqivPVa7Brio!KDKclikzivMcFelPyi433ivEn2MsWszmcVnYc16HMeRowkPFjlf386atqpU5B1iE$

Cursors are now faster than iterators and a loop + get().

Benchmark (size) Mode Cnt Score
Error Units
XArrayListCursorBenchMark.getViaArray 100000 avgt 5 345.798 ?
3.048 us/op
XArrayListCursorBenchMark.getViaCursorForLoop 100000 avgt 5 278.712 ?
2.501 us/op
XArrayListCursorBenchMark.getViaCursorWhileLoop 100000 avgt 5 278.281 ?
3.816 us/op
XArrayListCursorBenchMark.getViaIterator 100000 avgt 5 323.989 ?
4.374 us/op
XArrayListCursorBenchMark.getViaIteratorCurs 100000 avgt 5 312.738 ?
1.848 us/op

Good.

Thanks, Roger

@RogerRiggs
Copy link
Collaborator Author

@RogerRiggs RogerRiggs commented Mar 31, 2020

/integrate

@openjdk openjdk bot closed this Mar 31, 2020
@openjdk openjdk bot added integrated and removed ready rfr labels Mar 31, 2020
@openjdk
Copy link

@openjdk openjdk bot commented Mar 31, 2020

@RogerRiggs The following commits have been pushed to lworld since your change was applied:

  • 0595ad0: 8235753: [lworld] Handle deoptimization when buffering scalarized inline type args in C1
  • ece2a2d: 8241918: [lworld] Build failures after JDK-8236522
  • 2a6ddbd: 8241910: [lworld] C2 crashes due to duplicate storestore barrier added by JDK-8236522
  • cab7a5b: 8236522: NonTearable marker interface for inline classes
  • 78b0ffd: 8241533: [lworld] PhaseMacroExpand::migrate_outs should be replaced b…

Your commit was automatically rebased without conflicts.

Pushed as commit d544758.

@mlbridge
Copy link

@mlbridge mlbridge bot commented Mar 31, 2020

Mailing list message from Roger Riggs on valhalla-dev:

Changeset: d544758
Author: Roger Riggs <rriggs at openjdk.org>
Date: 2020-03-31 13:55:51 +0000
URL: https://git.openjdk.java.net/valhalla/commit/d544758b

Prototype inline cursors for List

+ test/micro/org/openjdk/bench/valhalla/corelibs/InlineCursor.java
+ test/micro/org/openjdk/bench/valhalla/corelibs/XArrayList.java
+ test/micro/org/openjdk/bench/valhalla/corelibs/XArrayListCursorTest.java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
1 participant