8235914: [lworld] Profile acmp bytecode #185
@rwestrel This change now passes all automated pre-integration checks.
After integration, the commit message for the final commit will be:
At the time when this comment was updated there had been 1 new commit pushed to the
Please see this link for an up-to-date comparison between the source branch of this pull request and the
When the JIT speculates on an acmp profile, I suppose the independent type profiles will help to make some cases more profitable to test: If an inline type is common, you test for it first and inline the rest.
But I don't think a good result requires independent type profiles for the two operands. I would think that the relevant history would consist of (a) the number of acmp attempts, and (b) for each inline type for which the operands had that as their common type the frequency of encountering that type. That's really just a single klass profile (with counters).
This other form would be somewhat preferable because it would use less footprint and require less bookkeeping, and it would capture sharper information for the JIT, than two independent profiles. The weakness of independent profiles is you don't know how often the two operands end up with the same type.
Just a suggestion.
@rose00 Thanks for the suggestion.
With this patch, my goal is to improve the performance of acmp when there's no inline types involved but the compiler can't tell from the static types of the acmp inputs so that legacy code that make no use of inline types is not affected by them. My understanding is that, first of all, we want the new acmp to not cause regressions in non inlined type code. How important is it to optimize acmp (or aaload/aastore) for cases where inline types hidden behind Object are compared (or flattened arrays hidden behind Object)?
Current logic for an acmp is:
Now if we have profiling for left or right that tells one of them is always null or one them is never an inline type and never null then we only need 2 comparisons, for instance:
which is a pattern that's a lot friendlier to the compiler.
The simple comparison
(Of course you need additional testing to verify the speculation. That's something like
My overall point here is that
To answer your question, I think (but don't know for sure) that inlines masked by
(Idea of the day: Make an
@rose00 Ideally, wouldn't we want to collect a set of:
(left class, right class, count)
Anyway, at this point, compiled code calls java.lang.invoke.ValueBootstrapMethods::isSubstitutable() for a substituability test. It doesn't even take advantage of known inline types to dispatch to the right comparison method. So, profile data makes no difference and we would first need to optimize acmp for known inline types.
@rwestrel this pull request can not be integrated into
git checkout JDK-8235914 git fetch https://git.openjdk.java.net/valhalla lworld git merge FETCH_HEAD # resolve conflicts and follow the instructions given by git merge git commit -m "Merge lworld" git push
I see, now, that you are hoping for a monomorphic (or nearly monomorphic) left or right argument. That would allow you to speculate an exact type for that particular argument, which then simplifies matters. I agree this is helpful. If you can correctly speculate the exact identity of a type, you don't need runtime tests for whether the type is inline or not.
That said, you only need a runtime test for inline types if the left and right types are identical. That's almost as fast as correctly verifying a single type.
Cases 1..3 can be grouped as:
I guess my main contribution here is to suggest that the profile could be more helpful if it would predict whether the specific condition of inline type equality is frequent or infrequent. If frequent, then we compile in the S-test. (If frequent and monomorphic, we guard for the mono-type. This will eventually help further inline the S-test.) Otherwise, if infrequent, we guard for type equality (if we can't guard for a mono-type that's a reference) and uncommon-trap the S-test.
There are several independent factors that can allow us to avoid compiling the S-test (and use an uncommon trap instead):
A. We can guard on a known reference type, either left or right. (Monomorphic sites only.)
I think those speculations are in order of preference. To speculate A. we need a 1-element type profile on both sides of the test, plus statistics on how many outlier events occurred. That is handled by your design, and covers cases 1..3. Monomorphism on both sides covers part of case 4, but that's going to be rarer than polymorphism on either or both sides, I think.
Case 4 (guard equal types) is the same as B. To speculate B. we want a profile on how often types are equal. That's not something you have here yet. To collect it, you could profile klass equality, or (better IMO) profile klass equality only if the klass is also an inline (maybe testing equality first, to avoid a dependent load and cache pollution?). The interpreter would collect this extra information.
To speculate C. (which is not on your list, and maybe is in the noise) we would need statistics on how often a type (on either side) is inline. That's something you can derive from a multi-element profile, in your current design, but only if the profile doesn't overflow. I think it's easier to just collect the inline bit separately.
To tie these requirements back to the interpreter profile structure:
A. Requires logging of reference types on either side, enough to detect monomorphism. (Or maybe bi-morphism? diminishing returns there...)
This leads me to the following interpreter profile structure:
That's three type-lists instead of two, which seems to get unwieldy. It also suggests that I'm really talking about a follow-on RFE (which I'm sure has crossed your mind already).
We could simplify this structure and make it more robust as follows:
The idea is to merge the lists, but keep separate "long tail" miss frequencies. This makes sense for
My suggestion is to put most of this aside as an RFE. But you could consider, right now, merging the two lists, adding two type codes (for left and right). Then adding the third type code (same) would be simpler as an RFE.
That's a moving target. Clearly we want to inline the polymorphic S-test and then try to "de-virtualize" it when we can. The current code has fast paths for all the cases we have already discussed (A, B, C above), and then uses a
The profile information to (eventually) drive JDK-8238260 will be either side, but it will be most efficiently collected if it is filtered down to cases where (a) the two sides have the same klass and (b) that klass is an inline. If that filtered profile is monomorphic (which doesn't require the profile as a whole to be monomorphic) then JDK-8238260 (or some other ad hoc devirtualization of the S-test) can get a serious win.
@rose00 Thanks for the discussion and suggestions. I propose I file 2 RFEs for this (one to improve handling of known inline types at acmp and another one to improve profile collection at acmp). I think the patch as it is today is a worthwhile improvement and I'd like to move forward with it as it is (even small improvements to profiling can turn out to be quite a bit of work given the interpreter, c1 and c2 all need to be adjusted).
Actually there's already an RFE for that:
@rwestrel Since your change was applied there has been 1 commit pushed to the
Your commit was automatically rebased without conflicts.
Pushed as commit 378279c.