Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8235914: [lworld] Profile acmp bytecode #185

Closed
wants to merge 4 commits into from

Conversation

rwestrel
Copy link
Collaborator

@rwestrel rwestrel commented Sep 11, 2020

This includes:

  • a new ProfileData structure to profile both inputs to acmp
  • profile collection at acmp in the interpreter on x86
  • profile collection at acmp in c1 generated code on x86
  • changes to c2's acmp implementation to leverage profiling (both existing profiling through type speculation and new profile data at acmp)
  • small tweaks to the assembly code generated for acmp
  • a change to the implementation of LIRGenerator::profile_null_free_array() so it doesn't use a branch (which is dangerous given the register allocator is not aware of branches added at the LIR level)
  • new tests

Profile collection happens unconditionally. Leveraging profiling at acmp is under UseACmpProfile which is false by default.


Progress

  • Change must not contain extraneous whitespace

Testing

Linux x64 Windows x64 macOS x64
Build (3/3 running) (2/2 running) (2/2 running)

Issue

Reviewers

Download

$ git fetch https://git.openjdk.java.net/valhalla pull/185/head:pull/185
$ git checkout pull/185

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 11, 2020

👋 Welcome back roland! A progress list of the required criteria for merging this PR into lworld will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@rwestrel
Copy link
Collaborator Author

rwestrel commented Sep 11, 2020

/test

@openjdk
Copy link

openjdk bot commented Sep 11, 2020

@rwestrel This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8235914: [lworld] Profile acmp bytecode

Reviewed-by: thartmann

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 1 new commit pushed to the lworld branch:

  • e815214: 8254326: [lworld] [lw3] Extend definition of empty inline types

Please see this link for an up-to-date comparison between the source branch of this pull request and the lworld branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the lworld branch, type /integrate in a new comment.

@openjdk openjdk bot added the ready label Sep 11, 2020
@openjdk
Copy link

openjdk bot commented Sep 11, 2020

@rwestrel you need to get approval to run the tests in tier1 for commits up until afd5cf0

@openjdk
Copy link

openjdk bot commented Sep 15, 2020

A test job has been started with id: github.com-182025808-185-690946782

@openjdk
Copy link

openjdk bot commented Sep 15, 2020

@rwestrel your test job with id github.com-182025808-185-690946782 for commits up until afd5cf0 has finished.

@rwestrel
Copy link
Collaborator Author

rwestrel commented Sep 15, 2020

/test

@openjdk
Copy link

openjdk bot commented Sep 15, 2020

@rwestrel you need to get approval to run the tests in tier1 for commits up until 0cc27a7

@openjdk openjdk bot added the test-request label Sep 15, 2020
@TobiHartmann
Copy link
Member

TobiHartmann commented Sep 15, 2020

/test approve

@openjdk openjdk bot removed the test-request label Sep 15, 2020
@openjdk
Copy link

openjdk bot commented Sep 15, 2020

A test job has been started with id: github.com-182025808-185-692643225

@openjdk
Copy link

openjdk bot commented Sep 15, 2020

@rwestrel your test job with id github.com-182025808-185-692643225 for commits up until 0cc27a7 has finished.

@rwestrel rwestrel marked this pull request as ready for review Sep 15, 2020
@openjdk openjdk bot added the rfr label Sep 15, 2020
@mlbridge
Copy link

mlbridge bot commented Sep 15, 2020

Webrevs

@rose00
Copy link
Collaborator

rose00 commented Sep 16, 2020

When the JIT speculates on an acmp profile, I suppose the independent type profiles will help to make some cases more profitable to test: If an inline type is common, you test for it first and inline the rest.

But I don't think a good result requires independent type profiles for the two operands. I would think that the relevant history would consist of (a) the number of acmp attempts, and (b) for each inline type for which the operands had that as their common type the frequency of encountering that type. That's really just a single klass profile (with counters).

This other form would be somewhat preferable because it would use less footprint and require less bookkeeping, and it would capture sharper information for the JIT, than two independent profiles. The weakness of independent profiles is you don't know how often the two operands end up with the same type.

Just a suggestion.

@rwestrel
Copy link
Collaborator Author

rwestrel commented Sep 16, 2020

@rose00 Thanks for the suggestion.

With this patch, my goal is to improve the performance of acmp when there's no inline types involved but the compiler can't tell from the static types of the acmp inputs so that legacy code that make no use of inline types is not affected by them. My understanding is that, first of all, we want the new acmp to not cause regressions in non inlined type code. How important is it to optimize acmp (or aaload/aastore) for cases where inline types hidden behind Object are compared (or flattened arrays hidden behind Object[])?

Current logic for an acmp is:

if (left == right) {
 // equal
} else {
  if (left == null) {
    // not equal
  } else  if (left not inline type) {
    // not equal
  } else if (right == null) {
    // not equal
  } else if (left.klass != right.klass) {
    // not equal
  } else {
    // substituability test
  } 
}

Now if we have profiling for left or right that tells one of them is always null or one them is never an inline type and never null then we only need 2 comparisons, for instance:

if (left == right) {
  // equal
} else {
  if (left.right.klass == java.lang.Integer) { // implicit null check
    trap;
  }
 // not equal
}

which is a pattern that's a lot friendlier to the compiler.

@rose00
Copy link
Collaborator

rose00 commented Sep 16, 2020

The simple comparison left == right is valid for acmp along all paths except when left and right are inlines, and even then the simple comparison is valid if the two inline types are distinct. Therefore, the only evidence that the JIT needs that the S-test was ever needed is if the two types are (a) inlines and (b) identical. Your profile detects (a) but not (b), which means the JIT can't speculate (b). If you speculate (b) you can use left == right because any inlines that appear are irrelevant.

(Of course you need additional testing to verify the speculation. That's something like left.klass == right.klass && left.klass.is_inline or the reverse, which detects (a) and (b). It can go to an uncommon trap if the profile claims it's rare, to avoid compiling in a potentially polymorphic S-test.)

My overall point here is that left == right is a good way to prove, in many cases, that two things are identical, and left != right is a good heuristic (not proof) that two things are not identical. The heuristic fails only if (a) and (b) are true. That heuristic seems to be a good target for speculation; perhaps you are aiming at other speculations here and I'm barking up a different tree. For this speculation, I think you would throw an uncommon trap you saw (a) and (b), so the complicated S-test path doesn't need to be compiled.

To answer your question, I think (but don't know for sure) that inlines masked by Object references will be an important source of acmp slow paths. If the profile indicates that either (a) inlines are not seen in the operands or (b) equal inline types are not seen, then the slow S-test can be removed and replaced (at most) by an uncommon trap.

(Idea of the day: Make an acmp entry point a hidden/injected virtual method on every object, with a distinct body for each inline object, and this==x on jl.Object. The body for any given inline class V is then this==x || x instanceof V && V.$MONOMORPHIC_S_TEST(this, (V)x). Then refer the whole problem to our portfolio of devirtualization optimizations. That gives us bimorphic calls, etc., for acmp. Maybe that's a better factoring than adding more and more ad hoc acmp logic. This idea pairs well with your profile proposal, since it basically treats acmp as a hidden virtual call.)

HTH

@rwestrel
Copy link
Collaborator Author

rwestrel commented Sep 21, 2020

@rose00 Ideally, wouldn't we want to collect a set of:

(left class, right class, count)

so if:

  1. left and right are reference types, we compile acmp as a simple pointer comparison and guard that one of left or right is the profiled class
  2. left is a reference type and right an inline type, we compile acmp as a simple pointer comparison and guard that left is the profiled class
  3. left is an inline type and right a reference type, we compile acmp as a simple pointer comparison and guard that right is the profiled class
  4. left and right are inline types, we can guard that they are of different (resp same) types (or that each is of the corresponding profiled types) and compile a simple pointer comparison (resp a substitutability test)?

Anyway, at this point, compiled code calls java.lang.invoke.ValueBootstrapMethods::isSubstitutable() for a substituability test. It doesn't even take advantage of known inline types to dispatch to the right comparison method. So, profile data makes no difference and we would first need to optimize acmp for known inline types.

@openjdk
Copy link

openjdk bot commented Sep 21, 2020

@rwestrel this pull request can not be integrated into lworld due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8235914
git fetch https://git.openjdk.java.net/valhalla lworld
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge lworld"
git push

@rose00
Copy link
Collaborator

rose00 commented Sep 21, 2020

@rose00 Ideally, wouldn't we want to collect a set of:

(left class, right class, count)

so if:

  1. left and right are reference types, we compile acmp as a simple pointer comparison and guard that one of left or right is the profiled class

I see, now, that you are hoping for a monomorphic (or nearly monomorphic) left or right argument. That would allow you to speculate an exact type for that particular argument, which then simplifies matters. I agree this is helpful. If you can correctly speculate the exact identity of a type, you don't need runtime tests for whether the type is inline or not.

That said, you only need a runtime test for inline types if the left and right types are identical. That's almost as fast as correctly verifying a single type.

  1. left is a reference type and right an inline type, we compile acmp as a simple pointer comparison and guard that left is the profiled class
  2. left is an inline type and right a reference type, we compile acmp as a simple pointer comparison and guard that right is the profiled class

Cases 1..3 can be grouped as:

  • (1..3) Either of left type or right right can be speculated as a (constant) reference type R. Guard that type as R and finish with pointer comparison. This is faster than computing both types and comparing for equality. But it is more brittle: It requires one side to be monomorphic. If the monomorphism guard fails, comparing the two types for equality must be done, probably as the immediate next step.
  1. left and right are inline types, we can guard that they are of different (resp same) types (or that each is of the corresponding profiled types) and compile a simple pointer comparison (resp a substitutability test)?

I guess my main contribution here is to suggest that the profile could be more helpful if it would predict whether the specific condition of inline type equality is frequent or infrequent. If frequent, then we compile in the S-test. (If frequent and monomorphic, we guard for the mono-type. This will eventually help further inline the S-test.) Otherwise, if infrequent, we guard for type equality (if we can't guard for a mono-type that's a reference) and uncommon-trap the S-test.

There are several independent factors that can allow us to avoid compiling the S-test (and use an uncommon trap instead):

A. We can guard on a known reference type, either left or right. (Monomorphic sites only.)
B. We can guard on unequal types. (Requires loading both types. Doesn't require monomorphism. Failure of equality can quickly fall through to C to profit on equal reference types.)
C. We can guard on whether a type is an inline type or not. (Requires a dependent load and test from a klass.)

I think those speculations are in order of preference. To speculate A. we need a 1-element type profile on both sides of the test, plus statistics on how many outlier events occurred. That is handled by your design, and covers cases 1..3. Monomorphism on both sides covers part of case 4, but that's going to be rarer than polymorphism on either or both sides, I think.

Case 4 (guard equal types) is the same as B. To speculate B. we want a profile on how often types are equal. That's not something you have here yet. To collect it, you could profile klass equality, or (better IMO) profile klass equality only if the klass is also an inline (maybe testing equality first, to avoid a dependent load and cache pollution?). The interpreter would collect this extra information.

To speculate C. (which is not on your list, and maybe is in the noise) we would need statistics on how often a type (on either side) is inline. That's something you can derive from a multi-element profile, in your current design, but only if the profile doesn't overflow. I think it's easier to just collect the inline bit separately.

To tie these requirements back to the interpreter profile structure:

A. Requires logging of reference types on either side, enough to detect monomorphism. (Or maybe bi-morphism? diminishing returns there...)
B. Requires logging of inequality of inline types, enough to warrant an uncommon trap if inline-equality occurs. (I don't think unqualified inequality is an interesting thing to profile: Too rare in practice; it means a mostly-useless acmp test.)
C. Requires logging of inline types. If inline types are very rare, then we can speculate against them.

This leads me to the following interpreter profile structure:

(left null seen)
{ (left type, frequency), … }, (miss frequency)  // A left
(right null seen)
{ (right type, frequency), … }, (miss frequency)  // A right
{ (same inline type, frequency) … }, (miss frequency)  // B/C left==right && left.inline
(left inline seen)
(right inline seen)

That's three type-lists instead of two, which seems to get unwieldy. It also suggests that I'm really talking about a follow-on RFE (which I'm sure has crossed your mind already).

We could simplify this structure and make it more robust as follows:

(left null seen), (right null seen), (same inline seen), (left inline seen), (right inline seen)
{ (type, code, frequency) … }
// enum code { left, right, same }; in low 2 bits of profile record count
(left miss frequency)
(right miss frequency)
(same inline miss frequency)

The idea is to merge the lists, but keep separate "long tail" miss frequencies. This makes sense for acmp because it is a symmetric operation, and you only need monomorphism on one side. Arguably, you just need one array of types (at least 3) to prove monomorphism for either side, and then you get a slightly smaller profile for the other side. The equal-types case (significant only for acmp) is a natural extension.

My suggestion is to put most of this aside as an RFE. But you could consider, right now, merging the two lists, adding two type codes (for left and right). Then adding the third type code (same) would be simpler as an RFE.

Anyway, at this point, compiled code calls java.lang.invoke.ValueBootstrapMethods::isSubstitutable() for a substituability test. It doesn't even take advantage of known inline types to dispatch to the right comparison method. So, profile data makes no difference and we would first need to optimize acmp for known inline types.

That's a moving target. Clearly we want to inline the polymorphic S-test and then try to "de-virtualize" it when we can. The current code has fast paths for all the cases we have already discussed (A, B, C above), and then uses a jl.ClassValue to dispatch to a handler. Devirtualizing that dispatch will require (a) type information about the inline type shared by the two operands, and (b) a way to inline through a constant ClassValue::get call (please see https://bugs.openjdk.java.net/browse/JDK-8238260).

The profile information to (eventually) drive JDK-8238260 will be either side, but it will be most efficiently collected if it is filtered down to cases where (a) the two sides have the same klass and (b) that klass is an inline. If that filtered profile is monomorphic (which doesn't require the profile as a whole to be monomorphic) then JDK-8238260 (or some other ad hoc devirtualization of the S-test) can get a serious win.

@rwestrel
Copy link
Collaborator Author

rwestrel commented Sep 25, 2020

@rose00 Thanks for the discussion and suggestions. I propose I file 2 RFEs for this (one to improve handling of known inline types at acmp and another one to improve profile collection at acmp). I think the patch as it is today is a worthwhile improvement and I'd like to move forward with it as it is (even small improvements to profiling can turn out to be quite a bit of work given the interpreter, c1 and c2 all need to be adjusted).

@rwestrel
Copy link
Collaborator Author

rwestrel commented Oct 9, 2020

@rose00 Thanks for the discussion and suggestions. I propose I file 2 RFEs for this (one to improve handling of known inline types at acmp and another one to improve profile collection at acmp). I think the patch as it is today is a worthwhile improvement and I'd like to move forward with it as it is (even small improvements to profiling can turn out to be quite a bit of work given the interpreter, c1 and c2 all need to be adjusted).

Actually there's already an RFE for that:
https://bugs.openjdk.java.net/browse/JDK-8228361

@@ -232,6 +236,7 @@
protected static final String CHECKCAST_ARRAYCOPY = "(.*call_leaf_nofp,runtime checkcast_arraycopy.*" + END;
protected static final String JLONG_ARRAYCOPY = "(.*call_leaf_nofp,runtime jlong_disjoint_arraycopy.*" + END;
protected static final String FIELD_ACCESS = "(.*Field: *" + END;
protected static final String SUBSTITUTABLITY_TEST = START + "CallStaticJava" + MID + "java.lang.invoke.ValueBootstrapMethods::isSubstitutable" + END;
Copy link
Member

@TobiHartmann TobiHartmann Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUBSTITUTABLITY_TEST -> SUBSTITUTABILITY_TEST

@@ -289,6 +297,7 @@ public int getNumScenarios() {
"-XX:+InlineTypeReturnedAsFields",
"-XX:+StressInlineTypeReturnedAsFields"};
case 3: return new String[] {
"-XX:-UseACmpProfile",
Copy link
Member

@TobiHartmann TobiHartmann Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can enable profiling when IR verification is disabled -DVerifyIR=false.

@@ -297,6 +306,7 @@ public int getNumScenarios() {
"-XX:+InlineTypePassFieldsAsArgs",
"-XX:+InlineTypeReturnedAsFields"};
case 4: return new String[] {
"-XX:-UseACmpProfile",
Copy link
Member

@TobiHartmann TobiHartmann Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can enable profiling when IR verification is disabled -DVerifyIR=false.

@@ -3384,4 +3384,30 @@ public void test123_verifier(boolean warmup) {
// Expected
}
}

// acmp doesn't need substitutablity test when one input is known
Copy link
Member

@TobiHartmann TobiHartmann Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"substitutablity" -> "substitutability"

test124(42, testValue1);
}

// acmp doesn't need substitutablity test when one input null
Copy link
Member

@TobiHartmann TobiHartmann Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"substitutablity" -> "substitutability"

@@ -392,6 +394,7 @@ class ProfileData : public ResourceObj {
virtual bool is_ParametersTypeData() const { return false; }
virtual bool is_SpeculativeTrapData()const { return false; }
virtual bool is_ArrayLoadStoreData() const { return false; }
virtual bool is_ACmpData() const { return false; }
Copy link
Member

@TobiHartmann TobiHartmann Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation is wrong.

@@ -780,6 +780,9 @@
product(bool, UseArrayLoadStoreProfile, true, \
"Take advantage of profiling at array load/store") \
\
product(bool, UseACmpProfile, false, \
"Take advantage of profiling at acmp") \
Copy link
Member

@TobiHartmann TobiHartmann Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not enable it by default?

}
}

void Parse::do_acmp(BoolTest::mask btest, Node* right, Node* left) {
Copy link
Member

@TobiHartmann TobiHartmann Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node* left, Node* right seems more natural. Also in the code below, where right is processed first.

return;
}
if (!left_inline_type) {
// Comparison with an object of known not to be an inline type
Copy link
Member

@TobiHartmann TobiHartmann Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "object of known"

return;
}
if (!right_inline_type) {
// Comparison with an object of known not to be an inline type
Copy link
Member

@TobiHartmann TobiHartmann Oct 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "object of known"

@rwestrel
Copy link
Collaborator Author

rwestrel commented Oct 12, 2020

@TobiHartmann thanks for the review. I fixed the things you spotted and made this on by default.

Copy link
Member

@TobiHartmann TobiHartmann left a comment

Thanks for making these changes. Looks good to me.

@rwestrel
Copy link
Collaborator Author

rwestrel commented Oct 12, 2020

Thanks for making these changes. Looks good to me.

Thanks for the review.

@rwestrel
Copy link
Collaborator Author

rwestrel commented Oct 12, 2020

/integrate

@openjdk openjdk bot closed this Oct 12, 2020
@openjdk openjdk bot added integrated and removed ready rfr labels Oct 12, 2020
@openjdk
Copy link

openjdk bot commented Oct 12, 2020

@rwestrel Since your change was applied there has been 1 commit pushed to the lworld branch:

  • e815214: 8254326: [lworld] [lw3] Extend definition of empty inline types

Your commit was automatically rebased without conflicts.

Pushed as commit 378279c.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants