Skip to content

8310308: IR Framework: check for type and size of vector nodes #14539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 81 commits into from

Conversation

eme64
Copy link
Contributor

@eme64 eme64 commented Jun 19, 2023

For some changes to SuperWord, and maybe auto-vectorization in general, I want to strengthen the IR Framework.

Motivation
I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and MaxVectorSize). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss.

How to use it

All IRNodes in test/hotspot/jtreg/compiler/lib/ir_framework/IRNode.java that are created with vectorNode are now all matched with their type and size. The regex might now look something like this:

"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z]\[8\]:\{int\})"
which would match with IR nodes dumped like that:
1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...

The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (failOn or comparison < or <= or =0) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained using IRNode.VECTOR_SIZE.

Some examples:

  1. @IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "}) -> search for a LoadVector node with type int, and maximal size possible on the machine (limited by CPU features and MaxVectorSize). This is the most common use case.
  2. @IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR }) -> fail if there is a LoadVector with type long, of any size.
  3. @IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "}) -> find at least one XorV node with type int and exactly 4 elements. Useful for VectorAPI when the vector species is fixed.
  4. @IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " }) -> search for a LoadVector node with type double, and size exactly equals to min(4, max_double) (so 4 elements, or if the hardware allows fewer doubles, then that number).
  5. @IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" }) -> find at least one AbsV nodes with type float, and the size exactly equals to the smaller of LoopMaxUnroll or the maximal size allowed for floats (useful for tests where the LoopMaxUnroll is artificially lowered, which sometimes prevents the maximal filling of vectors).
  6. @IR(counts = {IRNode.VECTOR_CAST_I2F, IRNode.VECTOR_SIZE + "min(max_int, max_float)", ">0"}) -> find at least one VectorCastI2X node that casts to type float, and where the size is exactly equals to the smaller maximal size for ints and floats. This is helpful when there are multiple types in the loop, and the number of elements is limited by the sizes of multiple types.

I had to change lots of occurrences, hence you can find many more examples in the tests.

Details

Vector nodes that should be tested for type and size now are to be created with VECTOR_PREFIX and vectorNode, see IRNode.java.

When specifying such a vectorNode in an IR rule, one first uses the irNodePlaceholder (eg Load_VECTOR_I), and following it one can optionally add a IRNode.VECTOR_SIZE specifier, which is then parsed by parseVectorNodeSize. This allows either naming a concrete size (eg IRNode.VECTOR_SIZE_8), a tag (IRNode.VECTOR_SIZE + "<tag>") where the the tag can be one of the tags listed in parseVectorNodeSizeTag, or a min(...) clause which computes the minimum value of a comma separated list of tags. As a last resort one can match for any size (IRNode.VECTOR_SIZE_ANY).

The maximal vector size for any type is computed in getMaxElementsForType, under consideration of the CPU features and the MaxVectorSize.

Changes to tests

Unfortunately, I had to change a lot of IR rules, though not substantially. Most changes are because we usually had nodes like MAX_V or LOAD_VECTOR which matched for any type, and I had to create one node per type now (eg MAX_VF, MAX_VD, or LOAD_VECTOR_I, LOAD_VECTOR_L, LOAD_VECTOR_F, ...). While this was a lot of work, it is still good to know that we are generating the nodes with the correct types.

In the VectorAPI tests there were many which required concrete sizes due to the concrete size of the vector species. This is nice to test, since it guarantees that the vector species indeed generate the expected vector sizes.

A few tests required more attention, where I had to use patterns like IRNode.VECTOR_SIZE + "min(...)". These are especially interesting, as they test cases like mixed types (eg casting between types).

A few tests had loop iteration counts that were too small (maybe 512), such that the loops were not sufficiently unrolled to reach the maximal vector width. This happened especially with byte cases, which require an unrolling factor of 64 to fill 512bit registers. I increased the loop iteration counts, such that we can also properly test the largest vector widths. This improves our test coverage.

Future Work

There are a few nodes that I did not yet handle with vectorNode (eg VECTOR_REINTERPRET, OR_V_MASK, MACRO_LOGIC_V, LOAD_VECTOR_GATHER(_MASKED)). Some of these only have very few tests and are all from the Vector API which was not my priority here. They can easily be converted should the need arise in the future.

While looking at lots of IR tests I also came up with these RFE's:
JDK-8310891 C2 SuperWord tests: move platform requirements to IR rules
JDK-8310523 Add IR tests for nodes that have too few IR tests yet
JDK-8310533 [IR Framework] Add possibility to automatically verify that a test method always returns the same result

Testing

tier1-tier6 and stress-testing.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8310308: IR Framework: check for type and size of vector nodes (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539
$ git checkout pull/14539

Update a local copy of the PR:
$ git checkout pull/14539
$ git pull https://git.openjdk.org/jdk.git pull/14539/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 14539

View PR using the GUI difftool:
$ git pr show -t 14539

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14539.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 19, 2023

👋 Welcome back epeter! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jun 19, 2023

@eme64 The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Jun 19, 2023
@openjdk
Copy link

openjdk bot commented Jun 21, 2023

@eme64 this pull request can not be integrated into master due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout JDK-8310308
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push

@openjdk openjdk bot added the merge-conflict Pull request has merge conflict with target branch label Jun 21, 2023
@openjdk openjdk bot removed the merge-conflict Pull request has merge conflict with target branch label Jun 21, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Aug 11, 2023
@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Aug 11, 2023
Copy link
Member

@TobiHartmann TobiHartmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

TestFormat.checkNoReport(!vmInfo.canTrustVectorSize(), "sanity");
// If we have a size specified but cannot trust the size, and must check an upper
// bound, this can be impossible to count correctly - if we have an incorrect size
// we may count either too many nodes. We just create a impossible regex which will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// we may count either too many nodes. We just create a impossible regex which will
// we may count either too many nodes. We just create an impossible regex which will

@@ -87,7 +87,7 @@ static float[] test() {
}
```

However, the size does not have to be specified. In most cases, one either wants to have vectorization at the maximal possible vector width, or no vectorization at all. Hence, the default size is `IRNode.VECTOR_SIZE_MAX`, except when using `failOn` or `counts` with comparisons `<`, `<=` or `=0`, where we have a default of `IRNode.VECTOR_SIZE_ANY`.
However, the size does not have to be specified. In most cases, one either wants to have vectorization at the maximal possible vector width, or no vectorization at all. Hence, for lower bound counts ('>' or '>=') the default size is `IRNode.VECTOR_SIZE_MAX`, and for upper bound counts ('<' or '<=' or '=0' or failOn) the default is `IRNode.VECTOR_SIZE_ANY`. Equal count comparisons with a strictly positive count (e.g. '=2') are not allowed for vector nodes. On machines with 'canTrustVectorSize == false' (cascade lake) the maximal vector width is not predictable currently. Hence, on such a machine we have to automatically weaken the IR rules. All lower bound counts are performed checking with `IRNode.VECTOR_SIZE_ANY`. Upper bound counts with no user specified size are performed with `IRNode.VECTOR_SIZE_ANY` but upper bound counts with a user specified size are not checked at all. Details and reasoning can be found in [RawIRNode](./driver/irmatching/irrule/checkattribute/parsing/RawIRNode.java).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
However, the size does not have to be specified. In most cases, one either wants to have vectorization at the maximal possible vector width, or no vectorization at all. Hence, for lower bound counts ('>' or '>=') the default size is `IRNode.VECTOR_SIZE_MAX`, and for upper bound counts ('<' or '<=' or '=0' or failOn) the default is `IRNode.VECTOR_SIZE_ANY`. Equal count comparisons with a strictly positive count (e.g. '=2') are not allowed for vector nodes. On machines with 'canTrustVectorSize == false' (cascade lake) the maximal vector width is not predictable currently. Hence, on such a machine we have to automatically weaken the IR rules. All lower bound counts are performed checking with `IRNode.VECTOR_SIZE_ANY`. Upper bound counts with no user specified size are performed with `IRNode.VECTOR_SIZE_ANY` but upper bound counts with a user specified size are not checked at all. Details and reasoning can be found in [RawIRNode](./driver/irmatching/irrule/checkattribute/parsing/RawIRNode.java).
However, the size does not have to be specified. In most cases, one either wants to have vectorization at the maximal possible vector width, or no vectorization at all. Hence, for lower bound counts ('>' or '>=') the default size is `IRNode.VECTOR_SIZE_MAX`, and for upper bound counts ('<' or '<=' or '=0' or failOn) the default is `IRNode.VECTOR_SIZE_ANY`. Equal count comparisons with a strictly positive count (e.g. '=2') are not allowed for vector nodes. On machines with 'canTrustVectorSize == false' (Cascade Lake) the maximal vector width is not predictable currently. Hence, on such a machine we have to automatically weaken the IR rules. All lower bound counts are performed checking with `IRNode.VECTOR_SIZE_ANY`. Upper bound counts with no user specified size are performed with `IRNode.VECTOR_SIZE_ANY` but upper bound counts with a user specified size are not checked at all. Details and reasoning can be found in [RawIRNode](./driver/irmatching/irrule/checkattribute/parsing/RawIRNode.java).

Same for other occurrences.

}
case "=" -> {
// if 0, we expect none -> expect to not find any with any size
return comparison.getGivenValue() > 0;
// "=0" is same as setting upper bound - just like for failOn. But i we compare equals a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// "=0" is same as setting upper bound - just like for failOn. But i we compare equals a
// "=0" is same as setting upper bound - just like for failOn. But if we compare equals a

// if 0, we expect none -> expect to not find any with any size
return comparison.getGivenValue() > 0;
// "=0" is same as setting upper bound - just like for failOn. But i we compare equals a
// strictly positive number it is like setting both and upper and lower bound (equal).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// strictly positive number it is like setting both and upper and lower bound (equal).
// strictly positive number it is like setting both upper and lower bound (equal).

Copy link
Member

@chhagedorn chhagedorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have only some minor comments left, otherwise, the update looks good to me!

@@ -105,6 +105,8 @@ public class IRNode {
private static final String STORE_OF_CLASS_POSTFIX = "(:|\\+)\\S* \\*" + END;
private static final String LOAD_OF_CLASS_POSTFIX = "(:|\\+)\\S* \\*" + END;

public static final String IMPOSSIBLE_NODE_REGEX = "impossible_node_regex";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add additional # to be on the safe side to never accidentally match it:

Suggested change
public static final String IMPOSSIBLE_NODE_REGEX = "impossible_node_regex";
public static final String IMPOSSIBLE_NODE_REGEX = "#impossible_node_regex#";

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line is now removed, not required any more.

Comment on lines 56 to 67
System.err.println("--- VMInfo from Test VM ---");
System.err.println("cpuFeatures: " + getStringValue("cpuFeatures"));
System.err.println("MaxVectorSize: " + getLongValue("MaxVectorSize"));
System.err.println("MaxVectorSizeIsDefault: " + getLongValue("MaxVectorSizeIsDefault"));
System.err.println("LoopMaxUnroll: " + getLongValue("LoopMaxUnroll"));
System.err.println("UseAVX: " + getLongValue("UseAVX"));
System.err.println("UseAVXIsDefault: " + getLongValue("UseAVXIsDefault"));
if (isDefaultCascadeLake()) {
System.err.println(" -> You are on default Cascade Lake");
System.err.println(" -> SuperWord expected to run with 32 byte, not 64 byte, VectorAPI expected to use 64 byte");
System.err.println(" -> \"canTrustVectorSize == false\", some vector node IR rules are made weaker.");
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a leftover from debugging? If you want to print this information for debugging purposes, I suggest to move this code to VMInfoParser and additionally guard it with VERBOSE || PRINT_IR_ENCODING. The name PRINT_IR_ENCODING is not completely correct here but we might want to clean this up separately at some other point in time.

You can keep the verification of calling the get*() methods here, though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can just remove it. It is not necessary any more I think.

Comment on lines 107 to 108
* make use of the full MaxVectorSize. For Cascade Lake we by default only use
* 32 bytes for SuperWord even though MaxVectorSize is 64. But the VectorAPI still
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: For Cascade Lake, we only use 32 bytes for SuperWord by default even though MaxVectorSize is 64.

eme64 and others added 2 commits August 14, 2023 15:59
Co-authored-by: Christian Hagedorn <christian.hagedorn@oracle.com>
@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed ready Pull request is ready to be integrated labels Aug 15, 2023
@openjdk openjdk bot added ready Pull request is ready to be integrated and removed merge-conflict Pull request has merge conflict with target branch labels Aug 15, 2023
Copy link
Member

@chhagedorn chhagedorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates, looks good!

@eme64
Copy link
Contributor Author

eme64 commented Aug 15, 2023

Thanks @TobiHartmann @chhagedorn for all the discussions, help and reviews!
/integrate

@openjdk
Copy link

openjdk bot commented Aug 15, 2023

Going to push as commit a02d65e.
Since your change was applied there have been 3 commits pushed to the master branch:

  • dff99f7: 8313782: Add user-facing warning if THPs are enabled but cannot be used
  • f4e72c5: 8313949: Missing word in GPLv2 license text in StackMapTableAttribute.java
  • 6338927: 8314197: AttachListener::pd_find_operation always returning nullptr

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Aug 15, 2023
@openjdk openjdk bot closed this Aug 15, 2023
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Aug 15, 2023
@openjdk
Copy link

openjdk bot commented Aug 15, 2023

@eme64 Pushed as commit a02d65e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated
Development

Successfully merging this pull request may close these issues.

3 participants