8310308: IR Framework: check for type and size of vector nodes#14539
8310308: IR Framework: check for type and size of vector nodes#14539eme64 wants to merge 81 commits intoopenjdk:masterfrom
Conversation
|
👋 Welcome back epeter! A progress list of the required criteria for merging this PR into |
…estCyclicDependency.java
|
@eme64 this pull request can not be integrated into git checkout JDK-8310308
git fetch https://git.openjdk.org/jdk.git master
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge master"
git push |
There was a problem hiding this comment.
| // we may count either too many nodes. We just create a impossible regex which will | |
| // we may count either too many nodes. We just create an impossible regex which will |
There was a problem hiding this comment.
| However, the size does not have to be specified. In most cases, one either wants to have vectorization at the maximal possible vector width, or no vectorization at all. Hence, for lower bound counts ('>' or '>=') the default size is `IRNode.VECTOR_SIZE_MAX`, and for upper bound counts ('<' or '<=' or '=0' or failOn) the default is `IRNode.VECTOR_SIZE_ANY`. Equal count comparisons with a strictly positive count (e.g. '=2') are not allowed for vector nodes. On machines with 'canTrustVectorSize == false' (cascade lake) the maximal vector width is not predictable currently. Hence, on such a machine we have to automatically weaken the IR rules. All lower bound counts are performed checking with `IRNode.VECTOR_SIZE_ANY`. Upper bound counts with no user specified size are performed with `IRNode.VECTOR_SIZE_ANY` but upper bound counts with a user specified size are not checked at all. Details and reasoning can be found in [RawIRNode](./driver/irmatching/irrule/checkattribute/parsing/RawIRNode.java). | |
| However, the size does not have to be specified. In most cases, one either wants to have vectorization at the maximal possible vector width, or no vectorization at all. Hence, for lower bound counts ('>' or '>=') the default size is `IRNode.VECTOR_SIZE_MAX`, and for upper bound counts ('<' or '<=' or '=0' or failOn) the default is `IRNode.VECTOR_SIZE_ANY`. Equal count comparisons with a strictly positive count (e.g. '=2') are not allowed for vector nodes. On machines with 'canTrustVectorSize == false' (Cascade Lake) the maximal vector width is not predictable currently. Hence, on such a machine we have to automatically weaken the IR rules. All lower bound counts are performed checking with `IRNode.VECTOR_SIZE_ANY`. Upper bound counts with no user specified size are performed with `IRNode.VECTOR_SIZE_ANY` but upper bound counts with a user specified size are not checked at all. Details and reasoning can be found in [RawIRNode](./driver/irmatching/irrule/checkattribute/parsing/RawIRNode.java). |
Same for other occurrences.
There was a problem hiding this comment.
| // "=0" is same as setting upper bound - just like for failOn. But i we compare equals a | |
| // "=0" is same as setting upper bound - just like for failOn. But if we compare equals a |
There was a problem hiding this comment.
| // strictly positive number it is like setting both and upper and lower bound (equal). | |
| // strictly positive number it is like setting both upper and lower bound (equal). |
chhagedorn
left a comment
There was a problem hiding this comment.
I have only some minor comments left, otherwise, the update looks good to me!
There was a problem hiding this comment.
Maybe add additional # to be on the safe side to never accidentally match it:
| public static final String IMPOSSIBLE_NODE_REGEX = "impossible_node_regex"; | |
| public static final String IMPOSSIBLE_NODE_REGEX = "#impossible_node_regex#"; |
There was a problem hiding this comment.
Line is now removed, not required any more.
test/hotspot/jtreg/compiler/lib/ir_framework/driver/SuccessOnlyConstraintException.java
Show resolved
Hide resolved
test/hotspot/jtreg/compiler/lib/ir_framework/driver/SuccessOnlyConstraintException.java
Outdated
Show resolved
Hide resolved
...reg/compiler/lib/ir_framework/driver/irmatching/irrule/checkattribute/parsing/RawIRNode.java
Outdated
Show resolved
Hide resolved
...reg/compiler/lib/ir_framework/driver/irmatching/irrule/checkattribute/parsing/RawIRNode.java
Outdated
Show resolved
Hide resolved
...eg/compiler/lib/ir_framework/driver/irmatching/irrule/constraint/SuccessConstraintCheck.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Is this a leftover from debugging? If you want to print this information for debugging purposes, I suggest to move this code to VMInfoParser and additionally guard it with VERBOSE || PRINT_IR_ENCODING. The name PRINT_IR_ENCODING is not completely correct here but we might want to clean this up separately at some other point in time.
You can keep the verification of calling the get*() methods here, though.
There was a problem hiding this comment.
I can just remove it. It is not necessary any more I think.
There was a problem hiding this comment.
Suggestion: For Cascade Lake, we only use 32 bytes for SuperWord by default even though MaxVectorSize is 64.
Co-authored-by: Christian Hagedorn <christian.hagedorn@oracle.com>
chhagedorn
left a comment
There was a problem hiding this comment.
Thanks for the updates, looks good!
|
Thanks @TobiHartmann @chhagedorn for all the discussions, help and reviews! |
|
Going to push as commit a02d65e.
Your commit was automatically rebased without conflicts. |
For some changes to
SuperWord, and maybe auto-vectorization in general, I want to strengthen the IR Framework.Motivation
I want to not just find the relevant IR nodes, but also assert that they have the maximal length that they could have on the respective platform (given the CPU features and
MaxVectorSize). Without this verification it is possible that a future change leads to a regression where we still vectorize but at shorter vector widths as before - leading to performance loss.How to use it
All
IRNodes intest/hotspot/jtreg/compiler/lib/ir_framework/IRNode.javathat are created withvectorNodeare now all matched with theirtypeandsize. The regex might now look something like this:"(\d+(\s){2}(VectorCastF2X.*)+(\s){2}===.*vector[A-Za-z]\[8\]:\{int\})"which would match with IR nodes dumped like that:
1150 VectorCastF2X === _ 1151 [[ 1146 ]] #vectory[8]:{int} ...The goal was to keep it simple and straight forward. In most cases, you can just use the nodes as before, and implicitly we now check for maximal size automatically. However, in some cases we want to ensure there is no or only a limited number of nodes (
failOnor comparison<or<=or=0) - in those cases we usually want to make sure there is not any node of any size, so we match with any size by default. The size can also explicitly be constrained usingIRNode.VECTOR_SIZE.Some examples:
@IR(counts = {IRNode.LOAD_VECTOR_I, " >0 "})-> search for aLoadVectornode withtypeint, and maximalsizepossible on the machine (limited by CPU features andMaxVectorSize). This is the most common use case.@IR(failOn = { IRNode.LOAD_VECTOR_L, IRNode.STORE_VECTOR })-> fail if there is aLoadVectorwith typelong, ofanysize.@IR(counts = { IRNode.XOR_VI, IRNode.VECTOR_SIZE_4, " > 0 "})-> find at least oneXorVnode with typeintand exactly4elements. Useful for VectorAPI when the vector species is fixed.@IR(counts = { IRNode.LOAD_VECTOR_D, IRNode.VECTOR_SIZE + "min(4, max_double)", " >0 " })-> search for aLoadVectornode withtypedouble, andsizeexactly equals tomin(4, max_double)(so 4 elements, or if the hardware allows fewerdoubles, then that number).@IR(counts = { IRNode.ABS_VF, IRNode.VECTOR_SIZE + "min(LoopMaxUnroll, max_float)", ">= 1" })-> find at least oneAbsVnodes with typefloat, and thesizeexactly equals to the smaller ofLoopMaxUnrollor the maximal size allowed forfloats(useful for tests where theLoopMaxUnrollis artificially lowered, which sometimes prevents the maximal filling of vectors).@IR(counts = {IRNode.VECTOR_CAST_I2F, IRNode.VECTOR_SIZE + "min(max_int, max_float)", ">0"})-> find at least oneVectorCastI2Xnode that casts to typefloat, and where the size is exactly equals to the smaller maximal size forintsandfloats. This is helpful when there are multiple types in the loop, and the number of elements is limited by the sizes of multiple types.I had to change lots of occurrences, hence you can find many more examples in the tests.
Details
Vector nodes that should be tested for
typeandsizenow are to be created withVECTOR_PREFIXandvectorNode, seeIRNode.java.When specifying such a
vectorNodein an IR rule, one first uses theirNodePlaceholder(egLoad_VECTOR_I), and following it one can optionally add aIRNode.VECTOR_SIZEspecifier, which is then parsed byparseVectorNodeSize. This allows either naming a concrete size (egIRNode.VECTOR_SIZE_8), a tag (IRNode.VECTOR_SIZE + "<tag>") where the the tag can be one of the tags listed inparseVectorNodeSizeTag, or amin(...)clause which computes the minimum value of a comma separated list of tags. As a last resort one can match for any size (IRNode.VECTOR_SIZE_ANY).The maximal vector size for any type is computed in
getMaxElementsForType, under consideration of the CPU features and theMaxVectorSize.Changes to tests
Unfortunately, I had to change a lot of IR rules, though not substantially. Most changes are because we usually had nodes like
MAX_VorLOAD_VECTORwhich matched for any type, and I had to create one node per type now (egMAX_VF, MAX_VD, orLOAD_VECTOR_I, LOAD_VECTOR_L, LOAD_VECTOR_F, ...). While this was a lot of work, it is still good to know that we are generating the nodes with the correct types.In the VectorAPI tests there were many which required concrete sizes due to the concrete size of the vector species. This is nice to test, since it guarantees that the vector species indeed generate the expected vector sizes.
A few tests required more attention, where I had to use patterns like
IRNode.VECTOR_SIZE + "min(...)". These are especially interesting, as they test cases like mixed types (eg casting between types).A few tests had loop iteration counts that were too small (maybe 512), such that the loops were not sufficiently unrolled to reach the maximal vector width. This happened especially with byte cases, which require an unrolling factor of 64 to fill 512bit registers. I increased the loop iteration counts, such that we can also properly test the largest vector widths. This improves our test coverage.
Future Work
There are a few nodes that I did not yet handle with
vectorNode(egVECTOR_REINTERPRET,OR_V_MASK,MACRO_LOGIC_V,LOAD_VECTOR_GATHER(_MASKED)). Some of these only have very few tests and are all from the Vector API which was not my priority here. They can easily be converted should the need arise in the future.While looking at lots of IR tests I also came up with these RFE's:
JDK-8310891 C2 SuperWord tests: move platform requirements to IR rules
JDK-8310523 Add IR tests for nodes that have too few IR tests yet
JDK-8310533 [IR Framework] Add possibility to automatically verify that a test method always returns the same result
Testing
tier1-tier6 and stress-testing.
Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/14539/head:pull/14539$ git checkout pull/14539Update a local copy of the PR:
$ git checkout pull/14539$ git pull https://git.openjdk.org/jdk.git pull/14539/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 14539View PR using the GUI difftool:
$ git pr show -t 14539Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/14539.diff
Webrev
Link to Webrev Comment