8325155: C2 SuperWord: remove alignment boundaries#18822
8325155: C2 SuperWord: remove alignment boundaries#18822eme64 wants to merge 31 commits intoopenjdk:masterfrom
Conversation
|
👋 Welcome back epeter! A progress list of the required criteria for merging this PR into |
|
@eme64 This change now passes all automated pre-integration checks. ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details. After integration, the commit message for the final commit will be: You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed. At the time when this comment was updated there had been 165 new commits pushed to the
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details. ➡️ To integrate this PR with the above commit message to the |
| } | ||
| #endif | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Note: compatibility with def used to be checked via alignment, but now we need to check via is_velt_basic_type_compatible_use_def. For reductions, we only check the "second" input.
| _vloop_analyzer(vloop_analyzer), | ||
| _vloop(vloop_analyzer.vloop()), | ||
| _arena(mtCompiler), | ||
| _node_info(arena(), _vloop.estimated_body_length(), 0, SWNodeInfo::initial), // info needed per node |
There was a problem hiding this comment.
Note: held the "alignment" info, all other fields were already removed in previous refactorings.
src/hotspot/share/opto/superword.cpp
Outdated
| _arena(mtCompiler), | ||
| _node_info(arena(), _vloop.estimated_body_length(), 0, SWNodeInfo::initial), // info needed per node | ||
| _clone_map(phase()->C->clone_map()), // map of nodes created in cloning | ||
| _align_to_ref(nullptr), // memory reference to align vectors to |
There was a problem hiding this comment.
Note: renamed it to _mem_ref_for_main_loop_alignment
| } | ||
| if (longer_type_for_conversion(s) != T_ILLEGAL || | ||
| longer_type_for_conversion(t) != T_ILLEGAL) { | ||
| align = align / data_size(s) * data_size(t); |
There was a problem hiding this comment.
Note: this check was there to ensure the type size of use/def nodes matches. This is now done by is_velt_basic_type_compatible_use_def.
| return true; | ||
| } | ||
| } | ||
| return true; |
There was a problem hiding this comment.
Note: we still check are_adjacent_refs, and non-memops don't need any alignment.
Webrevs
|
chhagedorn
left a comment
There was a problem hiding this comment.
Great cleanup! I have some comments but otherwise, looks good.
src/hotspot/share/opto/superword.cpp
Outdated
| GrowableArray<const VPointer*> vpointers; | ||
|
|
||
| // Collect all valid VPointers. | ||
| for_each_mem([&] (const MemNode* mem, int bb_idx) { |
There was a problem hiding this comment.
The different parts of this method could be nicely put into separate methods which reduces the size of find_adjacent_memop_pairs().
GrowableArray<const VPointer*> vpointers;
collect_valid_vpointers(vpointers);
vpointers.sort();
// trace code
find_adjacent_memops(vpointers);
// trace code
The entire "find adjacent memop pairs" code could also be put into a separate class but I leave it up to you to decide if it's worth or not.
There was a problem hiding this comment.
A class would be nice, but I think I would have to pass around too much for that.
find_adjacent_memop_pairs_in_one_group requires some things like:
_do_vector_loop
same_origin_idx
can_pack_into_pair // especially this one
Not sure it is worth creating a separate class, I think it would become more complicated that way.
src/hotspot/share/opto/superword.cpp
Outdated
| }); | ||
|
|
||
| // Sort the VPointers. This does 2 things: | ||
| // - Separate the VPointer into groups (e.g. all LoadI of the same base and invar). We only need to find adjacent memops inside |
There was a problem hiding this comment.
Maybe you should state here what a "group" is. It is only explained at cmp_for_sort_by_group().
src/hotspot/share/opto/superword.cpp
Outdated
| while (group_end < vpointers.length() && | ||
| VPointer::cmp_for_sort_by_group( | ||
| vpointers.adr_at(group_start), | ||
| vpointers.adr_at(group_end) | ||
| ) == 0) { | ||
| group_end++; |
There was a problem hiding this comment.
This is somewhat hard to read. How about putting this into a separate method? I.e.
int group_start = 0;
while (group_start < vpointers.length()) {
int group_end = find_group_end(vpointers, group_start); // <---- EXTRACTED to new method
find_adjacent_memop_pairs_in_group(vpointers, group_start, group_end);
group_start = group_end;
}
| } | ||
| } | ||
|
|
||
| // Find adjacent memops for a single group, e.g. for all LoadI of the same base, invar, etc. |
There was a problem hiding this comment.
You should mention here that this method finally adds a new pair to the _pairset. On a separate note, find_adjacent_memop_pairs_in_group() suggests that we find something but we actually "find and add" something without returning anything from the method. Should we make this more clear in the method name?
src/hotspot/share/opto/superword.cpp
Outdated
| return true; | ||
| } | ||
|
|
||
| bool SuperWord::is_velt_basic_type_compatible_use_def(Node* use, int idx) const { |
There was a problem hiding this comment.
Maybe add a comment here to quickly explain that compatible means "output size of the def node matches the input size of the use node".
src/hotspot/share/opto/superword.cpp
Outdated
| return true; | ||
| } | ||
|
|
||
| if (!is_velt_basic_type_compatible_use_def(use, u_idx)) { |
There was a problem hiding this comment.
Might be easier to directly put in def as defined on L2764 instead of passing the index to the def.
There was a problem hiding this comment.
Good point. I think I used to require idx in a previous iteration, but not any more!
Co-authored-by: Christian Hagedorn <christian.hagedorn@oracle.com>
|
|
|
Thanks @chhagedorn for the review! I think I addressed all your points. Except for this:
I have not yet found a better name. I think |
Co-authored-by: Christian Hagedorn <christian.hagedorn@oracle.com>
|
FYI: I ran performance benchmarking, and there was no significant difference. |
|
Thank you for running performance testing. |
|
@vnkozlov thanks for the review! I will integrate as soon as the JDK24 fork happens ;) |
|
Thanks @chhagedorn @vnkozlov for the reviews! |
|
Going to push as commit 944aeb8.
Your commit was automatically rebased without conflicts. |
I have tried for a very long time to get rid of all the
alignment(n)code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it.I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code.
This is what I did in this PR:
_node_info: used to have many fields, which I refactored out to theVLoopAnalyzermodules.alignmentis the last component, which I now remove.SuperWord::find_adjacent_refs, nowSuperWord::find_adjacent_memop_pairs, completely:memopsrepeatedly, try to find somemem_refand see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse.memopsinto a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost toO(n log n)for sort, and a linear scan for finding adjacent memops.SuperWord::memory_alignmentbyint off_rem = offset % vw;.31, 32, which are adjacent in theory, but if we have avw = 32, then the modulo-offsets are31, 0, and they are not detected as adjacent).alignmentused to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node.alignment, even the non-memop nodes. Thisalignmentwas then scaled up and down at type casts (e.g. int0, 4, 8, 12-> long0, 8, 16, 24). If the output-size of the def-node did not match the input-size of the use-node, then thealignmentwould not match up, and we would not pack.alignment(s1) + data_size(s1) == alignment(s2)ands2_align == align + data_size(s1), and why we didset_alignment(s2, align + data_size(s1));insideSuperWord::set_alignment(Node* s1, Node* s2, int align).SuperWord::profitable(bad name, it has always been more about checking consistency than profitability, but I will rename that in a Future RFE). The relevant code is inSuperWord::is_velt_basic_type_compatible_use_def.Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18822/head:pull/18822$ git checkout pull/18822Update a local copy of the PR:
$ git checkout pull/18822$ git pull https://git.openjdk.org/jdk.git pull/18822/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 18822View PR using the GUI difftool:
$ git pr show -t 18822Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18822.diff
Webrev
Link to Webrev Comment