Skip to content

8325155: C2 SuperWord: remove alignment boundaries#18822

Closed
eme64 wants to merge 31 commits intoopenjdk:masterfrom
eme64:JDK-8325155-rm-alignment-boundaries
Closed

8325155: C2 SuperWord: remove alignment boundaries#18822
eme64 wants to merge 31 commits intoopenjdk:masterfrom
eme64:JDK-8325155-rm-alignment-boundaries

Conversation

@eme64
Copy link
Contributor

@eme64 eme64 commented Apr 17, 2024

I have tried for a very long time to get rid of all the alignment(n) code that is all over the SuperWord code. With lots of previous work, I am now finally ready to remove it.

I was able to remove lots of VM code, about 300 lines. And the removed code is I think much more complicated than the new code.

This is what I did in this PR:

  • Removal of _node_info: used to have many fields, which I refactored out to the VLoopAnalyzer modules. alignment is the last component, which I now remove.
  • Changed the implementation of SuperWord::find_adjacent_refs, now SuperWord::find_adjacent_memop_pairs, completely:
    • It used to be an algorithm that would scan over all memops repeatedly, try to find some mem_ref and see which other memops were comparable, and then pack pairs for all of those, by comparing all-vs-all memops. This algorithm is at least quadratic, if not much worse.
    • I now add all memops into a single array, sort them by groups (those that are comparable with each other and could be packed into vectors), and inside the groups by ascending offset. This allows me to split off the groups much more efficiently, and also the sorting by offset allows me finding adjacent pairs much more efficiently. In the most cases this reduces the cost to O(n log n) for sort, and a linear scan for finding adjacent memops.
  • I removed the "alignment boundaries" created in SuperWord::memory_alignment by int off_rem = offset % vw;.
    • This used to have the effect that all offsets were computed modulo the vector width. Hence, pairs could not be packed across this boundary (e.g. we have nodes with offsets 31, 32, which are adjacent in theory, but if we have a vw = 32, then the modulo-offsets are 31, 0, and they are not detected as adjacent).
    • These "alignment boundaries" used to be required for correctness about a year ago, before I fixed and relaxed much of the alignment code.
  • The alignment used to have another important task: Ensuring compatibility of the input-size of a use node, with the output-size of the def-node.
    • This was done by giving all nodes an alignment, even the non-memop nodes. This alignment was then scaled up and down at type casts (e.g. int 0, 4, 8, 12 -> long 0, 8, 16, 24). If the output-size of the def-node did not match the input-size of the use-node, then the alignment would not match up, and we would not pack.
    • This is why we used to have checks like alignment(s1) + data_size(s1) == alignment(s2) and s2_align == align + data_size(s1), and why we did set_alignment(s2, align + data_size(s1)); inside SuperWord::set_alignment(Node* s1, Node* s2, int align).
    • I decided to NOT check if use/def type sizes match during packing, but only much later in SuperWord::profitable (bad name, it has always been more about checking consistency than profitability, but I will rename that in a Future RFE). The relevant code is in SuperWord::is_velt_basic_type_compatible_use_def.

Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8325155: C2 SuperWord: remove alignment boundaries (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18822/head:pull/18822
$ git checkout pull/18822

Update a local copy of the PR:
$ git checkout pull/18822
$ git pull https://git.openjdk.org/jdk.git pull/18822/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 18822

View PR using the GUI difftool:
$ git pr show -t 18822

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18822.diff

Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Apr 17, 2024

👋 Welcome back epeter! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Apr 17, 2024

@eme64 This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8325155: C2 SuperWord: remove alignment boundaries

Reviewed-by: chagedorn, kvn

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 165 new commits pushed to the master branch:

  • 2a37764: 8333743: Change .jcheck/conf branches property to match valid branches
  • 75dc2f8: 8330182: Start of release updates for JDK 24
  • 054362a: 8332550: [macos] Voice Over: java.awt.IllegalComponentStateException: component must be showing on the screen to determine its location
  • 9b436d0: 8333674: Disable CollectorPolicy.young_min_ergo_vm for PPC64
  • 487c477: 8333647: C2 SuperWord: some additional PopulateIndex tests
  • d02cb74: 8333270: HandlersOnComplexResetUpdate and HandlersOnComplexUpdate tests fail with "Unexpected reference" if timeoutFactor is less than 1/3
  • 02f2404: 8333560: -Xlint:restricted does not work with --release
  • 606df44: 8332670: C1 clone intrinsic needs memory barriers
  • 33fd6ae: 8333622: ubsan: relocInfo_x86.cpp:101:56: runtime error: pointer index expression with base (-1) overflowed
  • 8de5d20: 8332865: ubsan: os::attempt_reserve_memory_between reports overflow
  • ... and 155 more: https://git.openjdk.org/jdk/compare/da6aa2a86c86ba5fce747b36dcb2d6001cfcc44e...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

➡️ To integrate this PR with the above commit message to the master branch, type /integrate in a new comment.

@openjdk openjdk bot changed the title 8325155 8325155: C2 SuperWord: remove alignment boundaries Apr 17, 2024
@openjdk
Copy link

openjdk bot commented Apr 17, 2024

@eme64 The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-compiler hotspot-compiler-dev@openjdk.org label Apr 17, 2024
}
#endif
return true;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: compatibility with def used to be checked via alignment, but now we need to check via is_velt_basic_type_compatible_use_def. For reductions, we only check the "second" input.

_vloop_analyzer(vloop_analyzer),
_vloop(vloop_analyzer.vloop()),
_arena(mtCompiler),
_node_info(arena(), _vloop.estimated_body_length(), 0, SWNodeInfo::initial), // info needed per node
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: held the "alignment" info, all other fields were already removed in previous refactorings.

_arena(mtCompiler),
_node_info(arena(), _vloop.estimated_body_length(), 0, SWNodeInfo::initial), // info needed per node
_clone_map(phase()->C->clone_map()), // map of nodes created in cloning
_align_to_ref(nullptr), // memory reference to align vectors to
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: renamed it to _mem_ref_for_main_loop_alignment

}
if (longer_type_for_conversion(s) != T_ILLEGAL ||
longer_type_for_conversion(t) != T_ILLEGAL) {
align = align / data_size(s) * data_size(t);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: this check was there to ensure the type size of use/def nodes matches. This is now done by is_velt_basic_type_compatible_use_def.

return true;
}
}
return true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: we still check are_adjacent_refs, and non-memops don't need any alignment.

@eme64 eme64 marked this pull request as ready for review May 13, 2024 07:59
@openjdk openjdk bot added the rfr Pull request is ready for review label May 13, 2024
@mlbridge
Copy link

mlbridge bot commented May 13, 2024

Webrevs

Copy link
Member

@chhagedorn chhagedorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great cleanup! I have some comments but otherwise, looks good.

GrowableArray<const VPointer*> vpointers;

// Collect all valid VPointers.
for_each_mem([&] (const MemNode* mem, int bb_idx) {
Copy link
Member

@chhagedorn chhagedorn May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The different parts of this method could be nicely put into separate methods which reduces the size of find_adjacent_memop_pairs().

GrowableArray<const VPointer*> vpointers;
collect_valid_vpointers(vpointers);
vpointers.sort();
// trace code
find_adjacent_memops(vpointers);
// trace code

The entire "find adjacent memop pairs" code could also be put into a separate class but I leave it up to you to decide if it's worth or not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A class would be nice, but I think I would have to pass around too much for that.
find_adjacent_memop_pairs_in_one_group requires some things like:

_do_vector_loop
same_origin_idx
can_pack_into_pair // especially this one

Not sure it is worth creating a separate class, I think it would become more complicated that way.

});

// Sort the VPointers. This does 2 things:
// - Separate the VPointer into groups (e.g. all LoadI of the same base and invar). We only need to find adjacent memops inside
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you should state here what a "group" is. It is only explained at cmp_for_sort_by_group().

Comment on lines 520 to 525
while (group_end < vpointers.length() &&
VPointer::cmp_for_sort_by_group(
vpointers.adr_at(group_start),
vpointers.adr_at(group_end)
) == 0) {
group_end++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is somewhat hard to read. How about putting this into a separate method? I.e.

  int group_start = 0;
  while (group_start < vpointers.length()) {
    int group_end = find_group_end(vpointers, group_start); // <---- EXTRACTED to new method
    find_adjacent_memop_pairs_in_group(vpointers, group_start, group_end);
    group_start = group_end;
  }

}
}

// Find adjacent memops for a single group, e.g. for all LoadI of the same base, invar, etc.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should mention here that this method finally adds a new pair to the _pairset. On a separate note, find_adjacent_memop_pairs_in_group() suggests that we find something but we actually "find and add" something without returning anything from the method. Should we make this more clear in the method name?

return true;
}

bool SuperWord::is_velt_basic_type_compatible_use_def(Node* use, int idx) const {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment here to quickly explain that compatible means "output size of the def node matches the input size of the use node".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

return true;
}

if (!is_velt_basic_type_compatible_use_def(use, u_idx)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be easier to directly put in def as defined on L2764 instead of passing the index to the def.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think I used to require idx in a previous iteration, but not any more!

eme64 and others added 3 commits May 28, 2024 18:47
Co-authored-by: Christian Hagedorn <christian.hagedorn@oracle.com>
@openjdk
Copy link

openjdk bot commented May 28, 2024

⚠️ @eme64 This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

@eme64
Copy link
Contributor Author

eme64 commented May 28, 2024

Thanks @chhagedorn for the review! I think I addressed all your points. Except for this:

You should mention here that this method finally adds a new pair to the _pairset. On a separate note, find_adjacent_memop_pairs_in_group() suggests that we find something but we actually "find and add" something without returning anything from the method. Should we make this more clear in the method name?

I have not yet found a better name. I think find is ok here. I guess we could have it be find_and_add but this seems unnecessarily long to me.

Copy link
Member

@chhagedorn chhagedorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing the updates, looks good now! find -> create is a good solution :-)

@openjdk openjdk bot added the ready Pull request is ready to be integrated label May 29, 2024
Co-authored-by: Christian Hagedorn <christian.hagedorn@oracle.com>
@eme64
Copy link
Contributor Author

eme64 commented May 30, 2024

FYI: I ran performance benchmarking, and there was no significant difference.

Copy link
Contributor

@vnkozlov vnkozlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved.

@vnkozlov
Copy link
Contributor

vnkozlov commented Jun 4, 2024

Thank you for running performance testing.

@eme64
Copy link
Contributor Author

eme64 commented Jun 4, 2024

@vnkozlov thanks for the review! I will integrate as soon as the JDK24 fork happens ;)

@eme64
Copy link
Contributor Author

eme64 commented Jun 7, 2024

Thanks @chhagedorn @vnkozlov for the reviews!
/integrate

@openjdk
Copy link

openjdk bot commented Jun 7, 2024

Going to push as commit 944aeb8.
Since your change was applied there have been 167 commits pushed to the master branch:

  • d8af589: 8026127: Deflater/Inflater documentation incomplete/misleading
  • 6238bc8: 8333456: CompactNumberFormat integer parsing fails when string has no suffix
  • 2a37764: 8333743: Change .jcheck/conf branches property to match valid branches
  • 75dc2f8: 8330182: Start of release updates for JDK 24
  • 054362a: 8332550: [macos] Voice Over: java.awt.IllegalComponentStateException: component must be showing on the screen to determine its location
  • 9b436d0: 8333674: Disable CollectorPolicy.young_min_ergo_vm for PPC64
  • 487c477: 8333647: C2 SuperWord: some additional PopulateIndex tests
  • d02cb74: 8333270: HandlersOnComplexResetUpdate and HandlersOnComplexUpdate tests fail with "Unexpected reference" if timeoutFactor is less than 1/3
  • 02f2404: 8333560: -Xlint:restricted does not work with --release
  • 606df44: 8332670: C1 clone intrinsic needs memory barriers
  • ... and 157 more: https://git.openjdk.org/jdk/compare/da6aa2a86c86ba5fce747b36dcb2d6001cfcc44e...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot added the integrated Pull request has been integrated label Jun 7, 2024
@openjdk openjdk bot closed this Jun 7, 2024
@openjdk openjdk bot removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jun 7, 2024
@openjdk
Copy link

openjdk bot commented Jun 7, 2024

@eme64 Pushed as commit 944aeb8.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler hotspot-compiler-dev@openjdk.org integrated Pull request has been integrated

Development

Successfully merging this pull request may close these issues.

3 participants