New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JDK-8277178: Reduce the priority of data dependent nodes when OptoScheduling enabled #6407
Conversation
|
Webrevs
|
I'm not too familiar with this code but I gave it a quick run through our performance testing and all results look good except for the MonteCarlo benchmark from SPECjvm2008 with G1 which shows a 1% regression. It could just be run-to-run variance but better double check. Also, I think it would be good to add a JMH benchmark for this. |
@sunny868 This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
I'm very sorry, I can't receive the mailing list recently, so I don't see your feedback in time. |
I found that after phase ScheduleAndBundle(), the instruction sequence (good in lcm.cpp) will be adjusted again, and the adjusted sequence is data dependent. Maybe it's more reasonable to cancel the use of ScheduleAndBundle(). I'm not sure about this problem. |
In addition to aarch64 C2's scheduler is used on some x86 platforms so you can not just remove part of it without proving that it helps on them too: I would suggest to investigate |
I have add a new benmark test file (InstructionScheduling.java). I had make a test for LoongArch and MIPS64 architecture , and can significantly observe the performance improvement(about 30%). However, the performance improvement on aarch64 architecture is not obvious, and the test jitter is very large. I don't have Silvermont or Centerton x86 machine, so can you make a performance testing again for this patch? Thank you very much. |
@@ -0,0 +1,44 @@ | |||
/* | |||
* Copyright (c) 2022, Loongson Technology. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sunny868 ,
We already have the copyright header for Loongson like the following.
./hotspot/jtreg/compiler/print/PrintCompileQueue.java: * Copyright (c) 2019, Loongson Technology Co. Ltd. All rights reserved.
./hotspot/jtreg/compiler/compilercontrol/CompilationModeHighOnlyTest.java: * Copyright (c) 2019, Loongson Technology Co. Ltd. All rights reserved.
./hotspot/jtreg/compiler/profiling/TestProfileCounterOverflow.java: * Copyright (c) 2019, Loongson Technology Co. Ltd. All rights reserved.
I'm not sure whether it had been changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sunny868 ,
We already have the copyright header for Loongson like the following.
./hotspot/jtreg/compiler/print/PrintCompileQueue.java: * Copyright (c) 2019, Loongson Technology Co. Ltd. All rights reserved. ./hotspot/jtreg/compiler/compilercontrol/CompilationModeHighOnlyTest.java: * Copyright (c) 2019, Loongson Technology Co. Ltd. All rights reserved. ./hotspot/jtreg/compiler/profiling/TestProfileCounterOverflow.java: * Copyright (c) 2019, Loongson Technology Co. Ltd. All rights reserved.
I'm not sure whether it had been changed.
typo
From many test results for InstructionScheduling.java on LoongArch and aarch64, |
for (int i=0; i<N; i++) { | ||
D[i] += D[i] * fval; | ||
D[i] += D[i] / fval; | ||
I[i] += I[i]*2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why no space before and after *
?
|
||
@Benchmark | ||
public void testMethod(){ | ||
for (int i=0; i<N; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest adding spaces for =
and <
.
Thanks.
test/micro/org/openjdk/bench/vm/compiler/InstructionScheduling.java
Outdated
Show resolved
Hide resolved
for (int i = 0; i < N; i++) { | ||
D[i] += D[i] * fval; | ||
D[i] += D[i] / fval; | ||
I[i] += I[i] * 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we define a variable for 2
like the one for 2.00
?
My tier1-3 testing passed clean (x86 and Aarch64).
Also please check
|
In general, inserting instructions between load value and its use can lead to spilling on stack if registers pressure is high. We saw such cases before. We should be careful here because your current changes affect all platforms. |
Thanks, I will further confirm whether this patch will have an impact on register pressure. |
After rerun I see less variations on x86 (or different benchmarks were affected). |
regression? |
I also done a test for InstructionScheduling.java on Linux/aarch64 with args
Maybe the pipeline configuration in the aarch64.ad file is unreasonable. |
I asked our performance experts and they observed before variations I saw on Aarch64. It seems they are not caused by your changes. You did not answered by suggestion about added |
Yes, you are right. MachNode latency in lcm will be 0 when OptoScheduling is false. so add |
@sunny868 This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
@sunny868 This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the |
when doing gcm/lcm, We should not only consider the height of nodes(latency), but also consider whether there is data dependency between nodes. When there is data dependency between two nodes and the delay of the previous node is large, another node without data dependency can be considered inserting between the two nodes. For example:
for java code
when use
-XX:+OptoScheduling
in aarch64, the sequence isThen a more efficient sequence should be:
This problem also exists in MIPS architecture. This is a patch to fix this problem. Please help review it.
Thanks
Progress
Issue
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6407/head:pull/6407
$ git checkout pull/6407
Update a local copy of the PR:
$ git checkout pull/6407
$ git pull https://git.openjdk.java.net/jdk pull/6407/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 6407
View PR using the GUI difftool:
$ git pr show -t 6407
Using diff file
Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6407.diff