Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8252842: Extend jmap to support parallel heap dump #2261

Closed
wants to merge 52 commits into from

Conversation

linzang
Copy link
Contributor

@linzang linzang commented Jan 27, 2021


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8252842: Extend JMap to support parallel heap dump

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/2261/head:pull/2261
$ git checkout pull/2261

Update a local copy of the PR:
$ git checkout pull/2261
$ git pull https://git.openjdk.java.net/jdk pull/2261/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 2261

View PR using the GUI difftool:
$ git pr show -t 2261

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/2261.diff

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 27, 2021

👋 Welcome back lzang! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Jan 27, 2021

@linzang The following labels will be automatically applied to this pull request:

  • core-libs
  • hotspot-runtime
  • serviceability

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added serviceability serviceability-dev@openjdk.org hotspot-runtime hotspot-runtime-dev@openjdk.org core-libs core-libs-dev@openjdk.org labels Jan 27, 2021
@linzang
Copy link
Contributor Author

linzang commented Jan 27, 2021

/csr needed

@openjdk openjdk bot added the csr Pull request needs approved CSR before integration label Jan 27, 2021
@openjdk
Copy link

openjdk bot commented Jan 27, 2021

@linzang this pull request will not be integrated until the CSR request JDK-8260424 for issue JDK-8252842 has been approved.

@openjdk openjdk bot added the rfr Pull request is ready for review label Jan 27, 2021
@AlanBateman
Copy link
Contributor

/label remove core-libs

@openjdk openjdk bot removed the core-libs core-libs-dev@openjdk.org label Jan 27, 2021
@openjdk
Copy link

openjdk bot commented Jan 27, 2021

@AlanBateman
The core-libs label was successfully removed.

// Not that making all arguments as a whole string like jcmd did does not guarantee the
// compatiabilty when the new jmap is uses on old version of JDK.
// See AttachOperation::arg_count_max in attachListener.hpp for argument count limitation.
String more_args = compress_level + "," + parallel;
// dumpHeap is not the same as jcmd GC.heap_dump
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To-discuss:
Compatibility issue that new version of jmap work on old JDK may wrongly processing the following case:
jmap -dump:file=dump.hprof,gz=1,parallel=2
this will cause the gz option not recognized correctly.

Can not simply change arg_count_max to 4, because it could cause JDK hang on waiting for arguments when old jmap is used.

@linzang
Copy link
Contributor Author

linzang commented Jan 27, 2021

Dear All,
I have made a draft patch on parallel heap dump of jmap.
This patch is still WIP with test case under development and code refining is in process.

But I want to firstly ask your help to discuss a potential compatibility issue that may need to solve:

This patch introduces a new "parallel" option to jmap, so after this patch, there can be at most 4 options for jmap that can be passed to JDK.

The issue is that there is a limitation in old version of jmap that at most 3 arguments can be accepted. As I commented in File JMap.java. and I can not simply enlarge the limitation to 4 because that will cause old version of jmap hang when working on new JDK.

So I use the 3rd argument of jmap to be a combined string and parse it to JDK, and the attachListener.cpp is modified to add the logic for parsing it to get gzlevel and parallel thread number.

The current status is that, with current change in this patch, old jmap could work normally on new JDK, and new jmap without "parallel" could work correctly with old JDK.

But the problem comes when new jmap use "gz=1, parallel=2" options together to communicate to an old JDK, the "gz" option takes no effect. The root cause is that old JDK's attachListener.cpp does not have the parsing logic, so it treat "gz=1, parallel=2" as a whole argument, and hence can not recognize it.

May I ask your suggestion to see whether this is behavior is acceptable, and if not, any advice for fixing the compatibility issue?

Thanks!
Lin

@plummercj
Copy link
Contributor

This patch introduces a new "parallel" option to jmap, so after this patch, there can be at most 4 options for jmap that can be passed to JDK.

The issue is that there is a limitation in old version of jmap that at most 3 arguments can be accepted. As I commented in File JMap.java. and I can not simply enlarge the limitation to 4 because that will cause old version of jmap hang when working on new JDK.

A year or so ago we ran into this same issue. I'll see if I can find the CR for it.

@plummercj
Copy link
Contributor

plummercj commented Jan 27, 2021

It looks like JDK-8215622 introduced a 4th argument to the attach API, and this caused hangs when and older JVMs tried to attach. That was fixed by JDK-8219721, and seemed to revert the arg count back to 3, although it's not too clear to me how this was accomplished. The webrev is here:

http://cr.openjdk.java.net/~xiaofeya/8219721/webrev.01/

Also at the same time the following was filed, which has not yet been addressed:

JDK-8219896 Parameter size mismatch between client and VM sides of the Attach API

@linzang JDK-8215622 and JDK-8219721 were your CRs, so maybe you can elaborate on them a bit more.

@linzang linzang marked this pull request as draft January 28, 2021 02:27
@openjdk openjdk bot removed the rfr Pull request is ready for review label Jan 28, 2021
@linzang
Copy link
Contributor Author

linzang commented Jan 28, 2021

Dear @plummercj
You are right, the same issue has been encountered when I was developing JDK-8215622. At that time the argument number does not actully exceed the limitation (3), and so I made a quick fix to change back arg_count_max to 3. And the reason that I originally change the arg_count_max is I was trying to add more options to jmap, and finally only "file" and "parallel" were accepted, so the arg_count_max never exceed the limitation.

Then it comes to this patch, with "parallel" added to jmap -dump, there can be 4 arguments and we need to handle the arg count limitation. Here are the experiments I have been tried recently (and also previously when handling JDK-8219721):

  • Enlarge arg_count_max from 3 to 4

    This cause the same issue as describe by JDK-8219721, when using an old jcmd tools to communicate with new JDK, it hangs, which is a huge compatibitily issue, so we need to fix it.

  • Let jmap pass all arguments as a whole one, just like what jcmd does, and modify attachlistener.cpp to parse arguments from op->arg(0), and also distinguish old/new jmap by testing whether op->arg(1) is null.

    This works well until the new jmap is used to communicate with old JDK. In this case, the old JDK doesn't parse op->arg(0), which is passed by jmap as a string containing all arguments. So all parameters are treated as dump filename incorrectly.

  • Let jmap still passing three arguents, and combine the "gz" and "parallel" together as a string and pass it to JDK, and modify attachlistener.cpp to parse the 3rd arguments from op->arg(2).

    This works well for old/new jmap targeting on old/new JDK when "parallel" and "gz" options are not used together. However, it has the issue when new jmap targeting to old JDK and both "parallel" and "gz" are used. because old JDK
    can not parse op->arg(3), so the "gz" option will not work. Although I don't think it is reasonable to use new jmap with "parallel" option on an old JDK, but it is possible, and the issue is that old JDK will not prompt any error message in this case, it just dump the heap by ignoring the "gz" value.

  • I also worked out a patch using timeout mechanism in socket.

This solution seems work, so the arg_count_max could be enlarged to 4, and when old jmap is targeting on new JDK, the socket can be timeout on waiting for the 4th arguments while old jmap only provide 3. And then it could work as expected. But this seems need to change the socket for different platform, and I just verified it on Linux, I am not sure whether it is acceptable.

BTW, Do you think we need to solve this issue seperately? Maybe have a patch for JDK-8219896? And when it is fixed, appling the parallel dump patch on top of that?

Thanks,
Lin

@sspitsyn
Copy link
Contributor

Hi Lin,
It is also in my memory that you actually did not have 4 arguments.
The real incompatibility issue was that the order of arguments was swapped.
It is why it was relatively easy to fall back and just update the constants with 3 instead of 4 and swap the order of arguments back.
Thanks,
Serguei

@linzang
Copy link
Contributor Author

linzang commented Jan 28, 2021

Dear Serguei,
Yes, you are right!
However, for this PR it seems I have to deal with four arguments for jmap -d : file, live, gz=, parallal=.

BRs,
Lin

@linzang
Copy link
Contributor Author

linzang commented Jan 28, 2021

Dear All,
I have another idea. How about I don't introduce parallel parameter to jmap , but instead just enable parallel heap dump by default as long as there is more than one active workers?
In this way there is no need to have an parallel option for jmap, and this PR don't even bother with a CSR.
The down side is that user can not control how much threads they want to use for parallel dump. (P.S This may not even guaranteed with "parallel" option because in JDK the actually number of thread used is decided by number of compression backends and active workers).
what's your opinion?

BRs,
Lin

@plummercj
Copy link
Contributor

It is also in my memory that you actually did not have 4 arguments.
The real incompatibility issue was that the order of arguments was swapped.

Ah, now the fix makes a lot more sense. I was always wondering how changing the order allowed reducing the max args to 3, but apparently the two changes are not related. Also, it seems them that JDK-8219721 does not have a correct analysis of the issue. It says:

JDK-8215622 increased number of arguments of Attach API [1].
Thus AttachListener will be waiting the command from client infinitely.

Perhaps it should be updated to avoid future confusion.

@plummercj
Copy link
Contributor

I think if you can avoid the 4th argument by just enabling parallel by default sounds like a good idea. Is there any reason not to use parallel heap dump? Also, I'm not familiar with the terms "compression backends" and "active workers". Can you explain them.

@linzang
Copy link
Contributor Author

linzang commented Jan 29, 2021

I think if you can avoid the 4th argument by just enabling parallel by default sounds like a good idea. Is there any reason not to use parallel heap dump? Also, I'm not familiar with the terms "compression backends" and "active workers". Can you explain them.

For parallel heap dump, I think expose the "parallel" option is useful when user want to control the number of threads that works on parallel dumping. One reason is to control the CPU load to not affect too much on other processes on system.

The "active workers" is the value that returned by gang->active_workers(). And I think "active workers" is the number of available threads that could be used for parallally working on something. my understanding is that this value could be dynamically changed at runtime, but it is limited by the JVM option "-XX:ParallelGCThreads".

The "compression backend" is introduced by "gz=" option, it is actually a "backend" that write heap dump data to file, and if there is compression level specified, the "backend" will do compression first and then write to file. The whole "backend" works parallelly, before this PR, the backend thread number is dicided by "active workers".

With this PR, the relation of parallel heap iteration threads (parallel dumper), the file writing threads (compression backend) and the "active_works" is like following:
num_of_dumper_threads + num_of_compression_backend = active_workers.

BRs,
Lin

@plummercj
Copy link
Contributor

Seems the #5410 has made some change that will affect this one. I am wondering should I start to rebase this PR based on that ?

It looks like at most there would be one conflict, and it is trivial. You'll eventually have to merge before integrating anyway.

Copy link
Contributor

@schmelter-sap schmelter-sap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change seems OK so far, but I've found a few issues.

src/hotspot/share/services/attachListener.cpp Show resolved Hide resolved
src/hotspot/share/services/heapDumper.cpp Outdated Show resolved Hide resolved
src/hotspot/share/services/heapDumper.cpp Outdated Show resolved Hide resolved
src/hotspot/share/services/heapDumper.cpp Outdated Show resolved Hide resolved
src/hotspot/share/services/heapDumper.cpp Outdated Show resolved Hide resolved
src/hotspot/share/services/heapDumper.cpp Outdated Show resolved Hide resolved
src/hotspot/share/services/heapDumper.cpp Outdated Show resolved Hide resolved
src/hotspot/share/services/heapDumper.cpp Outdated Show resolved Hide resolved
@linzang
Copy link
Contributor Author

linzang commented Sep 14, 2021

Hi Ralf @schmelter-sap ,
Thanks for your review and comments! The change is conflict with latest master, so I will rebase it first and then fix the issue you mentioned.

BRs,
Lin

@linzang
Copy link
Contributor Author

linzang commented Sep 14, 2021

rebased with latest master, and made some change to fix the conflict issue. I will handle Ralf's comments asap. Thanks.

@openjdk
Copy link

openjdk bot commented Sep 14, 2021

⚠️ @linzang This pull request contains merges that bring in commits not present in the target repository. Since this is not a "merge style" pull request, these changes will be squashed when this pull request in integrated. If this is your intention, then please ignore this message. If you want to preserve the commit structure, you must change the title of this pull request to Merge <project>:<branch> where <project> is the name of another project in the OpenJDK organization (for example Merge jdk:master).

@linzang
Copy link
Contributor Author

linzang commented Sep 16, 2021

Hi Ralf @schmelter-sap,
I have updated the PR, would you like to help review it again? Thanks!

BRs,
Lin

Copy link
Contributor

@schmelter-sap schmelter-sap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@linzang
Copy link
Contributor Author

linzang commented Sep 21, 2021

Thanks @schmelter-sap and @plummercj for your help review and approve this PR!
Since this is a non-trivial change, I would like to wait for 2 days before integrate it in case more comments from others.

@linzang
Copy link
Contributor Author

linzang commented Sep 23, 2021

/integrate

@openjdk
Copy link

openjdk bot commented Sep 23, 2021

Going to push as commit a74c099.
Since your change was applied there have been 118 commits pushed to the master branch:

  • 2166ed1: 8273894: ConcurrentModificationException raised every time ReferralsCache drops referral
  • 1c6fa11: 8273979: move some os time related functions to os_posix for POSIX platforms
  • 45adc92: 8273578: javax/swing/JMenu/4515762/bug4515762.java fails on macOS 12
  • 0fbbe4c: 8274033: Some tier-4 CDS EpsilonGC tests throw OOM
  • 9d3379b: 8267356: AArch64: Vector API SVE codegen support
  • 6031388: 8273714: jdk/jfr/api/consumer/TestRecordedFrame.java still times out after JDK-8273047
  • 8821b00: 8205137: Remove Applet support from SwingSet2
  • 57fe11c: 8274120: [JVMCI] CompileBroker should resolve parameter types for JVMCI compiles
  • 81d4164: 8272759: (fc) java/nio/channels/FileChannel/Transfer2GPlus.java failed in timeout
  • da38ced: 8271602: Add Math.ceilDiv() family parallel to Math.floorDiv() family
  • ... and 108 more: https://git.openjdk.java.net/jdk/compare/fe89dd3b0d47807c7dbfe24d17f6aca152fc8908...master

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Sep 23, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 23, 2021
@openjdk
Copy link

openjdk bot commented Sep 23, 2021

@linzang Pushed as commit a74c099.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

@linzang linzang deleted the par-dump branch September 23, 2021 07:05
@dholmes-ora
Copy link
Member

dholmes-ora commented Sep 23, 2021

@linzang this change is causing test crashes on Windows with ZGC and EpsilonGC - please see JDK-8274196

@linzang
Copy link
Contributor Author

linzang commented Sep 23, 2021

Thanks David @dholmes-ora,
I will take look at the issue.

BRs,
Lin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-runtime hotspot-runtime-dev@openjdk.org integrated Pull request has been integrated serviceability serviceability-dev@openjdk.org
7 participants