Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8278282: G1: Log basic statistics of evacuation failure #6763

Closed
wants to merge 5 commits into from

Conversation

Hamlin-Li
Copy link

@Hamlin-Li Hamlin-Li commented Dec 8, 2021

This is to get information about the pause time distribution (prepare(copy, sorting, …) , process (iterate) and cleanup) and region/objects/size statistics when processing evacuation failure objects in “Remove Self Forwards”, this information will be helpful when optimize the evacuation failure processing subsequently, and will also be helpful for users to analyze and troubleshoot in the future.

[10.917s][debug][gc,phases] GC(0)       Restore Retained Regions (ms): ...
[10.917s][debug][gc,phases] GC(0)         Retained Regions:              ...
[10.917s][trace][gc,phases] GC(0)         Prepare Retained Object Refs (ms): ...
[10.917s][trace][gc,phases] GC(0)         Reformat Retained Regions (ms): ...
[10.917s][trace][gc,phases] GC(0)           Retained Objects:              ...
[10.917s][trace][gc,phases] GC(0)           Retained Bytes:                ...
[10.917s][trace][gc,phases] GC(0)         Reclaim Memory (ms):           ...
[10.917s][trace][gc,phases] GC(0)           Used [Native] Memory:          ...

Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issues

  • JDK-8278282: G1: Log basic statistics of evacuation failure
  • JDK-8278283: G1: Log further detailed statistics of evacuation failure

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/6763/head:pull/6763
$ git checkout pull/6763

Update a local copy of the PR:
$ git checkout pull/6763
$ git pull https://git.openjdk.java.net/jdk pull/6763/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 6763

View PR using the GUI difftool:
$ git pr show -t 6763

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/6763.diff

@bridgekeeper
Copy link

@bridgekeeper bridgekeeper bot commented Dec 8, 2021

👋 Welcome back mli! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr label Dec 8, 2021
@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Dec 8, 2021

/issue add JDK-8278283

@openjdk
Copy link

@openjdk openjdk bot commented Dec 8, 2021

@Hamlin-Li
Adding additional issue to issue list: 8278283: G1: Log further detailed statistics of evacuation failure.

@openjdk
Copy link

@openjdk openjdk bot commented Dec 8, 2021

@Hamlin-Li The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot label Dec 8, 2021
@mlbridge
Copy link

@mlbridge mlbridge bot commented Dec 8, 2021

Webrevs

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Dec 10, 2021

Seems the jdk mail service was done, ping ~

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Dec 10, 2021

/label hotspot-gc

@openjdk openjdk bot added the hotspot-gc label Dec 10, 2021
@openjdk
Copy link

@openjdk openjdk bot commented Dec 10, 2021

@Hamlin-Li
The hotspot-gc label was successfully added.

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Dec 10, 2021

ping again ~ test mail service

@kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Dec 10, 2021

Haven't looked in detail at the patch yet, but I have a comment on the output.

[10.917s][debug][gc,phases] GC(0)       Restore Retained Regions (ms): ...
[10.917s][debug][gc,phases] GC(0)         Retained Regions:              ...
[10.917s][trace][gc,phases] GC(0)         Prepare Retained Object Refs (ms): ...
[10.917s][trace][gc,phases] GC(0)         Reformat Retained Regions (ms): ...
[10.917s][trace][gc,phases] GC(0)           Retained Objects:              ...
[10.917s][trace][gc,phases] GC(0)           Retained Bytes:                ...
[10.917s][trace][gc,phases] GC(0)         Reclaim Memory (ms):           ...
[10.917s][trace][gc,phases] GC(0)           Used [Native] Memory:          ...

I thought we agreed on having more things on debug level and using "Evacuation Failure Regions" for now (until we have more than one type of retained regions), like this:

[10.917s][debug][gc,phases] GC(0)       Restore Retained Regions (ms): ...
[10.917s][debug][gc,phases] GC(0)         Evacuation Failed Regions:              ...
[10.917s][debug][gc,phases] GC(0)         Prepare Retained Object Refs (ms): ...
[10.917s][debug][gc,phases] GC(0)         Reformat Retained Regions (ms): ...
[10.917s][debug][gc,phases] GC(0)           Retained Objects:              ...
[10.917s][trace][gc,phases] GC(0)           Retained Bytes:                ...
[10.917s][debug][gc,phases] GC(0)         Reclaim Memory (ms):           ...
[10.917s][trace][gc,phases] GC(0)           Used [Native] Memory:          ...

Possibly also put the regions count last, to keep the timings at the top.

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Dec 10, 2021

I thought we agreed on having more things on debug level and using "Evacuation Failure Regions" for now (until we have more than one type of retained regions), like this:

Thanks Stefan. I misunderstood, will update it as "Evacuation Failure Regions".

[10.917s][debug][gc,phases] GC(0)       Restore Retained Regions (ms): ...
[10.917s][debug][gc,phases] GC(0)         Evacuation Failed Regions:              ...
[10.917s][debug][gc,phases] GC(0)         Prepare Retained Object Refs (ms): ...
[10.917s][debug][gc,phases] GC(0)         Reformat Retained Regions (ms): ...
[10.917s][debug][gc,phases] GC(0)           Retained Objects:              ...
[10.917s][trace][gc,phases] GC(0)           Retained Bytes:                ...
[10.917s][debug][gc,phases] GC(0)         Reclaim Memory (ms):           ...
[10.917s][trace][gc,phases] GC(0)           Used [Native] Memory:          ...

Currently, I implement it by add following phases:

    RestoreRetainedRegions,
    RestoreRetainedRegionsPrepare,
    RestoreRetainedRegionsReformat,
    RestoreRetainedRegionsReclaim,

and, add RestoreRetainedRegionsObjects and RestoreRetainedRegionsBytes as work item of RestoreRetainedRegionsReformat phase, add RestoreRetainedRegionsReclaimUsedMemory as work item of RestoreRetainedRegionsReclaim, add RestoreRetainedRegionsNum as work item of RestoreRetainedRegions phase.

I tried to put a phase and work items under it in different log level, but seems the current phase time framework only support to log phase and its work items at the same log level. Maybe I missed something in the code, let me do some further investigation.

Possibly also put the regions count last, to keep the timings at the top.

I tried this too. But seems the current phase time framework only support to put the WorkItems of a phase close to its phase, rather than separate them with other phases, I will investigate it further.

@kstefanj
Copy link
Contributor

@kstefanj kstefanj commented Dec 10, 2021

I tried to put a phase and work items under it in different log level, but seems the current phase time framework only support to log phase and its work items at the same log level. Maybe I missed something in the code, let me do some further investigation.

I think you are correct, maybe Thomas know any trick. Otherwise I guess we could separate the "counts" from the timing, but that is not optimal either. If not solvable in a good way, maybe we should just drop some of the trace counts.

@tschatzl
Copy link
Contributor

@tschatzl tschatzl commented Dec 10, 2021

I tried to put a phase and work items under it in different log level, but seems the current phase time framework only support to log phase and its work items at the same log level. Maybe I missed something in the code, let me do some further investigation.

I think you are correct, maybe Thomas know any trick. Otherwise I guess we could separate the "counts" from the timing, but that is not optimal either. If not solvable in a good way, maybe we should just drop some of the trace counts.

There is no way to have different levels for the work items than for the timing it is attached to. I overlooked that when proposing this layout. I would like to just keep things the way they are though then, except moving the Evacuation Failure Regions to the bottom, as imho the timing is more interesting typically (or in other words, I'm typically only looking at the timings and if they are not as expected, I am digging into more details with the counts).

Making the work items having a different log level than the master item could be done later.

One remark: that [Native] in Used [Native] Memory has been meant as optional. :) I would just remove the [Native]... (did not look at the code yet).

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Dec 10, 2021

There is no way to have different levels for the work items than for the timing it is attached to. I overlooked that when proposing this layout. I would like to just keep things the way they are though then, except moving the Evacuation Failure Regions to the bottom, as imho the timing is more interesting typically (or in other words, I'm typically only looking at the timings and if they are not as expected, I am digging into more details with the counts).

Not sure if I missed something in the code, but seems if "Evacuation Failed Regions" is attached to "Restore Retained Regions (ms)", then there is no way to put other phases between them.

Making the work items having a different log level than the master item could be done later.
Agree.

One remark: that [Native] in Used [Native] Memory has been meant as optional. :) I would just remove the [Native]... (did not look at the code yet).

Sure, I will update it as "Used Memory". :)

@Hamlin-Li
Copy link
Author

@Hamlin-Li Hamlin-Li commented Dec 16, 2021

close this one, new pr is at #6860

@Hamlin-Li Hamlin-Li closed this Dec 16, 2021
@Hamlin-Li Hamlin-Li deleted the log-evac-failure-basic branch Jan 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot hotspot-gc rfr
3 participants