Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8244505: G1 pause time ratio calculation does not consider Remark/Cleanup pauses #183

Closed
wants to merge 5 commits into from

Conversation

walulyai
Copy link
Member

@walulyai walulyai commented Sep 15, 2020

The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup) recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files.
Testing: Tier 1 - 7
Performance testing did not show any significant changes in performance


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8244505: G1 pause time ratio calculation does not consider Remark/Cleanup pauses

Reviewers

Download

$ git fetch https://git.openjdk.java.net/jdk pull/183/head:pull/183
$ git checkout pull/183

@bridgekeeper
Copy link

bridgekeeper bot commented Sep 15, 2020

👋 Welcome back iwalulya! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Sep 15, 2020
@openjdk
Copy link

openjdk bot commented Sep 15, 2020

@walulyai The following label will be automatically applied to this pull request: hotspot-gc.

When this pull request is ready to be reviewed, an RFR email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label (add|remove) "label" command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Sep 15, 2020
@mlbridge
Copy link

mlbridge bot commented Sep 15, 2020

Webrevs

@walulyai walulyai changed the title 8244505: G1 pause time ratio calculation does not consider Remark/Cleanup 8244505: G1 pause time ratio calculation does not consider Remark/Cleanup pauses Sep 15, 2020
@mlbridge
Copy link

mlbridge bot commented Sep 15, 2020

Mailing list message from Thomas Schatzl on hotspot-gc-dev:

Hi Ivan,

On 15.09.20 15:57, Ivan Walulya wrote:

The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates
to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup)
recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files.

Initial comments:

- please remove the trivial cleanups from this change and do them in a
separate "trivial cleanups" change either before or after: they really
detract from the fix for the actual problem as they are too many, and
they, while they are located in the g1Policy file, are completely
unrelated to this patch.

This includes at least the hunks in
- G1Policy::revise_young_list_target_length_if_necessary (why is
110/100 better than 1100/1000?)
- G1Policy::record_concurrent_mark_init_end
- G1Policy::calc_min_old_cset_length
- the first hunk in G1Policy::record_collection_pause_end

None of these locations have anything to do with actual pause time ratio
calculation. Not that we do not do bundle cleanups elsewhere, but the
cleanup to actual related change ratio is quite high in this change.

- I would also almost prefer if the pre-existing bug in
G1Analytics::compute_pause_time_ratios() were also moved out of this
change to highlight it. It does not seem to be connected to that pause
time calculation did not consider Remark/Cleanup pause before. (Not
completely sure why this has not been noticed in the recent young gen
sizing changes).

- Please rename G1Policy::update_pause_time_stats() to something more
related to gc time ratio, i.e. update_actual_gc_time_ratio() or so to
make it a bit more specific. (This is probably not the perfect name for it).

Will keep looking.

Thanks,
Thomas

@walulyai
Copy link
Member Author

Thanks @tschatzl for the initial comments, I will split the changes.

@mlbridge
Copy link

mlbridge bot commented Sep 16, 2020

Mailing list message from Thomas Schatzl on hotspot-gc-dev:

Hi Ivan,

On 15.09.20 19:29, Ivan Walulya wrote:

The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates
to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup)
recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7
Performance testing did not show any significant changes in performance

Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:

Removed all changes unrelated to the main problem solved by the patch

Thanks.

- I'm worried that the assignment of "update_stats" in
G1Policy::record_collection_pause_end() and G1Policy::record_pause()
would get out of sync. It would be nice if the change either passed it
to G1Policy::record_pause(), or create a new member function that
returns that and is used in both places.

- update_gc_pause_time_ratios() does not need to be public afaict

- I'm okay that now full gcs are considered in the pause time ratio
calculation.
This may cause (with JDK-8238687) that the heap will be kept larger
after the full gc, which okay with me for naturally occurring full gcs.

One concern I have are full gcs caused by System.gc() calls - one of
their purposes is to compact the heap, but now they will have impact
beyond that one. Maybe try to exclude those from the gc time ratio
calculation?

I think the previous (undocumented) exclusion of all full gcs has been
to not overcomplicate the code assuming that most if any full gcs with
G1 should be System.gcs() for heap compaction purposes.

Ivan, what do you think?

- I think the change in G1Analytics::compute_pause_time_ratios() is

double long_interval_ms = (end_time_sec -
oldest_known_gc_end_time_sec()) * 1000.0;
- _long_term_pause_time_ratio = _recent_gc_times_ms->sum() /
long_interval_ms;
+ _long_term_pause_time_ratio = (_recent_gc_times_ms->sum() +
pause_time_ms) / long_interval_ms;

wrong, or even worse than the current calculation. Sorry for not being
able to pinpoint it yesterday evening.

Let's assume that we have the following pauses on the timeline in
G1Analytics, where "X" means pause (the number at the end of an "X"
block is just a running number), "-" means mutator time.

XXX1-----XX2---XXX3---------XXXX4---XXXXXX5--------XX6

^--------------long_interval_ms-------------------^

(optimized for fixed font view)

Currently oldest_known_gc_end_time_sec() is at the end of pause 1, and
_recent_gc_times_ms contains all gc pause times from 1 to 5. We did not
call update_recent_gc_times() yet.

Now the (existing) calculation calculates "long_interval_ms" from the
end of pause 1 to the end of pause 6 (current), and divides by the sum
of pauses 1 to 5 - which is wrong as pause 1 can be different to pause 6
in length. However, there is the the code permeating assumption (not
documented, sorry :)) that pauses are fairly regular. So this somewhat
works out.

With that change, the dividend is too large, i.e. in addition to taking
the wrong long_interval_ms, the code now also contains additional pauses.

Maybe this could be fixed somehow? Unless it's a very small fix (or I'm
wrong about the above), I would prefer to fix this separately.

Thanks,
Thomas

@walulyai
Copy link
Member Author

walulyai commented Sep 16, 2020

  • I'm worried that the assignment of "update_stats" in
    G1Policy::record_collection_pause_end() and G1Policy::record_pause()
    would get out of sync. It would be nice if the change either passed it
    to G1Policy::record_pause(), or create a new member function that
    returns that and is used in both places.

Will create a helper function to sync usage of 'update_stats'

  • update_gc_pause_time_ratios() does not need to be public afaict

Right, this will be change

  • I'm okay that now full gcs are considered in the pause time ratio
    calculation.
    This may cause (with JDK-8238687) that the heap will be kept larger
    after the full gc, which okay with me for naturally occurring full gcs.

One concern I have are full gcs caused by System.gc() calls - one of
their purposes is to compact the heap, but now they will have impact
beyond that one. Maybe try to exclude those from the gc time ratio
calculation?

I think the previous (undocumented) exclusion of all full gcs has been
to not overcomplicate the code assuming that most if any full gcs with
G1 should be System.gcs() for heap compaction purposes.

Ivan, what do you think?

Right, Full-gcs were excluded from the compute_pause_time_ratios immediately after the full-gc, but not from the successive computations. Bug/Enhancement will be filed.

  • I think the change in G1Analytics::compute_pause_time_ratios() is

double long_interval_ms = (end_time_sec -
oldest_known_gc_end_time_sec()) * 1000.0;

  • _long_term_pause_time_ratio = _recent_gc_times_ms->sum() /
    long_interval_ms;
  • _long_term_pause_time_ratio = (_recent_gc_times_ms->sum() +
    pause_time_ms) / long_interval_ms;

wrong, or even worse than the current calculation. Sorry for not being
able to pinpoint it yesterday evening.

Let's assume that we have the following pauses on the timeline in
G1Analytics, where "X" means pause (the number at the end of an "X"
block is just a running number), "-" means mutator time.

XXX1-----XX2---XXX3---------XXXX4---XXXXXX5--------XX6

^--------------long_interval_ms-------------------^

(optimized for fixed font view)

Currently oldest_known_gc_end_time_sec() is at the end of pause 1, and
_recent_gc_times_ms contains all gc pause times from 1 to 5. We did not
call update_recent_gc_times() yet.

Now the (existing) calculation calculates "long_interval_ms" from the
end of pause 1 to the end of pause 6 (current), and divides by the sum
of pauses 1 to 5 - which is wrong as pause 1 can be different to pause 6
in length. However, there is the the code permeating assumption (not
documented, sorry :)) that pauses are fairly regular. So this somewhat
works out.

With that change, the dividend is too large, i.e. in addition to taking
the wrong long_interval_ms, the code now also contains additional pauses.

Maybe this could be fixed somehow? Unless it's a very small fix (or I'm
wrong about the above), I would prefer to fix this separately.

Right, I will revert to the previous version, and file a bug to fix this separately.

Thanks,
Thomas

@openjdk
Copy link

openjdk bot commented Sep 16, 2020

@walulyai This change now passes all automated pre-integration checks. In addition to the automated checks, the change must also fulfill all project specific requirements

After integration, the commit message will be:

8244505: G1 pause time ratio calculation does not consider Remark/Cleanup  pauses

Reviewed-by: tschatzl, sjohanss
  • If you would like to add a summary, use the /summary command.
  • To credit additional contributors, use the /contributor command.
  • To add additional solved issues, use the /issue command.

Since the source branch of this PR was last updated there have been 53 commits pushed to the master branch:

  • 11c4ea9: 8253278: Refactor/cleanup oopDesc::*_klass_addr
  • 73c9088: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions.
  • 3ef2efb: 8252721: Nested classes in Swing APIs rely on default constructors
  • fd380d7: 8253274: The CycleDMImagetest brokes the system
  • 2c3a37c: 8253314: precompiled.hpp missing from vmIntrinsics.cpp
  • d4269fd: 8253028: SA core file tests still time out on OSX with "java.io.IOException: App waiting timeout"
  • 6c3e483: 8253313: xmlstream.hpp missing from vmIntrinsics.cpp
  • 0a1dc50: 8253271: vm_version_x86.hpp should not include globals_extension.hpp
  • 1e39649: 8243066: Move VM_INTRINSICS_DO into its own vmIntrinsics.hpp file
  • 12dfe1c: 8253262: Allocate in DumpRegion is not thread safe
  • ... and 43 more: https://git.openjdk.java.net/jdk/compare/7eb4d4aa018e37052cbf37cd854380006c309645...master

As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid automatic rebasing, please merge master into your branch, and then specify the current head hash when integrating, like this: /integrate 11c4ea963c3d6dbfcecf725409ce285ac7a8ac1a.

As you do not have Committer status in this projectan existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@tschatzl, @kstefanj) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 16, 2020
@mlbridge
Copy link

mlbridge bot commented Sep 16, 2020

Mailing list message from Thomas Schatzl on hotspot-gc-dev:

Hi,

On 16.09.20 15:17, Ivan Walulya wrote:

The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates
to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup)
recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7
Performance testing did not show any significant changes in performance

Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:

Changes made to address reviews from Thomas.

-------------

The comment at the declaration of G1Policy::should_update_gc_stats() is
better placed at the implementation as it is imho.

You could move it there if you want.

Also thanks for catching the issue with "*1000*0", I forgot to mention
it earlier.

Lgtm. Thanks.

Thomas

@walulyai
Copy link
Member Author

Mailing list message from Thomas Schatzl on hotspot-gc-dev:

Hi,

On 16.09.20 15:17, Ivan Walulya wrote:

The computation for long and short term pause time ratios does not include Remark/Cleanup pause times. Isolate updates
to _prev_collection_pause_end_ms from update_recent_gc_times(), then have all gc pauses (including Remark/Cleanup)
recorded by update_recent_gc_times(). Patch also includes trivial clean ups to g1Policy.* files. Testing: Tier 1 - 7
Performance testing did not show any significant changes in performance

Ivan Walulya has updated the pull request incrementally with one additional commit since the last revision:
Changes made to address reviews from Thomas.

The comment at the declaration of G1Policy::should_update_gc_stats() is
better placed at the implementation as it is imho.

You could move it there if you want.

I agree, will move it.

Also thanks for catching the issue with "10000", I forgot to mention
it earlier.

Lgtm. Thanks.

Thomas

Thanks Thomas,

@walulyai
Copy link
Member Author

/reviewers 2

@openjdk
Copy link

openjdk bot commented Sep 17, 2020

@walulyai
The number of required reviews for this PR is now set to 2 (with at least 1 of role reviewers).

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Sep 17, 2020
Copy link
Contributor

@kstefanj kstefanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Sep 17, 2020
@walulyai walulyai requested review from tschatzl and kstefanj Sep 18, 2020
@walulyai
Copy link
Member Author

Thanks @kstefanj and @tschatzl

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Sep 18, 2020
@openjdk
Copy link

openjdk bot commented Sep 18, 2020

@walulyai
Your change (at version e1f963e) is now ready to be sponsored by a Committer.

@tschatzl
Copy link
Contributor

/integrate

@openjdk
Copy link

openjdk bot commented Sep 18, 2020

@tschatzl Only the author (@walulyai) is allowed to issue the integrate command. As this PR is ready to be sponsored, and you are an eligible sponsor, did you mean to issue the /sponsor command?

@tschatzl
Copy link
Contributor

/sponsor

@openjdk openjdk bot closed this Sep 18, 2020
@openjdk openjdk bot added integrated Pull request has been integrated and removed sponsor Pull request is ready to be sponsored ready Pull request is ready to be integrated rfr Pull request is ready for review labels Sep 18, 2020
@openjdk
Copy link

openjdk bot commented Sep 18, 2020

@tschatzl @walulyai Since your change was applied there have been 53 commits pushed to the master branch:

  • 11c4ea9: 8253278: Refactor/cleanup oopDesc::*_klass_addr
  • 73c9088: 8249451: Unconditional exceptions clearing logic in compiler code should honor Async Exceptions.
  • 3ef2efb: 8252721: Nested classes in Swing APIs rely on default constructors
  • fd380d7: 8253274: The CycleDMImagetest brokes the system
  • 2c3a37c: 8253314: precompiled.hpp missing from vmIntrinsics.cpp
  • d4269fd: 8253028: SA core file tests still time out on OSX with "java.io.IOException: App waiting timeout"
  • 6c3e483: 8253313: xmlstream.hpp missing from vmIntrinsics.cpp
  • 0a1dc50: 8253271: vm_version_x86.hpp should not include globals_extension.hpp
  • 1e39649: 8243066: Move VM_INTRINSICS_DO into its own vmIntrinsics.hpp file
  • 12dfe1c: 8253262: Allocate in DumpRegion is not thread safe
  • ... and 43 more: https://git.openjdk.java.net/jdk/compare/7eb4d4aa018e37052cbf37cd854380006c309645...master

Your commit was automatically rebased without conflicts.

Pushed as commit 86a1640.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated
3 participants