Skip to content

8236073: G1: Use SoftMaxHeapSize to guide GC heuristics#24211

Closed
caoman wants to merge 9 commits intoopenjdk:masterfrom
caoman:JDK-8236073-softmaxheap
Closed

8236073: G1: Use SoftMaxHeapSize to guide GC heuristics#24211
caoman wants to merge 9 commits intoopenjdk:masterfrom
caoman:JDK-8236073-softmaxheap

Conversation

@caoman
Copy link
Contributor

@caoman caoman commented Mar 24, 2025

Hi all,

I have implemented SoftMaxHeapSize for G1 as attached. It is completely reworked compared to previous PR, and excludes code for CurrentMaxHeapSize. I believe I have addressed all direct concerns from previous email thread, such as:

  • does not respect MinHeapSize;
  • being too "blunt" and does not respect other G1 heuristics and flags for resizing, such as MinHeapFreeRatio, MaxHeapFreeRatio;
  • does not affect heuristcs to trigger a concurrent cycle;

This recent thread also has some context.


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8236073: G1: Use SoftMaxHeapSize to guide GC heuristics (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/24211/head:pull/24211
$ git checkout pull/24211

Update a local copy of the PR:
$ git checkout pull/24211
$ git pull https://git.openjdk.org/jdk.git pull/24211/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 24211

View PR using the GUI difftool:
$ git pr show -t 24211

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/24211.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 24, 2025

👋 Welcome back manc! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Mar 24, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot added the rfr Pull request is ready for review label Mar 24, 2025
@openjdk
Copy link

openjdk bot commented Mar 24, 2025

@caoman The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Mar 24, 2025
@mlbridge
Copy link

mlbridge bot commented Mar 24, 2025

@caoman
Copy link
Contributor Author

caoman commented Mar 24, 2025

@mo-beck Here is my implementation for SoftMaxHeapSize for G1. Let me know if you have any feedback or concerns.

@caoman
Copy link
Contributor Author

caoman commented Mar 24, 2025

This probably requires fixing https://bugs.openjdk.org/browse/JDK-8352765 before users try to use SoftMaxHeapSize. Otherwise, setting a small SoftMaxHeapSize could trigger premature OutOfMemoryError.

@caoman
Copy link
Contributor Author

caoman commented Apr 1, 2025

This PR is ready for review. Included tests cover important functionality of SoftMaxHeapSize.

Copy link
Contributor

@tschatzl tschatzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial comments.

@walulyai
Copy link
Member

walulyai commented Apr 1, 2025

With the changes to young_collection_expansion_amount(), once we reach the SoftMaxHeapSize, we cannot expand the heap except during GC where expansion can happen without regard for SoftMaxHeapSize. Thus, after exceeding SoftMaxHeapSize we go into a phase of repeated GCs where we expand the heap almost one region at a time. Is this the expected effect of the SoftMaxHeapSize as implemented by this patch?

@caoman
Copy link
Contributor Author

caoman commented Apr 1, 2025

With the changes to young_collection_expansion_amount(), once we reach the SoftMaxHeapSize, we cannot expand the heap except during GC where expansion can happen without regard for SoftMaxHeapSize. Thus, after exceeding SoftMaxHeapSize we go into a phase of repeated GCs where we expand the heap almost one region at a time. Is this the expected effect of the SoftMaxHeapSize as implemented by this patch?

Yes. This is the expected behavior if user sets SoftMaxHeapSize too small. G1 will try its best to respect SoftMaxHeapSize, which could cause GC thrashing. However, it won't cause OutOfMemoryError. This problem is due to user's misconfiguration of SoftMaxHeapSize, which is similar to user misconfiguring Xmx to be too small.

@tschatzl
Copy link
Contributor

tschatzl commented Apr 2, 2025

With the changes to young_collection_expansion_amount(), once we reach the SoftMaxHeapSize, we cannot expand the heap except during GC where expansion can happen without regard for SoftMaxHeapSize. Thus, after exceeding SoftMaxHeapSize we go into a phase of repeated GCs where we expand the heap almost one region at a time. Is this the expected effect of the SoftMaxHeapSize as implemented by this patch?

Yes. This is the expected behavior if user sets SoftMaxHeapSize too small. G1 will try its best to respect SoftMaxHeapSize, which could cause GC thrashing. However, it won't cause OutOfMemoryError. This problem is due to user's misconfiguration of SoftMaxHeapSize, which is similar to user misconfiguring Xmx to be too small.

The original patch on the CR only set the guidance for the marking. It did not interact with heap sizing directly at all like the change does. What is the reason for this change?

(Iirc, in tests long time ago, with that original patch, and also adapting Min/MaxHeapFreeRatio, did result the desired effect of G1/SoftMaxHeapSize decreasing the heap appropriately. Without it, the heap will almost never change, but that is expected how Min/MaxHeapFreeRatio operate).

So similar to @walulyai I would strongly prefer for SoftMaxHeapSize not interfere that much with the application's performance. To me, this behavior is not "soft", and there seems to be general consensus internally about allowing unbounded cpu usage for GC. Afaiu in ZGC, if heap grows beyond SoftMaxHeapSize, GC activity can grow up to 25% of cpu usage (basically maxing out concurrent threads). That could be a reasonable guidance as well here.

GC thrashing will also prevent progress with marking, and actually cause more marking because of objects not having enough time to die. This just makes the situation worse until the heap gets scaled back to SoftMaxHeapSize.

However at the moment, changing the GC activity threshold internally will not automatically shrink the heap as you would expect, since currently shrinking is controlled by marking using the Min/MaxHeapFreeRatio flags.

That gets us back to JDK-8238687 and JDK-8248324...

@walulyai is currently working on the former issue again, testing it, maybe you two could work together on that to see whether basing this work on what @walulyai is cooking up is a better way forward, if needed modifying gctimeratio if we are above SoftMaxHeapSize?

Otherwise, if there really is need to get this functionality asap, even only making it a guide for the marking should at least give some effect (but I think without changing Min/MaxHeapFreeRatio at the same time there is not much effect anyway). But that is a fairly coarse and indirect way of getting the necessary effect to shrink the heap. We should not limit ourselves to what mainline provides at the moment.

@tschatzl
Copy link
Contributor

tschatzl commented Apr 2, 2025

There also seems to be a concurrency issue with reading the SoftMaxHeapSize variable: Since the flag is manageable, at least outside of safepoints (afaict jcmd is blocked by safepoints, but I'll ask), the variable can be written to it at any time.

So e.g. the assignment of G1IHOPControl::get_conc_mark_start_threshold to marking_initiating_used_threshold in that call can be inlined in G1Policy::need_to_start_conc_mark (called by the mutator in G1CollectedHeap::attempt_allocation_humongous) in multiple places, and so SoftMaxHeapSize re-read with multiple different values in that method.

Probably an Atomic::load(&SoftMaxHeapSize) in the getter is sufficient for that.

The other multiple re-readings of the soft_max_capacity() in the safepoint seem okay - I do not think there is a way to update the value within a safepoint externally.

@caoman
Copy link
Contributor Author

caoman commented Apr 3, 2025

Re Thomas' comment:

The original patch on the CR only set the guidance for the marking. It did not interact with heap sizing directly at all like the change does. What is the reason for this change?

Because without changing heap sizing directly, setting SoftMaxHeapSize alone is ineffective to shrink the heap in most cases. E.g., the included test test/hotspot/jtreg/gc/g1/TestSoftMaxHeapSize.java will fail.

For other concerns, I think one fundamental issue is the precedence of heap sizing flags: should the JVM respect SoftMaxHeapSize over GCTimeRatio/MinHeapFreeRatio/MaxHeapFreeRatio? My preference is yes, that SoftMaxHeapSize should have higher precedence, for the following reasons:

  1. Users that set SoftMaxHeapSize expect it to be effective to limit heap size. The JVM should do its best to respect user's request. As JDK-8222181 mentions: "When -XX:SoftMaxHeapSize is set, the GC should strive to not grow heap size beyond the specified size, unless the GC decides it's necessary to do so." We might interpret "GC decides it's necessary" differently. I think the real necessary case is "the JVM will throw OutOfMemoryError if it does not grow the heap", instead of "the JVM will violate MinHeapFreeRatio/MaxHeapFreeRatio/GCTimeRatio if it does not grow the heap".

  2. Having a single flag that makes G1 shrink heap more aggressively, is much more user-friendly than requiring users to tune 3 or more flags to achieve the same effect. As you mentioned, if SoftMaxHeapSize only guides marking, user has to also tune MinHeapFreeRatio/MaxHeapFreeRatio to make G1 shrink more aggressively. It is difficult to figure out a proper value for each flag. Moreover, if user wants to make G1 shrink to a specific heap size, it is a lot harder to achieve that through tuning MinHeapFreeRatio/MaxHeapFreeRatio.

  3. Issues with expansion after young collections from GCTimeRatio. MinHeapFreeRatio/MaxHeapFreeRatio have no effect on how much G1 expands the heap after young collections. Users need to tune GCTimeRatio if they want to make G1 expand less aggressively, otherwise aggressive expansion would defeat the purpose of SoftMaxHeapSize. However, GCTimeRatio is not a manageable flag, so it cannot be changed at run time. If SoftMaxHeapSize has precedence, we don't need to bother making GCTimeRatio manageable and asking users to tune it at run time. (This is somewhat related to JDK-8349978 and email thread. )

So similar to @walulyai I would strongly prefer for SoftMaxHeapSize not interfere that much with the application's performance.

If user sets a too small SoftMaxHeapSize and causes performance regression or GC thrashing, it is really user's misconfiguration, and they should take measures to adjust SoftMaxHeapSize based on workload. Also misconfiguring GCTimeRatio/MinHeapFreeRatio/MaxHeapFreeRatio could cause similar regressions (think of -XX:GCTimeRatio=1 -XX:MinHeapFreeRatio=1 -XX:MaxHeapFreeRatio=1).

However, I can see that SoftMaxHeapSize may be easier to misconfigure than the other 3 flags, because it does not adapt to changing live size by itself. I wonder if we could try reaching a middle ground (perhaps this is also what you suggests with ZGC's example of growing up to 25% of cpu usage?):

  • SoftMaxHeapSize still takes higher precedence over GCTimeRatio/MinHeapFreeRatio/MaxHeapFreeRatio.
  • G1 could have an internal mechanism to detect GC thrashing, and expands heap above SoftMaxHeapSize if thrashing happens.

That gets us back to JDK-8238687 and JDK-8248324...

Yes, fixing these two issues would be great regardless of SoftMaxHeapSize. However, they do not address the 3 issues above about flag precedence.

@caoman
Copy link
Contributor Author

caoman commented Apr 3, 2025

Re: concurrency issue with reading SoftMaxHeapSize

I updated to Atomic::load(), but not sure if I understand the concern correctly.

So e.g. the assignment of G1IHOPControl::get_conc_mark_start_threshold to marking_initiating_used_threshold in that call can be inlined in G1Policy::need_to_start_conc_mark (called by the mutator in G1CollectedHeap::attempt_allocation_humongous) in multiple places, and so SoftMaxHeapSize re-read with multiple different values in that method.

I don't see where the re-read is. I think in any code path from G1IHOPControl::get_conc_mark_start_threshold, G1CollectedHeap::heap()->soft_max_capacity() is called only once. G1CollectedHeap::attempt_allocation_humongous also appears to call G1Policy::need_to_start_conc_mark only once, which calls G1IHOPControl::get_conc_mark_start_threshold only once.

I agree it is a data race if soft_max_capacity() runs outside of a safepoint, so Atomic::load() makes sense regardless.

@walulyai
Copy link
Member

walulyai commented Apr 3, 2025

1. Users that set `SoftMaxHeapSize` expect it to be effective to limit heap size. The JVM should do its best to respect user's request. As [JDK-8222181](https://bugs.openjdk.org/browse/JDK-8222181) mentions: "When -XX:SoftMaxHeapSize is set, the GC should strive to not grow heap size beyond the specified size, unless the GC decides it's necessary to do so."  We might interpret "GC decides it's necessary" differently. I think the real necessary case is "the JVM will throw OutOfMemoryError if it does not grow the heap", instead of "the JVM will violate `MinHeapFreeRatio`/`MaxHeapFreeRatio`/`GCTimeRatio` if it does not grow the heap".

In the current approach, it is not that we are respecting the user's request, we are violating the request just that we do this only during GCs. So eventually you have back to back GCs that will expand the heap to whatever heapsize the application requires. My interpretation of SoftMaxHeapSize is that we can meet this limit where possible, but also exceed the limit if required. So I propose we take the same approach as used in other GCs where SoftMaxHeapSize is used as a parameter for setting GC pressure but not as a limit to allocations.

3. Issues with expansion after young collections from `GCTimeRatio`. `MinHeapFreeRatio`/`MaxHeapFreeRatio` have no effect on how much G1 expands the heap after young collections. Users need to tune `GCTimeRatio` if they want to make G1 expand less aggressively, otherwise aggressive expansion would defeat the purpose of `SoftMaxHeapSize`. However, `GCTimeRatio` is not a manageable flag, so it cannot be changed at run time. If `SoftMaxHeapSize` has precedence, we don't need to bother making `GCTimeRatio` manageable and asking users to tune it at run time. (This is somewhat related to [JDK-8349978](https://bugs.openjdk.org/browse/JDK-8349978) and [email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051004.html). )

Agreed, these ratios are problematic, and we should find a solution that removes them. We also need to agree on the purpose of SoftMaxHeapSize, my understanding is that SoftMaxHeapSize is meant for the application to be handle spikes in allocations and and quickly release the memory if no longer required. If SoftMaxHeapSize has precedence overGCTimeRatio, then G1 is changing the objective from balancing latency and throughput to optimizing for memory usage.

@tschatzl
Copy link
Contributor

tschatzl commented Apr 3, 2025

Re: concurrency issue with reading SoftMaxHeapSize

I updated to Atomic::load(), but not sure if I understand the concern correctly.

So e.g. the assignment of G1IHOPControl::get_conc_mark_start_threshold to marking_initiating_used_threshold in that call can be inlined in G1Policy::need_to_start_conc_mark (called by the mutator in G1CollectedHeap::attempt_allocation_humongous) in multiple places, and so SoftMaxHeapSize re-read with multiple different values in that method.

I don't see where the re-read is. I think in any code path from G1IHOPControl::get_conc_mark_start_threshold, G1CollectedHeap::heap()->soft_max_capacity() is called only once. G1CollectedHeap::attempt_allocation_humongous also appears to call G1Policy::need_to_start_conc_mark only once, which calls G1IHOPControl::get_conc_mark_start_threshold only once.

I agree it is a data race if soft_max_capacity() runs outside of a safepoint, so Atomic::load() makes sense regardless.

The compiler could be(*) free to call get_conc_mark_start_threshold() again in any of the uses of the local variable without telling it that one of its components may change between re-reads.

(*) Probably not after looking again, given that it's not marked as const (not sure why), and a virtual method, and fairly large.

The situation would be much worse if somehow SoftMaxHeapsize could be changed within a safepoint.

@tschatzl
Copy link
Contributor

tschatzl commented Apr 3, 2025

Re Thomas' comment:

The original patch on the CR only set the guidance for the marking. It did not interact with heap sizing directly at all like the change does. What is the reason for this change?

Because without changing heap sizing directly, setting SoftMaxHeapSize alone is ineffective to shrink the heap in most cases. E.g., the included test test/hotspot/jtreg/gc/g1/TestSoftMaxHeapSize.java will fail.

For other concerns, I think one fundamental issue is the precedence of heap sizing flags: should the JVM respect SoftMaxHeapSize over GCTimeRatio/MinHeapFreeRatio/MaxHeapFreeRatio? My preference is yes, that SoftMaxHeapSize should have higher precedence, for the following reasons:

1. Users that set `SoftMaxHeapSize` expect it to be effective to limit heap size. The JVM should do its best to respect user's request. As [JDK-8222181](https://bugs.openjdk.org/browse/JDK-8222181) mentions: "When -XX:SoftMaxHeapSize is set, the GC should strive to not grow heap size beyond the specified size, unless the GC decides it's necessary to do so."  We might interpret "GC decides it's necessary" differently. I think the real necessary case is "the JVM will throw OutOfMemoryError if it does not grow the heap", instead of "the JVM will violate `MinHeapFreeRatio`/`MaxHeapFreeRatio`/`GCTimeRatio` if it does not grow the heap".

2. Having a single flag that makes G1 shrink heap more aggressively, is much more user-friendly than requiring users to tune 3 or more flags to achieve the same effect. As you mentioned, if `SoftMaxHeapSize` only guides marking, user has to also tune `MinHeapFreeRatio`/`MaxHeapFreeRatio` to make G1 shrink more aggressively. It is difficult to figure out a proper value for each flag. Moreover, if user wants to make G1 shrink to a specific heap size, it is a lot harder to achieve that through tuning `MinHeapFreeRatio`/`MaxHeapFreeRatio`.

3. Issues with expansion after young collections from `GCTimeRatio`. `MinHeapFreeRatio`/`MaxHeapFreeRatio` have no effect on how much G1 expands the heap after young collections. Users need to tune `GCTimeRatio` if they want to make G1 expand less aggressively, otherwise aggressive expansion would defeat the purpose of `SoftMaxHeapSize`. However, `GCTimeRatio` is not a manageable flag, so it cannot be changed at run time. If `SoftMaxHeapSize` has precedence, we don't need to bother making `GCTimeRatio` manageable and asking users to tune it at run time. (This is somewhat related to [JDK-8349978](https://bugs.openjdk.org/browse/JDK-8349978) and [email thread](https://mail.openjdk.org/pipermail/hotspot-gc-dev/2025-February/051004.html). )

So similar to @walulyai I would strongly prefer for SoftMaxHeapSize not interfere that much with the application's performance.

If user sets a too small SoftMaxHeapSize and causes performance regression or GC thrashing, it is really user's misconfiguration, and they should take measures to adjust SoftMaxHeapSize based on workload. Also misconfiguring GCTimeRatio/MinHeapFreeRatio/MaxHeapFreeRatio could cause similar regressions (think of -XX:GCTimeRatio=1 -XX:MinHeapFreeRatio=1 -XX:MaxHeapFreeRatio=1).

However, I can see that SoftMaxHeapSize may be easier to misconfigure than the other 3 flags, because it does not adapt to changing live size by itself. I wonder if we could try reaching a middle ground (perhaps this is also what you suggests with ZGC's example of growing up to 25% of cpu usage?):

Exactly.

* `SoftMaxHeapSize` still takes higher precedence over `GCTimeRatio`/`MinHeapFreeRatio`/`MaxHeapFreeRatio`.

* G1 could have an internal mechanism to detect GC thrashing, and expands heap above `SoftMaxHeapSize` if thrashing happens.

That gets us back to JDK-8238687 and JDK-8248324...

Yes, fixing these two issues would be great regardless of SoftMaxHeapSize. However, they do not address the 3 issues above about flag precedence.

  • JDK-8248324 effectively removes the use of Min/MaxHeapFreeRatio (apart of full gc, which obviously they also need to be handled in some way that fits into the system).
  • JDK-8238687 makes GCTimeRatio shrink the heap too, obviating the need for Min/MaxHeapFreeRatio, which are currently the knobs that limit excessive memory usage.

With no flag to interfere (no Min/MaxHeapFreeRatio) with each other, there is no need for considering their precedence.

As you mention, there is need for some strategy to reconcile divergent goals - ultimately G1 needs a single value that tells it to resize the heap in which direction in which degree.

Incidentally, the way GCTimeRatio (or actually the internal gc cpu usage target as an intermediate) is already in use fits these requirements. From some actual value you can calculate a difference to desired, with some smoothing applied, which gives you both direction and degree of the change in heap size (applying some magic factors/constants).

So it seems fairly straightforward to have any outside "memory pressure" effect this intermediate control value instead of everyone overriding each other in multiple places in the code.

Now there is some question about the weights of these factors: we (in the gc team) prefer to keep G1's balancing between throughput and latency, particularly if the input this time is some value explicitly containing "soft" in its name. Using the 25% from ZGC as a max limit for gc cpu usage if we are (way) beyond what the user desires seems good enough for an initial guess. Not too high, guaranteeing some application progress in the worst case (for this factor!), not too low, guaranteeing that the intent of the user setting this value is respected.

(One can see Min/MaxHeapFreeRatio as an old attempt to limit heap size growth without affecting performance too much, changing memory pressure. However they are hard to use. And they are completely dis-associated with the rest of the heap sizing mechanism. SoftMaxHeapSize is easier to handle)

@caoman
Copy link
Contributor Author

caoman commented Apr 4, 2025

Thank you both for the quick and detailed responses!

  • JDK-8248324 effectively removes the use of Min/MaxHeapFreeRatio (apart of full gc, which obviously they also need to be handled in some way that fits into the system).
  • JDK-8238687 makes GCTimeRatio shrink the heap too, obviating the need for Min/MaxHeapFreeRatio, which are currently the knobs that limit excessive memory usage.

With no flag to interfere (no Min/MaxHeapFreeRatio) with each other, there is no need for considering their precedence.

As you mention, there is need for some strategy to reconcile divergent goals - ultimately G1 needs a single value that tells it to resize the heap in which direction in which degree.

Incidentally, the way GCTimeRatio (or actually the internal gc cpu usage target as an intermediate) is already in use fits these requirements. From some actual value you can calculate a difference to desired, with some smoothing applied, which gives you both direction and degree of the change in heap size (applying some magic factors/constants).

I was unaware that G1 plans to stop using Min/MaxHeapFreeRatio until now. Looks like JDK-8238686 has more relevant description. It sounds good to solve all above-mentioned issues and converge on a single flag such as GCTimeRatio, and ensure both incremental and full GCs respect this flag. (We should also fix JDK-8349978 for converging on GCTimeRatio. ) It would be nicer if we have a doc or a master bug that describes the overall plan.

In comparison, this PR's approach for a high-precedence, "harder" SoftMaxHeapSize is an easier and more expedient approach to improve heap resizing, without solving all other issues. However, it requires users to carefully maintain and dynamically adjust SoftMaxHeapSize to prevent GC thrashing. I think if all other issues are resolved, our existing internal use cases that use a separate algorithm to dynamically calculate and set the high-precedence SoftMaxHeapSize (or ProposedHeapSize) could probably migrate to the GCTimeRatio approach, and stop using SoftMaxHeapSize.

I'll need some discussion with my team about what we would do next. Meanwhile, @mo-beck do you guys have preference on how SoftMaxHeapSize should work?

Now there is some question about the weights of these factors: we (in the gc team) prefer to keep G1's balancing between throughput and latency, particularly if the input this time is some value explicitly containing "soft" in its name. Using the 25% from ZGC as a max limit for gc cpu usage if we are (way) beyond what the user desires seems good enough for an initial guess. Not too high, guaranteeing some application progress in the worst case (for this factor!), not too low, guaranteeing that the intent of the user setting this value is respected.

Somewhat related to above, our experience with our internal algorithm that adjusts SoftMaxHeapSize based on GC CPU overhead, encountered cases that it behaves poorly. The problem is that some workload have large variance in mutator's CPU usage (e.g. peak hours vs off-peak hours), but smaller variance in GC CPU usage. Then it does not make much sense to maintain a constant % for GC CPU overhead, which could cause excessive heap expansion when mutator CPU usage is low. The workaround is to take live size into consideration when calculating SoftMaxHeapSize, which is similar to how Min/MaxHeapFreeRatio works.

I'm not sure if GCTimeRatio using wall time and pause time could run into similar issues. I'm happy to experiment when we make progress on JDK-8238687/JDK-8248324/JDK-8349978.

@tschatzl
Copy link
Contributor

tschatzl commented Apr 4, 2025

Thank you both for the quick and detailed responses!

  • JDK-8248324 effectively removes the use of Min/MaxHeapFreeRatio (apart of full gc, which obviously they also need to be handled in some way that fits into the system).
  • JDK-8238687 makes GCTimeRatio shrink the heap too, obviating the need for Min/MaxHeapFreeRatio, which are currently the knobs that limit excessive memory usage.

With no flag to interfere (no Min/MaxHeapFreeRatio) with each other, there is no need for considering their precedence.
As you mention, there is need for some strategy to reconcile divergent goals - ultimately G1 needs a single value that tells it to resize the heap in which direction in which degree.
Incidentally, the way GCTimeRatio (or actually the internal gc cpu usage target as an intermediate) is already in use fits these requirements. From some actual value you can calculate a difference to desired, with some smoothing applied, which gives you both direction and degree of the change in heap size (applying some magic factors/constants).

I was unaware that G1 plans to stop using Min/MaxHeapFreeRatio until now. Looks like JDK-8238686 has more relevant description. It sounds good to solve all above-mentioned issues and converge on a single flag such as GCTimeRatio, and ensure both incremental and full GCs respect this flag. (We should also fix JDK-8349978 for converging on GCTimeRatio. ) It would be nicer if we have a doc or a master bug that describes the overall plan.

Last time this has been mentioned in the hotspot-gc-dev list has been here. I remember giving multiple outlines to everyone involved earlier, each mentioning that Min/MaxHeapFreeRatio need to go away because it's in the way, so I was/am a bit surprised on this response.

I will look through the existing bugs and see if I there is a need for a(nother) master bug.

In comparison, this PR's approach for a high-precedence, "harder" SoftMaxHeapSize is an easier and more expedient approach to improve heap resizing, without solving all other issues. However, it requires users to carefully maintain and dynamically adjust SoftMaxHeapSize to prevent GC thrashing. I think if all other issues are resolved, our existing internal use cases that use a separate algorithm to dynamically calculate and set the high-precedence SoftMaxHeapSize (or ProposedHeapSize) could probably migrate to the GCTimeRatio approach, and stop using SoftMaxHeapSize.

I'll need some discussion with my team about what we would do next. Meanwhile, @mo-beck do you guys have preference on how SoftMaxHeapSize should work?

Now there is some question about the weights of these factors: we (in the gc team) prefer to keep G1's balancing between throughput and latency, particularly if the input this time is some value explicitly containing "soft" in its name. Using the 25% from ZGC as a max limit for gc cpu usage if we are (way) beyond what the user desires seems good enough for an initial guess. Not too high, guaranteeing some application progress in the worst case (for this factor!), not too low, guaranteeing that the intent of the user setting this value is respected.

Somewhat related to above, our experience with our internal algorithm that adjusts SoftMaxHeapSize based on GC CPU overhead, encountered cases that it behaves poorly. The problem is that some workload have large variance in mutator's CPU usage (e.g. peak hours vs off-peak hours), but smaller variance in GC CPU usage. Then it does not make much sense to maintain a constant % for GC CPU overhead, which could cause excessive heap expansion when mutator CPU usage is low. The workaround is to take live size into consideration when calculating SoftMaxHeapSize, which is similar to how Min/MaxHeapFreeRatio works.

I'm not sure if GCTimeRatio using wall time and pause time could run into similar issues. I'm happy to experiment when we make progress on JDK-8238687/JDK-8248324/JDK-8349978.

Obviously there are issues to sort out. :)

@tschatzl
Copy link
Contributor

tschatzl commented Apr 7, 2025

Filed JDK-8353716.

@tschatzl
Copy link
Contributor

tschatzl commented Apr 7, 2025

Also collected thoughts and existing documents with some additional rough explanations.

@caoman
Copy link
Contributor Author

caoman commented Apr 9, 2025

Thank you for creating JDK-8353716!

Last time this has been mentioned in the hotspot-gc-dev list has been here. I remember giving multiple outlines to everyone involved earlier, each mentioning that Min/MaxHeapFreeRatio need to go away because it's in the way, so I was/am a bit surprised on this response.

Apology for overlooking previous mentions about Min/MaxHeapFreeRatio. Previous mentions were mostly inside responses to complicated issues, and I have hardly got the time to follow hotspot-gc-dev closely. To be honest, we didn't pay much attention to Min/MaxHeapFreeRatio before I started working on this PR.

I guess this is a good example that a one-pager doc/umbrella bug provides cleaner communication and additional values over email discussion, especially when one party already has a pretty detailed plan for how it should be done.

@tschatzl
Copy link
Contributor

tschatzl commented Apr 9, 2025

Last time this has been mentioned in the hotspot-gc-dev list has been here. I remember giving multiple outlines to everyone involved earlier, each mentioning that Min/MaxHeapFreeRatio need to go away because it's in the way, so I was/am a bit surprised on this response.

Apology for overlooking previous mentions about Min/MaxHeapFreeRatio. Previous mentions were mostly inside responses to complicated issues, and I have hardly got the time to follow hotspot-gc-dev closely. To be honest, we didn't pay much attention to Min/MaxHeapFreeRatio before I started working on this PR.

I guess this is a good example that a one-pager doc/umbrella bug provides cleaner communication and additional values over email discussion, especially when one party already has a pretty detailed plan for how it should be done.

Don't worry, I should have been better with following up with that summary about thoughts/plans communicated so far somewhere publicly.

Let's go forward with that CR summarizing the respective (current) general direction.

@mo-beck
Copy link
Contributor

mo-beck commented Apr 11, 2025

Meanwhile, @mo-beck do you guys have preference on how SoftMaxHeapSize should work?

Thanks for the thoughtful work here — this PR is a solid step toward strengthening G1’s memory footprint management, and I support it.

This patch adds support for SoftMaxHeapSize in both expansion and shrinkage paths, as well as IHOP calculation, ensuring it's part of the regular heap policy logic. As I outlined in my original note and follow-up on AHS integration, my intent has been to use SoftMaxHeapSize as a guiding input — a soft signal — within a broader dynamic heap sizing controller that considers GC overhead, mutator behavior, and memory availability. This patch lays the groundwork for that direction.

The behavior when the live set exceeds the soft target has come up in the discussion. My view remains that the heap should be influenced by the value, not strictly bound to it. That’s the balance I’ve been aiming for in describing how it integrates into the control loop — SoftMax helps inform decisions, but doesn’t unconditionally restrict them.

I agree that we’ll want to follow up with logic that can respond to GC pressure and workload needs, to avoid any unintended performance issues. I’ll update JDK-8353716 to reflect this, and I’ll continue the thread on the mailing list to coordinate the next phase.

@bridgekeeper
Copy link

bridgekeeper bot commented May 9, 2025

@caoman This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@tschatzl
Copy link
Contributor

/touch

@openjdk
Copy link

openjdk bot commented May 21, 2025

@tschatzl The pull request is being re-evaluated and the inactivity timeout has been reset.

@bridgekeeper
Copy link

bridgekeeper bot commented Jun 18, 2025

@caoman This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Jul 16, 2025

@caoman This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Jul 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-gc hotspot-gc-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

4 participants