Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: GC pacing exhibits strange behavior with a low GOGC #37927

Closed
mknyszek opened this issue Mar 18, 2020 · 10 comments
Closed

runtime: GC pacing exhibits strange behavior with a low GOGC #37927

mknyszek opened this issue Mar 18, 2020 · 10 comments
Assignees
Labels
Milestone

Comments

@mknyszek
Copy link
Contributor

@mknyszek mknyszek commented Mar 18, 2020

As of Go 1.14, GC pacing changed a bit to alleviate issues with an increased allocation rate, related to #35112.

Unfortunately, the pacing change that was made was made in error. Currently, the code for capping the trigger ratio in gcSetTriggerRatio looks like:

    const minTriggerRatio = 0.6                                                                                                                                                                             
                                                                                                                                                                                                            
    // Set the trigger ratio, capped to reasonable bounds.                                                                                                                                                  
    if triggerRatio < minTriggerRatio {                                                                                                                                                                     
        // This can happen if the mutator is allocating very                                                                                                                                                
        // quickly or the GC is scanning very slowly.                                                                                                                                                       
        triggerRatio = minTriggerRatio                                                                                                                                                                      
    } else if gcpercent >= 0 {                                                                                                                                                                              
        // Ensure there's always a little margin so that the                                                                                                                                                
        // mutator assist ratio isn't infinity.                                                                                                                                                             
        maxTriggerRatio := 0.95 * float64(gcpercent) / 100                                                                                                                                                  
        if triggerRatio > maxTriggerRatio {                                                                                                                                                                 
            triggerRatio = maxTriggerRatio                                                                                                                                                                  
        }                                                                                                                                                                                                   
    }                                                                                                                                                                                                       
    memstats.triggerRatio = triggerRatio

Consider the case where gcpercent == 3 as in the Go 1.14 log found here. If the input trigger ratio is < 0.6, then we'll set it at 0.6... but if it's greater, then we'll always set it to 0.95*3/100, or 0.0285. Given that gcpercent is 3, the latter is the correct behavior. In fact, when we see a 0.6 in a gcpacertrace, we also see a spike in heap use compared to Go 1.13. This is not good, since we're not respecting the trade-off.

I think the fix is to scale the minTriggerRatio just like the maxTriggerRatio, though it probably stops making sense to have a minimum at some point (it gets so close to 0 anyway).

We should fix this and probably backport it to Go 1.14. This breaks the promise of GOGC in some cases, which I think counts as a correctness bug?

CC @aclements @randall77

@mknyszek mknyszek added the NeedsFix label Mar 18, 2020
@mknyszek mknyszek added this to the Go1.15 milestone Mar 18, 2020
@mknyszek mknyszek self-assigned this Mar 18, 2020
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Mar 18, 2020

Change https://golang.org/cl/223937 mentions this issue: runtime: ensure minTriggerRatio never exceeds maxTriggerRatio

@mknyszek

This comment has been minimized.

Copy link
Contributor Author

@mknyszek mknyszek commented Mar 18, 2020

@pijng Would you be willing to try out https://golang.org/cl/223937 and to see if it helps?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 18, 2020

@gopherbot Please open a backport to 1.14. The comment above suggests that we should do this to fix GOGC in 1.14.

@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Mar 18, 2020

Backport issue(s) opened: #37928 (for 1.14).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Mar 18, 2020

@mknyszek Should this be for 1.14.1 or can it wait until 1.14.2?

@pijng

This comment has been minimized.

Copy link

@pijng pijng commented Mar 18, 2020

@mknyszek I did the same tests as before.
It definitely helped and now I don't see any significant difference between 1.13.3 and 1.14 with https://golang.org/cl/223937.

Attaching log just in case.

If additional actions/logs are needed – I'm always ready.

@y3llowcake

This comment has been minimized.

Copy link

@y3llowcake y3llowcake commented Mar 19, 2020

I believe we hit this issue in a recent upgrade of a high QPS production service to go 1.14. On startup, this service makes a few very large heap allocations and then dials down GOGC. During normal request processing operation it makes a high rate of smaller allocations. We found it is now OOMing where previously it did not.

I can write a constrained example program to demonstrate this behavior if it would help?

@mknyszek

This comment has been minimized.

Copy link
Contributor Author

@mknyszek mknyszek commented Mar 19, 2020

@y3llowcake that would be great! Let me know if you need any help; I think I've nailed down the conditions for reproducing this, but it's a little tricky to construct the right benchmark.

@ianlancetaylor After talking to some folks (and you), it looks like the 1.14.1 release is very close and we shouldn't delay it more. We'll get this in 1.14.2.

@y3llowcake

This comment has been minimized.

Copy link

@y3llowcake y3llowcake commented Mar 20, 2020

I didn't get a chance to write up that program today, but figured I would at least share the snippet below. It's taken directly from the service in question and is the function we are using to adjust GOGC.

// This function attempts to set GOGC to a value that will maintain a maximum
// **additional** heap size of maxBytes. Call this after making a very large
// heap allocation.
func AdjustGCPercent(maxBytes uint64) (int, error) {
	if maxBytes < 1 {
		return 0, fmt.Errorf("adjust gcpercent: max bytes must be > 0")
	}
	// Force a GC to run so that memstats are up to date.
	runtime.GC()
	s := runtime.MemStats{}
	var diff uint64
	p := 100
	for ; p > 0; p-- {
		debug.SetGCPercent(p)
		runtime.ReadMemStats(&s)
		// https://golang.org/pkg/runtime/#MemStats
		// NextGC is the target heap size of the next GC cycle.
		//
		// The garbage collector's goal is to keep HeapAlloc ≤ NextGC.
		// At the end of each GC cycle, the target for the next cycle
		// is computed based on the amount of reachable data and the
		// value of GOGC.
		diff = s.NextGC - s.HeapAlloc
		if diff < maxBytes {
			break
		}
	}
	if diff < maxBytes {
		return p, nil
	}
	return p, fmt.Errorf("GC percent adjustment failed; last diff %d max %d", diff, maxBytes)
}
@gopherbot gopherbot closed this in d1ecfcc Mar 26, 2020
@gopherbot

This comment has been minimized.

Copy link

@gopherbot gopherbot commented Mar 26, 2020

Change https://golang.org/cl/225637 mentions this issue: [release-branch.go1.14] runtime: ensure minTriggerRatio never exceeds maxTriggerRatio

gopherbot pushed a commit that referenced this issue Mar 27, 2020
… maxTriggerRatio

Currently, the capping logic for the GC trigger ratio is such that if
gcpercent is low, we may end up setting the trigger ratio far too high,
breaking the promise of SetGCPercent and GOGC has a trade-off knob (we
won't start a GC early enough, and we will use more memory).

This change modifies the capping logic for the trigger ratio by scaling
the minTriggerRatio with gcpercent the same way we scale
maxTriggerRatio.

For #37927.
Fixes #37928.

Change-Id: I2a048c1808fb67186333d3d5a6bee328be2f35da
Reviewed-on: https://go-review.googlesource.com/c/go/+/223937
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
(cherry picked from commit d1ecfcc)
Reviewed-on: https://go-review.googlesource.com/c/go/+/225637
Reviewed-by: David Chase <drchase@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants
You can’t perform that action at this time.