Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upruntime: mechanism for monitoring heap size #16843
Comments
This was referenced Aug 22, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rgooch
commented
Aug 23, 2016
|
Can you please expand on what you have in mind? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bradfitz
Aug 23, 2016
Member
I have nothing specific in mind. This bug was filed as part of a triage meeting with a bunch of us. One bug (#5049) was ancient with no activity and one bug (#14162) proposed a solution instead of discussing the problem.
This bug is recognition that there is a problem, and we've heard problem statements and potential solutions (and sometimes both) from a number of people.
The reality is that there are always memory limits, and it'd be nice for the Go runtime to help applications stay within them, through perhaps some combination of limiting itself, and/or helping the application apply backpressure when resources are getting tight. That might involve new runtime API surface to help applications know when things are getting tight.
/cc @nictuku also.
|
I have nothing specific in mind. This bug was filed as part of a triage meeting with a bunch of us. One bug (#5049) was ancient with no activity and one bug (#14162) proposed a solution instead of discussing the problem. This bug is recognition that there is a problem, and we've heard problem statements and potential solutions (and sometimes both) from a number of people. The reality is that there are always memory limits, and it'd be nice for the Go runtime to help applications stay within them, through perhaps some combination of limiting itself, and/or helping the application apply backpressure when resources are getting tight. That might involve new runtime API surface to help applications know when things are getting tight. /cc @nictuku also. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bradfitz
Aug 23, 2016
Member
Btw, there was lots of good conversation at #14162 and it wasn't our intention to kill it or devalue it. It just didn't fit the proposal process, and we also didn't want to decline it, nor close it as a dup of #5049.
Changing the language is out of scope, so all discussions of things like catching memory allocation failures, language additions like "trymake" or "tryappend", etc, are all not going to happen.
But we can add runtime APIs to help out. That's what this bug is tracking.
/cc @matloob @aclements
|
Btw, there was lots of good conversation at #14162 and it wasn't our intention to kill it or devalue it. It just didn't fit the proposal process, and we also didn't want to decline it, nor close it as a dup of #5049. Changing the language is out of scope, so all discussions of things like catching memory allocation failures, language additions like "trymake" or "tryappend", etc, are all not going to happen. But we can add runtime APIs to help out. That's what this bug is tracking. /cc @matloob @aclements |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rgooch
Aug 23, 2016
Agreed. "try*" isn't practical. It would require changing too make call-sites and even then would not catch all allocations. Adding runtime.SetSoftMemoryLimit() still seems like the best approach.
rgooch
commented
Aug 23, 2016
|
Agreed. "try*" isn't practical. It would require changing too make call-sites and even then would not catch all allocations. Adding runtime.SetSoftMemoryLimit() still seems like the best approach. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
nictuku
Aug 23, 2016
Contributor
It would be nice to have the ability to set a limit to the memory usage.
After a limit is set, perhaps the runtime could provide a clear indication that we're under memory pressure and that the application should avoid creating new allocations. Example new runtime APIs that would help:
func InMemoryPushback() bool; orfunc RegisterPushbackFunc(func(inPushback bool))
That would provide a clear signal to the application. How exactly that's decided should be an internal implementation decision and not part of the API. An example implementation, to illustrate: if we limit ourselves to the heap size specified by the user, we could trigger GC whenever the used heap is close to the limit. Then we could enter pushback whenever the GC performance (latency or CPU overhead) is outside certain bounds. Apply smoothing as needed.
The approach suggested by this API has limitations.
For example, it's still possible for an application that is behaving well to do one monstrous allocation after it has checked for the pushback state. This would be common for HTTP and RPC servers that do admittance control at the beginning of the request processing. If the monstrous allocation would bring the memory heap above the limit, Go should probably panic. Since we don't want to change the language to add memory allocation error checks, I think this is fine. And we have no other option :).
Another problem is that deciding what is the right time to pushback can be hard. Whatever the runtime implements, some folks may find it too aggressive (pushing back too much, leading to poor resource utilization) or too conservative (pushing back too late, leading to high latency due to excessive GC). I guess the go team could provide a knob similar to GOGC to control the pushbackiness of the runtime, if folks are really paranoid about it.
|
It would be nice to have the ability to set a limit to the memory usage. After a limit is set, perhaps the runtime could provide a clear indication that we're under memory pressure and that the application should avoid creating new allocations. Example new runtime APIs that would help:
That would provide a clear signal to the application. How exactly that's decided should be an internal implementation decision and not part of the API. An example implementation, to illustrate: if we limit ourselves to the heap size specified by the user, we could trigger GC whenever the used heap is close to the limit. Then we could enter pushback whenever the GC performance (latency or CPU overhead) is outside certain bounds. Apply smoothing as needed. The approach suggested by this API has limitations. For example, it's still possible for an application that is behaving well to do one monstrous allocation after it has checked for the pushback state. This would be common for HTTP and RPC servers that do admittance control at the beginning of the request processing. If the monstrous allocation would bring the memory heap above the limit, Go should probably panic. Since we don't want to change the language to add memory allocation error checks, I think this is fine. And we have no other option :). Another problem is that deciding what is the right time to pushback can be hard. Whatever the runtime implements, some folks may find it too aggressive (pushing back too much, leading to poor resource utilization) or too conservative (pushing back too late, leading to high latency due to excessive GC). I guess the go team could provide a knob similar to GOGC to control the pushbackiness of the runtime, if folks are really paranoid about it. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
RLH
Aug 23, 2016
Contributor
The runtime could set up a channel and send a message whenever it completes
a GC. The application could have a heap monitor goroutine (HMG) watching
that channel. Whenever the HMG gets a message it inspects the state of the
heap. To determine the size of the heap the HMG would look at the live heap
size and GOGC. If need be it could adjust GOGC so that the total heap does
not exceed whatever limit the application finds appropriate. If things are
going badly for the application the HMG can start applying back pressure to
whatever part of the application is causing the increase in heap size. The
HMG would be part of the application so a wide variety of application
specific strategies could be implemented.
Trying to pick up the pieces after a failure does not seem doable. Likewise
deciding what is "close to a failure" is very application specific and a
global metric that potentially involves external OS issues such as
co-tenancy as well as other issue well beyond the scope of the Go runtime.
Decisions and actions need to be made well ahead if one expects them to
reliable prevent an OOM.
I believe this is where we were headed in #14162
#14162 and this is a recap of some of
that discussion.
I would be interested in what useful policy could not be implemented using
the HMG mechanism and current runtime mechanisms.
On Tue, Aug 23, 2016 at 1:43 AM, Yves Junqueira notifications@github.com
wrote:
For Google's internal needs, it would be nice to have the ability to set a
limit to the memory usage.After a limit is set, perhaps the runtime could provide a clear indication
that we're under memory pressure and that the application should avoid
creating new allocations. Example new runtime APIs that would help:
- func InMemoryPushback() bool; or
- func RegisterPushbackFunc(func(inPushback bool))
That would provide a clear signal to the application. How exactly
that's decided should be an internal implementation decision and not part
of the API. An example implementation, to illustrate: if we limit ourselves
to the heap size specified by the user, we could trigger GC whenever the
used heap is close to the limit. Then we could enter pushback whenever the
GC performance (latency or CPU overhead) is outside certain bounds. Apply
smoothing as needed.The approach suggested by this API has limitations.
For example, it's still possible for an application that is behaving well
to do one monstrous allocation after it has checked for the pushback state.
This would be common for HTTP and RPC servers that do admittance control at
the beginning of the request processing. If the monstrous allocation would
bring the memory heap above the limit, Go should probably panic. Since we
don't want to change the language to add memory allocation error checks, I
think this is fine. And we have no other option :).Another problem is that deciding what is the right time to pushback can be
hard. Whatever the runtime implements, some folks may find it too
aggressive (pushing back too much, leading to poor resource utilization) or
too conservative (pushing back too late, leading to high latency due to
excessive GC). I guess the go team could provide a knob similar to GOGC to
control the pushbackiness of the runtime, if folks are really paranoid
about it.—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#16843 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA7Wn-x0kWzbQY0w2nI8daJSWBbIHPWHks5qiohsgaJpZM4Jqa25
.
|
The runtime could set up a channel and send a message whenever it completes Trying to pick up the pieces after a failure does not seem doable. Likewise I believe this is where we were headed in #14162 I would be interested in what useful policy could not be implemented using On Tue, Aug 23, 2016 at 1:43 AM, Yves Junqueira notifications@github.com
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rgooch
Aug 23, 2016
I previously gave the reasoning why using a channel or a callback to receive memory exceeded events won't work: #14162
That same reasoning applies to a channel whenever a GC run is completed.
To robustly handling exceeding a memory limit the check for the limit has to be part of the allocator, not done after a GC run. This is because you can't afford to wait. If you wait for the next GC run, it may be too late. Consider a single large slice allocation that would put you over the soft limit and would exceed the hard memory limit. You'll get an OOM panic. The same applies to a callback function.
You need to immediately stop the code which is doing the heavy allocating. To do that you need a check in the allocator and you need to send a panic(). It's up to the application to set the soft memory limit at which these optional, catchable panics are sent.
Please, before rehashing old suggestions or coming up with new variants, read through #14162 where I gave the reasoning why a panic and a check in the allocator is needed. Otherwise we keep covering the same old ground.
rgooch
commented
Aug 23, 2016
|
I previously gave the reasoning why using a channel or a callback to receive memory exceeded events won't work: #14162 To robustly handling exceeding a memory limit the check for the limit has to be part of the allocator, not done after a GC run. This is because you can't afford to wait. If you wait for the next GC run, it may be too late. Consider a single large slice allocation that would put you over the soft limit and would exceed the hard memory limit. You'll get an OOM panic. The same applies to a callback function. You need to immediately stop the code which is doing the heavy allocating. To do that you need a check in the allocator and you need to send a panic(). It's up to the application to set the soft memory limit at which these optional, catchable panics are sent. Please, before rehashing old suggestions or coming up with new variants, read through #14162 where I gave the reasoning why a panic and a check in the allocator is needed. Otherwise we keep covering the same old ground. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
quentinmit
Aug 23, 2016
Contributor
@rgooch If you are allocating giant arrays, you probably know exactly where in your code that is happening, and you can add code there to first check if there is enough memory available. You can even do that using the GC information we're discussing passing down a channel.
I do think there is a race here, but in the opposite case - if code is sitting in a tight loop making many small allocations, your channel read/callback might not run in time to actually trigger a new GC soon enough without OOMing.
|
@rgooch If you are allocating giant arrays, you probably know exactly where in your code that is happening, and you can add code there to first check if there is enough memory available. You can even do that using the GC information we're discussing passing down a channel. I do think there is a race here, but in the opposite case - if code is sitting in a tight loop making many small allocations, your channel read/callback might not run in time to actually trigger a new GC soon enough without OOMing. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rgooch
Aug 23, 2016
I discussed all this in #14162: you can be reading GOB-encoded data from a network connection. No way to know ahead of time how big it's going to be. Or it can be some other library you don't control where a lot of data are allocated, whether a single huge slice or a lot of small allocations. The point is, you don't know how much will be allocated before you enter the library code and you've got no way to reach in there and stop things if you hit some pre-defined limit. And, as you say, if you're in a loop watching allocations, even if you could stop things, you may not get there in time. Spinning in a loop watching the memory level is grossly expensive. This needs to be tied to the allocator.
rgooch
commented
Aug 23, 2016
|
I discussed all this in #14162: you can be reading GOB-encoded data from a network connection. No way to know ahead of time how big it's going to be. Or it can be some other library you don't control where a lot of data are allocated, whether a single huge slice or a lot of small allocations. The point is, you don't know how much will be allocated before you enter the library code and you've got no way to reach in there and stop things if you hit some pre-defined limit. And, as you say, if you're in a loop watching allocations, even if you could stop things, you may not get there in time. Spinning in a loop watching the memory level is grossly expensive. This needs to be tied to the allocator. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
RLH
Aug 23, 2016
Contributor
This does not propose a callback or channel for delivering a memory
exceeded message or a memory almost exceeded message. At that point it is
already too late. This proposes a mechanism for providing the application
timely information that it can use to avoid the OOM. The application knows
how best to predict memory usage and, if need be, throttle its memory usage.
One suggestion was
func runtime.ReserveOOMBuffer(size uint64)
The application's heap monitor goroutine, HMG, could initially allocate a
large object of the required size and retain a single reference to it. If
the HMG using information provided by the runtime determines that the
current GOGC and live heap size will not support the application's
predicted allocations then it can release that single reference confident
that the next GC will recover those spans and make them available. It the
HMG wants the GC to happen sooner than currently scheduled then it can
lower GOGC using SetGCPercent.
If ReserveOOMBuffer is the API that some Go application needs then this
provides it. The intent of this proposal is to provide the application with
the information it needs to create the abstractions that best fits its need
while minimizing Go's runtime API surface.
On Tue, Aug 23, 2016 at 11:13 AM, rgooch notifications@github.com wrote:
I previously gave the reasoning why using a channel or a callback to
receive memory exceeded events won't work: #14162
#14162
That same reasoning applies to a channel whenever a GC run is completed.To robustly handling exceeding a memory limit the check for the limit has
to be part of the allocator, not done after a GC run. This is because you
can't afford to wait. If you wait for the next GC run, it may be too late.
Consider a single large slice allocation that would put you over the soft
limit and would exceed the hard memory limit. You'll get an OOM panic.
The same applies to a callback function.You need to immediately stop the code which is doing the heavy allocating.
To do that you need a check in the allocator and you need to send a
panic(). It's up to the application to set the soft memory limit at which
these optional, catchable panics are sent.Please, before rehashing old suggestions or coming up with new variants,
read through #14162 #14162 where I
gave the reasoning why a panic and a check in the allocator is needed.
Otherwise we keep covering the same old ground.—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#16843 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA7Wn4rwLDnFazl8ko7MEgqGqjlHlYJKks5qiw4rgaJpZM4Jqa25
.
|
This does not propose a callback or channel for delivering a memory One suggestion was The application's heap monitor goroutine, HMG, could initially allocate a If ReserveOOMBuffer is the API that some Go application needs then this On Tue, Aug 23, 2016 at 11:13 AM, rgooch notifications@github.com wrote:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
dr2chase
Aug 25, 2016
Contributor
As I read this, #14162 describes a workload where (analogy follows) sometimes the python attempts to swallow a rhino, and if the attempt is not halted ASAP it is guaranteed to end badly. Is it in fact the case that the rhino will never be successfully swallowed? (I can imagine DOS attacks on servers where this might be the case.)
I think that the periodic notification scheme is intended to deal with a python diet of a large number of smaller prey; if an application has the two constraints of m=memory < M and l=latency < L, and if m is affine in workload W (reasonable assumption) and l is also affine in workload W (semi-reasonable), then simply comparing observed m with limit M and observed l with limit L tells you how much more work can be admitted (W' = W * min(M/m, L/l)), with the usual handwaving around unlucky variations in the input and lag in the measurement. It's possible to adjust GOGC up or down if M/m and L/l are substantially different, so as to maximize the workload within constraints -- this however also requires knowledge of the actual GC overhead imposed on the actual application (supposed to be 25% during GC, but high allocation rates change this). One characteristic of this approach is that a newly started application might not snap online immediately at full load, but would increase its intake as it figured out what load it could handle.
But this is no help for intermittent rhino-swallowing.
|
As I read this, #14162 describes a workload where (analogy follows) sometimes the python attempts to swallow a rhino, and if the attempt is not halted ASAP it is guaranteed to end badly. Is it in fact the case that the rhino will never be successfully swallowed? (I can imagine DOS attacks on servers where this might be the case.) I think that the periodic notification scheme is intended to deal with a python diet of a large number of smaller prey; if an application has the two constraints of m=memory < M and l=latency < L, and if m is affine in workload W (reasonable assumption) and l is also affine in workload W (semi-reasonable), then simply comparing observed m with limit M and observed l with limit L tells you how much more work can be admitted (W' = W * min(M/m, L/l)), with the usual handwaving around unlucky variations in the input and lag in the measurement. It's possible to adjust GOGC up or down if M/m and L/l are substantially different, so as to maximize the workload within constraints -- this however also requires knowledge of the actual GC overhead imposed on the actual application (supposed to be 25% during GC, but high allocation rates change this). One characteristic of this approach is that a newly started application might not snap online immediately at full load, but would increase its intake as it figured out what load it could handle. But this is no help for intermittent rhino-swallowing. |
quentinmit
added this to the Go1.8Maybe milestone
Sep 6, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
bradfitz
Oct 5, 2016
Member
As long as the proposal isn't to "make it possible to catch failed memory allocations", which I'm pretty sure everybody agrees isn't going to happen.
But any proposal should address or at least consider the whole range of related issues in this space. (back pressure, runtime & applications being aware of limits & usage levels)
|
As long as the proposal isn't to "make it possible to catch failed memory allocations", which I'm pretty sure everybody agrees isn't going to happen. But any proposal should address or at least consider the whole range of related issues in this space. (back pressure, runtime & applications being aware of limits & usage levels) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
jessfraz
Oct 5, 2016
Contributor
I was thinking a couple additions to the runtime package to expose information that might be useful for applications like you said in #16843 (comment)
|
I was thinking a couple additions to the runtime package to expose information that might be useful for applications like you said in #16843 (comment) |
quentinmit
added
the
NeedsDecision
label
Oct 11, 2016
rsc
modified the milestones:
Go1.9,
Go1.8Maybe
Oct 21, 2016
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
juliandroid
Feb 20, 2017
Is there any decision about how this would be properly implemented?
In Perl there is documented a notorious $^M global variable that user code could initialize to some lengthy string, that in case of Out of memory error could be used as an emergency memory pool after die()ing. However I couldn't find a working example and it seems that feature was never implemented.
However it seems logical approach. Since you are most probably in multitenancy environment, sharing memory with other go/non-go programs, so the only buffer that you can rely on is the emergency one allocated by yourself. Using that memory by go runtime in case of low memory and immediately notifying the subscribed process that you are running out of memory seems like a good measure to prevent pure go programs panic.
juliandroid
commented
Feb 20, 2017
|
Is there any decision about how this would be properly implemented? In Perl there is documented a notorious $^M global variable that user code could initialize to some lengthy string, that in case of Out of memory error could be used as an emergency memory pool after die()ing. However I couldn't find a working example and it seems that feature was never implemented. However it seems logical approach. Since you are most probably in multitenancy environment, sharing memory with other go/non-go programs, so the only buffer that you can rely on is the emergency one allocated by yourself. Using that memory by go runtime in case of low memory and immediately notifying the subscribed process that you are running out of memory seems like a good measure to prevent pure go programs panic. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
nictuku
Feb 20, 2017
Contributor
My proposal is here: https://docs.google.com/document/d/1zn4f3-XWmoHNj702mCCNvHqaS7p9rzqQGa74uOwOBKM/edit
I hope to have an implementation open sourced soon. I don't know if it could be included in the standard libraries.
I would like to make it as robust as possible, so if you'd like to test it, please drop me an email (see my github profile) and I'll contact you later. Thanks!
|
My proposal is here: https://docs.google.com/document/d/1zn4f3-XWmoHNj702mCCNvHqaS7p9rzqQGa74uOwOBKM/edit I hope to have an implementation open sourced soon. I don't know if it could be included in the standard libraries. I would like to make it as robust as possible, so if you'd like to test it, please drop me an email (see my github profile) and I'll contact you later. Thanks! |
brian-brazil
referenced this issue
Mar 7, 2017
Closed
Implement strategies to limit memory usage. #455
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rgooch
Mar 8, 2017
This proposal looks interesting. I made a couple of comments in the document:
-
Support the pattern of pre-allocating at startup (up to a percentage of the VM/container memory) and never give that memory back to the OS
-
Have a hard memory limit and push back+GC harder as you get closer to the limit.
rgooch
commented
Mar 8, 2017
|
This proposal looks interesting. I made a couple of comments in the document:
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
CAFxX
Mar 8, 2017
Contributor
Added feedback to optionally trigger orderly application shutdown when GC pacing fails to keep memory below the set maximum.
|
Added feedback to optionally trigger orderly application shutdown when GC pacing fails to keep memory below the set maximum. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tve
May 11, 2017
I'm dealing with an app that runs out of memory (on a 16GB box) and that eventually lead me here. Some of the notes I took along the way are below, apologies if these fall into a "yeah, we know" category.
- On 64-bit linux, I hit the out-of-memory panic in
sysMapin mem_linux.go:216, but when I look up the call stack I see it passing throughgrowin mheap.go:774 and the code leads me to believe that ifsysMaphad returned an error instead of just panicking thengrowcould have tried a smaller allocation.
fatal error: runtime: out of memory
runtime stack:
runtime.throw(0x8a2de5, 0x16)
/usr/local/go/src/runtime/panic.go:596 +0x95
runtime.sysMap(0xc437a10000, 0x5800000, 0xc420394800, 0xaebef8)
/usr/local/go/src/runtime/mem_linux.go:216 +0x1d0
runtime.(*mheap).sysAlloc(0xad31a0, 0x5800000, 0x421b81)
/usr/local/go/src/runtime/malloc.go:428 +0x374
runtime.(*mheap).grow(0xad31a0, 0x2c00, 0x0)
/usr/local/go/src/runtime/mheap.go:774 +0x62
runtime.(*mheap).allocSpanLocked(0xad31a0, 0x2c00, 0xaceb30)
/usr/local/go/src/runtime/mheap.go:678 +0x44f
- I'm running in a container env where the container has a max memory set and I'm trying to understand what fraction of that can realistically be "in_use". It appears that I have to count for anywhere from 25% to 50% overhead. E.g., if the cgroup has memory=16GB then the actual in-use heap data structures may be in the 8GB..12GB range before I hit the out-of-memory panic. On the one hand, with GC that's perhaps in the reasonable ballpark, on the other hand this does represent $$.
- The amount of "unused heap overhead" seems to be tunable using the GOGC env variable, I didn't see a way to modify this at run-time. For example. while the process is far from its limit using 100% reduces GC overhead, but when it reaches perhaps 60% of its limit I may want to change it to 20% to trade memory vs cpu. In my app I see it going from 1% to 6% of cpu overhead.
- I'm very interested in being able to capture control when the process runs out of memory or is about to. I understand that in the absolute this is a difficult problem, but I'm looking at it from a troubleshooting perspective. I would first use it to output a memory profile or similar information so I can understand how much memory is allocated where, plus some info about GC (e.g. allocated but unused space). It would be OK for this to trigger before absolutely-out-of-memory occurs, e.g. the first time the runtime gets back-pressure from the OS (see first bullet point).
- I do believe that many services can adjust their memory consumption by, broadly speaking, adjusting the concurrency. For example, an HTTP server can adjust the number of requests that are concurrently processed. I believe the runtime.MemStats info is sufficient for this purpose, but it could be enhanced by having some callback mechanism when a threshold is exceeded. E.g., a web server could block processing of new requests when 80% of available memory is used and only resume when it drops below 75%.
Overall I concur with the sentiment that most apps that run out of memory will run out of memory regardless of how fancy a mechanism is added to the current situation. For this reason if I had a vote I would vote for adding some additional simple hooks so one can do some tuning and foremost troubleshoot when an app does run out of memory.
tve
commented
May 11, 2017
|
I'm dealing with an app that runs out of memory (on a 16GB box) and that eventually lead me here. Some of the notes I took along the way are below, apologies if these fall into a "yeah, we know" category.
Overall I concur with the sentiment that most apps that run out of memory will run out of memory regardless of how fancy a mechanism is added to the current situation. For this reason if I had a vote I would vote for adding some additional simple hooks so one can do some tuning and foremost troubleshoot when an app does run out of memory. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
aclements
May 11, 2017
Member
On 64-bit linux, I hit the out-of-memory panic in sysMap in mem_linux.go:216, but when I look up the call stack I see it passing through grow in mheap.go:774 and the code leads me to believe that if sysMap had returned an error instead of just panicking then grow could have tried a smaller allocation.
I'm not sure what you're suggesting, exactly. grow can reduce its request by at most 64 KB, which probably isn't going to help when a multi-gigabyte heap is running out of room.
I'm running in a container env where the container has a max memory set and I'm trying to understand what fraction of that can realistically be "in_use". It appears that I have to count for anywhere from 25% to 50% overhead.
Assuming you mean runtime.MemStats.HeapInUse (and friends), note that this can vary depending on where you are in a GC cycle. Perhaps more interesting is MemStats.NextGC, which tells you what heap size this GC cycle is trying to keep you below. This changes only once per GC cycle.
The amount of "unused heap overhead" seems to be tunable using the GOGC env variable, I didn't see a way to modify this at run-time.
runtime/debug.SetGCPercent lets you change this. Right now this triggers a full STW GC, but in Go 1.9 this operation will let you change GOGC on the fly without triggering a GC (unless you set it low enough that you have to immediately start a GC, of course :)
I'm not sure what you're suggesting, exactly.
Assuming you mean runtime.MemStats.HeapInUse (and friends), note that this can vary depending on where you are in a GC cycle. Perhaps more interesting is MemStats.NextGC, which tells you what heap size this GC cycle is trying to keep you below. This changes only once per GC cycle.
runtime/debug.SetGCPercent lets you change this. Right now this triggers a full STW GC, but in Go 1.9 this operation will let you change GOGC on the fly without triggering a GC (unless you set it low enough that you have to immediately start a GC, of course :) |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tve
May 12, 2017
I'm not sure what you're suggesting, exactly. grow can reduce its request by at most 64 KB, which probably isn't going to help when a multi-gigabyte heap is running out of room.
Ah, I couldn't tell that, you're right then.
tve
commented
May 12, 2017
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tve
May 12, 2017
My proposal is here: https://docs.google.com/document/d/1zn4f3-XWmoHNj702mCCNvHqaS7p9rzqQGa74uOwOBKM/edit
Nice long proposal write-up :-). I'm trying to understand the tl;dr; ...
The proposal seems to come down do "periodically measure live data size and set GCPercent such that GC is triggered before the desired total heap size is reached". As mentioned in the proposal, this can be done/approximated today in the app itself using runtime.MemStats and debug.SetGCPercent.
As far as I can tell the following changes to the runtime would be desirable to improve this:
- ensure that the calls necessary are efficient (some (all?) optimization are in Go1.9 already)
- provide a hook so GCPercent can be adjusted after each GC instead of relying on a periodic timer?
As a user I'm still left wondering a bit what a reasonable goal in all of this is. I'm imagining something like "for the vast majority of Go apps the tuning of GCPercent allows 80% of memory to be used for live data with moderate GC overhead and 90% with high to very high GC overhead". Maybe someone in the Go community has informed intuition about specific numbers.
The answer to requests to have some callback or rescue option when memory allocation fails would be that instead GC overhead exceeding N% or GCPercent falling below below M% should be used to trigger said rescue action.
tve
commented
May 12, 2017
Nice long proposal write-up :-). I'm trying to understand the tl;dr; ... The proposal seems to come down do "periodically measure live data size and set GCPercent such that GC is triggered before the desired total heap size is reached". As mentioned in the proposal, this can be done/approximated today in the app itself using runtime.MemStats and debug.SetGCPercent. As far as I can tell the following changes to the runtime would be desirable to improve this:
As a user I'm still left wondering a bit what a reasonable goal in all of this is. I'm imagining something like "for the vast majority of Go apps the tuning of GCPercent allows 80% of memory to be used for live data with moderate GC overhead and 90% with high to very high GC overhead". Maybe someone in the Go community has informed intuition about specific numbers. The answer to requests to have some callback or rescue option when memory allocation fails would be that instead GC overhead exceeding N% or GCPercent falling below below M% should be used to trigger said rescue action. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
tve
May 12, 2017
I did an experiment to use GCPercent to constrain heap size and while the principle works as expected, it does look sufficient to me. I'm working on an app that digests some giant CSVs where memory consumption is an issue. I'm running with GCPercent=25 to try and contain the memory overhead. I'm running with gctrace=1 and the highest heap size number I see is 797MB:
gc 389 @209.888s 6%: 0.013+888+0.10 ms clock, 0.055+164/183/1068+0.40 ms cpu, 796->797->613 MB, 797 MB goal, 4 P
A little later after some memory has been freed I grab MemStats and get the following HeapXxx stats which show 1.2GB of heap (all gctrace outputs since the above were lower):
Heap stats: sys=1205MB inuse=488MB alloc=438, idle=717, released=0
Data grabbed from top at about that time seems to agree with the heap stats (code/stack size are not significant):
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
17746 tve 20 0 1272812 983920 6952 S 226.2 24.9 7:44.84 csv-digest
I was trying to keep the memory used by my process to 613MB*1.25=767MB using GCPercent but clearly that's not really working.
The point here is that tuning GCPercent is not sufficient if there is some hard limit one wants to stay under.
(I understand that my 25% goal may very well be unrealistic but I don't think this invalidates the point.)
tve
commented
May 12, 2017
|
I did an experiment to use GCPercent to constrain heap size and while the principle works as expected, it does look sufficient to me. I'm working on an app that digests some giant CSVs where memory consumption is an issue. I'm running with GCPercent=25 to try and contain the memory overhead. I'm running with gctrace=1 and the highest heap size number I see is 797MB:
A little later after some memory has been freed I grab MemStats and get the following HeapXxx stats which show 1.2GB of heap (all gctrace outputs since the above were lower):
Data grabbed from top at about that time seems to agree with the heap stats (code/stack size are not significant):
I was trying to keep the memory used by my process to 613MB*1.25=767MB using GCPercent but clearly that's not really working. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
twotwotwo
Nov 15, 2017
@nictuku Very helpful; I'd looked at that doc but didn't catch the details of when it was tried for real. Reading your list of problems, it looks like they might have even partly motivated 1.9's changes to make SetGCPercent/FreeOSMemory not STW, make SetGCPercent not always start a concurrent GC, and get MemStats under 100µs on big heaps. Makes me interested in trying to reimplement your approach and see how it works today. (Still agree the runtime will always be able to a better job of this than app code.)
Also curious about the status of the SetMaxHeap experiment by @aclements in https://go-review.googlesource.com/c/go/+/46751. I see it got +2'd by @RLH but not sure that means anything for next steps. Is some API like that planned to go in eventually or at least still under consideration? If so, is there anywhere else we should look for discussion about it?
twotwotwo
commented
Nov 15, 2017
|
@nictuku Very helpful; I'd looked at that doc but didn't catch the details of when it was tried for real. Reading your list of problems, it looks like they might have even partly motivated 1.9's changes to make SetGCPercent/FreeOSMemory not STW, make SetGCPercent not always start a concurrent GC, and get MemStats under 100µs on big heaps. Makes me interested in trying to reimplement your approach and see how it works today. (Still agree the runtime will always be able to a better job of this than app code.) Also curious about the status of the SetMaxHeap experiment by @aclements in https://go-review.googlesource.com/c/go/+/46751. I see it got +2'd by @RLH but not sure that means anything for next steps. Is some API like that planned to go in eventually or at least still under consideration? If so, is there anywhere else we should look for discussion about it? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
nictuku
Nov 21, 2017
Contributor
@twotwotwo I tried my experiment with at least one of the changes from @aclements backported into 1.8 - the one speeding up getMemStats, IIRC. It didn't really help that much :-(. The naive implementation has too many problems and getMemStats was just one of them.
https://golang.org/cl/46751 looks promising.
|
@twotwotwo I tried my experiment with at least one of the changes from @aclements backported into 1.8 - the one speeding up getMemStats, IIRC. It didn't really help that much :-(. The naive implementation has too many problems and getMemStats was just one of them. https://golang.org/cl/46751 looks promising. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rgooch
Nov 21, 2017
So, looking at https://golang.org/cl/46751 there is still the problem of how to induce a panic for goroutines which have opted-in. If you have code which is stuck in some library code performing huge allocations, you need a way to induce a panic so that the allocations are stopped. @aclements: will your solution be including that feature as well?
rgooch
commented
Nov 21, 2017
|
So, looking at https://golang.org/cl/46751 there is still the problem of how to induce a panic for goroutines which have opted-in. If you have code which is stuck in some library code performing huge allocations, you need a way to induce a panic so that the allocations are stopped. @aclements: will your solution be including that feature as well? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
RLH
Nov 21, 2017
Contributor
|
The idea is to use contexts for this type of cancellation functionality.
https://golang.org/pkg/context/
…On Tue, Nov 21, 2017 at 11:28 AM, rgooch ***@***.***> wrote:
So, looking at https://golang.org/cl/46751 there is still the problem of
how to induce a panic for goroutines which have opted-in. If you have code
which is stuck in some library code performing huge allocations, you need a
way to induce a panic so that the allocations are stopped. @aclements
<https://github.com/aclements>: will your solution be including that
feature as well?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16843 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA7Wn7I_G9soyOShpAp3f7HtpNj3MrR_ks5s4vowgaJpZM4Jqa25>
.
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
beorn7
Nov 21, 2017
Just for the record, as Prometheus was mentioned early in the whole story as a signature use case: If you look at the current Prometheus code (v2.x), you'll find that that's not the case anymore, as most of the memory used is in mmap'd files. The RSS of Prometheus 2.x is tiny compared to Prometheus 1.x.
However, I do believe that using mmap in Go programs (and the subsequent management of raw data blocks of memory) will and should be limited to very specific scenarios and is certainly not a viable work-around in general. So please do keep up the good work here!
If you are interested how the problem was “solved” in the later 1.x Prometheus versions (calling ReadMemStats once per second), here are the relevant code references (including the sometimes desperate comments of the poor coder):
- https://github.com/prometheus/prometheus/blob/2fad91d25a957d4329140889ed501957c72c3651/storage/local/storage.go#L46-L50
- https://github.com/prometheus/prometheus/blob/2fad91d25a957d4329140889ed501957c72c3651/storage/local/storage.go#L1130-L1134
- https://github.com/prometheus/prometheus/blob/2fad91d25a957d4329140889ed501957c72c3651/storage/local/storage.go#L1167-L1169
- https://github.com/prometheus/prometheus/blob/2fad91d25a957d4329140889ed501957c72c3651/storage/local/storage.go#L1208-L1318
As you can see, this grew into something fairly involved, which is, however, still not able to make absolutely sure we won't let the heap grow too much. On the other hand, the RSS is anyway not closely correlated to the heap size (and, for some reason I don't know, increased slightly for the same heap sizes with Go1.9). In practice, this worked quite nicely. On our fairly large number of Prometheus servers at SoundCloud (~70 servers), we never had an OOM-kill again, until we compiled Prometheus with Go1.9 but kept the settings the same (and the ratio between RSS and heap size went up).
beorn7
commented
Nov 21, 2017
|
Just for the record, as Prometheus was mentioned early in the whole story as a signature use case: If you look at the current Prometheus code (v2.x), you'll find that that's not the case anymore, as most of the memory used is in mmap'd files. The RSS of Prometheus 2.x is tiny compared to Prometheus 1.x. If you are interested how the problem was “solved” in the later 1.x Prometheus versions (calling
As you can see, this grew into something fairly involved, which is, however, still not able to make absolutely sure we won't let the heap grow too much. On the other hand, the RSS is anyway not closely correlated to the heap size (and, for some reason I don't know, increased slightly for the same heap sizes with Go1.9). In practice, this worked quite nicely. On our fairly large number of Prometheus servers at SoundCloud (~70 servers), we never had an OOM-kill again, until we compiled Prometheus with Go1.9 but kept the settings the same (and the ratio between RSS and heap size went up). |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rgooch
Nov 21, 2017
That requires that all my transitive dependencies support contexts. That does not seem likely to happen, or will take a loooooong time.
rgooch
commented
Nov 21, 2017
|
That requires that all my transitive dependencies support contexts. That does not seem likely to happen, or will take a loooooong time. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
aclements
Nov 21, 2017
Member
Also curious about the status of the SetMaxHeap experiment by @aclements in https://go-review.googlesource.com/c/go/+/46751. I see it got +2'd by @RLH but not sure that means anything for next steps. Is some API like that planned to go in eventually or at least still under consideration? If so, is there anywhere else we should look for discussion about it?
The intent is to get some experience with that API and make sure it actually solves problems (and doesn't create new ones :). We're planning to roll it out as an experimental API within Google and I was also hoping to get some adventurous open source users to try it out (I should email golang-dev), but neither of these has happened yet.
So, looking at https://golang.org/cl/46751 there is still the problem of how to induce a panic for goroutines which have opted-in. ... @aclements: will your solution be including that feature as well?
Sorry, but no, that isn't part of my solution. As @RLH said, context cancellation is the "right" answer to this, though I understand that context isn't everywhere. I'm afraid I still don't really know what a panic-based solution would look like. What actually triggers the panics? Which goroutines actually get hit by the panic? A large part of the point of CL 46751 is that the back-pressure is gradual, graceful, and application-level, so the application can respond as a whole before things go too terribly wrong. And it's application level because the heap is application level. We don't have the ability to say "Goroutine X is using Y MB of memory" because it's not well-formed in general (how do you count things reachable from multiple goroutines?). "Y MB could be freed if goroutine X exited" is well-formed and could theoretically be useful for this, but I'm pretty sure it's very expensive to figure out the answer to that.
(and the ratio between RSS and heap size went up).
@beorn7, out of curiosity, what sort of ratio are you seeing in practice? In general it's hard to bound this, so I expect you'll always have to do some testing to establish this and it will change a bit between releases.
The intent is to get some experience with that API and make sure it actually solves problems (and doesn't create new ones :). We're planning to roll it out as an experimental API within Google and I was also hoping to get some adventurous open source users to try it out (I should email golang-dev), but neither of these has happened yet.
Sorry, but no, that isn't part of my solution. As @RLH said, context cancellation is the "right" answer to this, though I understand that context isn't everywhere. I'm afraid I still don't really know what a panic-based solution would look like. What actually triggers the panics? Which goroutines actually get hit by the panic? A large part of the point of CL 46751 is that the back-pressure is gradual, graceful, and application-level, so the application can respond as a whole before things go too terribly wrong. And it's application level because the heap is application level. We don't have the ability to say "Goroutine X is using Y MB of memory" because it's not well-formed in general (how do you count things reachable from multiple goroutines?). "Y MB could be freed if goroutine X exited" is well-formed and could theoretically be useful for this, but I'm pretty sure it's very expensive to figure out the answer to that.
@beorn7, out of curiosity, what sort of ratio are you seeing in practice? In general it's hard to bound this, so I expect you'll always have to do some testing to establish this and it will change a bit between releases. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rgooch
Nov 21, 2017
To use an enlightened quote: "the perfect is the enemy of the good". Whether contexts are the "right" solution is unclear, but it is clear that it will be a long time before they can help solve the problem generally. In the meantime, people have to deal with OOM panics.
Here is an approach that may work while preserving the solution you've implemented: add an API that allows a goroutine to receive an externally induced panic. That would then allow me to catch the event from your memory pressure channel and start sending events to goroutines to initiate panics. Ideally, if the screams from the garbage collector get louder, I'd start inducing panics to more and more goroutines, to bring the situation under control. This follows the basic "opt-in" philosophy that I've been advocating. I know which goroutines are vulnerable to triggering an OOM, so those are the ones I opt-in to being killed.
Suggested API:
func MakePanicChannel() chan <- error
When an error is sent on the channel, the calling goroutine will panic, with the provided error.
rgooch
commented
Nov 21, 2017
|
To use an enlightened quote: "the perfect is the enemy of the good". Whether contexts are the "right" solution is unclear, but it is clear that it will be a long time before they can help solve the problem generally. In the meantime, people have to deal with OOM panics. Here is an approach that may work while preserving the solution you've implemented: add an API that allows a goroutine to receive an externally induced panic. That would then allow me to catch the event from your memory pressure channel and start sending events to goroutines to initiate panics. Ideally, if the screams from the garbage collector get louder, I'd start inducing panics to more and more goroutines, to bring the situation under control. This follows the basic "opt-in" philosophy that I've been advocating. I know which goroutines are vulnerable to triggering an OOM, so those are the ones I opt-in to being killed. Suggested API: When an error is sent on the channel, the calling goroutine will panic, with the provided error. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
beorn7
Nov 21, 2017
@beorn7, out of curiosity, what sort of ratio are you seeing in practice? In general it's hard to bound this, so I expect you'll always have to do some testing to establish this and it will change a bit between releases.
Yes, totally aware of that. I didn't want to imply a need to clamp RSS (which would be close to impossible) but merely underline that clamping the heap size doesn't have to be perfect rocket science to have the effects desired in many scenarios.
To answer your question: Our rule of thumb for a reasonably safe heap size setting was 67% of available physical memory with Go1.8 compiled Prometheus 1.7, and 60% for Go1.9 compiled Prometheus 1.8. (Note the beautiful version number dance…)
beorn7
commented
Nov 21, 2017
Yes, totally aware of that. I didn't want to imply a need to clamp RSS (which would be close to impossible) but merely underline that clamping the heap size doesn't have to be perfect rocket science to have the effects desired in many scenarios. To answer your question: Our rule of thumb for a reasonably safe heap size setting was 67% of available physical memory with Go1.8 compiled Prometheus 1.7, and 60% for Go1.9 compiled Prometheus 1.8. (Note the beautiful version number dance…) |
rsc
modified the milestones:
Go1.10,
Go1.11
Nov 22, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
RLH
Nov 22, 2017
Contributor
|
An API that allows a goroutine to externally induced panic in another
goroutine means the condemned goroutine must reason about whether an
asynchronously induced panic will leave the system in an inconsistent
state. Even a locked critical section would need to reason about
consistency in the face of an asynchronous panic. Debugging asynchronous
issues and writing test cases would be a real challenge. The experience
with Java and Thread.Stop ended up being deprecated for these and other
reasons. Doesn't MakePanicChannel have the same set of problems?
…On Tue, Nov 21, 2017 at 12:49 PM, Björn Rabenstein ***@***.*** > wrote:
@beorn7 <https://github.com/beorn7>, out of curiosity, what sort of ratio
are you seeing in practice? In general it's hard to bound this, so I expect
you'll always have to do some testing to establish this and it will change
a bit between releases.
Yes, totally aware of that. I didn't want to imply a need to clamp RSS
(which would be close to impossible) but merely underline that clamping the
heap size doesn't have to be perfect rocket science to have the effects
desired in many scenarios.
To answer your question: Our rule of thumb for a reasonably safe heap size
setting was 67% of available physical memory with Go1.8 compiled Prometheus
1.7, and 60% for Go1.9 compiled Prometheus 1.8. (Note the beautiful version
number dance…)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16843 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA7Wn0lVGs4srb_YZEmYvZfEtWZlnlGtks5s4w0hgaJpZM4Jqa25>
.
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rgooch
Nov 22, 2017
Firstly, this is an opt-in mechanism. The goroutine must call MakePanicChannel and it must register that channel with whomever it wishes to give panic powers. Secondly, people should be using defer to manage their locks, which mitigates a lot of the problems with panic-as-abort. For the class of problems I've discussed up-thread, this approach will work well.
rgooch
commented
Nov 22, 2017
|
Firstly, this is an opt-in mechanism. The goroutine must call |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
RLH
Nov 22, 2017
Contributor
|
My concern isn't about releasing a held lock using a defer. Asynchronous
externally induced control flow between any statement in the critical
section and the defer logic is a foot gun. Critical sections exist to
provide consistency and isolation, the C and I in ACID. Providing and
testing these properties in face of such control flow would be a challenge
and likely be error prone.
…On Wed, Nov 22, 2017 at 11:53 AM, rgooch ***@***.***> wrote:
Firstly, this is an opt-in mechanism. The goroutine must call
MakePanicChannel and it must register that channel with whomever it
wishes to give panic powers. Secondly, people should be using defer to
manage their locks, which mitigates a lot of the problems with
panic-as-abort. For the class of problems I've discussed up-thread, this
approach will work well.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16843 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA7Wn5WCM-nwQTaifAYYpgMYOt4Myw7Hks5s5FGUgaJpZM4Jqa25>
.
|
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
rgooch
Nov 27, 2017
How else would you recover from a too-large memory allocation (and unwind the calling stack) which is buried deep?
rgooch
commented
Nov 27, 2017
|
How else would you recover from a too-large memory allocation (and unwind the calling stack) which is buried deep? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
RLH
Nov 28, 2017
Contributor
|
Neither the literature, languages similar to Go, nor this thread provide a
satisfactory answer of how to recover from an OOM. This issue and the
proposed solution provides a _mechanism for monitoring heap size_ and the
tools needed to avoid an OOM.
…On Mon, Nov 27, 2017 at 5:16 PM, rgooch ***@***.***> wrote:
How else would you recover from a too-large memory allocation (and unwind
the calling stack) which is buried deep?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16843 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA7Wn1zyEBCI65-BVg9QW2BvLUHBXDZAks5s6zS3gaJpZM4Jqa25>
.
|
cespare
referenced this issue
Dec 8, 2017
Open
proposal: runtime: add a mechanism for specifying a minimum target heap size #23044
chebizarro
referenced this issue
Apr 3, 2018
Closed
Fuzzing wkb encoding causes out of memory errors #384
ianlancetaylor
modified the milestones:
Go1.11,
Go1.12
Jul 10, 2018
graphaelli
referenced this issue
Jul 16, 2018
Open
Make concurrent requests limit based on memory #1057
gdey
referenced this issue
Sep 5, 2018
Open
Fuzzing wkb encoding causes out of memory errors #384 #21
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
robaho
Sep 7, 2018
In the meantime, something like Java's 'dump heap on OOM' would be very helpful, as long as there is a heap analyzer tool - but I assume it could just dump it in the memprof format and that should suffice.
robaho
commented
Sep 7, 2018
|
In the meantime, something like Java's 'dump heap on OOM' would be very helpful, as long as there is a heap analyzer tool - but I assume it could just dump it in the memprof format and that should suffice. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
aclements
Sep 7, 2018
Member
The intent is to get some experience with that API and make sure it actually solves problems (and doesn't create new ones :). We're planning to roll it out as an experimental API within Google and I was also hoping to get some adventurous open source users to try it out (I should email golang-dev), but neither of these has happened yet.
Just a quick update on this. We've had several projects trying out SetMaxHeap. It helps, but this experimentation has uncovered a few rough interactions with other parts of the runtime. The biggest problem is that large object fragmentation can cause the heap's RSS to grow significantly larger than the allocated size of the heap (#14045). As a result, for systems that suffer from large object fragmentation, there needs to be a large (and somewhat unpredictable) buffer between the reserved memory and the max heap. The other known problem is that stacks and globals don't currently count toward the GC trigger, but do count toward the RSS, and since stacks change dynamically it's hard to account for this overhead when setting the heap limit (#19839 and #23044). Both of these issues also cause problems for other reasons (wasting memory and failing to amortize GC costs). I've prioritized fixing both for Go 1.12.
Just a quick update on this. We've had several projects trying out |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
andreimatei
Sep 7, 2018
@aclements thank you for your interest in this area. For what it's worth, all this is a pretty big deal for CockroachDB, who's trying to do memory accounting and would like to trust the runtime to stay within given limits.
andreimatei
commented
Sep 7, 2018
|
@aclements thank you for your interest in this area. For what it's worth, all this is a pretty big deal for CockroachDB, who's trying to do memory accounting and would like to trust the runtime to stay within given limits. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
vitalyisaev2
Sep 10, 2018
I have an urgent need in a tool that helps to understand what's actually going in a Go process address space, why RSS keeps growing despite of memory limitations and so on. Also I need to use cgo, which makes problem even more complicated. Currently I have to use a set of tools like pprof, valgring --tool=massif, viewcore (introduced few weeks ago), and some self-developed tools. But it looks like I can see only different aspects of the problem, not an entire problem.
For example, I see that process has 5GB RSS. Go runtime says that it takes 2GB (though only 5% of them is used, and other 95% are idle). cgo library says that it uses < 500MB in it's caches and other internal data structures. And no one knows what consumed the remaining 2.5GB. I can only speculate if it's Go runtime cost (for instance, because I have profiler enabled as well), or it's a memory leak due to cgo (malloc without free).
vitalyisaev2
commented
Sep 10, 2018
•
|
I have an urgent need in a tool that helps to understand what's actually going in a Go process address space, why RSS keeps growing despite of memory limitations and so on. Also I need to use cgo, which makes problem even more complicated. Currently I have to use a set of tools like For example, I see that process has 5GB |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
aclements
Sep 10, 2018
Member
@vitalyisaev2, please open a new issue or send an email to golang-nuts@googlegroups.com. In it, please elaborate on what you mean by "Go runtime says that it takes 2GB". The runtime exports many different statistics, and it's important to know which one you're talking about. I would start by looking closely at all of the runtime.MemStats statistics, since that should tell you if it's on the Go side or the C side. If it's on the Go side, viewcore is probably the right tool to find the problem. Please also describe the time-scale of the problem, since some things (like returning memory to the OS) happen on the scale of minutes.
|
@vitalyisaev2, please open a new issue or send an email to golang-nuts@googlegroups.com. In it, please elaborate on what you mean by "Go runtime says that it takes 2GB". The runtime exports many different statistics, and it's important to know which one you're talking about. I would start by looking closely at all of the |
bradfitz commentedAug 22, 2016
Tracking bug for some way for applications to monitor memory usage, apply backpressure, stay within limits, etc.
Related previous issues: #5049, #14162