Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: reflect: add mechanisms to allocate large slices only if possible #33060

Open
ianlancetaylor opened this issue Jul 11, 2019 · 12 comments

Comments

@ianlancetaylor
Copy link
Contributor

commented Jul 11, 2019

We've seen several discussions about how to handle running out of memory in a Go program. In a quick look I found #5049 (closed), #14162 (closed), #16843 (open).

One way to think about memory allocations in a program is to observe that there are fixed-size allocations (new(int), or simply letting the address of a variable escape) and there are variable-size allocations (make([]int, N), append(a, b...)). If your program does too many fixed-size allocations, then you have a garbage collection problem; how to handle GC backpressure is discussed in the issues cited above. However, many server programs need to read variable-length data from a client, and they don't know how much space they will need when a request arrives. It seems useful to permit those programs to have some way of checking, as they allocate memory, whether that memory is available.

Any approach that relies on asking the garbage collector how much memory is available is prone to race conditions in a server, and relies on details of garbage collection. I propose a simpler mechanism:

package reflect

// TryMakeSlice creates a new zero-initialized slice value
// for the specified slice type, length, and capacity.
// If the memory is not available, it returns a zero Value.
func TryMakeSlice(typ Type, len, cap int) Value

// TryAppend appends the values x to a slice s and returns the resulting slice.
// As in Go, each x's value must be assignable to the slice's element type.
// If the memory is not available, it returns a zero Value.
func TryAppend(s Value, x ...Value) Value

(I note in passing that if we had generics, we could write these functions in a more natural way and put them in the runtime package. But the underlying idea remains the same.)

These functions in the reflect package are not the easiest to use, but they will permit servers to handle certain common cases of memory overload without simply crashing. We should set the criteria under which they would fail. For example, perhaps they should fail if after allocation less than 5% of available memory would remain unallocated.

We could then use these functions in packages like encoding/json and encoding/gob, so that those packages could reliably fail with an out of memory error when decoding a large incoming stream.

@gopherbot gopherbot added this to the Proposal milestone Jul 11, 2019

@gopherbot gopherbot added the Proposal label Jul 11, 2019

@networkimprov

This comment has been minimized.

Copy link

commented Jul 11, 2019

Can anything be done to help existing programs fail gracefully?

I imagine the proposed API would alter best practices for long-running programs to: "Always use reflect.Try*() instead of make() and append()."

Since the goal is to keep allocated memory < N% of available mem, have you considered a runtime API which makes normal allocations optionally return nil or panic with "almost-OOM" (which is recoverable) if an allocation would exceed the limit?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2019

If we change encoding/gob and encoding/json, that will automatically help a number of existing programs: they will see and report an error rather than running out of memory. A server will report an error back to the client, and continue serving other clients rather than crashing. There are likely a few other packages that could be changed similarly, such as, perhaps, archive/zip.

I don't think it's necessary or appropriate to always use the new functions. It's appropriate to use the new functions in a server serving multiple concurrent clients where the size of the allocated memory is controlled by the client. When memory usage is not controlled by an external client, other mechanisms to limit memory use are almost certainly more appropriate.

I don't think we can reasonably change existing allocations (by which I assume you mean the pre-declared make and append functions). Existing programs do not expect those functions to fail or panic. Changing the functions would likely cause the existing programs either to fail or to crash. Since the current behavior is to crash anyhow, nothing would be gained by the change.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor Author

commented Jul 11, 2019

Let me add something:

Since the goal is to keep allocated memory < N% of available mem

I want to clarify that although that may be a reasonable goal, it's not the goal of these proposed functions. The goal of these proposed functions is to permit a program to avoid some crash cases when a client makes a request that requires a lot of memory to serve. We don't currently have any way to ask whether an allocation can succeed. This proposal adds that capability.

@bcmills

This comment has been minimized.

Copy link
Member

commented Jul 12, 2019

You suggest a 5% threshold of remaining free memory for failure, but it seems to me that the appropriate threshold in a real server depends very strongly on the implementation details of that server.

If the caller already knows that the allocation is large enough that it may fail, it seems simpler for them to apply a (weighted) semaphore before the allocation rather than a check after it — and that makes the overall memory fraction much easier to tailor to their specific server. What advantages does the post-hoc approach provide over the explicit semaphore?

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor Author

commented Jul 12, 2019

The advantage of the post-hoc approach is that it can be added to existing packages like encoding/json without complicating their ABI. A server that takes json-encoded data from a client has no way of knowing how much memory will be required to represent that information.

Another approach would be to invent a new interchange format that is easier to control, but JSON is what we have today.

We can always add more knobs, either to encoding/json or to the entirely arbitrary 5% threshold, but I think it's worth thinking about what we can do without knobs. Especially since even if we had knobs most programs wouldn't use them.

@bcmills

This comment has been minimized.

Copy link
Member

commented Jul 12, 2019

It seems to me that this would allow binary maintainers to paper over flow-control bugs under moderate load in much the same way that increasing channel buffers allows maintainers to paper over concurrency bugs, and under similar conditions: it's an easy hammer to make the symptom go away without actually understanding or addressing the underlying problem.

In particular, it's easy to use such a tool to “fix” flakiness in a simplified load-test without actually fixing the production risk that the test is intended to detect: it would produce recoverable errors for large overallocations (which are easy to provoke in a synthetic test), but still result in crashes for more modest ones (which are more likely to occur under organic growth).

@rsc

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2019

Reflect seems pretty clunky; what about x, ok = make([]int, N) or x, ok = append(x, y...)?

@rsc

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2019

(Also how do we know for sure whether a particular allocation should fail as out of memory? Overcommit means sometimes you think it's going to work and you get killed anyway.)

@RLH

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2019

The amount of memory available for allocation is a global property and this proposal tries to deal with this using local information. The proposal comes with potential races between various allocations to see which one succeeds, which one fails using this proposal’s functions, and which fails with an OOM. Using these functions will increase technical debt as applications solve localized coding problems instead of taking a more global approach such as envisioned by SetMaxHeap.

Pragmatically, it would be difficult to create a reliable replicator, decrease the robustness of tests, and increase the maintenance load on the core runtime team responding to error reports. Replicator creation will be further complicated by the fact that the proposal is sensitive to co-tenancy problems so global is even “more global” (sorry) than reachability. Finally, any allocation not using this API can result in an OOM so it would not be possible for a high level package API, say the JSON API, to claim that it will return an error instead of an OOM.

The SetMaxHeap alternative should be allowed to mature and be used for several releases to see if it can address some of the problems that motivated this proposal. If a client can force a server to allocate unbounded amounts of memory then perhaps the server can be architected to use a series of allocation with bounded size along with SetMaxHeap to fail gracefully.

@ianlancetaylor

This comment has been minimized.

Copy link
Contributor Author

commented Aug 9, 2019

If a client can force a server to allocate unbounded amounts of memory then perhaps the server can be architected to use a series of allocation with bounded size along with SetMaxHeap to fail gracefully.

I think this is the key point. Does SetMaxHeap permit such a server to fail gracefully?

@bcmills

This comment has been minimized.

Copy link
Member

commented Aug 9, 2019

In the context of “encoding/json without adding knobs”, I don't see how SetMaxHeap would help: the API would need some way to allow the caller to cancel a call to Decode or Unmarshal in response to heap growth, and plumbing in any explicit cancellation mechanism would require an API change.

@RLH

This comment has been minimized.

Copy link
Contributor

commented Aug 10, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.