Add EEP 42: setrlimit(2) analogue for Erlang

1 parent 177d678 commit 08928e4bbed704c1b517f63b9d806d7156bb8dc2 @RaimoNiskanen RaimoNiskanen committed Feb 7, 2013
+ Author: Richard A. O'Keefe <ok(at)cs(dot)otago(dot)ac(dot)nz>
+ Status: Draft
+ Type: Standards Track
+ Erlang-Version: R16A
+ Created: 07-Feb-2013
+ Post-History:
+EEP 42: setrlimit(2) analogue for Erlang
+A new function `erlang:set_process_info_limit/2` is added,
+allowing a process to set limits on its own memory use.
+ erlang:set_process_info_limit(Item, Limit :: integer()) ->
+ Old_Limit :: integer()
+An `Old_Limit` of 0 means that no limit has been set.
+A Limit of 0 means that no limit is to be set.
+The Items that can be set are
+- `memory`, the number of bytes that may be used for stack, heap, and
+internal structures.
+- `memory_words`, the same value as `memory`, but expressed in
+words to be consistent with `heap_size`, `total_heap_size`, and
+`stack_size` in `erlang:process_info/[1,2]`. Those functions should
+also be revised to accept this Item.
+- `message_queue_len`, the number of unprocessed messages.
+(Aside.) The documentation refers to `message_queue_len`
+but the system generates and recognises `message_queue_len`.
+(End aside.)
+The values that are actually set may be less than Limit as
+long as any physically realisable system in the node as
+configured cannot exceed the value used.
+A non-zero memory limit is checked after each garbage
+collection or other memory restructuring; setting the memory
+limit to a non-zero value less than the current
+`process_info(self(), memory)` forces an immediate garbage
+If the memory required by a process exceeds its limit, the
+process is exited with reason `memory`.
+A non-zero message queue length limit is checked whenever
+the message queue length is about to be incremented, or
+when such a limit is set.
+If the message queue length limit is exceeded, the process
+is exited with reason `message_queue_len`.
+It is currently possible for an Erlang process to grab
+memory without limit and eventually take down the whole
+node. This problem is frequently reported in the Erlang
+mailing list.
+It is also possible for the message queue of an Erlang
+process to grow without limit. This problem too is
+reported often enough to be recognised.
+This is an EEP rather than a library change request because
+it requires low level runtime system changes to support it.
+For example, the limits have to be _stored_ somewhere,
+suggesting a change in the data structure holding information
+about a process. The limits have to be _checked_, meaning
+that changes to the garbage collector and the message sending
+core are required. The limits have to be _enforced_,
+meaning that two new exit reasons have to be supported. And
+the new exit reasons and the new function have to be
+_documented_, requiring changes to more than one document.
+The need is long-standing, and at least the presence of this
+EEP may provoke discussion leading to _some_ resolution.
+The function that sets a limit should have a name based
+in the name of the function that reports current use.
+That function is `process_info`, hence the name I've
+chosen here is `set_process_info_limit`.
+The `erlang:process_info/[1,2]` functions can report on
+any Erlang process. But it is clearly dangerous to let
+a process set another process's limits. All we actually
+_need_ is for a process to be able to set its own limits
+in a startup phase. That could be done by adding
+`{memory,Size}` and `{message_queue_len,Count}` options
+to `spawn_opt/[2,3]`. However,
+ spawn_opt(Fun, [{memory,128*1024}])
+can be mimicked by
+ spawn(fun () ->
+ erlang:set_process_info_limit(memory, 128*1024),
+ Fun()
+ end)
+and `set_process_info_limit/2` allows a process to set
+different limits at different times. If you are trying
+to protect the system from _malice_, setting limits in
+`spawn_opt/[2,3]` is the way to go. If you are trying to
+protect the system from _accident_, letting a process
+set its own limits is the way to go.
+The names for the Items to be set clearly must be identical
+to the names used for the same thing elsewhere.
+The `memory_words` item is introduced because in a world where
+some of my programs run 32-bit and some run 64-bit it is just
+too confusing to count bytes, especially as stack size and so
+on are measured in words, not bytes.
+I considered allowing `total_heap_size` and `stack_size` and so on
+to be given their own limits, but on finding that `total_heap_size`
+"currently includes the stack of the process", decided that it
+was "currently" a bad idea.
+I would have liked to allow setting a limit on the number
+of reductions, so that a process that doesn't intend to
+live forever can ensure its own eventual death. However,
+the documentation warns that `erlang:bump_reductions/1`
+"might be removed", so presumably reduction counting
+_per se_ might well disappear.
+The point of letting the value actually set for a limit
+to be smaller is to allow an implementation to use only
+values that will fit in a `size_t`, so that low level
+code does not need to deal with bignums. On a
+POSIX-compliant operating system, an Erlang implementation
+may use the `RLIMIT_DATA` value for the UNIX process
+if the memory limit is bigger, and may use `RLIMIT_DATA`
+divided by the minimum size of a message for the message
+queue length. Or it might use other appropriate system
+The biggest question is "what happens if a limit is exceeded?"
+For memory, we could exit with `system_limit` as reason, but
+that wouldn't make clear _what_ limit had been exceeded. It
+seems advisable to introduce a new reason, and making the
+reason the same as the name of the limit seems the least
+confusing approach.
+For message queue length, I would prefer it if the process
+that _sends_ the message were the one to get a run time error.
+However, the Erlang documentation guarantees that sending a
+message to a pid always succeeds, whether the process is live
+or dead, and I don't want to change too much. A message queue
+might build up because some process(es) is(are) sending junk;
+we would prefer exiting junk senders to exiting junk receivers.
+However, if the junk receiver isn't cleaning out the junk, that
+junk is _never_ going away, so it's _always_ going to be costing
+time to skip over as well as memory to store. That argument
+applies to another option that I considered: just discarding
+messages that would make the queue too long. The Erlang Way is
+to let subsystems that get into trouble crash. So let it be.
+Backwards Compatibility
+The new function is not exported by default
+so it cannot be called accidentally.
+Reference Implementation
+None in this draft.
+This document has been placed in the public domain.
