Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

erts: Implement max_heap_size process flag #1032

Merged
merged 70 commits into from
May 12, 2016

Conversation

garazdawi
Copy link
Contributor

@garazdawi garazdawi commented Apr 26, 2016

The max_heap_size process flag can be used to limit the growth of a process heap by killing it before it becomes too large to handle. It is possible to set the maximum using the erl +hmax option, system_flag(max_heap_size, ...), spawn_opt(Fun, [{max_heap_size, ...}]) and process_flag(max_heap_size, ...).

It is possible to configure the behaviour of the process when the maximum heap size is reached. The process may be sent an untrappable exit signal with reason kill and/or send an error_logger message with details on the process state. A new trace event called gc_max_heap_size is also triggered for the garbage_collection trace flag when the heap grows larger than the configured size.

If kill and error_logger are disabled, it is still possible to see that the maximum has been reached by
doing garbage collection tracing on the process.

The heap size is defined as the sum of the heap memory that the process is currently using. This includes all generational heaps, the stack, any messages that are considered to be part of the heap and any extra memory the garbage collector may need during collection.

In the current implementation this means that when a process is set using on_heap message queue data mode, the messages that are in the internal message queue are counted towards
this value. For off_heap, only matched messages count towards the size of the heap. For mixed, it depends on race conditions within the VM whether a message is part of the heap or not.

Below is an example run of the new behaviour:

Eshell V8.0  (abort with ^G)
1> f(P),P = spawn_opt(fun() -> receive ok -> ok end end, [{max_heap_size, 512}]).
<0.60.0>
2> erlang:trace(P, true, [garbage_collection, procs]).
1
3> [P ! lists:duplicate(M,M) || M <- lists:seq(1,15)],ok.
ok
4>
=ERROR REPORT==== 26-Apr-2016::16:25:10 ===
     Process:          <0.60.0>
     Context:          maximum heap size reached
     Max heap size:    512
     Total heap size:  723
     Kill:             true
     Error Logger:     true
     GC Info:          [{old_heap_block_size,0},
                        {heap_block_size,609},
                        {mbuf_size,145},
                        {recent_size,0},
                        {stack_size,9},
                        {old_heap_size,0},
                        {heap_size,211},
                        {bin_vheap_size,0},
                        {bin_vheap_block_size,46422},
                        {bin_old_vheap_size,0},
                        {bin_old_vheap_block_size,46422}]
flush().
Shell got {trace,<0.60.0>,gc_start,
                 [{old_heap_block_size,0},
                  {heap_block_size,233},
                  {mbuf_size,145},
                  {recent_size,0},
                  {stack_size,9},
                  {old_heap_size,0},
                  {heap_size,211},
                  {bin_vheap_size,0},
                  {bin_vheap_block_size,46422},
                  {bin_old_vheap_size,0},
                  {bin_old_vheap_block_size,46422}]}
Shell got {trace,<0.60.0>,gc_max_heap_size,
                 [{old_heap_block_size,0},
                  {heap_block_size,609},
                  {mbuf_size,145},
                  {recent_size,0},
                  {stack_size,9},
                  {old_heap_size,0},
                  {heap_size,211},
                  {bin_vheap_size,0},
                  {bin_vheap_block_size,46422},
                  {bin_old_vheap_size,0},
                  {bin_old_vheap_block_size,46422}]}
Shell got {trace,<0.60.0>,exit,killed}

Anders Svensson and others added 11 commits March 13, 2016 07:10
To let a callback module decide whether or to receive another message
from the peer, so that backpressure can be applied when it's
inappropriate. This is to let a callback protect against reading more
than can be processed, which is otherwise possible since diameter_tcp
otherwise always asks for more.

A callback is made after each message, and can answer to continue
reading or to ask again after a timeout. It's each message instead of
each packet partly for simplicity, but also since this should be
sufficiently fine-grained. Per packet would require some interaction
with the fragment timer that flushes partial messages that haven't been
completely received.
The callback is now applied to the atom 'false' when asking if another
message should be received on the socket, and to a received binary
message after reception. Throttling on received messages makes it
possible to distinguish between requests and answers.

There is no callback on outgoing messages since these don't have to go
through the transport process, even if they currently do.
In addition to returning ok or {timeout, Tmo}, let a throttling callback
for message reception return a pid(), which is then notified if the
message in question is either discarded or results in a request process.
Notification is by way of messages of the form

  {diameter, discard | {request, pid()}}

where the pid is that of a request process resulting from the received
message. This allows the notification process to keep track of the
maximum number of request processes a peer connection can have given
rise to.
This can be used as a simple form of overload protection, discarding the
message before it's passed into diameter to become one more request
process in a flood. Replying with 3004 would be more appropriate when
the request has been directed at a specific server (the RFC's
requirement) however, and possibly it should be possible for a callback
to do this as well.
As discussed in the parent commit. This is easier said than done in
practice, but there's no harm in allowing it.
TCP packets can contain more than one message, so only ask to receive
another message if it hasn't already been received.
In particular, let a callback decide when to receive the initial
message.
By sending {diameter, {answer, pid()}} when an incoming answer is sent
to the specified pid, instead of a discard message as previously. The
latter now literally means that the message has been discarded.
That is, don't assume that it's only diameter_tcp doing so: allow it to
be received when not throttling. This lets a callback module trigger a
new throttling callback itself, but it's not clear if this will be
useful in practice.
called 'literal_mmap' and 'exec_mmap'.

Also moved existing erts_mmap info from 'mseg_alloc'
to its own system_info({allocator, erts_mmap})

with "allocators" default_mmap, literal_mmap and exec_mmap.
@garazdawi garazdawi added team:VM Assigned to OTP team VM feature labels Apr 26, 2016
@garazdawi garazdawi self-assigned this Apr 26, 2016
@ghost
Copy link

ghost commented Apr 26, 2016

Sounds useful. Any plans to use this in error_logger, where it would restart in a controlled fashion and then maybe be used in tandem with a water mark, or a more practical strategy?

@garazdawi
Copy link
Contributor Author

garazdawi commented Apr 26, 2016

@Tuncer not at the moment, depending on how useful we find that this feature is, we may expand the configuration to allow a message to be sent to a user defined process instead of just the error_logger.

<marker id="+hmax"/>
<tag><c><![CDATA[+hmax Size]]></c></tag>
<item>
<p>Sets the default maximum heap size of processes to the size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably want to have something here saying that +hmax defaults to being disabled (being set to 0). That isn't clear from the other documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@okeuday
Copy link
Contributor

okeuday commented Apr 26, 2016

@garazdawi Thank you for adding this!

@OTP-Maintainer
Copy link

Patch has passed first testings and has been assigned to be reviewed


I am a script, I am not human


@DeadZen
Copy link
Contributor

DeadZen commented Apr 27, 2016

@garazdawi Seconded, kudos for adding this!

@cmullaparthi
Copy link

This is very useful, thank you! All we now need is an option for max_msg_queue_len and it will give us processes with bounded mailbox queues. If queue length is exceeded, the process should get killed and generate a crash report.

@garazdawi
Copy link
Contributor Author

@cmullaparthi bounded message queues will most likely never be introduced. It is too expensive to keep track of in even small SMP systems and it gets much worse when the number of numa nodes grows. Some other mechanism should be used to limit the flow of incoming messages, like for instance a windowing scheme similar to how {active, N} works for gen_tcp.

@cmullaparthi
Copy link

@garazdawi Isn't the message queue length already tracked? I can do process_info(pid(), message_queue_len) and get back the length of the message queue. I'm assuming beam is already keeping track of the length here and not counting the length each time the above call is made?

@psyeugenic
Copy link
Contributor

@cmullaparthi The inner message queue is tracked in the current implementation and it is only the inner queue you get from process_info(pid(), message_queue_len). Not the "total" message queue length. The tracking might change though. If we find it necessary or find a better scalable solution for the queues we might change the tracking and instead calculate queue length at process_info call time.

I say "total" in quotes because it is a bit vague when we say a message actually arrives at the receiving process. The outer queue(s) might just not be a part of the process. Think about how two processes, on different nodes, communicate with each other.

The thing to realize here is that only conceptually do we have one single queue but that is not necessarily the case in the implementation. It could be a whole tree of isolated queues. We don't want to expose, even more, of those contention points where we believe we have only one queue. It's just bad.

The absolute worst case would be to let the sender examine the total queue length of the receiver and kill it if the queue length exceeds some limit. That would just be insane.

One thing that might be needed is for OTP to help developers with flow control since that seems to be an issue. Some standard way to do a Sliding Window Protocol for processes.

@cmullaparthi
Copy link

cmullaparthi commented Apr 27, 2016

@psyeugenic Thanks for the explanation. Perhaps there is some confusion here. When I referred to 'max_msg_queue_len', I had in mind a setting specific to a process, not the total number of message queues in the system. So the same way this max_heap_size option has been implemented, one could spawn a new process as:

spawn(Module, Function, Args, [{max_msg_queue_len, 1000}, ...])

And if the number of messages in the queue for this specific process exceeds 1000, it gets killed. As far as I understand, the scheduler obtains a lock on a process' message queue before depositing a message. Presumably, it is at this point it is incrementing the message_queue_len? Surely any messages which are enroute is irrelevant. The check should only be performed at the point of insertion into the queue.

By introducing the max_heap_size option, haven't you indirectly supported this feature? The amount of memory occupied by the message queue is considered to be part of the process heap size? Which means if the process builds up a message queue, it's heap size will increase and cause it to combust?

@cmullaparthi
Copy link

Yes, some flow control for message passing would be great. gen_tcp already has it - I almost always write TCP handling code with {active, once}. Golang provides flow control in channels which I think is quite powerful. Though it is a simpler use case because channels are only valid within a single process, whereas Erlang's message-passing spans nodes so it is impossible to provide some of those features.

That said, perhaps some of the flow control features can be limited to message-passing between processes on the same node...

Apologies for hijacking this thread. Happy to move this conversation elsewhere - this max_msg_heap_size option just triggered a long term itch :-)

@psyeugenic
Copy link
Contributor

@cmullaparthi Nope, I was not talking about a global queue. I was talking about per process message queue(s). Reread my comments above with that mindset. =)

An Erlang process has multiple message queues. In the current implementation it has two queues, an inner and an outer queue. A sending process may touch the outer queue but never the inner queue of the receiver. The receiving process never touches the outer queue except when the inner queue is empty. Scalability - don't touch it. But the internals is very much beside the point. The internals may change. We don't want to expose internals or give guarantees that will kill performance.

This desire for message queue monitors or load shedders keeps coming back. I'm not indifferent to the issue or the desire for a simple solution. I'm telling you there isn't one. 😞

I totally agree, this conversation is not in the scope of this Pull Request.

@OTP-Maintainer
Copy link

Patch has passed first testings and has been assigned to be reviewed


I am a script, I am not human


@garazdawi
Copy link
Contributor Author

By introducing the max_heap_size option, haven't you indirectly supported this feature? The amount of memory occupied by the message queue is considered to be part of the process heap size? Which means if the process builds up a message queue, it's heap size will increase and cause it to combust?

@cmullaparthi no, all messages are not part of the heap. Pre 19, or using the new on_heap message queue data option in 19.0, messages that are known to be in the queue are part of the heap. So if you don't inspect the queue somehow, either by doing a selective receive that is known to have skipped the message or doing process_info(Pid, messages|message_queue_len), there is no way to know if the message is counted as part of the heap.

My example in the PR description was very poorly chosen, as it is because I've written a fun in the shell that the limit gets triggered. If the same code was written using a module the process is never killed.

@priestjim
Copy link

Great feature @garazdawi 👍 ! I think we'd all love some kind of knob for non-heap binaries as well :)

Ulf Wiger and others added 25 commits May 9, 2016 14:51
new protocol version to handle new schema fields
Should maybe be moved to mnesia.erl and inlined??
Or is it used elsewhere?
Add ext to table/system information
Add add_backend_type
Make ram_copies index always use ordered_set

And use index type as prefered type not a implementation requirement,
the standard implmentation will currently ignore the prefered type.
…orary processes

Tables or data containers should be owned and monitored by mnesia_monitor and
should thus be created by that process.

Always create_table before loading it

We need to create tables for ram_copies at least before loading
them as they are intermittent. It is also needed to get mnesia
monitor as the parent and supervisor of the data storage.
Minimal impact when talking to older nodes.
* dgud/mnesia/ext-backend/PR-858/OTP-13058:
  mnesia_ext: Add basic backend extension tests
  mnesia_ext: reuse snmp field for ext updates
  mnesia_ext: Create table/data containers from mnesia monitor not temporary processes
  mnesia_ext: Implement ext copies index
  mnesia_ext: Load table ext
  mnesia_ext: Dumper and schema changes
  mnesia_ext: Refactor mnesia_schema.erl
  mnesia_ext: Ext support in fragmented tables
  mnesia_ext: Backup handling
  mnesia_ext: Create schema functionality
  mnesia_ext: Add ext copies and db_fold to low level api
  mnesia_ext: Refactor record_validation code
  mnesia_ext: Add create_external and increase protocol version to monitor
  mnesia_ext: Add ext copies to records
  mnesia_ext: Add supervisor and behaviour modules
* anders/diameter/test/OTP-13438:
  Don't assume list comprehension evaluation order
* anders/diameter/overload/OTP-13330:
  Suppress dialyzer warning
  Remove dead case clause
  Let throttling callback send a throttle message
  Acknowledge answers to notification pids when throttling
  Throttle properly with TLS
  Don't ask throttling callback to receive more unless needed
  Let a throttling callback answer a received message
  Let a throttling callback discard a received message
  Let throttling callback return a notification pid
  Make throttling callbacks on message reception
  Add diameter_tcp option throttle_cb
* anders/diameter/info/OTP-13508:
  Add diameter:peer_find/1
  Add diameter:peer_info/1
The max_heap_size process flag can be used to limit the
growth of a process heap by killing it before it becomes
too large to handle. It is possible to set the maximum
using the `erl +hmax` option, `system_flag(max_heap_size, ...)`,
`spawn_opt(Fun, [{max_heap_size, ...}])` and
`process_flag(max_heap_size, ...)`.

It is possible to configure the behaviour of the process
when the maximum heap size is reached. The process may be
sent an untrappable exit signal with reason kill and/or
send an error_logger message with details on the process
state. A new trace event called gc_max_heap_size is
also triggered for the garbage_collection trace flag
when the heap grows larger than the configured size.

If kill and error_logger are disabled, it is still
possible to see that the maximum has been reached by
doing garbage collection tracing on the process.

The heap size is defined as the sum of the heap memory
that the process is currently using. This includes
all generational heaps, the stack, any messages that
are considered to be part of the heap and any extra
memory the garbage collector may need during collection.

In the current implementation this means that when a process
is set using on_heap message queue data mode, the messages
that are in the internal message queue are counted towards
this value. For off_heap, only matched messages count towards
the size of the heap. For mixed, it depends on race conditions
within the VM whether a message is part of the heap or not.

Below is an example run of the new behaviour:

Eshell V8.0  (abort with ^G)
1> f(P),P = spawn_opt(fun() -> receive ok -> ok end end, [{max_heap_size, 512}]).
<0.60.0>
2> erlang:trace(P, true, [garbage_collection, procs]).
1
3> [P ! lists:duplicate(M,M) || M <- lists:seq(1,15)],ok.
ok
4>
=ERROR REPORT==== 26-Apr-2016::16:25:10 ===
     Process:          <0.60.0>
     Context:          maximum heap size reached
     Max heap size:    512
     Total heap size:  723
     Kill:             true
     Error Logger:     true
     GC Info:          [{old_heap_block_size,0},
                        {heap_block_size,609},
                        {mbuf_size,145},
                        {recent_size,0},
                        {stack_size,9},
                        {old_heap_size,0},
                        {heap_size,211},
                        {bin_vheap_size,0},
                        {bin_vheap_block_size,46422},
                        {bin_old_vheap_size,0},
                        {bin_old_vheap_block_size,46422}]
flush().
Shell got {trace,<0.60.0>,gc_start,
                 [{old_heap_block_size,0},
                  {heap_block_size,233},
                  {mbuf_size,145},
                  {recent_size,0},
                  {stack_size,9},
                  {old_heap_size,0},
                  {heap_size,211},
                  {bin_vheap_size,0},
                  {bin_vheap_block_size,46422},
                  {bin_old_vheap_size,0},
                  {bin_old_vheap_block_size,46422}]}
Shell got {trace,<0.60.0>,gc_max_heap_size,
                 [{old_heap_block_size,0},
                  {heap_block_size,609},
                  {mbuf_size,145},
                  {recent_size,0},
                  {stack_size,9},
                  {old_heap_size,0},
                  {heap_size,211},
                  {bin_vheap_size,0},
                  {bin_vheap_block_size,46422},
                  {bin_old_vheap_size,0},
                  {bin_old_vheap_block_size,46422}]}
Shell got {trace,<0.60.0>,exit,killed}
@garazdawi garazdawi force-pushed the lukas/erts/max_heap_size/OTP-13174 branch from 90d2278 to dc30187 Compare May 10, 2016 08:33
@OTP-Maintainer
Copy link

Patch has passed first testings and has been assigned to be reviewed


I am a script, I am not human


@proxyles proxyles merged commit dc30187 into erlang:master May 12, 2016
@garazdawi garazdawi deleted the lukas/erts/max_heap_size/OTP-13174 branch February 25, 2017 10:35
@isaacsanders
Copy link

Would a report style error be less appropriate here? Using a format style error seems a little limiting from the perspective of the upstream user.

I don't know much, and would like to understand a bit more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature team:VM Assigned to OTP team VM
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet