-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
erts: Implement max_heap_size process flag #1032
erts: Implement max_heap_size process flag #1032
Conversation
To let a callback module decide whether or to receive another message from the peer, so that backpressure can be applied when it's inappropriate. This is to let a callback protect against reading more than can be processed, which is otherwise possible since diameter_tcp otherwise always asks for more. A callback is made after each message, and can answer to continue reading or to ask again after a timeout. It's each message instead of each packet partly for simplicity, but also since this should be sufficiently fine-grained. Per packet would require some interaction with the fragment timer that flushes partial messages that haven't been completely received.
The callback is now applied to the atom 'false' when asking if another message should be received on the socket, and to a received binary message after reception. Throttling on received messages makes it possible to distinguish between requests and answers. There is no callback on outgoing messages since these don't have to go through the transport process, even if they currently do.
In addition to returning ok or {timeout, Tmo}, let a throttling callback for message reception return a pid(), which is then notified if the message in question is either discarded or results in a request process. Notification is by way of messages of the form {diameter, discard | {request, pid()}} where the pid is that of a request process resulting from the received message. This allows the notification process to keep track of the maximum number of request processes a peer connection can have given rise to.
This can be used as a simple form of overload protection, discarding the message before it's passed into diameter to become one more request process in a flood. Replying with 3004 would be more appropriate when the request has been directed at a specific server (the RFC's requirement) however, and possibly it should be possible for a callback to do this as well.
As discussed in the parent commit. This is easier said than done in practice, but there's no harm in allowing it.
TCP packets can contain more than one message, so only ask to receive another message if it hasn't already been received.
In particular, let a callback decide when to receive the initial message.
By sending {diameter, {answer, pid()}} when an incoming answer is sent to the specified pid, instead of a discard message as previously. The latter now literally means that the message has been discarded.
That is, don't assume that it's only diameter_tcp doing so: allow it to be received when not throttling. This lets a callback module trigger a new throttling callback itself, but it's not clear if this will be useful in practice.
called 'literal_mmap' and 'exec_mmap'. Also moved existing erts_mmap info from 'mseg_alloc' to its own system_info({allocator, erts_mmap}) with "allocators" default_mmap, literal_mmap and exec_mmap.
Sounds useful. Any plans to use this in error_logger, where it would restart in a controlled fashion and then maybe be used in tandem with a water mark, or a more practical strategy? |
@Tuncer not at the moment, depending on how useful we find that this feature is, we may expand the configuration to allow a message to be sent to a user defined process instead of just the error_logger. |
<marker id="+hmax"/> | ||
<tag><c><![CDATA[+hmax Size]]></c></tag> | ||
<item> | ||
<p>Sets the default maximum heap size of processes to the size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to have something here saying that +hmax defaults to being disabled (being set to 0). That isn't clear from the other documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@garazdawi Thank you for adding this! |
Patch has passed first testings and has been assigned to be reviewed I am a script, I am not human |
@garazdawi Seconded, kudos for adding this! |
This is very useful, thank you! All we now need is an option for max_msg_queue_len and it will give us processes with bounded mailbox queues. If queue length is exceeded, the process should get killed and generate a crash report. |
@cmullaparthi bounded message queues will most likely never be introduced. It is too expensive to keep track of in even small SMP systems and it gets much worse when the number of numa nodes grows. Some other mechanism should be used to limit the flow of incoming messages, like for instance a windowing scheme similar to how {active, N} works for gen_tcp. |
@garazdawi Isn't the message queue length already tracked? I can do process_info(pid(), message_queue_len) and get back the length of the message queue. I'm assuming beam is already keeping track of the length here and not counting the length each time the above call is made? |
@cmullaparthi The inner message queue is tracked in the current implementation and it is only the inner queue you get from I say "total" in quotes because it is a bit vague when we say a message actually arrives at the receiving process. The outer queue(s) might just not be a part of the process. Think about how two processes, on different nodes, communicate with each other. The thing to realize here is that only conceptually do we have one single queue but that is not necessarily the case in the implementation. It could be a whole tree of isolated queues. We don't want to expose, even more, of those contention points where we believe we have only one queue. It's just bad. The absolute worst case would be to let the sender examine the total queue length of the receiver and kill it if the queue length exceeds some limit. That would just be insane. One thing that might be needed is for OTP to help developers with flow control since that seems to be an issue. Some standard way to do a Sliding Window Protocol for processes. |
@psyeugenic Thanks for the explanation. Perhaps there is some confusion here. When I referred to 'max_msg_queue_len', I had in mind a setting specific to a process, not the total number of message queues in the system. So the same way this max_heap_size option has been implemented, one could spawn a new process as: spawn(Module, Function, Args, [{max_msg_queue_len, 1000}, ...]) And if the number of messages in the queue for this specific process exceeds 1000, it gets killed. As far as I understand, the scheduler obtains a lock on a process' message queue before depositing a message. Presumably, it is at this point it is incrementing the message_queue_len? Surely any messages which are enroute is irrelevant. The check should only be performed at the point of insertion into the queue. By introducing the max_heap_size option, haven't you indirectly supported this feature? The amount of memory occupied by the message queue is considered to be part of the process heap size? Which means if the process builds up a message queue, it's heap size will increase and cause it to combust? |
Yes, some flow control for message passing would be great. gen_tcp already has it - I almost always write TCP handling code with {active, once}. Golang provides flow control in channels which I think is quite powerful. Though it is a simpler use case because channels are only valid within a single process, whereas Erlang's message-passing spans nodes so it is impossible to provide some of those features. That said, perhaps some of the flow control features can be limited to message-passing between processes on the same node... Apologies for hijacking this thread. Happy to move this conversation elsewhere - this max_msg_heap_size option just triggered a long term itch :-) |
@cmullaparthi Nope, I was not talking about a global queue. I was talking about per process message queue(s). Reread my comments above with that mindset. =) An Erlang process has multiple message queues. In the current implementation it has two queues, an inner and an outer queue. A sending process may touch the outer queue but never the inner queue of the receiver. The receiving process never touches the outer queue except when the inner queue is empty. Scalability - don't touch it. But the internals is very much beside the point. The internals may change. We don't want to expose internals or give guarantees that will kill performance. This desire for message queue monitors or load shedders keeps coming back. I'm not indifferent to the issue or the desire for a simple solution. I'm telling you there isn't one. 😞 I totally agree, this conversation is not in the scope of this Pull Request. |
Patch has passed first testings and has been assigned to be reviewed I am a script, I am not human |
@cmullaparthi no, all messages are not part of the heap. Pre 19, or using the new My example in the PR description was very poorly chosen, as it is because I've written a fun in the shell that the limit gets triggered. If the same code was written using a module the process is never killed. |
Great feature @garazdawi 👍 ! I think we'd all love some kind of knob for non-heap binaries as well :) |
new protocol version to handle new schema fields
Should maybe be moved to mnesia.erl and inlined?? Or is it used elsewhere?
Add ext to table/system information Add add_backend_type
Make ram_copies index always use ordered_set And use index type as prefered type not a implementation requirement, the standard implmentation will currently ignore the prefered type.
…orary processes Tables or data containers should be owned and monitored by mnesia_monitor and should thus be created by that process. Always create_table before loading it We need to create tables for ram_copies at least before loading them as they are intermittent. It is also needed to get mnesia monitor as the parent and supervisor of the data storage.
Minimal impact when talking to older nodes.
* dgud/mnesia/ext-backend/PR-858/OTP-13058: mnesia_ext: Add basic backend extension tests mnesia_ext: reuse snmp field for ext updates mnesia_ext: Create table/data containers from mnesia monitor not temporary processes mnesia_ext: Implement ext copies index mnesia_ext: Load table ext mnesia_ext: Dumper and schema changes mnesia_ext: Refactor mnesia_schema.erl mnesia_ext: Ext support in fragmented tables mnesia_ext: Backup handling mnesia_ext: Create schema functionality mnesia_ext: Add ext copies and db_fold to low level api mnesia_ext: Refactor record_validation code mnesia_ext: Add create_external and increase protocol version to monitor mnesia_ext: Add ext copies to records mnesia_ext: Add supervisor and behaviour modules
* anders/diameter/test/OTP-13438: Don't assume list comprehension evaluation order
* anders/diameter/overload/OTP-13330: Suppress dialyzer warning Remove dead case clause Let throttling callback send a throttle message Acknowledge answers to notification pids when throttling Throttle properly with TLS Don't ask throttling callback to receive more unless needed Let a throttling callback answer a received message Let a throttling callback discard a received message Let throttling callback return a notification pid Make throttling callbacks on message reception Add diameter_tcp option throttle_cb
* anders/diameter/info/OTP-13508: Add diameter:peer_find/1 Add diameter:peer_info/1
The max_heap_size process flag can be used to limit the growth of a process heap by killing it before it becomes too large to handle. It is possible to set the maximum using the `erl +hmax` option, `system_flag(max_heap_size, ...)`, `spawn_opt(Fun, [{max_heap_size, ...}])` and `process_flag(max_heap_size, ...)`. It is possible to configure the behaviour of the process when the maximum heap size is reached. The process may be sent an untrappable exit signal with reason kill and/or send an error_logger message with details on the process state. A new trace event called gc_max_heap_size is also triggered for the garbage_collection trace flag when the heap grows larger than the configured size. If kill and error_logger are disabled, it is still possible to see that the maximum has been reached by doing garbage collection tracing on the process. The heap size is defined as the sum of the heap memory that the process is currently using. This includes all generational heaps, the stack, any messages that are considered to be part of the heap and any extra memory the garbage collector may need during collection. In the current implementation this means that when a process is set using on_heap message queue data mode, the messages that are in the internal message queue are counted towards this value. For off_heap, only matched messages count towards the size of the heap. For mixed, it depends on race conditions within the VM whether a message is part of the heap or not. Below is an example run of the new behaviour: Eshell V8.0 (abort with ^G) 1> f(P),P = spawn_opt(fun() -> receive ok -> ok end end, [{max_heap_size, 512}]). <0.60.0> 2> erlang:trace(P, true, [garbage_collection, procs]). 1 3> [P ! lists:duplicate(M,M) || M <- lists:seq(1,15)],ok. ok 4> =ERROR REPORT==== 26-Apr-2016::16:25:10 === Process: <0.60.0> Context: maximum heap size reached Max heap size: 512 Total heap size: 723 Kill: true Error Logger: true GC Info: [{old_heap_block_size,0}, {heap_block_size,609}, {mbuf_size,145}, {recent_size,0}, {stack_size,9}, {old_heap_size,0}, {heap_size,211}, {bin_vheap_size,0}, {bin_vheap_block_size,46422}, {bin_old_vheap_size,0}, {bin_old_vheap_block_size,46422}] flush(). Shell got {trace,<0.60.0>,gc_start, [{old_heap_block_size,0}, {heap_block_size,233}, {mbuf_size,145}, {recent_size,0}, {stack_size,9}, {old_heap_size,0}, {heap_size,211}, {bin_vheap_size,0}, {bin_vheap_block_size,46422}, {bin_old_vheap_size,0}, {bin_old_vheap_block_size,46422}]} Shell got {trace,<0.60.0>,gc_max_heap_size, [{old_heap_block_size,0}, {heap_block_size,609}, {mbuf_size,145}, {recent_size,0}, {stack_size,9}, {old_heap_size,0}, {heap_size,211}, {bin_vheap_size,0}, {bin_vheap_block_size,46422}, {bin_old_vheap_size,0}, {bin_old_vheap_block_size,46422}]} Shell got {trace,<0.60.0>,exit,killed}
90d2278
to
dc30187
Compare
Patch has passed first testings and has been assigned to be reviewed I am a script, I am not human |
Would a report style error be less appropriate here? Using a format style error seems a little limiting from the perspective of the upstream user. I don't know much, and would like to understand a bit more. |
The max_heap_size process flag can be used to limit the growth of a process heap by killing it before it becomes too large to handle. It is possible to set the maximum using the
erl +hmax
option,system_flag(max_heap_size, ...)
,spawn_opt(Fun, [{max_heap_size, ...}])
andprocess_flag(max_heap_size, ...)
.It is possible to configure the behaviour of the process when the maximum heap size is reached. The process may be sent an untrappable exit signal with reason kill and/or send an error_logger message with details on the process state. A new trace event called gc_max_heap_size is also triggered for the garbage_collection trace flag when the heap grows larger than the configured size.
If kill and error_logger are disabled, it is still possible to see that the maximum has been reached by
doing garbage collection tracing on the process.
The heap size is defined as the sum of the heap memory that the process is currently using. This includes all generational heaps, the stack, any messages that are considered to be part of the heap and any extra memory the garbage collector may need during collection.
In the current implementation this means that when a process is set using on_heap message queue data mode, the messages that are in the internal message queue are counted towards
this value. For off_heap, only matched messages count towards the size of the heap. For mixed, it depends on race conditions within the VM whether a message is part of the heap or not.
Below is an example run of the new behaviour: