Optimize signal sending to processes with message_queue_data=off_heap enabled #5020
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Erlang guarantees that signals (i.e., message signals and non-message signals) sent from a single process to another process are ordered in send order. However, there are no ordering guarantees for signals sent from different processes to a particular process. Therefore, several processes can send signals in parallel to a specific process without synchronizing with each other. However, such signal sending was previously always serialized as the senders had to acquire the lock for the outer signal queue of the receiving process. This commit makes it possible for several processes to send signals to a process with the message_queue_data=off_heap setting(1) activated in parallel and without interfering with each other. This parallel signal sending optimization yields much better scalability for signal sending than what was previously possible(2).
(1) Information about how to enable the message_queue_data=off_heap setting can be found in the documentation of the functions erlang:process_flag/2 and erlang:spawn_opt/4.
(2) http://winsh.me/bench/erlang_sig_q/sigq_bench_result.html
Implementation
The parallel signal sending optimization works only on processes with the message_queue_data=off_heap setting enabled. For processes with the message_queue_data=off_heap setting enabled, the new optimization is activated and deactivated on demand based on heuristics to give a small overhead when the optimization is unnecessary. The optimization is activated when the contention on the lock for the outer message queue is high. It is deactivated when the number of enqueued messages per fetch operation (that fetch messages from the outer message queue to the inner) is low.
When the optimization is active, the outer message queue has an array of signal buffers where sending processes enqueue signals. When the receiving process needs to fetch messages from the outer message queue, the contents of the non-empty buffers are append to the outer message queue. Each process is assigned a particular slot in the buffer array (the process ID is used to hash to a particular slot). That way, the system can preserve the send order between messages coming from the same process.