Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement fragmentation of large distribution messages #2133

merged 13 commits into from Feb 22, 2019


Copy link

This PR implements fragmentation of large Erlang Distribution signals in order to prevent head-of-line blocking. This PR introduces two changes to the
distribution protocol.

  1. Move exit reason of EXIT, EXIT2 and MONITOR_P_EXIT to after the control message.
  2. Introduce two new distribution headers that represent:
  3. Start of a new sequence of fragments
  4. A fragment in a sequence

The new distribution headers look like this:

1 1 8 8 1 NumberOfAtomCacheRefs/2+1 | 0 N | 0
131 69 SeqId FragNo NumberOfAtomCacheRefs Flags AtomCacheRefs
1 1 8 8
131 70 SeqId FragNo
  • The atom cache is the same for the entire sequence.
  • The FragNo starts at the total number of fragments in the sequence and then decrements to 1, i.e. in a sequence of 2 fragments the start header has FragNo set to 2 and the following fragment has FragNo set to 1.
  • The old distribution header is still used for messages that do not need to be fragmented.

The following restrictions exist when using the message fragmentation:

  • Only the payload of the message may be fragmented. The control sequence may not span across several fragments.
  • Only one sequence may be sent by one process at a time.
  • Fragments must arrive in the correct order. i.e. if a sequence consists of 4 fragments, then the fragments have to arrive as 4, 3, 2, 1.

In addition to these changes to the Erlang Distribution protocol, this PR also fixes and optimizes many internal issues.

  • Yielding during processes exit when sending many exit/down messages
  • Change the distr inet driver to run in binary mode and fix dist.c to not copy the payload unneccisarily.
  • Trap when sending distributed exit/1, exit/2 and monitor down messages.

NOTE: The documentation for the new distribution headers is not done yet.

All of the Red-Black Tree _yielding functions have been
updated to work with reductions returned by the called
function instead of yielding on each element.
@garazdawi garazdawi added team:VM Assigned to OTP team VM feature labels Feb 5, 2019
@garazdawi garazdawi self-assigned this Feb 5, 2019
Copy link

michaelklishin commented Feb 5, 2019

@garazdawi how will this work for mixed version clusters, e.g. when an OTP 22 node is connected to a 21.2 one?

Copy link
Contributor Author

The feature is only available if both nodes present the distribution flag indicating that they support fragmented messages.

Before this change the inet driver was in list mode and
thus the data from it had to be copied when received by
the dist entry. This change puts the tcp port in binary mode
and makes the any refc binary created by it be used all the way
to the process where it is decoded.

Thus eliminating one copy of the entire message payload.
The dist messages EXIT, EXIT2 and MONITOR_DOWN have been
updated with new versions that send the reason term as
part of the payload of the message instead of as part
of the control message.

This allows the decode of the reason to be done by the
receiving process instead of the dist entry which in turn
makes it possible for multiple decodes to be done in

This change is done in order to make it easier to fragment
the potentially large payload of EXIT, EXIT2 and MONITOR_DOWN
into multiple distribution messages.

@garazdawi garazdawi force-pushed the lukas/erts/fragment-dist-messages branch from ccdb9c0 to f4c121b Compare February 22, 2019 08:29
This commit removed the general send context (which was used
very little anyways) and only uses the distributed send context.
This will make it easier to use the dist API at the cost of
a little bit more code for the local send.
The reason in EXIT and DOWN may be arbitrarily large,
so we yield and allow other processes to execute while
encoding and sending the signals over the distribution.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
feature team:VM Assigned to OTP team VM
None yet

Successfully merging this pull request may close these issues.

None yet

3 participants