Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimisation for 'delegate' #3865

Merged
merged 1 commit into from
Dec 26, 2021
Merged

Optimisation for 'delegate' #3865

merged 1 commit into from
Dec 26, 2021

Conversation

tomyouyou
Copy link
Contributor

@tomyouyou tomyouyou commented Dec 7, 2021

This is copied from rabbitmq/rabbitmq-common#349

If a message is sent to only one queue(in most application scenarios), passing through the 'delegate' is meaningless. Otherwise, it increases the delay of the message and the possibility of 'delegate' congestion.

Here are some test data:

node1: Pentium(R) Dual-Core CPU E5300 @ 2.60GHz
node2: Pentium(R) Dual-Core CPU E5300 @ 2.60GHz
  • Join node1 and node2 to a cluster. Create 100 queues on node2, and start 100 consumers to receive messages from these queues.
  • Start 100 publishers on node1 to send messages to the queues of node2. Each publisher will send 10k messages at the rate of 100/s(10k/s theoretically in total), and all the messages for all publishers is 1 million.

Before the change:

{1,[{msg_time,812312(=<1ms),177922(=<5ms),9507(=<50ms),221(=<500ms),38(=<1000ms),0,0,0,0,1061,1069,0,0}]}

After the change:

{1,[{msg_time,902854(=< 1ms),93993(=<5ms),3038(=<50ms),96(=<500ms),19(=<1000ms),0,0,0,0,1049,1060,0,0}]}

Additional information:

Time counted here is the stay time of a message in the cluster, that is, Time(leaving from node2 at) - Time(reaching node1 at).
"812312(=<1ms)" is the number of messages with time consumption less than or equal to 1ms.
Overall, the optimisation is effective.

This is copied from rabbitmq/rabbitmq-common#349


If a message is sent to only one queue(in most application scenarios), passing through the 'delegate' is meaningless. Otherwise, it increases the delay of the message and the possibility of 'delegate' congestion.

Here are some test data:
node1: Pentium(R) Dual-Core CPU E5300 @ 2.60GHz
node2: Pentium(R) Dual-Core CPU E5300 @ 2.60GHz

Join node1 and node2 to a cluster. Create 100 queues on node2, and start 100 consumers to receive messages from these queues.
Start 100 publishers on node1 to send messages to the queues of node2. Each publisher will send 10k messages at the rate of 100/s(10k/s theoretically in total), and all the messages for all publishers is 1 million.

Before optimisation:
{1,[{msg_time,812312(=<1ms),177922(=<5ms),9507(=<50ms),221(=<500ms),38(=<1000ms),0,0,0,0,1061,1069,0,0}]}

After optimisation:
{1,[{msg_time,902854(=< 1ms),93993(=<5ms),3038(=<50ms),96(=<500ms),19(=<1000ms),0,0,0,0,1049,1060,0,0}]}

Additional information:

Time counted here is the stay time of a message in the cluster, that is, Time(leaving from node2 at) - Time(reaching node1 at).
"812312(=<1ms)" is the number of messages with time consumption less than or equal to 1ms.
Overall, the optimisation is effective.
@lukebakken
Copy link
Collaborator

@tomyouyou thank you for the updated pull request. Would you mind sharing the code you use to benchmark these changes?

@tomyouyou
Copy link
Contributor Author

tomyouyou commented Dec 8, 2021

@tomyouyou thank you for the updated pull request. Would you mind sharing the code you use to benchmark these changes?

I modified the code as follows:
When the reader receives a message, it timestamps the message and calculates the elapsed time before the writer sends the message. This can eliminate the impact of network delay.

The test commands are:

  1. Create 100 queues.
    python msg_test.py create_queue 10.228.103.136 'durable=false;count=100'
  2. Start a consumer for each queue.
    python msg_test.py consume 10.228.103.137 'auto_ack=false;thread_num=100'
    3.Publish messages.
    python msg_test.py publish 10.228.103.137 'exchange=;count=10000;thread_num=100;msg_send_time=true;rate=100'

The following is the client test code -
gh-3865.py.txt

@michaelklishin
Copy link
Member

The usage() part of the script is not particularly helpful. It looks like a homegrown PerfTest equivalent in Python.
I could run it with

python3 benchmark.py publish localhost queue_name=server-3865

but it publishes just one message. While this produces a nice histogram distribution of latency, we'd much rather prefer to compare using PerfTest.

@lukebakken
Copy link
Collaborator

I'll see if I can figure out an equivalent PerfTest scenario.

@michaelklishin
Copy link
Member

We have potentially identified subtle regressions that this optimization introduced. It is now being considered for reversion.

kjnilsson added a commit that referenced this pull request Oct 17, 2022
Done in PR #3865.

The changes in 3865 can cause message arrival ordering guarantees to
be lost as a message sent to a single destination can overtake a prior
message sent between the same two processes that was included in a fan-out.
kjnilsson added a commit that referenced this pull request Oct 17, 2022
That was done in PR #3865.

The changes introduced in #3865 can cause message arrival ordering guarantees
between two logical erlang process (sending messages via delegate) to
be violated as a message sent to a single destination can overtake a prior
message sent as part of a fan-out. This is due to the fact that the fan-out
take a different route via the delegate process than the direct delivery that
bypasses it.

This commit only reverses it for the `invoke_no_result/2|3` API and leaves the
optimisation in for the synchronous `invoke/` API. This means that the message
send ordering you expect between erlang processes still can be violated when
mixing invoke and invoke_no_result invocations. As far as I can see there are
no places where the code relies on this and there are uses of invoke (mgmt db)
that very well could benefit from avoiding the additional copying.
kjnilsson added a commit that referenced this pull request Oct 17, 2022
That was done in PR #3865.

The changes introduced in #3865 can cause message arrival ordering guarantees
between two logical erlang process (sending messages via delegate) to
be violated as a message sent to a single destination can overtake a prior
message sent as part of a fan-out. This is due to the fact that the fan-out
take a different route via the delegate process than the direct delivery that
bypasses it.

This commit only reverses it for the `invoke_no_result/2|3` API and leaves the
optimisation in for the synchronous `invoke/` API. This means that the message
send ordering you expect between erlang processes still can be violated when
mixing invoke and invoke_no_result invocations. As far as I can see there are
no places where the code relies on this and there are uses of invoke (mgmt db)
that very well could benefit from avoiding the additional copying.
mergify bot pushed a commit that referenced this pull request Oct 17, 2022
That was done in PR #3865.

The changes introduced in #3865 can cause message arrival ordering guarantees
between two logical erlang process (sending messages via delegate) to
be violated as a message sent to a single destination can overtake a prior
message sent as part of a fan-out. This is due to the fact that the fan-out
take a different route via the delegate process than the direct delivery that
bypasses it.

This commit only reverses it for the `invoke_no_result/2|3` API and leaves the
optimisation in for the synchronous `invoke/` API. This means that the message
send ordering you expect between erlang processes still can be violated when
mixing invoke and invoke_no_result invocations. As far as I can see there are
no places where the code relies on this and there are uses of invoke (mgmt db)
that very well could benefit from avoiding the additional copying.

(cherry picked from commit 8804434)
mergify bot pushed a commit that referenced this pull request Oct 17, 2022
That was done in PR #3865.

The changes introduced in #3865 can cause message arrival ordering guarantees
between two logical erlang process (sending messages via delegate) to
be violated as a message sent to a single destination can overtake a prior
message sent as part of a fan-out. This is due to the fact that the fan-out
take a different route via the delegate process than the direct delivery that
bypasses it.

This commit only reverses it for the `invoke_no_result/2|3` API and leaves the
optimisation in for the synchronous `invoke/` API. This means that the message
send ordering you expect between erlang processes still can be violated when
mixing invoke and invoke_no_result invocations. As far as I can see there are
no places where the code relies on this and there are uses of invoke (mgmt db)
that very well could benefit from avoiding the additional copying.

(cherry picked from commit 8804434)
(cherry picked from commit 6c613b8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants