Remove AMQP backpressure test expectation #14007

ansd · 2025-06-02T17:28:29Z

Test case tcp_back_pressure_rabbitmq_internal_flow_quorum_queue succeeds
consistently locally on macOS and fails consistently in CI since 30 May
2025.

CI also shows a test failure instance of tcp_back_pressure_rabbitmq_internal_flow_classic_queue, albeit much rearer.

This test case succeeds in CI when using ubuntu-22.04 but fails with ubuntu-24.04.
Even before 30 May 2025, ubuntu-24.04 was used. However the GitHub runner
version was updated from Version: 20250511.1.0 to Version: 20250527.1.0
which presumably started to cause this test to fail.
This hypothesis cannot be validated because the GitHub actions
definitions YAML file doesn't provide a means to configure this version.

File images/ubuntu/Ubuntu2404-Readme.md in actions/runner-images@ubuntu24/20250511.1...ubuntu24/20250527.1 shows the diff.
The most notable changes are probably the kernel version change from Kernel Version: 6.11.0-1013-azure to Kernel Version: 6.11.0-1015-azure and some changes to file images/ubuntu/scripts/build/configure-environment.sh

There seem to be no RabbitMQ related changes causing this test to fail
because this test also fails with an older RabbitMQ version with the new runner
Version: 20250527.1.0.

Neither meck nor inet:setopts(Socket, [{active, once}]) cause the
test failure because the test also fails with the former
erlang:suspend_process/1 and erlang:resume_process/1.

The test fails due to the following timeout in the writer proc on the
server:

** Last message in was {'$gen_cast',
                           {send_command,<0.760.0>,0,
                               {'v1_0.transfer',
                                   {uint,3},
                                   {uint,2211},
                                   {binary,<<0,0,8,162>>},
                                   {uint,0},
                                   true,undefined,undefined,undefined,
                                   undefined,undefined,undefined},
                               <<"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx">>}}
** When Server state == #{pending => 3510,socket => #Port<0.49>,
                          reader => <0.755.0>,
                          monitored_sessions => [<0.760.0>],
                          pending_size => 3510}
** Reason for termination ==
** {{writer,send_failed,timeout},
    [{rabbit_amqp_writer,flush,1,
                         [{file,"src/rabbit_amqp_writer.erl"},{line,250}]},
     {rabbit_amqp_writer,handle_cast,2,
                         [{file,"src/rabbit_amqp_writer.erl"},{line,106}]},
     {gen_server,try_handle_cast,3,[{file,"gen_server.erl"},{line,2371}]},
     {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,2433}]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,329}]}]}

Even after the CT test case resumes consumption,
the server still times out writing to the socket. The Wireshark capture validates that there is no logic error on the server: The TCP window of the receiving client is full for 30 seconds, the client doesn't receive fast enough, hence the server times out writing to the socket and closes the connection.

The most important test expectation that is kept in place is that the
server won't send all the messages if the client can't receive fast
enough.

Test case `tcp_back_pressure_rabbitmq_internal_flow_quorum_queue` succeeds consistently locally on macOS and fails consistently in CI since 30 May 2025. CI also shows a test failure instance of `tcp_back_pressure_rabbitmq_internal_flow_classic_queue`, albeit much rearer. This test case succeeds in CI when using ubuntu-22.04 but fails with ubuntu-24.04. Even before 30 May 2025, ubuntu-24.04 was used. However the GitHub runner version was updated from Version: 20250511.1.0 to Version: 20250527.1.0 which presumably started to cause this test to fail. This hypothesis cannot be validated because the GitHub actions definitions YAML file doesn't provide a means to configure this version. File `images/ubuntu/Ubuntu2404-Readme.md` in actions/runner-images@ubuntu24/20250511.1...ubuntu24/20250527.1 shows the diff. The most notable changes are probably the kernel version change from Kernel Version: 6.11.0-1013-azure to Kernel Version: 6.11.0-1015-azure and some changes to file `images/ubuntu/scripts/build/configure-environment.sh` There seem to be no RabbitMQ related changes causing this test to fail because this test also fails with an older RabbitMQ version with the new runner Version: 20250527.1.0. Neither `meck` nor `inet:setopts(Socket, [{active, once}])` cause the test failure because the test also fails with the former `erlang:suspend_process/1` and `erlang:resume_process/1`. The test fails due to the following timeout in the writer proc on the server: ``` ** Last message in was {'$gen_cast', {send_command,<0.760.0>,0, {'v1_0.transfer', {uint,3}, {uint,2211}, {binary,<<0,0,8,162>>}, {uint,0}, true,undefined,undefined,undefined, undefined,undefined,undefined}, <<"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx">>}} ** When Server state == #{pending => 3510,socket => #Port<0.49>, reader => <0.755.0>, monitored_sessions => [<0.760.0>], pending_size => 3510} ** Reason for termination == ** {{writer,send_failed,timeout}, [{rabbit_amqp_writer,flush,1, [{file,"src/rabbit_amqp_writer.erl"},{line,250}]}, {rabbit_amqp_writer,handle_cast,2, [{file,"src/rabbit_amqp_writer.erl"},{line,106}]}, {gen_server,try_handle_cast,3,[{file,"gen_server.erl"},{line,2371}]}, {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,2433}]}, {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,329}]}]} ``` For unknown reasons, even after the CT test case resumes consumption, the server still times out writing to the socket. The most important test expectation that is kept in place is that the server won't send all the messages if the client can't receive fast enough.

Remove AMQP backpressure test expectation (backport #14007)

ansd force-pushed the backpressure-flake branch from 39f7d26 to d09523c Compare June 2, 2025 17:34

ansd force-pushed the backpressure-flake branch from d09523c to 0c391a5 Compare June 2, 2025 17:35

ansd added the backport-v4.1.x label Jun 2, 2025

michaelklishin mentioned this pull request Jun 2, 2025

Oauth2: Support variable expansion when checking resource access (backport #13941) #14008

Merged

6 tasks

ansd marked this pull request as ready for review June 2, 2025 18:30

michaelklishin added this to the 4.2.0 milestone Jun 2, 2025

michaelklishin approved these changes Jun 2, 2025

View reviewed changes

michaelklishin merged commit 797e543 into main Jun 2, 2025
557 of 560 checks passed

michaelklishin deleted the backpressure-flake branch June 2, 2025 19:00

mergify bot mentioned this pull request Jun 2, 2025

Remove AMQP backpressure test expectation (backport #14007) #14011

Merged

michaelklishin added a commit that referenced this pull request Jun 2, 2025

Merge pull request #14011 from rabbitmq/mergify/bp/v4.1.x/pr-14007

a396533

Remove AMQP backpressure test expectation (backport #14007)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove AMQP backpressure test expectation #14007

Remove AMQP backpressure test expectation #14007

Uh oh!

ansd commented Jun 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Remove AMQP backpressure test expectation #14007

Remove AMQP backpressure test expectation #14007

Uh oh!

Conversation

ansd commented Jun 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ansd commented Jun 2, 2025 •

edited

Loading