Skip to content

Conversation

@mergify
Copy link

@mergify mergify bot commented Jun 2, 2025

Test case tcp_back_pressure_rabbitmq_internal_flow_quorum_queue succeeds
consistently locally on macOS and fails consistently in CI since 30 May
2025.

CI also shows a test failure instance of tcp_back_pressure_rabbitmq_internal_flow_classic_queue, albeit much rearer.

This test case succeeds in CI when using ubuntu-22.04 but fails with ubuntu-24.04.
Even before 30 May 2025, ubuntu-24.04 was used. However the GitHub runner
version was updated from Version: 20250511.1.0 to Version: 20250527.1.0
which presumably started to cause this test to fail.
This hypothesis cannot be validated because the GitHub actions
definitions YAML file doesn't provide a means to configure this version.

File images/ubuntu/Ubuntu2404-Readme.md in actions/runner-images@ubuntu24/20250511.1...ubuntu24/20250527.1 shows the diff.
The most notable changes are probably the kernel version change from Kernel Version: 6.11.0-1013-azure to Kernel Version: 6.11.0-1015-azure and some changes to file images/ubuntu/scripts/build/configure-environment.sh

There seem to be no RabbitMQ related changes causing this test to fail
because this test also fails with an older RabbitMQ version with the new runner
Version: 20250527.1.0.

Neither meck nor inet:setopts(Socket, [{active, once}]) cause the
test failure because the test also fails with the former
erlang:suspend_process/1 and erlang:resume_process/1.

The test fails due to the following timeout in the writer proc on the
server:

** Last message in was {'$gen_cast',
                           {send_command,<0.760.0>,0,
                               {'v1_0.transfer',
                                   {uint,3},
                                   {uint,2211},
                                   {binary,<<0,0,8,162>>},
                                   {uint,0},
                                   true,undefined,undefined,undefined,
                                   undefined,undefined,undefined},
                               <<"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx">>}}
** When Server state == #{pending => 3510,socket => #Port<0.49>,
                          reader => <0.755.0>,
                          monitored_sessions => [<0.760.0>],
                          pending_size => 3510}
** Reason for termination ==
** {{writer,send_failed,timeout},
    [{rabbit_amqp_writer,flush,1,
                         [{file,"src/rabbit_amqp_writer.erl"},{line,250}]},
     {rabbit_amqp_writer,handle_cast,2,
                         [{file,"src/rabbit_amqp_writer.erl"},{line,106}]},
     {gen_server,try_handle_cast,3,[{file,"gen_server.erl"},{line,2371}]},
     {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,2433}]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,329}]}]}

For unknown reasons, even after the CT test case resumes consumption,
the server still times out writing to the socket.

The most important test expectation that is kept in place is that the
server won't send all the messages if the client can't receive fast
enough.


This is an automatic backport of pull request #14007 done by Mergify.

Test case `tcp_back_pressure_rabbitmq_internal_flow_quorum_queue` succeeds
consistently locally on macOS and fails consistently in CI since 30 May
2025.

CI also shows a test failure instance of `tcp_back_pressure_rabbitmq_internal_flow_classic_queue`, albeit much rearer.

This test case succeeds in CI when using ubuntu-22.04 but fails with ubuntu-24.04.
Even before 30 May 2025, ubuntu-24.04 was used. However the GitHub runner
version was updated from Version: 20250511.1.0 to Version: 20250527.1.0
which presumably started to cause this test to fail.
This hypothesis cannot be validated because the GitHub actions
definitions YAML file doesn't provide a means to configure this version.

File `images/ubuntu/Ubuntu2404-Readme.md` in actions/runner-images@ubuntu24/20250511.1...ubuntu24/20250527.1 shows the diff.
The most notable changes are probably the kernel version change from Kernel Version: 6.11.0-1013-azure to Kernel Version: 6.11.0-1015-azure and some changes to file `images/ubuntu/scripts/build/configure-environment.sh`

There seem to be no RabbitMQ related changes causing this test to fail
because this test also fails with an older RabbitMQ version with the new runner
Version: 20250527.1.0.

Neither `meck` nor `inet:setopts(Socket, [{active, once}])` cause the
test failure because the test also fails with the former
`erlang:suspend_process/1` and `erlang:resume_process/1`.

The test fails due to the following timeout in the writer proc on the
server:
```
** Last message in was {'$gen_cast',
                           {send_command,<0.760.0>,0,
                               {'v1_0.transfer',
                                   {uint,3},
                                   {uint,2211},
                                   {binary,<<0,0,8,162>>},
                                   {uint,0},
                                   true,undefined,undefined,undefined,
                                   undefined,undefined,undefined},
                               <<"xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx">>}}
** When Server state == #{pending => 3510,socket => #Port<0.49>,
                          reader => <0.755.0>,
                          monitored_sessions => [<0.760.0>],
                          pending_size => 3510}
** Reason for termination ==
** {{writer,send_failed,timeout},
    [{rabbit_amqp_writer,flush,1,
                         [{file,"src/rabbit_amqp_writer.erl"},{line,250}]},
     {rabbit_amqp_writer,handle_cast,2,
                         [{file,"src/rabbit_amqp_writer.erl"},{line,106}]},
     {gen_server,try_handle_cast,3,[{file,"gen_server.erl"},{line,2371}]},
     {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,2433}]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,329}]}]}
```

For unknown reasons, even after the CT test case resumes consumption,
the server still times out writing to the socket.

The most important test expectation that is kept in place is that the
server won't send all the messages if the client can't receive fast
enough.

(cherry picked from commit 0c391a5)
@mergify mergify bot assigned ansd Jun 2, 2025
@michaelklishin michaelklishin added this to the 4.1.1 milestone Jun 2, 2025
@michaelklishin michaelklishin merged commit a396533 into v4.1.x Jun 2, 2025
537 of 540 checks passed
@michaelklishin michaelklishin deleted the mergify/bp/v4.1.x/pr-14007 branch June 2, 2025 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants