Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RabbitMQ is not able to recover/start when its not gracefully shutdown #261

Closed
lokesh411 opened this issue Feb 5, 2024 · 4 comments
Closed

Comments

@lokesh411
Copy link

Describe the bug

RabbitMQ is not able to start when its not gracefully shutdown when its killed due to some reason (eg. OOMkilled etc..). When this plugin is disabled, RabbitMQ is able to start though
Here are the logs that we are getting

Running boot step database_sync defined by app rabbit�[0m
2024-02-05 13:11:47.492139+00:00 [info] <0.221.0> Running boot step feature_flags defined by app rabbit�[0m
2024-02-05 13:11:47.492470+00:00 [info] <0.221.0> Running boot step codec_correctness_check defined by app rabbit�[0m
2024-02-05 13:11:47.492535+00:00 [info] <0.221.0> Running boot step external_infrastructure defined by app rabbit�[0m
2024-02-05 13:11:47.492580+00:00 [info] <0.221.0> Running boot step rabbit_delayed_message defined by app rabbitmq_delayed_message_exchange�[0m
2024-02-05 13:11:47.492914+00:00 [info] <0.221.0> Waiting for Mnesia tables for 30000 ms, 0 retries left�[0m
�[38;5;160m2024-02-05 13:12:17.506034+00:00 [error] <0.271.0> Error in process <0.271.0> on node 'rabbit@rabbitmq-inbox-0' with exit value:�[0m
�[38;5;160m2024-02-05 13:12:17.506034+00:00 [error] <0.271.0> {badarg,�[0m
�[38;5;160m2024-02-05 13:12:17.506034+00:00 [error] <0.271.0>     [{ets,insert,�[0m
�[38;5;160m2024-02-05 13:12:17.506034+00:00 [error] <0.271.0>          ['rabbit_delayed_messagerabbit@rabbitmq-inbox-0',�[0m
�[38;5;160m2024-02-05 13:12:17.506034+00:00 [error] <0.271.0>           [{delay_entry,�[0m
�[38;5;160m2024-02-05 13:12:17.506034+00:00 [error] <0.271.0>                {delay_key,1707152400070,�[0m
�[38;5;160m2024-02-05 13:12:17.506034+00:00 [error] <0.271.0>                    {exchange,�[0m
�[38;5;160m2024-02-05 13:12:17.506034+00:00 [error] <0.271.0>                        {resource,<<"/">>,exchange,�[0m
�[38;5;160m2024-02-05 13:12:17.506034+00:00 [error] <0.271.0>                            <<"direct-delay-agent-exchange">>},�[0m
�[38;5;160m2024-02-05 13:12:17.506034+00:00 [error] <0.271.0>                        'x-delayed-message',true,false,false,�[0m
�[38;5;160m2024-02-05 13:12:17.506034+00:00 [error] <0.271.0>                        [{<<"x-delayed-type">>,longstr,<<"direct">>}],�[0m
�[38;5;160m2024-02-05 13:12:17.506034+00:00 [error] <0.271.0>                        undefined,undefined,undefined,�[0m
�[38;5;160m2024-02-05 13:12:17.506034+00:00 [error] <0.271.0>                        {[],[]},�[0m
�[38;5;160m2024-02-05 13:12:17.506034+00:00 [error] <0.271.0>                        #{user => <<"guest">>}}},�[0m

Erlang version: 24.3.4.2
RabbitMQ version: 3.10.6

Reproduction steps

  1. Make Rabbitmq crash with SIGKILL
  2. Start back using rabbitmq-server
  3. It should crash again

Expected behavior

RabbitMQ should be able to recover back even after a crash as it does without this plugin installed

Additional context

No response

@lokesh411 lokesh411 added the bug label Feb 5, 2024
@michaelklishin michaelklishin removed the bug label Feb 5, 2024
@michaelklishin
Copy link
Member

RabbitMQ 3.10 has reached end of life.

This plugin is very unlikely to receive any attention from the core team outside of #253 (#229).

But one thing that immediately stands out from this exception is

2024-02-05 13:11:47.492914+00:00 [info] <0.221.0> Waiting for Mnesia tables for 30000 ms, 0 retries left

which immediately makes me wonder if this node actually has booted by the time the plugin tried to use its schema tables. If not, then this plugin has very few options as to what it could do to avoid this exception.

@lokesh411
Copy link
Author

Seems like the node is not booted, and since we are running it in standalone mode, it wouldn't be able to contact with other nodes too?
Is this problem solved in rabbitmq v3.12?

@michaelklishin
Copy link
Member

I'm afraid I don't know what this "standalone mode" is. Nodes contact their peers fairly early in the process, before plugins are enabled:

We do not guess in this community, so that's as much as I can say from a few log messages.

@michaelklishin
Copy link
Member

@lokesh411 we don't understand what the problem is, so I'm not going to tell you if it's been solved or not. We do not guess in this community. 3.12.x is the only version with active community support.

Nothing in 3.12 has changed around how nodes form clusters or contact their peers on restart. Definitely nothing fundamental. The only change related to plugin activation that I recall has moved this step to the latest possible moment, right before definition import (the final step). If anything, this makes it less likely that a plugin that declares its own tables, like this one, would try to do so before all tables were synced.

Anyhow, these are just guesses after guesses with a few log lines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants