Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rabbitmqctl: "wait" could wait forever if rabbitmq-server fails to create a pidfile #463

Closed
dumbbell opened this issue Dec 2, 2015 · 12 comments
Assignees

Comments

@dumbbell
Copy link
Member

dumbbell commented Dec 2, 2015

The current implementation of rabbitmqctl wait can't distinguish between a node that did not start and a node that failed to create its pidfile. In both cases, rabbitmqctl wait will wait indefinitely. When the node fails to create a pidfile, one would expect the command to return with a non-zero exit status.

The current implementation of rabbitmqctl wait can't distinguish a node who did not start yet from a node who failed to create its pidfile. In both cases, rabbitmqctl wait will spin forever, but in the latter (eg. when used in an init script), one would expect it to return and the init script to notify the startup failure.

@cesarmun
Copy link

cesarmun commented Dec 3, 2015

Is there an approximate timeline to fix this?

@dumbbell
Copy link
Member Author

dumbbell commented Dec 3, 2015

Not yet. We have more critical fixes to do first and we are busy with the 3.6.0 release at the moment.

@jmoney
Copy link

jmoney commented Dec 22, 2015

Could this make it into the 3.6.0 release? I've hit this quite a bit and its rather annoying to have to ctrl-c and kill the process.

@dumbbell
Copy link
Member Author

Hi @jmoney8080!

No, unfortunately, it won't make it to 3.6.0. It requires non-trivial changes and we are too far in the release cycle.

@michaelklishin
Copy link
Member

It possibly can make it into 3.6.x, we will see how intrusive this ends up being.

@jmoney
Copy link

jmoney commented Dec 22, 2015

Thanks! If you need an external source to test an RC with it I can help. This issue causes a lot of annoyances when our automation runs(not all the time but when it does happen to causes issues it's annoying to fix).

@michaelklishin
Copy link
Member

@jmoney8080 we'll keep you posted ;)

@michaelklishin michaelklishin assigned Dzol and unassigned dumbbell Jan 5, 2016
@Dzol
Copy link

Dzol commented Apr 21, 2016

To distinguish a node which hasn't completed start-up yet, from a node which failed to create its PID file, we could consider the node to have failed after retrying or timing out.

Specifying a timeout on the command-line would be a friendly way to give the user control.

But the danger is that the broker may complete start-up after we consider it to have failed.

@Dzol
Copy link

Dzol commented Apr 22, 2016

@dumbbell & @michaelklishin: To clarify what I meant above: if the script doesn't get a response from the server (in a given time) saying that it has started successfully, then the script could exit with status/code 1, otherwise 0.

@dumbbell
Copy link
Member Author

Just a random thought while working on something unrelated: we could start an ephemere Erlang node before starting RabbitMQ itself and use net_kernel:monitor_nodes/{1,2} to detect if the RabbitMQ node started but failed to boot.

@michaelklishin michaelklishin assigned dumbbell and unassigned Dzol May 13, 2016
@michaelklishin
Copy link
Member

We will investigate if there are reasonably safe ways to fix this in 3.6.x.

binarin pushed a commit to binarin/nixpkgs that referenced this issue Jan 25, 2018
- Use socket-activated epmd - that way there won't be any trouble when
  more than one erlang system is used within a single host.
- Use new automation-friendly configuration file format
- Use systemd notifications instead of buggy 'rabbitmqctl wait' for
  confirming successful server startup.
  'wait' bug: rabbitmq/rabbitmq-server#463
- Use 'rabbitmqctl shutdown' instead of 'stop', because it's not
  pid-file based
- Use sane systemd unit defaults from RabbitMQ repo:
  https://github.com/rabbitmq/rabbitmq-server/blob/master/docs/rabbitmq-server.service.example
- Support for external plugins
binarin pushed a commit to binarin/nixpkgs that referenced this issue Feb 13, 2018
- Use socket-activated epmd - that way there won't be any trouble when
  more than one erlang system is used within a single host.
- Use new automation-friendly configuration file format
- Use systemd notifications instead of buggy 'rabbitmqctl wait' for
  confirming successful server startup.
  'wait' bug: rabbitmq/rabbitmq-server#463
- Use 'rabbitmqctl shutdown' instead of 'stop', because it's not
  pid-file based
- Use sane systemd unit defaults from RabbitMQ repo:
  https://github.com/rabbitmq/rabbitmq-server/blob/master/docs/rabbitmq-server.service.example
- Support for external plugins
binarin added a commit to binarin/nixpkgs that referenced this issue Jul 20, 2018
- Use socket-activated epmd - that way there won't be any trouble when
  more than one erlang system is used within a single host.
- Use new automation-friendly configuration file format
- Use systemd notifications instead of buggy 'rabbitmqctl wait' for
  confirming successful server startup.
  'wait' bug: rabbitmq/rabbitmq-server#463
- Use 'rabbitmqctl shutdown' instead of 'stop', because it's not
  pid-file based
- Use sane systemd unit defaults from RabbitMQ repo:
  https://github.com/rabbitmq/rabbitmq-server/blob/master/docs/rabbitmq-server.service.example
- Support for external plugins
binarin pushed a commit to binarin/nixpkgs that referenced this issue Sep 24, 2018
- Use socket-activated epmd - that way there won't be any trouble when
  more than one erlang system is used within a single host.
- Use new automation-friendly configuration file format
- Use systemd notifications instead of buggy 'rabbitmqctl wait' for
  confirming successful server startup.
  'wait' bug: rabbitmq/rabbitmq-server#463
- Use 'rabbitmqctl shutdown' instead of 'stop', because it's not
  pid-file based
- Use sane systemd unit defaults from RabbitMQ repo:
  https://github.com/rabbitmq/rabbitmq-server/blob/master/docs/rabbitmq-server.service.example
- Support for external plugins
binarin added a commit to binarin/nixpkgs that referenced this issue Sep 24, 2018
- Use socket-activated epmd - that way there won't be any trouble when
  more than one erlang system is used within a single host.
- Use new automation-friendly configuration file format
- Use systemd notifications instead of buggy 'rabbitmqctl wait' for
  confirming successful server startup.
  'wait' bug: rabbitmq/rabbitmq-server#463
- Use 'rabbitmqctl shutdown' instead of 'stop', because it's not
  pid-file based
- Use sane systemd unit defaults from RabbitMQ repo:
  https://github.com/rabbitmq/rabbitmq-server/blob/master/docs/rabbitmq-server.service.example
- Support for external plugins
Profpatsch pushed a commit to NixOS/nixpkgs that referenced this issue Sep 25, 2018
- Use socket-activated epmd - that way there won't be any trouble when
  more than one erlang system is used within a single host.
- Use new automation-friendly configuration file format
- Use systemd notifications instead of buggy 'rabbitmqctl wait' for
  confirming successful server startup.
  'wait' bug: rabbitmq/rabbitmq-server#463
- Use 'rabbitmqctl shutdown' instead of 'stop', because it's not
  pid-file based
- Use sane systemd unit defaults from RabbitMQ repo:
  https://github.com/rabbitmq/rabbitmq-server/blob/master/docs/rabbitmq-server.service.example
- Support for external plugins
@lourot
Copy link

lourot commented Aug 13, 2021

IMHO this is fixed by 69454027 and can be closed, thanks!

EDIT: actually it seems like rabbitmqctl wait --timeout ... has been there for even longer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants