Make shutdown command require the node to be local (unless --no-wait is provided) #309

hairyhum · 2019-02-01T16:50:22Z

Shutdown waits for the node to stop based on OS pid,
if the node is not local, the pid file won't be local and will never exist.

[#142699795]

It's not that easy to autotest this feature, because it requires a node on a different host.

Shutdown waits for the node to stop based on OS pid, if the node is not local, it will be the wrong pid. [#142699795]

michaelklishin · 2019-02-03T15:39:08Z

I've moved the check to a validator.

michaelklishin · 2019-02-03T15:41:37Z

Let's double check if the BOSH release may be using this in a non-local scenario under any circumstances since it technically does work (the node will stop), even though not really as expected (the command won't wait for shutdown since there never will be a local pid file).

michaelklishin · 2019-02-03T15:47:56Z

@gerhard FYI.

Add --[no-]wait (enabled by default) for those who would prefer to use it to shut down remote nodes even though it wouldn't wait for a verified node termination. Propagate --timeout to calls. References #309.

michaelklishin · 2019-02-03T17:42:59Z

I introduced a new flag, --[no]-wait, that lets the user shut down remote nodes without waiting for a confirmed termination if she opts in.

Add --[no-]wait (enabled by default) for those who would prefer to use it to shut down remote nodes even though it wouldn't wait for a verified node termination. Propagate --timeout to calls. References #309. (cherry picked from commit 6d73256)

michaelklishin · 2019-02-03T18:10:19Z

Backported to v3.7.x.

References #309.

References #309. (cherry picked from commit ee74f46)

gerhard · 2019-02-04T09:52:06Z

This makes sense, good catch.

I can see that cf-rabbitmq-release uses shutdown with remote nodes, this change is likely to impact them. cc @rabbitmq/rabbitmq-pcf-tile-team

rabbitmq-server-boshrelease uses shutdown with local node, no impact. cc @rabbitmq/sme

References #309.

References #309. (cherry picked from commit 72a98d8)

gerhard · 2019-03-06T13:13:43Z

We've discovered with @nodo that in cf-rabbitmq-release inet_db:gethostname/0 returns localhost for all nodes. This results in the remote node check failing, since hostname == remote_hostname is always true. Because hostnames are resolved via erl_inetrc, the OS cannot resolve RabbitMQ node hostnames - including the local one - which is why the hostname defaults to localhost on all nodes.

Should we make the remote node check account for this edge-case?

FWIW, the order in which inet_db data gets loaded in Erlang/OTP v20.3.8.20

lukebakken · 2019-03-06T14:20:48Z

If they are both localhost we could just print a warning that they're using shutdown in yolo mode 😄

michaelklishin · 2019-03-06T14:47:43Z

@gerhard if a hostname is localhost for all nodes, how can they be distinguished? I would strongly prefer to see this fixed in cf-rabbitmq-release over us adding more hacks.

lukebakken · 2019-03-06T15:00:48Z

FWIW this is probably pretty common. Here's what I see on my own workstation -

^C(21.2.6)lbakken@shostakovich ~/development/rabbitmq/umbrella (master=)
$ erl
Erlang/OTP 21 [erts-10.2.4] [source] [64-bit] [smp:8:8] [ds:8:8:10] [async-threads:1] [hipe]

Eshell V10.2.4  (abort with ^G)
1> inet_db:gethostname().
"localhost"

(21.2.6)lbakken@shostakovich ~/development/rabbitmq/umbrella (master=)
$ hostname
shostakovich

(21.2.6)lbakken@shostakovich ~/development/rabbitmq/umbrella (master=)
$ hostname -f
localhost.localdomain

The startup scripts use $HOSTNAME and / or the output of hostname -f to set node names, and not inet_db which is why this sort of thing doesn't affect RMQ operation normally.

gerhard · 2019-03-06T15:02:17Z

You might be thinking about the value returned innode(), and the nodename that gets used for RPC calls. inet_db:get_hostname() differs from that. If erl_inetrc is used to resolve node hostnames, instead of the OS, inet_db:get_hostname() will diverge from node() hence the troubles.

michaelklishin · 2019-03-06T15:02:19Z

If inet_db doesn't take node name or ERL_INETRC into account, we should use a different mechanism.

gerhard · 2019-03-06T15:23:15Z

Let me reproduce this issue locally so that it's easier to fix.

nodo · 2019-03-06T15:38:54Z

@gerhard happy to help if you want to.

@gerhard

inet_db is not a very reliable source as it doesn't take node name CLI arguments and ERL_INETRC file settings. That can lead to false positives in environments where inet_db returns the same value (e.g. `localhost`) for every cluster member. Per discussion with @gerhard. Closes #327. References #309.

@gerhard

inet_db is not a very reliable source as it doesn't take node name CLI arguments and ERL_INETRC file settings. That can lead to false positives in environments where inet_db returns the same value (e.g. `localhost`) for every cluster member. Per discussion with @gerhard. Closes #327. References #309. (cherry picked from commit f83ea58)

@lukebakken

Without this -n has to be used when it previously wasn't required. Follow-up to #328, references #327, #309. Per discussion with @lukebakken.

@lukebakken

Without this -n has to be used when it previously wasn't required. Follow-up to #328, references #327, #309. Per discussion with @lukebakken. (cherry picked from commit 952ec9e)

Make shutdown command require a node to be local.

3ae19cf

Shutdown waits for the node to stop based on OS pid, if the node is not local, it will be the wrong pid. [#142699795]

hairyhum changed the title ~~Make shutdown command require a node to be local.~~ Make shutdown command require the node to be local. Feb 1, 2019

michaelklishin added 2 commits February 3, 2019 18:15

Merge branch 'master' into shutdown_require_local_host

72a8afa

Move the node locality to a validator

b58c6f6

michaelklishin approved these changes Feb 3, 2019

View reviewed changes

Further improvements to the shutdown command

6d73256

Add --[no-]wait (enabled by default) for those who would prefer to use it to shut down remote nodes even though it wouldn't wait for a verified node termination. Propagate --timeout to calls. References #309.

michaelklishin merged commit 69b1e57 into master Feb 3, 2019

michaelklishin deleted the shutdown_require_local_host branch February 3, 2019 17:43

michaelklishin added this to the 3.7.12 milestone Feb 3, 2019

michaelklishin added a commit that referenced this pull request Feb 3, 2019

shutdown: add a test for 8fa8492

ee74f46

References #309.

michaelklishin added a commit that referenced this pull request Feb 3, 2019

shutdown: add a test for 8fa8492

30ad185

References #309. (cherry picked from commit ee74f46)

michaelklishin changed the title ~~Make shutdown command require the node to be local.~~ Make shutdown command require the node to be local (unless --no-wait is provided) Feb 6, 2019

michaelklishin added a commit that referenced this pull request Feb 7, 2019

Improve error message produced by rabbitmqctl shutdown

72a98d8

References #309.

michaelklishin added a commit that referenced this pull request Feb 7, 2019

Improve error message produced by rabbitmqctl shutdown

7f57b82

References #309. (cherry picked from commit 72a98d8)

michaelklishin mentioned this pull request Mar 6, 2019

ctl shutdown: use erlang:node/0 when detecting if target node is remote #327

Closed

michaelklishin mentioned this pull request Mar 6, 2019

ctl shutdown: infer hostnames from node names #328

Merged

michaelklishin added a commit that referenced this pull request Mar 6, 2019

ctl shutdown: consider @localhost nodes to be local

952ec9e

Without this -n has to be used when it previously wasn't required. Follow-up to #328, references #327, #309. Per discussion with @lukebakken.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make shutdown command require the node to be local (unless --no-wait is provided) #309

Make shutdown command require the node to be local (unless --no-wait is provided) #309

Uh oh!

hairyhum commented Feb 1, 2019 •

edited by michaelklishin

Loading

Uh oh!

michaelklishin commented Feb 3, 2019

Uh oh!

michaelklishin commented Feb 3, 2019 •

edited

Loading

Uh oh!

michaelklishin commented Feb 3, 2019

Uh oh!

michaelklishin commented Feb 3, 2019

Uh oh!

michaelklishin commented Feb 3, 2019

Uh oh!

gerhard commented Feb 4, 2019

Uh oh!

gerhard commented Mar 6, 2019 •

edited

Loading

Uh oh!

lukebakken commented Mar 6, 2019

Uh oh!

michaelklishin commented Mar 6, 2019

Uh oh!

lukebakken commented Mar 6, 2019

Uh oh!

gerhard commented Mar 6, 2019 •

edited

Loading

Uh oh!

michaelklishin commented Mar 6, 2019

Uh oh!

gerhard commented Mar 6, 2019

Uh oh!

nodo commented Mar 6, 2019

Uh oh!

Uh oh!

Make shutdown command require the node to be local (unless --no-wait is provided) #309

Make shutdown command require the node to be local (unless --no-wait is provided) #309

Uh oh!

Conversation

hairyhum commented Feb 1, 2019 • edited by michaelklishin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelklishin commented Feb 3, 2019

Uh oh!

michaelklishin commented Feb 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelklishin commented Feb 3, 2019

Uh oh!

michaelklishin commented Feb 3, 2019

Uh oh!

michaelklishin commented Feb 3, 2019

Uh oh!

gerhard commented Feb 4, 2019

Uh oh!

gerhard commented Mar 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukebakken commented Mar 6, 2019

Uh oh!

michaelklishin commented Mar 6, 2019

Uh oh!

lukebakken commented Mar 6, 2019

Uh oh!

gerhard commented Mar 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

michaelklishin commented Mar 6, 2019

Uh oh!

gerhard commented Mar 6, 2019

Uh oh!

nodo commented Mar 6, 2019

Uh oh!

Uh oh!

hairyhum commented Feb 1, 2019 •

edited by michaelklishin

Loading

michaelklishin commented Feb 3, 2019 •

edited

Loading

gerhard commented Mar 6, 2019 •

edited

Loading

gerhard commented Mar 6, 2019 •

edited

Loading