-
Notifications
You must be signed in to change notification settings - Fork 3.9k
Closed
Labels
Description
Hi,
I do experience migration issues from 3.5.4 to 3.7.4 on windows environment.
I did prepare windows docker containers to ease the reproduction of the issue. We experience the same issue on windows virtual machines.
How to reproduce
- setup your machine name to: rmq (it's easier in the Container than on real windows machines)
- start rabbitmq 3.5.4 (OTP 18.3) instance
- stop the broker
- start rabbitmq 3.7.4 (OTP 20.0) instance using the same RABBITMQ_BASE folder
- migration failed with message:
BOOT FAILED
===========
Error description:
init:do_boot/3 line 793
init:start_em/1 line 1085
rabbit:start_it/1 line 445
rabbit:'-boot/0-fun-0-'/0 line 296
rabbit_upgrade:run_mnesia_upgrades/2 line 155
rabbit_upgrade:die/2 line 209
io:format(<0.56.0>, "\n\n****\n\nCluster upgrade needed but other disc nodes shut down after this one.\nPlease first star...", [])
error:badarg
Log file(s) (may contain more information):
c:/rmq-data/log/RABBIT~1.LOG
c:/rmq-data/log/rabbit@rmq_upgrade.log
{"init terminating in do_boot",badarg}
init terminating in do_boot (badarg)
Crash dump is being written to: c:\rmq-data\log\erl_crash.dump...done
investigations done
-
it looks like cluster membership is case sensitive
- when I start 3.5.4 instance the content of nodes_running_at_shutdown is: [rabbit@RMQ]. (look the uppercase)
- when I do the failing upgrade, the content of the same file is: [rabbit@RMQ,rabbit@rmq].
-
from what I read in the code:
- in 3.5.4, in scripts/rabbitmq-env.bat : https://github.com/rabbitmq/rabbitmq-server/blame/v3.5.x/scripts/rabbitmq-env.bat
if "!RABBITMQ_NODENAME!"=="" (
if "!NODENAME!"=="" (
set RABBITMQ_NODENAME=rabbit@!COMPUTERNAME!
) else (
set RABBITMQ_NODENAME=!NODENAME!
)
)
in the default case, rabbitmq will generate rabbit@COMPUTERNAME (all in uppercase)
- in 3.7.4, same script: https://github.com/rabbitmq/rabbitmq-server/blob/v3.7.x/scripts/rabbitmq-env.bat
if "!RABBITMQ_NODENAME!"=="" (
if "!NODENAME!"=="" (
REM We use Erlang to query the local hostname because
REM !COMPUTERNAME! and Erlang may return different results.
REM Start erl with -sname to make sure epmd is started.
call "%ERLANG_HOME%\bin\erl.exe" -A0 -noinput -boot start_clean -sname rabbit-prelaunch-epmd -eval "init:stop()." >nul 2>&1
for /f "delims=" %%F in ('call "%ERLANG_HOME%\bin\erl.exe" -A0 -noinput -boot start_clean -eval "net_kernel:start([list_to_atom(""rabbit-gethostname-"" ++ os:getpid()), %NAMETYPE%]), [_, H] = string:tokens(atom_to_list(node()), ""@""), io:format(""~s~n"", [H]), init:stop()."') do @set HOSTNAME=%%F
set RABBITMQ_NODENAME=rabbit@!HOSTNAME!
set HOSTNAME=
) else (
set RABBITMQ_NODENAME=!NODENAME!
)
)
And here rabbitmq generates rabbit@hostname where hostname has the same value as cmd hostname
Workaround
- delete db folder (not really possible)
- manually set the RABBITMQ_NODENAME environment variable
- rename the machine with everything as capital letters
How to reproduce with windows docker containers
DockerHub images are built from this repository: https://github.com/gsx-solutions/rmq-win
docker volume create rmq-data
docker run --rm -h rmq -v rmq-data:c:\rmq-data -ti gsxsolutions/rmq:3.5.4
docker run --rm -h rmq -v rmq-data:c:\rmq-data -ti gsxsolutions/rmq:3.7.4
Then you can just use -h RMQ to make it working.
Thank you for your work and support.