Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S01E01: How to upgrade from RabbitMQ 3.7 to 3.8? #3

Open
wants to merge 9 commits into
base: master
from

Conversation

@gerhard
Copy link
Member

gerhard commented Jan 6, 2020

Proposed by @dlresende via #2

A new RabbitMQ version comes out. Exciting! Shiny new features, bug fixes, security patches, etc. - it's time to upgrade! But hang on a second... there are hundreds of applications using RabbitMQ in production. How should we go about upgrading?

This question comes up frequently in the RabbitMQ community, as part of what we call Day 2 Operations. Every company or team decide which upgrade strategy works better for them: blue-green deployment, rolling (one node at a time) upgrades, etc. But every strategy comes with its advantages and trade-offs, which are not well understood.

  • What happens to clients during a rolling upgrade?
  • What happens to particular types of queues?
  • What if an alarm gets triggered during an upgrade?
  • When should I expect for downtime?
  • When there's a risk of data loss?
  • How clusters reform after an upgrade?
  • How to configure RabbitMQ or its clients to be upgrade-resilient?

The above questions come up over and over again and I believe that many would benefit from clear & concise guidance on how to tackle them. Bonus points if I can see how to do it right, which would go really well alongside the already excellent Upgrading RabbitMQ guide.

How to upgrade RabbitMQ 3.7 to 3.8 in prod?

Proposed by @dlresende via #2

[#170544206]

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
@gerhard gerhard changed the title How to upgrade RabbitMQ 3.7 to 3.8 in prod? 2020-01-31 How to upgrade from RabbitMQ 3.7 to 3.8 in prod? 2020-01-31 Jan 6, 2020
@gerhard gerhard changed the title How to upgrade from RabbitMQ 3.7 to 3.8 in prod? 2020-01-31 How to upgrade from RabbitMQ 3.7 to 3.8 in prod? Jan 6, 2020
@gerhard gerhard changed the title How to upgrade from RabbitMQ 3.7 to 3.8 in prod? How to upgrade from RabbitMQ 3.7 to 3.8 in prod? - 2020-01-31 Jan 6, 2020
@gerhard gerhard changed the title How to upgrade from RabbitMQ 3.7 to 3.8 in prod? - 2020-01-31 How to upgrade from RabbitMQ 3.7 to 3.8? - 2020-01-31 Jan 6, 2020
@gerhard gerhard changed the title How to upgrade from RabbitMQ 3.7 to 3.8? - 2020-01-31 S01E01 How to upgrade from RabbitMQ 3.7 to 3.8? Jan 6, 2020
@gerhard gerhard changed the title S01E01 How to upgrade from RabbitMQ 3.7 to 3.8? S01E01: How to upgrade from RabbitMQ 3.7 to 3.8? Jan 17, 2020
gerhard added 8 commits Jan 17, 2020
[#170544206]

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
[#170545354]

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
We want to start with backlog & focus on upgrading the cluster while
consumers are making progress on draining the queues. Ideally, we would
like to see queues with very little message backlog by the time we
finish upgrading all nodes.

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
To reproduce, start a broker in one session by running:

    make 3-nodes-production-rmq1-server

And run the following command in a different session:

    make 3-nodes-production-rmq1-backlog

PerfTest crashes with the following exception within 10 seconds:

    /usr/local/bin/docker run --rm --interactive --tty \
      --hostname tgir-s01e01-rmq1-backlog \
      --name tgir-s01e01-rmq1-backlog \
      --network tgir-s01e01 \
      pivotalrabbitmq/perf-test:2.10.0-ubuntu \
      --auto-delete false \
      --confirm 100 \
      --confirm-timeout 10 \
      --consumers 0 \
      --flag persistent \
      --pmessages 1000 \
      --producers 1 \
      --queue-args 'x-max-length=1000' \
      --queue-pattern 'tgir-s01e01-rmq1-q%d' \
      --queue-pattern-from 1 \
      --queue-pattern-to 2000 \
      --servers-startup-timeout 60 \
      --size 1000 \
      --type 'fanout' \
      --uri "amqp://guest:guest@tgir-s01e01-rmq1:5672/%2f"
    id: test-102849-029, starting producer #0
    id: test-102849-029, starting producer #0, channel #0
    id: test-102849-029, time: 7.178s, sent: 0.14 msg/s, confirmed: 0 msg/s, nacked: 0 msg/s, min/median/75th/95th/99th confirm latency: 0/0/0/0/0 ?s
    test stopped (Error in producer)
    id: test-102849-029, sending rate avg: 4.9 msg/s
    id: test-102849-029, receiving rate avg: 0 msg/s
    Exception in thread "AMQP Connection 172.26.0.2:5672" java.util.concurrent.RejectedExecutionException: Task com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable@394d1b4f rejected from java.util.concurrent.ThreadPoolExecutor@1314c64d[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
            at java.base/java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2055)
            at java.base/java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:825)
            at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1355)
            at java.base/java.util.concurrent.Executors$DelegatedExecutorService.execute(Executors.java:687)
            at com.rabbitmq.client.impl.ConsumerWorkService.addWork(ConsumerWorkService.java:81)
            at com.rabbitmq.client.impl.ConsumerDispatcher.execute(ConsumerDispatcher.java:214)
            at com.rabbitmq.client.impl.ConsumerDispatcher.handleShutdownSignal(ConsumerDispatcher.java:173)
            at com.rabbitmq.client.impl.ChannelN.broadcastShutdownSignal(ChannelN.java:283)
            at com.rabbitmq.client.impl.ChannelN.finishProcessShutdownSignal(ChannelN.java:301)
            at com.rabbitmq.client.impl.ChannelN.processShutdownSignal(ChannelN.java:317)
            at com.rabbitmq.client.impl.ChannelManager$1.run(ChannelManager.java:117)
            at com.rabbitmq.client.impl.ChannelManager.handleSignal(ChannelManager.java:121)
            at com.rabbitmq.client.impl.AMQConnection.finishShutdown(AMQConnection.java:982)
            at com.rabbitmq.client.impl.AMQConnection.shutdown(AMQConnection.java:956)
            at com.rabbitmq.client.impl.AMQConnection.handleFailure(AMQConnection.java:759)
            at com.rabbitmq.client.impl.AMQConnection.access$400(AMQConnection.java:48)
            at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:654)
            at java.base/java.lang.Thread.run(Thread.java:834)

[#170544206]

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
[#170544206]

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
[#170544206]

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
[#170544206]

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Disable publisher confirms, publish 5 msg/s/producer to backfill quicker

[#170544206]

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
1 participant
You can’t perform that action at this time.