Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Durable LVC exchange should persist the cached message (if persistent) across RabbitMQ restarts. #21

Closed
dcorbin opened this issue Apr 2, 2019 · 8 comments · Fixed by #22

Comments

@dcorbin
Copy link

dcorbin commented Apr 2, 2019

  1. Create a durable LVC exchange
  2. Publish a persistent message to it.
  3. Confirm that binding a queue to the exchanges causes that message to be delivered to the queue.
  4. Restart RabbitMQ
  5. Bind a queue to the exchange - notice that no message is delivered, but my expectation is that it will deliver it.
@lukebakken lukebakken self-assigned this Apr 2, 2019
@lukebakken
Copy link

If implemented, this probably won't take clustering into account.

@dcorbin
Copy link
Author

dcorbin commented Apr 2, 2019

For my immediate uses, thats fine.

dcorbacho added a commit that referenced this issue Apr 24, 2019
rabbitmq-lvc-exchange #21
[#165034482]
lukebakken pushed a commit that referenced this issue Apr 24, 2019
rabbitmq-lvc-exchange #21
[#165034482]

Use argument to create_table to set disc_copies
@lukebakken lukebakken added this to the 3.6.16 milestone Apr 24, 2019
@michaelklishin michaelklishin modified the milestones: 3.6.16, 3.7.15 Apr 25, 2019
@bharath1718
Copy link

We observed something where after a recycling the cluster nodes in a rolling fashion the queue bound to a LVC exchange did return the sequence number. Durable is set to true.
[[..]]
com.gs.futures.jetstream.messaging.services.infra.rabbitmq.LvcSequenceManagerRabbitmq - No Sequence Number received
[[..]]

@michaelklishin
Copy link
Member

@bharath1718 in https://github.com/rabbitmq/rabbitmq-lvc-exchange/pull/22/files, the copy uses a single node replica for storage. We cannot suggest much with a single sentence problem definition.

@bharath1718
Copy link

Thanks Mike,below is the log that we observed during the rolling restart of the cluster.
let me know if it helps.
[[..]]
2020-09-11 16:39:02.519 [error] <0.12922.482> CRASH REPORT Process <0.12922.482> with 0 neighbours exited with reason: {error,{no_exists,lvc}} in rabbit_misc:execute_mnesia_transaction/1 line 561 in gen_server2:terminate/3 line 1183
2020-09-11 16:39:02.519 [warning] <0.12846.482> Non-AMQP exit reason '{{error,{no_exists,lvc}},[{rabbit_misc,execute_mnesia_transaction,1,[{file,"src/rabbit_misc.erl"},{line,561}]},{rabbit_exchange_type_lvc,route,2,[{file,"src/rabbit_exchange_type_lvc.erl"},{line,30}]},{rabbit_exchange,route1,3,[{file,"src/rabbit_exchange.erl"},{line,397}]},{rabbit_exchange,route,2,[{file,"src/rabbit_exchange.erl"},{line,387}]},{rabbit_channel,handle_method,3,[{file,"src/rabbit_channel.erl"},{line,1161}]},{rabbit_channel,handle_cast,2,[{file,"src/rabbit_channel.erl"},{line,567}]},{gen_server2,handle_msg,2,[{file,"src/gen_server2.erl"},{line,1067}]},{proc_lib,wake_up,3,[{file,"proc_lib.erl"},{line,259}]}]}'
2020-09-11 16:39:02.519 [error] <0.12907.482> Supervisor {<0.12907.482>,rabbit_channel_sup} had child channel started with rabbit_channel:start_link(1, <0.12846.482>, <0.12918.482>, <0.12846.482>, <<"XX.XX.XX.XX:56308 -> XX.XX.XX.XX:29130">>, rabbit_framing_amqp_0_9_1, {user,<<"161690_nonprod.jetstream-oma">>,[monitoring],[{rabbit_auth_backend_internal,none}]}, <<"/">>, [{<<"exchange_exchange_bindings">>,bool,true},{<<"connection.blocked">>,bool,true},{<<"authentica...">>,...},...], <0.12881.482>, <0.12917.482>) at <0.12922.482> exit with reason {error,{no_exists,lvc}} in rabbit_misc:execute_mnesia_transaction/1 line 561 in context child_terminated
2020-09-11 16:44:10.411 [info] <0.287.0> Running boot step rabbit_lvc_plugin defined by app rabbitmq_lvc_exchange

  • rabbitmq_lvc_exchange
    [[..]]

@michaelklishin
Copy link
Member

@bharath1718 our team does not use existing issues as a support forum. The exception says that when the exchange attempted to route a message, the lvc table did not exist. My best guess is that this is a variation of rabbitmq/rabbitmq-server#2384 where plugin activation happens "out of sync" with other operations, e.g. client connections and attempts to publish a message. This can make a few initial publishes of a fast-to-connect client to fail because this plugin is not really ready yet (its table creation transaction hasn't completed yet).

We don't have much relevant information (RabbitMQ version, plugin version, any logs, or any steps to reproduce), and will not use an existing issues' comments for discussions. That's what our public mailing list, community Slack and rabbitmq/discussions are for.

@michaelklishin
Copy link
Member

We now understand what may be going on here. The plugin does not retry syncing its tables from peers the same way RabbitMQ core would. And in general, this plugin is stateful
so a client can potentially begin using its exchange type before the plugin has started.

We can easily address the former (#28) but it's not obvious how to do the latter safely for RabbitMQ itself. We don't want a plugin to delay node startup, for example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants