Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

errors when restarting broker #121

Open
heinsaris opened this issue Jan 31, 2020 · 2 comments
Open

errors when restarting broker #121

heinsaris opened this issue Jan 31, 2020 · 2 comments

Comments

@heinsaris
Copy link

Hi,

I get some strange behaviour when I stop and restart my broker while there is an active Tortoise connection. I use Tortoise 0.9.4 on a MBP running Macos 10.15.3 (Catalina) and Elixir 1.10.0 on top of Erlang 22.2.4 (both installed via brew).

This is how the connection is set up (As a child of the app supervisor:

{Tortoise.Connection,
[
client_id: "hein-1",
server: {
Tortoise.Transport.Tcp,
host: "localhost", port: 1883
},
handler: {Tortoise.Handler.Logger, []}
]}

I run a producer and a consumer in separate processes. The producer adds messages to a queue (BlockingQueue) , and the consumer takes the messages from the queue and publishes them via publish_sync. I tried with publish and a receive..after, but I got even worse problems (some of my messages were lost).

Here is what happens:

When the broker is stopped, I immediately get a warning from the logger:

14:51:38.193 [warn] Connection has been dropped

I can keep publishing (for minutes even) , but as soon as I restart the broker, I get an error:

14:51:50.926 [error] GenStateMachine {Tortoise.Registry, {Tortoise.Connection.Inflight, "hein-1"}} terminating
** (exit) {:protocol_violation, :unknown_identifier}
(stdlib 3.11.1) gen_statem.erl:1298: :gen_statem.loop_state_callback_result/11
(stdlib 3.11.1) proc_lib.erl:249: :proc_lib.init_p_do_apply/3
State: {{:connected, {Tortoise.Transport.Tcp, #Port<0.16>}}, %Tortoise.Connection.Inflight{client_id: "hein-1", order: [], pending: %{}}}
Callback mode: :handle_event_function

After the timeout on publish_sync I get this:

14:52:50.914 [info] answer = {:error, :timeout}

And I try to publish the message again:

14:52:50.914 [info] publish message Message 4

But get another error:

14:52:50.922 [error] GenStateMachine {Tortoise.Registry, {Tortoise.Connection.Inflight, "hein-1"}} terminating
** (CaseClauseError) no case clause matching: {:error, :closed}
(tortoise 0.9.4) lib/tortoise/connection/inflight.ex:292: Tortoise.Connection.Inflight.handle_event/4
(stdlib 3.11.1) gen_statem.erl:1161: :gen_statem.loop_state_callback/11
(stdlib 3.11.1) proc_lib.erl:249: :proc_lib.init_p_do_apply/3

Then I get this warning:

14:52:55.912 [warn] Client has been terminated with reason: {:function_clause, [{Tortoise.Handler.Logger, :connection, [:terminated, %Tortoise.Handler.Logger{}], [file: 'lib/tortoise/handler/logger.ex', line: 16]}, {Tortoise.Handler, :execute, 2, [file: 'lib/tortoise/handler.ex', line: 314]}, {Tortoise.Connection.Controller, :handle_info, 2, [file: 'lib/tortoise/connection/controller.ex', line: 215]}, {:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 637]}, {:gen_server, :handle_msg, 6, [file: 'gen_server.erl', line: 711]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 249]}]}

And this error:

14:52:55.920 [error] GenServer {Tortoise.Registry, {Tortoise.Connection.Controller, "hein-1"}} terminating
** (FunctionClauseError) no function clause matching in Tortoise.Handler.Logger.connection/2
(tortoise 0.9.4) lib/tortoise/handler/logger.ex:16: Tortoise.Handler.Logger.connection(:terminated, %Tortoise.Handler.Logger{})
(tortoise 0.9.4) lib/tortoise/handler.ex:314: Tortoise.Handler.execute/2
(tortoise 0.9.4) lib/tortoise/connection/controller.ex:215: Tortoise.Connection.Controller.handle_info/2

Then comes a ping timeout:

14:52:55.922 [error] GenServer {Tortoise.Registry, {Tortoise.Connection, "hein-1"}} terminating
** (stop) :ping_timeout
Last message: :ping
State: %Tortoise.Connection{backoff: %Tortoise.Connection.Backoff{max_interval: 30000, min_interval: 100, value: nil}, client_id: "hein-1", connect: %Tortoise.Package.Connect{META: %Tortoise.Package.Meta{flags: 0, opcode: 1}, clean_session: false, client

After 4 errors, the Tortoise client is terminated (by or via the supervisor), and a new one is created, after which everything is fine again.

I have lost no messages, but the messages that was being published at the time the broker is restarted gets published more than once.

The question is whether this is a bug in Tortoise, or am I doing something wrong? Or maybe this is what is supposed to happen and is a good example of the let it fail philosophy?

@gausby
Copy link
Owner

gausby commented Jan 31, 2020

Looks like the server send a package with an identifier unknown to the tortoise instance…what broker are you using?

@heinsaris
Copy link
Author

heinsaris commented Jan 31, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants