Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can hang on shutdown without explaining what it's waiting for #183

Open
timbunce opened this issue Mar 27, 2020 · 6 comments
Open

Can hang on shutdown without explaining what it's waiting for #183

timbunce opened this issue Mar 27, 2020 · 6 comments

Comments

@timbunce
Copy link

Pixy hangs on shutdown after being used. In our case it's a single client connected to a single topic with a single partition, which produced and consumed a few messages.

Two example logs. First was SIGINT the second was SIGTERM, in case it matters.

^C2020-03-27 10:52:08.675328 Z info </service.0> "Shutting down"
2020-03-27 10:52:08.675378 Z info </service.0/default_pxy_stop.0> Started
2020-03-27 10:52:08.675427 Z info </default.0/adm_stop.0> Started
2020-03-27 10:52:08.675443 Z info </default.0/adm_stop.0> Stopped
2020-03-27 10:52:08.675446 Z info </default.0/prod_stop.0> Started
2020-03-27 10:52:08.675451 Z info </default.0/cons_stop.0> Started
2020-03-27 10:52:08.675523 Z info </default.0/cons.0/disp.0> "Shutting down"
2020-03-27 10:52:08.675491 Z info </default.0/prod_disp.0> "About to stop producer: pendingMsgCount=0"
2020-03-27 10:52:08.675570 Z info </default.0/prod_disp.0> "Stopping producer: pendingMsgCount=0"
2020-03-27 10:52:08.675542 Z info </default.0/cons.0/ussjc-bx-001.ts.example.com.0/disp.0> "Shutting down" kafka.group=ussjc-bx-001.ts.example.com
2020-03-27 10:52:08.675588 Z info "Producer shutting down." category=sarama
2020-03-27 10:52:08.675625 Z info </default.0/prod_merg.0> Stopped
2020-03-27 10:52:08.675639 Z info </default.0/prod_disp.0> Stopped
2020-03-27 10:52:08.675650 Z info "producer/broker/1001 input chan closed" category=sarama
2020-03-27 10:52:08.675665 Z info "producer/broker/1001 shut down" category=sarama
2020-03-27 10:52:08.675665 Z info </default.0/prod_stop.0> Stopped
2020-03-27 10:54:33.659139 Z info </service.0> "Shutting down"
2020-03-27 10:54:33.659175 Z info </service.0/default_pxy_stop.0> Started
2020-03-27 10:54:33.659194 Z info </default.0/prod_stop.0> Started
2020-03-27 10:54:33.659210 Z info </default.0/prod_disp.0> "About to stop producer: pendingMsgCount=0"
2020-03-27 10:54:33.659217 Z info </default.0/prod_disp.0> "Stopping producer: pendingMsgCount=0"
2020-03-27 10:54:33.659222 Z info </default.0/adm_stop.0> Started
2020-03-27 10:54:33.659226 Z info "Producer shutting down." category=sarama
2020-03-27 10:54:33.659240 Z info </default.0/adm_stop.0> Stopped
2020-03-27 10:54:33.659246 Z info </default.0/prod_merg.0> Stopped
2020-03-27 10:54:33.659251 Z info </default.0/prod_disp.0> Stopped
2020-03-27 10:54:33.659259 Z info "producer/broker/1001 input chan closed" category=sarama
2020-03-27 10:54:33.659269 Z info "producer/broker/1001 shut down" category=sarama
2020-03-27 10:54:33.659225 Z info </default.0/cons_stop.0> Started
2020-03-27 10:54:33.659278 Z info </default.0/cons.0/disp.0> "Shutting down"
2020-03-27 10:54:33.659285 Z info </default.0/cons.0/ussjc-bx-001.ts.example.com.0/disp.0> "Shutting down" kafka.group=ussjc-bx-001.ts.example.com
2020-03-27 10:54:33.659254 Z info </default.0/prod_stop.0> Stopped
@horkhe
Copy link
Member

horkhe commented Mar 27, 2020

What version are you using?

@horkhe
Copy link
Member

horkhe commented Mar 27, 2020

Have you acknowledged all consumed messages before stopping? Alternatively you could have used auto_ack=true when consuming. Before stopping kafka-pixy waits for acknowledgments of consumed messages for proxies.<proxy-name>.consumer.ack_timeout which is 5 minutes by default.

@timbunce
Copy link
Author

What version are you using?

v0.17.0

Before stopping kafka-pixy waits for acknowledgments of consumed messages for proxies..consumer.ack_timeout which is 5 minutes by default.

That seems like a long time for a process to wait after a SIGTERM (especially without logging an explanatory message).

I'd expect kafka-pixy to stop promptly, dropping the connection, and for the client's Ack to fail with a gRPC error so the client would know the Ack didn't succeed.

What does "before retrying" mean in this context:

      # Period of time that Kafka-Pixy should wait for an acknowledgement
      # before retrying.
      ack_timeout: 5m

@horkhe
Copy link
Member

horkhe commented Mar 30, 2020

That seems like a long time for a process to wait after a SIGTERM (especially without logging an explanatory message).

If kafka-pixy stops immediately that would result in unacknowledged messages to be offered to another client later, possibly by another kafka-pixy instance, hence it is advisable to set this parameter to be greater than the longest it takes to handle your messages.

What does "before retrying" mean in this context:

It means that if a message is not acknowledged in ack_timeout time, it will be offered by kafka-pixy again.

@timbunce timbunce changed the title Hangs on shutdown and has to be killed Can hang on shutdown without explaining what it's waiting for Mar 31, 2020
@timbunce
Copy link
Author

Thanks for the explanation. I agree it's working as designed.

My suggested changes are:

  • clarify the wording of the ack_timeout comments by explaining the effect, both on messages and on shutdown.
  • if a shutdown is blocked waiting on ack_timeout log a message to that effect.
  • add an config option to shutdown more aggressively and document the consequences.

@horkhe
Copy link
Member

horkhe commented Apr 1, 2020

I agree with the first two suggestions, however we will not implement the third, because a user have an option to auto ack consumed messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants