Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pubsub messages sit in queue until GKE pod with subscriber gets reset #2640

Closed
ShahNewazKhan opened this issue Oct 1, 2017 · 19 comments
Closed
Assignees
Labels
api: pubsub Issues related to the Pub/Sub API.

Comments

@ShahNewazKhan
Copy link

ShahNewazKhan commented Oct 1, 2017

Environment details

  • OS: Debian GNU/Linux 8.9 (jessie) [K8s pod based on dockerfile gcr.io/google_appengine/base]
  • Node.js version: 6.11.3
  • npm version: 5.4.2
  • google-cloud/pubsub version: 0.14.2

Steps to reproduce

  1. Spin up nodejs pubsub publisher to topic1 in GKE pod 1
  2. Spin up nodejs pubsub subscriber to subscription to topic1 in GKE pod 2
  3. Publish messages to topic1

I am facing an intermittent issue where pubsub messages are sitting in the queue and not being delivered to the subscriber in GKE pod 2. Only when I delete the GKE pod 2 subscriber and restart the pod does the message get delivered.

@callmehiphop
Copy link
Contributor

We've seen a number of reports of messages not being delivered in k8s, I believe this issue is being investigated internally, although I do not know the status. @lukesneeringer have we heard any news in regards to this?

@callmehiphop callmehiphop added the api: pubsub Issues related to the Pub/Sub API. label Oct 1, 2017
@eyalse
Copy link

eyalse commented Oct 1, 2017

I'm suffering from the same issue at the moment :(

@ApeNox
Copy link

ApeNox commented Oct 2, 2017

Suffering the same issue too, please provide a fix as these are production used tools.

@eyalse
Copy link

eyalse commented Oct 2, 2017

@callmehiphop (@lukesneeringer) hey any update? as mentioned these tools (k8s and pubsub) are used in production.

@callmehiphop
Copy link
Contributor

I don't have any official updates, but a new patch release was made this morning that might resolve the issues you're seeing.

@ShahNewazKhan
Copy link
Author

@callmehiphop I have done some preliminary testing with the google-cloud/pubsub patch version: 0.14.3 release this morning and it looks promising so far

I have not been able to reproduce the issue yet however will need to run full end to end tests to confirm

@callmehiphop
Copy link
Contributor

@ShahNewazKhan that's great, please keep us posted! 😃

@ShahNewazKhan
Copy link
Author

@callmehiphop I have been able to replicate the issue with google-cloud/pubsub patch 0.14.3 in a slightly different use case.

Environment details

OS: Debian GNU/Linux 8.9 (jessie) [K8s pod based on dockerfile gcr.io/google_appengine/base]
Node.js version: 6.11.3
npm version: 5.4.2
google-cloud/pubsub version: 0.14.3

Steps to reproduce

  1. Spin up nodejs pubsub publisher to topic1 in GKE pod 1
    
  2. Spin up nodejs pubsub subscriber to subscription to topic1 in GKE pod 2
    
  3. Reset GKE pod 1 [pubsub publisher app]
    
  4. Publish messages to topic1
    

At this point the message remains stuck in the pubsub queue until I reset the GKE pod 2 [pubsub subscriber app]

@ShahNewazKhan
Copy link
Author

Just checking in for updates on this issue.

@callmehiphop
Copy link
Contributor

@ShahNewazKhan We believe this is a GKE issue and because of that I can't comment on if its being worked on and when it will be fixed. I'm really sorry for the inconvenience.

@ehacke
Copy link

ehacke commented Oct 16, 2017

We may be having similar issues, not sure. @ShahNewazKhan what version of GKE are you on?

@ShahNewazKhan
Copy link
Author

@ehacke

GKE: 1.6.10-gke.1
Kubernetes: 1.5.6

@kir-titievsky
Copy link

Question for those who'd reported this: is there any chance you had no messages published or delivered for 10 minutes or longer before you started publishing and accumulating them in the backlog?

@ShahNewazKhan
Copy link
Author

@kir-titievsky I can confirm that the published messages sit in the subscription queue only when the publisher has been inactive longer than 10 minutes.

@kir-titievsky
Copy link

Thanks @ShahNewazKhan . My guess here is this: by default, GCE suspends inactive connections after 10 minutes [1]. Since Pub/Sub relies on a persistent streamingPull connection, this connection would get suspended if no messages flow for 10 minutes. This condition was not properly detected by Pub/Sub. This was fixed as of 2017-10-20 by shutting down affected streamingPull connections. The server-initiated shutdown should now trigger the client library to rebuild the connection.

Can those of you affected check if the issue persists?

[1] https://cloud.google.com/compute/docs/troubleshooting#communicatewithinternet

@ShahNewazKhan
Copy link
Author

@kir-titievsky Can you clarify what you mean by 'server-initiated shutdown'. Does this mean that the inactive Pub/Sub streamingPull connections are now being shutdown instead of being suspended by GCE?

I have noticed messages sitting in the queue intermittently still, do I have to update the Pub/Sub client to a latest version to handle the streamingPull connection rebuilds?

Thanks in advance!

@stephenplusplus stephenplusplus added the status: blocked Resolving the issue is dependent on other work. label Nov 27, 2017
@stephenplusplus
Copy link
Contributor

I'm marking this as blocked, since it sounds like GKE is the party responsible for any progress on this. @callmehiphop does this sound right?

@callmehiphop
Copy link
Contributor

@stephenplusplus I believe it does!

@stephenplusplus
Copy link
Contributor

This issue was moved to googleapis/nodejs-pubsub#11

@stephenplusplus stephenplusplus removed the status: blocked Resolving the issue is dependent on other work. label Dec 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the Pub/Sub API.
Projects
None yet
Development

No branches or pull requests

7 participants