Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Failed to connect before the deadline" errors being seen #818

Closed
npomfret opened this issue Nov 13, 2019 · 22 comments
Closed

"Failed to connect before the deadline" errors being seen #818

npomfret opened this issue Nov 13, 2019 · 22 comments
Assignees
Labels
api: pubsub Issues related to the googleapis/nodejs-pubsub API. external This issue is blocked on a bug with the actual product. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@npomfret
Copy link

Since upgrading to the latest pubsub ("@google-cloud/pubsub": "1.1.5") my node is still getting occasional errors from the underlying grpc library. My app is using grpc-js@0.6.9:

$ npm ls | grep grpc-js
│ │ ├─┬ @grpc/grpc-js@0.6.9

It's not as bad as it was. When I first upgraded to pubsub 1.x it simply didn't work at all, I'd get an error within a few seconds of the app starting and from then no messages would get delivered.

Now the problem is much less bad (my app is working), but I still see the following errors in my logs:

From subscription.on(error, (err) => { I see this:

{"code":4,"details":"Failed to connect before the deadline","metadata":{"internalRepr":{},"options":{}}}

And in my logs I've also noticed these (excuse the formatting):

Error: Failed to add metadata entry @���: Wed, 13 Nov 2019 07:13:16 GMT. Metadata key "@���" contains illegal characters at validate (/home/nodeapp/apps/crypto-trader-prototype/shared-code/node_modules/@grpc/grpc-js/build/src/metadata.js:35:15) at Metadata.add (/home/nodeapp/apps/crypto-trader-prototype/shared-code/node_modules/@grpc/grpc-js/build/src/metadata.js:87:9) at values.split.forEach.v (/home/nodeapp/apps/crypto-trader-prototype/shared-code/node_modules/@grpc/grpc-js/build/src/metadata.js:226:63) at Array.forEach (<anonymous>) at Object.keys.forEach.key (/home/nodeapp/apps/crypto-trader-prototype/shared-code/node_modules/@grpc/grpc-js/build/src/metadata.js:226:43) at Array.forEach (<anonymous>) at Function.fromHttp2Headers (/home/nodeapp/apps/crypto-trader-prototype/shared-code/node_modules/@grpc/grpc-js/build/src/metadata.js:200:30) at ClientHttp2Stream.stream.on (/home/nodeapp/apps/crypto-trader-prototype/shared-code/node_modules/@grpc/grpc-js/build/src/call-stream.js:211:56) at ClientHttp2Stream.emit (events.js:193:13) at emit (internal/http2/core.js:244:8) at processTicksAndRejections (internal/process/task_queues.js:81:17)

.. but the two don't seem to be related.

I will start running with the debug flags enabled and report back

@bcoe bcoe added type: question Request for information or clarification. Not an issue. external This issue is blocked on a bug with the actual product. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p2 Moderately-important priority. Fix may not be included in next release. and removed type: question Request for information or clarification. Not an issue. labels Nov 13, 2019
@bcoe
Copy link
Contributor

bcoe commented Nov 13, 2019

as discussed here this is a known, non-fatal, warning coming out of @grpc/grpc-js; there's a hunch that it might actually be a problem with the HTTP2 implementation in Node.js.

@bcoe bcoe changed the title still getting grpc issues grpc metadata warning in pubsub Nov 13, 2019
@npomfret
Copy link
Author

@bcoe including the "Failed to connect before the deadline" error?

It happened again today with the debug flags on:

F 2019-11-13T12:40:23.450999975Z {"code":4,"details":"Failed to connect before the deadline","metadata":{"internalRepr":{},"options":{}}} 
E 2019-11-13T12:40:23.450999975Z 2019-11-13T12:40:23.451Z | subchannel | 172.217.203.95:443 IDLE -> TRANSIENT_FAILURE 

@npomfret npomfret changed the title grpc metadata warning in pubsub grpc metadata warning in pubsub & "Failed to connect before the deadline" errors being seen Nov 13, 2019
@bcoe
Copy link
Contributor

bcoe commented Nov 13, 2019

@callmehiphop can you speak to the above ☝️ , what situations can lead to this (I believe you were helping a user with a similar issue last week).

@callmehiphop
Copy link
Contributor

@bcoe this is not the same issue, this looks like the same issue we were having with gRPC previously. That specific error occurs when a timeout occurs while waiting for the gRPC channel to get put into a ready state.

@bcoe
Copy link
Contributor

bcoe commented Nov 14, 2019

@callmehiphop @alexander-fenster I was wondering if there's a parameter that can be tuned to increase the connect timeout.

@npomfret would you be able to provide the detailed debug output when this connection error occurs?

@callmehiphop
Copy link
Contributor

@bcoe there is, but the default value is 5 minutes, so I'm not sure I think that increasing the timeout here is the right solution.

@npomfret
Copy link
Author

would you be able to provide the detailed debug output when this connection error occurs?

I have (above). I tuned on GRPC_TRACE=all and GRPC_VERBOSITY=DEBUG as instructed. The only output I saw in stackdriver that was at a similar time to the error occurring is:

E 2019-11-13T12:40:23.450999975Z 2019-11-13T12:40:23.451Z | subchannel | 172.217.203.95:443 IDLE -> TRANSIENT_FAILURE

All other logging at that time came from my app.

@bcoe
Copy link
Contributor

bcoe commented Nov 15, 2019

@npomfret it sounds like your logs have given @murgatroid99 a hunch, as to what's happening. If you're open to it, we can try floating a patch to you early next week?

CC: @alexander-fenster who I've been working on a process with to facilitate releasing candidate versions to folks.

@npomfret
Copy link
Author

@bcoe happy to help

@npomfret
Copy link
Author

npomfret commented Nov 22, 2019

I've seen 4 of these in the past 24h:

{"code":4,"details":"Failed to connect before the deadline","metadata":{"internalRepr":{},"options":{}}}

I didn't see any logging of interest before of after this time (the TRANSIENT_FAILURE I reported above is not present). But, each of them occurred shortly after (like a second or two) this log entry:

{"name":"my-foo-service","time":"Fri Nov 22 2019 01:18:31 GMT+0000 (Coordinated Universal Time)","result":"pong","id":1574359707215,"v":0,"pid":20360,"error":null,"msg":"","hostname":"frontend-group-vxs9","level":20}

@flunderpero
Copy link

Every time we see this error, the subscription stops working and no new messages are consumed until we restart the POD.

@bcoe
Copy link
Contributor

bcoe commented Nov 22, 2019

@npomfret we've released @google-cloud/pubsub@next (1.1.6-beta.0), which has @grpc/grpc-js@6.12.0, there's a bug fix related to the error you reported.

If you have a chance, try it out? and we'll release more widely after the american holidays?

@npomfret
Copy link
Author

I'll do a release today using @google-cloud/pubsub@next and run it for a few days

@npomfret
Copy link
Author

npomfret commented Nov 24, 2019

I've had two of these in the past 24h with the new code:

{"code":4,"details":"Failed to connect before the deadline","metadata":{"internalRepr":{},"options":{}}}

My app is still running and receiving messages.

I accidentally turned off export GRPC_TRACE=all and export GRPC_VERBOSITY=DEBUG, but will reenable them today.

@npomfret
Copy link
Author

npomfret commented Dec 2, 2019

[update] I've had zero errors since the 24th November.

@bcoe
Copy link
Contributor

bcoe commented Dec 2, 2019

👍 @npomfret that's great to hear, I'll close as soon as we've ensure that all of our libraries are on @grpc/grpc-js@6.12.0.

@npomfret
Copy link
Author

npomfret commented Dec 11, 2019

Sadly I spoke too soon. I upgraded from "@google-cloud/pubsub": "^1.1.6-beta.0" to "@google-cloud/pubsub": "^1.1.6" on the 6th December, and according to my logs the errors came back on the 6th December. At the same time I also upgraded @google-cloud/firestore and @google-cloud/logging to the latest versions, perhaps the problem lies in them? Currently I'm running with:

"@google-cloud/firestore": "^2.6.1",
"@google-cloud/logging": "6.0.0",
"@google-cloud/logging-bunyan": "^2.0.2",
"@google-cloud/pubsub": "^1.1.6",

So no errors from 24th November to the 6th December (on 1.1.6-beta.0), then the upgrade and they came back.

There seem to be 2 types of error:

{"code":4,"details":"Failed to connect before the deadline","metadata":{"internalRepr":{},"options":{}}}

and

{"ackIds":["blah..."],"code":4,"details":"Deadline exceeded","metadata":{"internalRepr":{},"options":{}}}

The 1st error 'Failed to connect before the deadline' (the more common one), was preceded by the following in stackdriver:

2019-12-08T06:46:15.341Z | subchannel | 172.217.193.95:443 IDLE -> TRANSIENT_FAILURE

The 2nd error 'Deadline exceeded' was preceded by the following in stackdriver:

E 2019-12-08T17:50:14.042999982Z 2019-12-08T17:50:14.043Z | subchannel | [2607:f8b0:400c:c0f::5f]:443 IDLE -> TRANSIENT_FAILURE 
E 2019-12-08T17:50:14.042999982Z 2019-12-08T17:50:14.043Z | subchannel | [2607:f8b0:400c:c0f::5f]:443 IDLE -> TRANSIENT_FAILURE 
E 2019-12-08T17:50:14.042999982Z 2019-12-08T17:50:14.043Z | subchannel | [2607:f8b0:400c:c03::5f]:443 IDLE -> TRANSIENT_FAILURE 
E 2019-12-08T17:50:14.042999982Z 2019-12-08T17:50:14.043Z | subchannel | [2607:f8b0:400c:c03::5f]:443 IDLE -> TRANSIENT_FAILURE 
E 2019-12-08T17:50:13.961999893Z 2019-12-08T17:50:13.962Z | connectivity_state | pubsub.googleapis.com:443 CONNECTING -> READY 
E 2019-12-08T17:50:13.961999893Z 2019-12-08T17:50:13.962Z | resolving_load_balancer | pubsub.googleapis.com:443 CONNECTING -> READY 
E 2019-12-08T17:50:13.961999893Z 2019-12-08T17:50:13.962Z | pick_first | CONNECTING -> READY 
E 2019-12-08T17:50:13.961999893Z 2019-12-08T17:50:13.962Z | pick_first | Pick subchannel with address 172.217.203.95:443 
E 2019-12-08T17:50:13.961999893Z 2019-12-08T17:50:13.962Z | subchannel | 172.217.203.95:443 CONNECTING -> READY 
E 2019-12-08T17:50:13.953999996Z 2019-12-08T17:50:13.954Z | connectivity_state | pubsub.googleapis.com:443 CONNECTING -> CONNECTING 
E 2019-12-08T17:50:13.953999996Z 2019-12-08T17:50:13.954Z | resolving_load_balancer | pubsub.googleapis.com:443 CONNECTING -> CONNECTING 
E 2019-12-08T17:50:13.953999996Z 2019-12-08T17:50:13.954Z | pick_first | CONNECTING -> CONNECTING 
E 2019-12-08T17:50:13.953999996Z 2019-12-08T17:50:13.954Z | subchannel | [2607:f8b0:400c:c0f::5f]:443 CONNECTING -> TRANSIENT_FAILURE 
E 2019-12-08T17:50:13.953999996Z 2019-12-08T17:50:13.954Z | connectivity_state | pubsub.googleapis.com:443 CONNECTING -> CONNECTING 
E 2019-12-08T17:50:13.953999996Z 2019-12-08T17:50:13.954Z | resolving_load_balancer | pubsub.googleapis.com:443 CONNECTING -> CONNECTING 
E 2019-12-08T17:50:13.953999996Z 2019-12-08T17:50:13.954Z | pick_first | CONNECTING -> CONNECTING 
E 2019-12-08T17:50:13.953000068Z 2019-12-08T17:50:13.953Z | subchannel | [2607:f8b0:400c:c0f::5f]:443 IDLE -> CONNECTING 
E 2019-12-08T17:50:13.953000068Z 2019-12-08T17:50:13.953Z | connectivity_state | pubsub.googleapis.com:443 CONNECTING -> CONNECTING 
E 2019-12-08T17:50:13.953000068Z 2019-12-08T17:50:13.953Z | resolving_load_balancer | pubsub.googleapis.com:443 CONNECTING -> CONNECTING 
E 2019-12-08T17:50:13.953000068Z 2019-12-08T17:50:13.953Z | pick_first | CONNECTING -> CONNECTING 
E 2019-12-08T17:50:13.953000068Z 2019-12-08T17:50:13.953Z | subchannel | 172.217.203.95:443 IDLE -> CONNECTING 
E 2019-12-08T17:50:13.953000068Z 2019-12-08T17:50:13.953Z | connectivity_state | pubsub.googleapis.com:443 IDLE -> CONNECTING 
E 2019-12-08T17:50:13.953000068Z 2019-12-08T17:50:13.953Z | resolving_load_balancer | pubsub.googleapis.com:443 IDLE -> CONNECTING 
E 2019-12-08T17:50:13.953000068Z 2019-12-08T17:50:13.953Z | pick_first | IDLE -> CONNECTING 
E 2019-12-08T17:50:13.953000068Z 2019-12-08T17:50:13.953Z | pick_first | Start connecting to subchannel with address 172.217.203.95:443 
E 2019-12-08T17:50:13.953000068Z 2019-12-08T17:50:13.953Z | pick_first | Connect to address list 172.217.203.95:443,[2607:f8b0:400c:c0f::5f]:443 
E 2019-12-08T17:50:13.953000068Z 2019-12-08T17:50:13.953Z | connectivity_state | pubsub.googleapis.com:443 CONNECTING -> IDLE 
E 2019-12-08T17:50:13.953000068Z 2019-12-08T17:50:13.953Z | resolving_load_balancer | pubsub.googleapis.com:443 CONNECTING -> IDLE 
E 2019-12-08T17:50:13.953000068Z 2019-12-08T17:50:13.953Z | pick_first | IDLE -> IDLE 
E 2019-12-08T17:50:13.953000068Z 2019-12-08T17:50:13.953Z | dns_resolver | Resolved addresses for target pubsub.googleapis.com:443: 172.217.203.95:443,[2607:f8b0:400c:c0f::5f]:443 
E 2019-12-08T17:50:13.948999881Z 2019-12-08T17:50:13.949Z | connectivity_state | pubsub.googleapis.com:443 CONNECTING -> CONNECTING 
E 2019-12-08T17:50:13.948999881Z 2019-12-08T17:50:13.949Z | resolving_load_balancer | pubsub.googleapis.com:443 CONNECTING -> CONNECTING 
E 2019-12-08T17:50:13.948999881Z 2019-12-08T17:50:13.949Z | connectivity_state | pubsub.googleapis.com:443 IDLE -> CONNECTING 
E 2019-12-08T17:50:13.948999881Z 2019-12-08T17:50:13.949Z | resolving_load_balancer | pubsub.googleapis.com:443 IDLE -> CONNECTING 
E 2019-12-08T17:50:13.948999881Z 2019-12-08T17:50:13.949Z | dns_resolver | Resolution update requested for target pubsub.googleapis.com:443 
E 2019-12-08T17:50:13.634999990Z 2019-12-08T17:50:13.635Z | connectivity_state | pubsub.googleapis.com:443 READY -> IDLE 
E 2019-12-08T17:50:13.634999990Z 2019-12-08T17:50:13.635Z | resolving_load_balancer | pubsub.googleapis.com:443 READY -> IDLE 
E 2019-12-08T17:50:13.634000062Z 2019-12-08T17:50:13.634Z | pick_first | READY -> IDLE 
E 2019-12-08T17:50:13.634000062Z 2019-12-08T17:50:13.634Z | subchannel | 173.194.218.95:443 READY -> IDLE 
E 2019-12-08T17:50:13.608999967Z 2019-12-08T17:50:13.609Z | subchannel | [2607:f8b0:400c:c0f::5f]:443 CONNECTING -> TRANSIENT_FAILURE 
E 2019-12-08T17:50:13.608000040Z 2019-12-08T17:50:13.608Z | subchannel | [2607:f8b0:400c:c0f::5f]:443 TRANSIENT_FAILURE -> CONNECTING 
E 2019-12-08T17:50:13.506999969Z 2019-12-08T17:50:13.507Z | subchannel | [2607:f8b0:400c:c15::5f]:443 CONNECTING -> TRANSIENT_FAILURE 
E 2019-12-08T17:50:13.506000041Z 2019-12-08T17:50:13.506Z | subchannel | [2607:f8b0:400c:c15::5f]:443 TRANSIENT_FAILURE -> CONNECTING 

... all within about 1 second before the error.

@bcoe
Copy link
Contributor

bcoe commented Dec 12, 2019

@murgatroid99 ☝️ does anything jump out at you in these logs, is there anything additional that @npomfret can provide to help you debug.

@npomfret npomfret changed the title grpc metadata warning in pubsub & "Failed to connect before the deadline" errors being seen "Failed to connect before the deadline" errors being seen Jan 13, 2020
@google-cloud-label-sync google-cloud-label-sync bot added the api: pubsub Issues related to the googleapis/nodejs-pubsub API. label Jan 30, 2020
@npomfret
Copy link
Author

npomfret commented Feb 4, 2020

I'm still getting these with the latest release, although it has to be said they do appear to be reducing in frequency (2 in the past 4 days, before it was more like 4 per day on my system).

Using...

"@google-cloud/firestore": "^3.4.1",
"@google-cloud/logging": "7.1.0",
"@google-cloud/logging-bunyan": "^2.0.3",
"@google-cloud/pubsub": "^1.4.1",

@npomfret
Copy link
Author

I may be speaking too soon, but it looks like the latest release may have fixed this issue.

Recent releases seem to have steadily reduced the number of errors, but until recently I was getting one or two errors per day. I've had zero for five days now. The upgrade in question took my deps from:

"@google-cloud/firestore": "^3.4.1",
"@google-cloud/logging": "7.1.0",
"@google-cloud/logging-bunyan": "^2.0.3",
"@google-cloud/pubsub": "^1.4.1",

to ..

"@google-cloud/firestore": "^3.5.0",
"@google-cloud/logging": "7.1.0",
"@google-cloud/logging-bunyan": "^2.0.3",
"@google-cloud/pubsub": "^1.5.0",

@npomfret
Copy link
Author

Closing this as no errors have occurred in my system for 9 days now.

@ZHamburglar
Copy link

{"code":4,"details":"Failed to connect before the deadline","metadata":{"internalRepr":{},"options":{}},"level":"error","message":"GCP subscription errorFailed to connect to channel. Reason: Failed to connect before the deadline","stack":"Error: Failed to connect to channel. Reason: Failed to connect before the deadline\n at MessageStream._waitForClientReady (/kw/node_modules/@google-cloud/pubsub/build/src/message-stream.js:318:19)\n at process._tickCallback (internal/process/next_tick.js:68:7)"}

Just wanted to jump in and say that I was having this issue previously as well. The issue arose from the GRPC Subscriber Client not being the same as ProjectID the Subscription Path. Once these two were set as the same, then the issue cleared up. In case anyone Googling this issue finds it, hopefully this can help them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the googleapis/nodejs-pubsub API. external This issue is blocked on a bug with the actual product. priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests

5 participants