Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream: Removed #892

Closed
crivera opened this issue Feb 20, 2020 · 30 comments
Closed

Stream: Removed #892

crivera opened this issue Feb 20, 2020 · 30 comments
Assignees
Labels
api: pubsub Issues related to the googleapis/nodejs-pubsub API. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.

Comments

@crivera
Copy link

crivera commented Feb 20, 2020

Environment details

  • OS: node slim via docker running in GKE
  • Node.js version: 13.3.0
  • npm version: 6.3
  • @google-cloud/pubsub version: 1.5.0
  • grpc version: 1.24.2
  • google gax version: 1.14.1

Steps to reproduce

Randomly we just got an error from our connection to pubsub

message: "Error: Stream removed
at MessageStream._onEnd (/app/node_modules/@google-cloud/pubsub/build/src/message-stream.js:234:26)
at ClientDuplexStream. (/app/node_modules/@google-cloud/pubsub/build/src/message-stream.js:274:43)
at Object.onceWrapper (events.js:411:28)
at ClientDuplexStream.emit (events.js:317:22)
at endReadableNT (_stream_readable.js:1215:12)
at processTicksAndRejections (internal/process/task_queues.js:84:21)"

this also happens locally with the simulator

please help

@product-auto-label product-auto-label bot added the api: pubsub Issues related to the googleapis/nodejs-pubsub API. label Feb 20, 2020
@yoshi-automation yoshi-automation added the triage me I really want to be triaged. label Feb 21, 2020
@feywind feywind added priority: p2 Moderately-important priority. Fix may not be included in next release. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. and removed triage me I really want to be triaged. labels Feb 25, 2020
@crivera
Copy link
Author

crivera commented Feb 26, 2020

not sure if this helps but locally i see when using the simulator

09:58:46:160 ERROR Received RST_STREAM with error code 8 (node_modules/@taxfyle/backend-commons/lib/connections/pubsub.js:197)
→ Error: Received RST_STREAM with error code 8
→ at MessageStream._onEnd (/node_modules/@google-cloud/pubsub/src/message-stream.ts:299:20)
→ at ClientDuplexStream. (/node_modules/@google-cloud/pubsub/src/message-stream.ts:342:37)
→ at Object.onceWrapper (events.js:308:28)
→ at ClientDuplexStream.emit (events.js:224:7)
→ at endReadableNT (_stream_readable.js:1206:12)
→ at processTicksAndRejections (internal/process/task_queues.js:84:21)

@crivera
Copy link
Author

crivera commented Feb 27, 2020

this has been happening now nightly... any update?

@feywind
Copy link
Collaborator

feywind commented Feb 27, 2020

@crivera Sorry for the silence - I've been watching another issue to see if it bore fruit. Can you try the instructions in this comment and see if it helps? #890 (comment) We've been tracking some issues related to the streams reconnecting on subscriptions.

@crivera
Copy link
Author

crivera commented Feb 27, 2020

Tried by switch back over to grpc-js:latest and now i get the same issue as in #890 that my worker gets killed without any information

@feywind
Copy link
Collaborator

feywind commented Feb 28, 2020

The same issue in that there's a spike in CPU and then it's killed after a while without any log output?

Can you paste in some more details about how you're initializing the PubSub object, any changed parameters, etc...? I haven't seen the exact problem you're describing above, but maybe we can make a repro that way.

@crivera
Copy link
Author

crivera commented Feb 28, 2020

Yes we see a little spike in CPU (but that could also be from restarting) and yes no log output. It seems its blocking the whole node thread since the "health" endpoint becomes unavailable and k8s restarts the pod

We do a very standard project setup:

const pubSub = new PubSub({
    projectId: env.GCLOUD_PROJECT
})

"@google-cloud/pubsub": "^1.5.0",
"@grpc/grpc-js": "^0.6.18",
"google-gax": "^1.14.1",

running on node 13.

Nothing has changed

I am able to remove the problem when we go back to before the upgrade at pubsub v 1.1.14

@airburst
Copy link

airburst commented Mar 11, 2020

I am seeing the same issue. It started around 5 days ago and now my alerting service, which subscribes to a couple of pubsub topics, dies within 24 hours with the error below:

error: uncaughtException: Stream removed
Error: Stream removed
at g._onEnd (/usr/app/dist/index.js:341:2283)
at ClientDuplexStream. (/usr/app/dist/index.js:341:2535)
at Object.onceWrapper (events.js:288:20)
at ClientDuplexStream.emit (events.js:205:15)
at endReadableNT (_stream_readable.js:1137:12)
at processTicksAndRejections (internal/process/task_queues.js:84:9) {"error":{"code":2,"details":"Stream removed","metadata":{"_internal_repr":{},"flags":0}},"stack":"Error: Stream removed\n at g._onEnd (/usr/app/dist/index.js:341:2283)\n at ClientDuplexStream. (/usr/app/dist/index.js:341:2535)\n at Object.onceWrapper (events.js:288:20)\n at ClientDuplexStream.emit (events.js:205:15)\n at endReadableNT (_stream_readable.js:1137:12)\n at processTicksAndRejections (internal/process/task_queues.js:84:9)","exception":true,"date":"Wed Mar 11 2020 05:10:20 GMT+0000 (Coordinated Universal Time)","process":{"pid":1,"uid":0,"gid":0,"cwd":"/usr/app","execPath":"/usr/bin/node","version":"v12.3.1","argv":["/usr/bin/node","/usr/app/dist/index.js"],"memoryUsage":{"rss":105455616,"heapTotal":36888576,"heapUsed":27265872,"external":2971343}},"os":{"loadavg":[0.04296875,0.0205078125,0.0009765625],"uptime":28389148},"trace":[{"column":2283,"file":"/usr/app/dist/index.js","function":"g._onEnd","line":341,"method":"_onEnd","native":false},{"column":2535,"file":"/usr/app/dist/index.js","function":null,"line":341,"method":null,"native":false},{"column":20,"file":"events.js","function":"Object.onceWrapper","line":288,"method":"onceWrapper","native":false},{"column":15,"file":"events.js","function":"ClientDuplexStream.emit","line":205,"method":"emit","native":false},{"column":12,"file":"_stream_readable.js","function":"endReadableNT","line":1137,"method":null,"native":false},{"column":9,"file":"internal/process/task_queues.js","function":"processTicksAndRejections","line":84,"method":null,"native":false}],"timestamp":"2020-03-11T05:10:20.993Z"}

I am using @google-cloud/pubsub at v1.6.0 and following #825 have included the standalone grpc lib. I then instantiate the connection like:

import { PubSub } from '@google-cloud/pubsub';
import grpc from 'grpc';

...
class CloudPubSub {
  constructor(projectId) {
    this.projectId = projectId;
    // this.pubsub = new PubSub();
    this.pubsub = new PubSub({ grpc }); // Suggested fix for grpc bug
  }
 ...

The actual error is thrown inside the @google/pubsub message-stream.js file (_onEnd function).

I have tried to add subscription.on("end", () => {}) handlers, but this event does not appear to bubble up to the subscriber.

Not sure how and where to catch it; I would like to re-subscribe on end.

@Gusten
Copy link

Gusten commented Mar 12, 2020

We've started to receive this in our production environment as well and it completely halts the services from receiving messages which is quite the annoyance.
We are using @google-cloud/pubsub 1.6.0 and we are also using the solution in #825 for injecting grpc manually.

We received this at the exact same time across multiple topics and subscriptions and server instances running a GKE cluster.

@jperasmus
Copy link

jperasmus commented Mar 17, 2020

@airburst

We're seeing the same error. For now, we catch the error by listening to the error event on the subscription and then just re-subscribe everything. It seems to work for a bit until the error comes back.

Simplified example:

const setupSubscription = () => {
  const subscription = // init PubSub subsription

  subscription.on('message', messageHandler);
  subscription.once('error', error => {
    subscription.off('message', messageHandler);
    setupSubscription();
  });
}

Edit: we're also using the older C++ grpc client and not the default grpc-js version since we had problems with that.

@airburst
Copy link

Thanks @jperasmus

I took the less valiant approach and moved the container onto a k8s cluster. Now the entire container will restart - and resubscribe - if it fails. Luckily it hasn't experienced the dropout since I moved host; quite possibly because everything is on GCP.

@jperasmus
Copy link

Okay, cool. Glad it is mitigated for you. We're actually using Google App Engine Flexible Environment and still getting the issue, so we're still experiencing the issue while on GCP.

@airburst
Copy link

I actually plan to change the architecture. This is not the first time that I have had issues with a long subscription in a micro service.

I plan to take the subscription out, leaving an API. Then write a GCF to trigger on the topic and post messages to the API.

@carmelid
Copy link

I plan to take the subscription out, leaving an API. Then write a GCF to trigger on the topic and post messages to the API.

Oh my... I was hoping to avoid that work around

We're manually injecting grpc@1.24.2 into @google-cloud/pubsub@1.6.0 and are also experiencing the issues described above. I haven't seen the issue before bumping from @google-cloud/pubsub@1.1.5 on March 13th, but I don't know if that is a coincidence or not though...

@rvillane
Copy link

rvillane commented Apr 1, 2020

My environment is exactly the same as reported by @carmelid and can confirm also experiencing the issue described here

@bduclaux
Copy link

I get the same "Stream removed" error while using a connection between a C++ app inside a GKE pod and a Bigtable backend, using the Google Cloud C++ stack (which uses the GRPC underlying stack) :

E0413 20:30:28.996674196       6 ssl_transport_security.cc:483] Corruption detected.
E0413 20:30:28.996708332       6 ssl_transport_security.cc:459] error:14187180:SSL routines:ssl_do_config:bad value
E0413 20:30:28.996714577       6 secure_endpoint.cc:208]     Decryption error: TSI_DATA_CORRUPTED

The error happens quite frequently (every hour). Using GKE 1.16.8-gke.8 and gGRPC v1.26.0.
The "Stream removed" error is definitely not specific to NodeJS / pubsub.

@feywind
Copy link
Collaborator

feywind commented Apr 13, 2020

Sorry this is still causing a problem for you all. Let me check with our gRPC guru, especially if it's happening across multiple libraries. @murgatroid99, any thoughts?

@murgatroid99
Copy link

My recommendation is to stop using the C++-based grpc library. I believe that the previously referenced grpc-js issues have been fixed.

@rvillane
Copy link

rvillane commented Apr 15, 2020

@murgatroid99 on recent @google-cloud/pubsub issues the advice was exactly the opposite, I actually had to start using the C++ gRPC binding in order to stabilize my production environment that heavily depends on Google Pub/Sub. What is the the level of confidence that grpc-js is now stable? can it be considered production ready?

to me is a bit worrisome that gax-nodejs dependency is no longer pinning grpc-js

https://github.com/googleapis/gax-nodejs/blob/master/package.json#L17

@murgatroid99
Copy link

When there are bugs in grpc-js, switching to the C++-based grpc library can be an effective way to work around the bug, but that does not mean that doing so is necessary or valuable in the long term. I believe grpc-js is sufficiently stable, but as this issue demonstrates, even an older and more tested library like grpc has bugs occasionally.

Regarding the pinning, that was done previously done in response to some bugs in grpc-js, but it makes it more difficult to get bug fixes out to users.

@jperasmus
Copy link

What is more concerning is that both the PubSub and Firebase SDK libraries are using the GRPC JS version by default while the package's own readme says that it is still incomplete and experimental.

grpc-js

@murgatroid99
Copy link

@jperasmus I have some good news for you: we published grpc-js version 1.0 today and that version is no longer considered "incomplete" or "experimental". The client libraries should be able to pick that version up soon.

@jperasmus
Copy link

That is excellent news, thanks @murgatroid99

@feywind
Copy link
Collaborator

feywind commented Apr 21, 2020

At this point we're basically held up by some versioning gotchas with the other libraries and some GCP services. :S I'm working on a way to work around that right now.

@feywind
Copy link
Collaborator

feywind commented May 5, 2020

This versioning issue should be sorted out next week when pubsub 2.0 is expected to be released.

@feywind
Copy link
Collaborator

feywind commented May 19, 2020

The Pub/Sub library 2.0 is still not released yet, due to a confluence of factors, but I'm crossing my fingers that we can do it this week.

@feywind
Copy link
Collaborator

feywind commented May 21, 2020

@google-cloud/pubsub 2.0.0 is now released, so if you're on Node 10+, it might be worth trying that to see if the newer versions of gRPC/gax help anything.

@feywind
Copy link
Collaborator

feywind commented Jul 21, 2020

The newer grpc-js has been pulled in for a while now, and this issue has gone quiet. Is anyone here still having the issue?

@AlexWellsHS
Copy link

@feywind We just implemented the pubsub client recently, running into the error:
Failed to "modifyAckDeadline" for 1 message(s). Reason: 13 INTERNAL: Received RST_STREAM with code 2 (Internal server error). came across this issue looking for answers about what is happening.

@mnahkies
Copy link

mnahkies commented Aug 6, 2020

@feywind we're seeing this intermittently as well

Version: @google/pubsub@v2.1.0

Error null.<anonymous>(@google-cloud.pubsub.build.src:message-queues)
errorFailed to "modifyAckDeadline" for 2 message(s). Reason: 13 INTERNAL: Received RST_STREAM with code 2

Occurred this morning at 2020-08-06T08:07:52.189844Z, but one thing that might be interesting to you is that another independent service running in the same k8 cluster had the same error (different topics/subscriptions) at 2020-08-06T08:07:30.226722Z

@yoshi-automation yoshi-automation added the 🚨 This issue needs some love. label Aug 18, 2020
@feywind
Copy link
Collaborator

feywind commented Sep 3, 2020

I could've sworn I responded to this... There is now a 2.5.0 for @google-cloud/pubsub that pulls in a newer gax and grpc-js, and some others have had luck with those versions. Not a whole lot has changed between 2.1.0 and that one, so you might give that a try?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: pubsub Issues related to the googleapis/nodejs-pubsub API. priority: p2 Moderately-important priority. Fix may not be included in next release. 🚨 This issue needs some love. type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Projects
None yet
Development

No branches or pull requests