Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ECONNRESET [{"message":"14 UNAVAILABLE: read ECONNRESET"}] #1907

Open
Arun-KumarH opened this issue Sep 19, 2021 · 33 comments
Open

ECONNRESET [{"message":"14 UNAVAILABLE: read ECONNRESET"}] #1907

Arun-KumarH opened this issue Sep 19, 2021 · 33 comments

Comments

@Arun-KumarH
Copy link

Arun-KumarH commented Sep 19, 2021

Problem description

We experience intermittent connection reset errors migrating to grpc-js version "1.3.2". We are creating the Grpc client instance every time when message is processed.
On the client side we see the stack trace ECONNRESET [{"message":"14 UNAVAILABLE: read ECONNRESET"}] and no log messages on the server side although GRPC_TRACE=all and GRPC_VERBOSITY=DEBUG on server side as well.
We have attached the debug logs seen on the client side. This issue is seen in our production k8s environment and it is intermittent.

Environment

  • Node version 16.9.0
  • Package name and version @grpc/grpc-js: 1.3.2 and @grpc/proto-loader: 0.6.2

Additional context

Client Logging:

2021-09-19T16:53:33.253Z | call_stream | [79] write() called with message of length 34
2021-09-19T16:53:33.255Z | call_stream | [79] end() called
2021-09-19T16:53:33.258Z | call_stream | [79] deferring writing data chunk of length 39
2021-09-19T16:53:33.263Z | dns_resolver | Resolved addresses for target dns:x-des:50051: [x.x.x.x:50051]
2021-09-19T16:53:33.264Z | pick_first | Connect to address list x.x.x.x:50051
2021-09-19T16:53:33.264Z | subchannel_refcount | x.x.x.x:50051 refcount 4 -> 5
2021-09-19T16:53:33.264Z | pick_first | Pick subchannel with address x.x.x.x:50051
2021-09-19T16:53:33.264Z | pick_first | IDLE -> READY
2021-09-19T16:53:33.264Z | resolving_load_balancer | dns:x-des:50051 CONNECTING -> READY
2021-09-19T16:53:33.264Z | channel | callRefTimer.unref | configSelectionQueue.length=1 pickQueue.length=0
2021-09-19T16:53:33.265Z | connectivity_state | dns:x-des:50051 CONNECTING -> READY
2021-09-19T16:53:33.265Z | subchannel_refcount | x.x.x.x:50051 refcount 5 -> 6
2021-09-19T16:53:33.265Z | subchannel_refcount | x.x.x.x:50051 refcount 6 -> 5
2021-09-19T16:53:33.265Z | channel | Pick result: COMPLETE subchannel: x.x.x.x:50051 status: undefined undefined
2021-09-19T16:53:33.266Z | call_stream | Starting stream on subchannel x.x.x.x:50051 with headers
                grpc-timeout: 99946m
                grpc-accept-encoding: identity,deflate,gzip
                accept-encoding: identity
                :authority: x-des:50051
                user-agent: grpc-node-js/1.3.7
                content-type: application/grpc
                :method: POST
                :path: /io.xingular.device.Service/Read
                te: trailers

2021-09-19T16:53:33.266Z | call_stream | [79] attachHttp2Stream from subchannel x.x.x.x:50051
2021-09-19T16:53:33.266Z | subchannel_refcount | x.x.x.x:50051 callRefcount 0 -> 1
2021-09-19T16:53:33.266Z | call_stream | [79] sending data chunk of length 39 (deferred)
2021-09-19T16:53:33.266Z | call_stream | [79] calling end() on HTTP/2 stream
2021-09-19T16:53:33.273Z | call_stream | [79] Node error event: message=read ECONNRESET code=ECONNRESET errno=Unknown system error -104 syscall=read
2021-09-19T16:53:33.273Z | subchannel | x.x.x.x:50051 connection closed with error read ECONNRESET
2021-09-19T16:53:33.274Z | subchannel | x.x.x.x:50051 connection closed
2021-09-19T16:53:33.274Z | subchannel | x.x.x.x:50051 READY -> IDLE
2021-09-19T16:53:33.274Z | subchannel_refcount | x.x.x.x:50051 refcount 5 -> 4
2021-09-19T16:53:33.274Z | pick_first | READY -> IDLE
2021-09-19T16:53:33.275Z | resolving_load_balancer | dns:x-des:50051 READY -> IDLE
2021-09-19T16:53:33.275Z | connectivity_state | dns:x-des:50051 READY -> IDLE
2021-09-19T16:53:33.275Z | subchannel_refcount | x.x.x.x:50051 refcount 4 -> 3
2021-09-19T16:53:33.275Z | pick_first | READY -> IDLE
2021-09-19T16:53:33.275Z | resolving_load_balancer | dns:x-des:50051 READY -> IDLE
2021-09-19T16:53:33.275Z | connectivity_state | dns:x-des:50051 READY -> IDLE
2021-09-19T16:53:33.275Z | subchannel_refcount | x.x.x.x:50051 refcount 3 -> 2
2021-09-19T16:53:33.276Z | pick_first | READY -> IDLE
2021-09-19T16:53:33.276Z | resolving_load_balancer | dns:x-des:50051 READY -> IDLE
2021-09-19T16:53:33.276Z | connectivity_state | dns:x-des:50051 READY -> IDLE
2021-09-19T16:53:33.276Z | subchannel_refcount | x.x.x.x:50051 refcount 2 -> 1
2021-09-19T16:53:33.276Z | pick_first | READY -> IDLE
2021-09-19T16:53:33.276Z | resolving_load_balancer | dns:x-des:50051 READY -> IDLE
2021-09-19T16:53:33.276Z | connectivity_state | dns:x-des:50051 READY -> IDLE
2021-09-19T16:53:33.277Z | call_stream | [79] HTTP/2 stream closed with code 2
2021-09-19T16:53:33.277Z | call_stream | [79] ended with status: code=14 details="read ECONNRESET"
2021-09-19T16:53:33.278Z | subchannel_refcount | x.x.x.x:50051 callRefcount 1 -> 0
^[[31merror^[[39m: 2021-09-19T16:53:33.278Z: Error serving unary request 14 UNAVAILABLE: read ECONNRESET [{"message":"14 UNAVAILABLE: read ECONNRESET"}]
@murgatroid99
Copy link
Member

Sometimes connections get dropped for reasons outside of the library's control. Just retry the call when that happens. The client will reconnect when you do.

In addition, you say "We are creating the Grpc client instance every time when message is processed." We recommend not doing that. You can just use one client for all of your requests. Using multiple clients isn't even accomplishing anything for you right now. They're all using the same underlying connection anyway.

@gillsoftab
Copy link

Try to downgrade to Node version 16.8.0 and see if that helps. We tested Node version 16.9.0 and 16.9.1 with no luck. Downgrading to Node version 16.8.0 works without any problems.

@vanthome
Copy link

@gillsoftab what exact issues did you have with recent node versions?

@gillsoftab
Copy link

gillsoftab commented Sep 21, 2021

Node version 16.9.0 and 16.9.1 same as described in nodejs/node#39683
we also detected issues with an app that is using template (node add-on in c++).

@Jaypov
Copy link

Jaypov commented Sep 24, 2021

Sometimes connections get dropped for reasons outside of the library's control. Just retry the call when that happens. The client will reconnect when you do.

In addition, you say "We are creating the Grpc client instance every time when message is processed." We recommend not doing that. You can just use one client for all of your requests. Using multiple clients isn't even accomplishing anything for you right now. They're all using the same underlying connection anyway.

I am having the same issue as this, i am in the process of downgrading my docker containers to use node 16.8.0 as suggested. However it is probably a good idea to have retires.

could anyone let me know if there is a way to do this inside @grpc/grpc-js config options. looking a GRPC .net library i can see there is a MaxAttempts = 5 param. does anyone know if there is something similar in @grpc/grpc-js library?

all i can see in this library is "grpc.initial_reconnect_backoff_ms" under channelOptions so my assumption would be it does retries by default, however it does not seem like this is the case

@murgatroid99
Copy link
Member

all i can see in this library is "grpc.initial_reconnect_backoff_ms" under channelOptions so my assumption would be it does retries by default, however it does not seem like this is the case

Connections are separate things for requests. This is one of the options that controls how the channel re-establishes connections after they are dropped. The @grpc/grpc-js library does not currently have any built-in support for doing requests automatically, so you will need to do that yourself. For example, you could use an interceptor, as described in the interceptor spec's examples section.

@Jaypov
Copy link

Jaypov commented Sep 26, 2021

Thanks for the info and the link. Just an FYI to the rest of thee thread, downgrading node to 16.8.0 did not resolve the ECONNRESET [{"message":"14 UNAVAILABLE: read ECONNRESET"}] issue for me. It is still happening sporadically. What i am finding odd is when developing locally, i never get this error (i develop locally with with node 16.8.0). When i deploy my services with docker and docker swarm, this is when i get the issue. I have 9 services and i only get this issue with 2 of them, i am yet to see it happen with any of the others

i am using@grpc/grpc-js: 1.3.7

@Jaypov
Copy link

Jaypov commented Sep 27, 2021

I am using node 16.8.0 and i have downgrade too

    "@grpc/grpc-js": "1.2.1",
    "@grpc/proto-loader": "0.5.1",

and now the ECONNRESET [{"message":"14 UNAVAILABLE: read ECONNRESET"}] has seemed to have disappear for me.

@murgatroid99
Copy link
Member

One of the 1.3.x releases changed how ECONNRESET errors were reported. In 1.2.x, they had the error code INTERNAL instead of UNAVAILABLE. Are you getting any INTERNAL errors with your new setup?

@Jaypov
Copy link

Jaypov commented Sep 28, 2021

Ah damn, yes i am, it has just happend sporadically again. do you think it could be an issue with NestJs (the framework i use). It uses grpc-js under the hood. https://docs.nestjs.com/microservices/grpc

I manage the implement a retry interceptor as shown HERE However it hangs. The if statement condition will resolve to true and retry will be called, but then it will just hang. no response from onReceiveMessage or onReceiveStatus

          if (response.code !== status.OK) {
            console.log('CONNECTION FAILED: retry... ' + retries);
            retry(savedSendMessage, savedMetadata);
          } else {
            console.log('ALL GOOD');
            savedMessageNext(savedReceiveMessage);
            next(response);
          } 

At this point, there seems to be no quick fix solution. not sure if it is a library issue or a much deeper issue.

@murgatroid99
Copy link
Member

ECONNRESET is a relatively low-level network error. It's unlikely that it's caused by the framework.

You might want to try this version of the retry interceptor, which was actually tested. Just remove the code that references the registry, because that's there for the test.

@Jaypov
Copy link

Jaypov commented Sep 28, 2021

That makes sense, thanks i'll give it a try. Thanks for all the help, appreciate it, been pulling my hair out.

@Jaypov
Copy link

Jaypov commented Sep 29, 2021

Hey, so i have been poking around a lot today and whilst i do not have an answer i have some findings.

I use docker-compose to bring up all the microservice inside docker containers. When I run these containers on my local machine all the microservices work fine (never get [{"message":"14 UNAVAILABLE: read ECONNRESET"}])`

However when running them on my development network (AWS EC2 instance with all the containers on the same 1 VM) I do sporadically get the [{"message":"14 UNAVAILABLE: read ECONNRESET"}])`. It seems that I get the error randomly, then when i make a call again it will work multiple times. Then if i wait some time I will get the error again.

I enabled grpc-js debugging on the development network and i noticed this SSL error 150] Node error event: message=read ECONNRESET code=ECONNRESET errno=Unknown system error -104 syscall=read

This lead me to ssh'ing into the docker containers on my development network. and using this command line tool to try and connect to one microservice from another. https://github.com/fullstorydev/grpcurl

I found that if i send a message using -plaintext it successfully connects to the microservice every single time. If I use TLS I get first record does not look like a TLS handshake (which is an error from the command line tool).

just out of curiosity do we know if grpc-js has some sort of fallback for if TLS fails?

@murgatroid99
Copy link
Member

just out of curiosity do we know if grpc-js has some sort of fallback for if TLS fails?

That's not a thing, in general. If you instruct the client to connect over TLS, it will connect over TLS, or not all.

If the problem is that the response you're getting is not a TLS handshake, are you sure that you are connecting to a TLS server? The port that is serving the plaintext service can't also be serving TLS; are you sure you are connecting to the correct TLS server port when trying to make a request using TLS?

@Jaypov
Copy link

Jaypov commented Sep 30, 2021

I am unsure, the instantiation of the grpc-js client happens under the hood in the nest-js framework. ill dig into the code and take a look.

on another note, i think this issue may possibly be cause by docker swarm, at this point i am just going through a process of elimination. I have booted a new EC2 instance and installed all of my containers with just docker-compose and the error has disappeared. I am just in the process of booting up another instance, installing with docker compose and then putting a load balancer in front of the 2 instances (this is my current setup but without swarm) to see if the issues comes back.

@murgatroid99
Copy link
Member

The questions I asked are not even necessarily about grpc-js. You said

This lead me to ssh'ing into the docker containers on my development network. and using this command line tool to try and connect to one microservice from another. https://github.com/fullstorydev/grpcurl

I found that if i send a message using -plaintext it successfully connects to the microservice every single time. If I use TLS I get first record does not look like a TLS handshake (which is an error from the command line tool).

What ports did you connect to for these two tests? On the server you connected to, what port is serving in plaintext and what port is serving in TLS?

@brunor2
Copy link

brunor2 commented Oct 27, 2021

Hey @Jaypov, I just want to know if you were able to fix it with that change in docker.

Thanks!

@Siddhesh-Swami
Copy link

Siddhesh-Swami commented Nov 21, 2021

We also face the same issue. Sparingly getting the "message":"14 UNAVAILABLE: read ECONNRESET".

some observations by me :
Client Env:
Nodejs version:14.16.1
@grpc/grpc-js: 1.3.8 and @grpc/proto-loader: 0.5.6

in some clients:
Nodejs version:14.16.1
@grpc/grpc-js: 1.4.4 and @grpc/proto-loader: 0.5.6

Server Env:
Nodejs version:14.16.1
@grpc/grpc-js: 1.4.4 and @grpc/proto-loader: 0.5.6

experience the "message":"14 UNAVAILABLE: read ECONNRESET". issue sparingly.


But in one microservice client:
the client we use nestjs :
Nodejs version:14.16.1
+-- @grpc/proto-loader@0.5.6
+-- grpc@1.24.6

Server:
Nodejs version:14.16.1
@grpc/grpc-js: 1.4.4 and @grpc/proto-loader: 0.5.6

never seen read ECONNRESET once in the logs.


@Siddhesh-Swami
Copy link

migrated all nodejs services to:
@grpc/grpc-js: 1.2.4 and @grpc/proto-loader: 0.5.6
but still face this issue

@murgatroid99
Copy link
Member

To new commenters, see my first response on this issue:

Sometimes connections get dropped for reasons outside of the library's control. Just retry the call when that happens. The client will reconnect when you do.

@tomasholicky
Copy link

tomasholicky commented Dec 8, 2021

We have same problem.

@murgatroid99 Do you have plan to implement interceptor (or something different) in next releases to grpc-js?

@murgatroid99
Copy link
Member

The client interceptor API has been available in grpc-js since before the 1.0 release.

@Siddhesh-Swami
Copy link

Hello, opened a new issue for read ECONNRESET. #1994

@Arun-KumarH
Copy link
Author

#2115 we are using the retry interceptor and it works for Unary requests, but for client streaming after the first chunk is sent the connection is closed and we receive back grpc response.
We followed this version of interceptor

@Arun-KumarH
Copy link
Author

#2115 This is a pressing issue for us, any updates how we can use the retry interceptor for streaming use cases ?

@bastienlemaitre
Copy link

Still no answer ?

1 similar comment
@railsonluna
Copy link

Still no answer ?

@felipe-huszar
Copy link

felipe-huszar commented Sep 28, 2022

Still no answer?
We are also having this issue, its quite frustrating because we cant replicate it

@rochecompaan
Copy link

I'm running into the same issue with streaming requests, but I have additional info that might help.
I'm specifically using the retry interceptor which closely matches the one from the tests in implementation.

I see the following traceback for streaming requests:

Trace: Request path: /object_attributes.object_attributes_grpc/ListAssortmentObjectAttributes; response status code: 0; not eligible for retry.
    at Object.onReceiveStatus (/workspace/atlas/common/lib/client/interceptors.ts:135:41)
    at InterceptingListenerImpl.onReceiveStatus (/workspace/atlas/node_modules/@grpc/grpc-js/src/call-stream.ts:206:19)
    at Object.onReceiveStatus (/workspace/atlas/node_modules/@grpc/grpc-js/src/client-interceptors.ts:424:48)
    at /workspace/atlas/node_modules/@grpc/grpc-js/src/call-stream.ts:334:24
    at processTicksAndRejections (node:internal/process/task_queues:78:11)
savedMessageNext: undefined
TypeError: savedMessageNext is not a function
    at Object.onReceiveStatus (/workspace/atlas/common/lib/client/interceptors.ts:141:33)
    at InterceptingListenerImpl.onReceiveStatus (/workspace/atlas/node_modules/@grpc/grpc-js/src/call-stream.ts:206:19)
    at Object.onReceiveStatus (/workspace/atlas/node_modules/@grpc/grpc-js/src/client-interceptors.ts:424:48)
    at /workspace/atlas/node_modules/@grpc/grpc-js/src/call-stream.ts:334:24
    at processTicksAndRejections (node:internal/process/task_queues:78:11)
[ERROR] 10:16:28 TypeError: savedMessageNext is not a function

Looking at InterceptingListenerImpl, it seems that a next method is not passed to nextListener which would result in savedMessageNext being undefined.

I had to link to the 1.7 branch above. I was surprised to see call-stream.ts is not on master anymore, and then I noticed work on the retry implementation by @murgatroid99. I would gladly try this out on our stack once it is ready and thanks for the great work!

@HofmannZ
Copy link

HofmannZ commented Jan 23, 2023

This works for us:

const channelOptions: ChannelOptions = {
  // Send keepalive pings every 10 seconds, default is 2 hours.
  'grpc.keepalive_time_ms': 10 * 1000,
  // Keepalive ping timeout after 5 seconds, default is 20 seconds.
  'grpc.keepalive_timeout_ms': 5 * 1000,
  // Allow keepalive pings when there are no gRPC calls.
  'grpc.keepalive_permit_without_calls': 1,
};

✌️

@ANANDD18
Copy link

This is my output when I try to install ESP32 library by Espressif systems. PLEASE HELP!


Downloading packages
esp32:xtensa-esp32-elf-gcc@esp-2021r2-patch5-8.4.0
esp32:xtensa-esp32s2-elf-gcc@esp-2021r2-patch5-8.4.0
esp32:xtensa-esp32s3-elf-gcc@esp-2021r2-patch5-8.4.0
esp32:riscv32-esp-elf-gcc@esp-2021r2-patch5-8.4.0
esp32:openocd-esp32@v0.11.0-esp32-20221026
esp32:esptool_py@4.5.1
esp32:mkspiffs@0.2.3
esp32:mklittlefs@3.0.0-gnu12-dc7f933
esp32:esp32@2.0.7
Installing esp32:xtensa-esp32-elf-gcc@esp-2021r2-patch5-8.4.0
Configuring tool.
esp32:xtensa-esp32-elf-gcc@esp-2021r2-patch5-8.4.0 installed
Installing esp32:xtensa-esp32s2-elf-gcc@esp-2021r2-patch5-8.4.0
Configuring tool.
esp32:xtensa-esp32s2-elf-gcc@esp-2021r2-patch5-8.4.0 installed
Installing esp32:xtensa-esp32s3-elf-gcc@esp-2021r2-patch5-8.4.0
Configuring tool.
esp32:xtensa-esp32s3-elf-gcc@esp-2021r2-patch5-8.4.0 installed
Installing esp32:riscv32-esp-elf-gcc@esp-2021r2-patch5-8.4.0
Configuring tool.
esp32:riscv32-esp-elf-gcc@esp-2021r2-patch5-8.4.0 installed
Installing esp32:openocd-esp32@v0.11.0-esp32-20221026
Configuring tool.
esp32:openocd-esp32@v0.11.0-esp32-20221026 installed
Installing esp32:esptool_py@4.5.1
Configuring tool.
esp32:esptool_py@4.5.1 installed
Installing esp32:mkspiffs@0.2.3
Configuring tool.
esp32:mkspiffs@0.2.3 installed
Installing esp32:mklittlefs@3.0.0-gnu12-dc7f933
Configuring tool.
esp32:mklittlefs@3.0.0-gnu12-dc7f933 installed
Installing platform esp32:esp32@2.0.7
Failed to install platform: esp32:esp32.
Error: 14 UNAVAILABLE: read ECONNRESET


@nixjke
Copy link

nixjke commented Jun 15, 2023

I set the url parameters to localhost:3000
And it works for me

export const grpcClientOptions: ClientOptions = {
  transport: Transport.GRPC,
  options: {
    package: 'hero',
    protoPath: join(__dirname, './hero/hero.proto'),
    url: 'localhost:3000',
  },
}

image

@oddcitizenape
Copy link

oddcitizenape commented May 2, 2024

This works for us:

const channelOptions: ChannelOptions = {
  // Send keepalive pings every 10 seconds, default is 2 hours.
  'grpc.keepalive_time_ms': 10 * 1000,
  // Keepalive ping timeout after 5 seconds, default is 20 seconds.
  'grpc.keepalive_timeout_ms': 5 * 1000,
  // Allow keepalive pings when there are no gRPC calls.
  'grpc.keepalive_permit_without_calls': 1,
};

✌️

apply this configuration changed the error meesage I receive to: Error: 14 UNAVAILABLE: Connection dropped, but didn't resolve it. I'm getting it on a client which is streaming values from server after approx. 25 min. Client is implemented in grpc-js, server is implemneted in golang

Client:
"@grpc/grpc-js": "^1.10.1",
"@grpc/proto-loader": "^0.7.10",
nodejs version: 20.5.0

Server:
go 1.21.1
google.golang.org/grpc v1.58.1
google.golang.org/protobuf v1.31.0

UPDATE: after settings the server keepalive policy and settings, everything seems to work fine:

kaPolicy := grpc.KeepaliveEnforcementPolicy(keepalive.EnforcementPolicy{
		MinTime:             10 * time.Second,
		PermitWithoutStream: true, // Allow ping without stream
})

kaParams := grpc.KeepaliveParams(keepalive.ServerParameters{
		Time: 10 * time.Second, // The default value is 2 hours.
		// After having pinged for keepalive check, the server waits for a duration
		// of Timeout and if no activity is seen even after that the connection is
		// closed.
		Timeout: 5 * time.Second, // The default value is 20 seconds.
})

UPDATE2: nope, didn't help: Error: 14 UNAVAILABLE: Connection dropped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests