Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SocketClosedUnexpectedlyError while awaiting #initiator results in multiple connecting sockets / connections #2443

Closed
edkimmel opened this issue Mar 16, 2023 · 8 comments · Fixed by #2480
Labels

Comments

@edkimmel
Copy link

Description

Hello, we noticed in our logs that node-redis was able to get into a broken state during a ongoing networking issue causing socket closures. The last logs from some machines were the "Socket closed unexpectedly" error message, with no follow up reconnect event. We also noticed that some machines were firing multiple connecting / reconnecting events in parallel for the same client.

I believe there is an issue in error handling when a socket is closed during the #initiator promise in #connect.

  this.#socket = await this.#createSocket();
  this.#writableNeedDrain = false;
  this.emit('connect');

  try {
    await this.#initiator(); // <------ Socket closed here results in multiple #connect chains.
  } catch (err) {

On that line, the state of socket is:

  • isOpen - true
  • #socket - set as expected

Looking at createSocket:

.once('close', hadError => {
  if (!hadError && this.#isOpen && this.#socket === socket) {
    this.#onSocketError(new SocketClosedUnexpectedlyError());
  }
})
  • hasError can be falsey
  • isOpen is true - we were in the middle of #connect
  • #socket === socket is true, we had just set the instance variable

And finally looking at #onSocketError

    #onSocketError(err: Error): void {
        this.#isReady = false;
        this.emit('error', err);

        if (!this.#isOpen || typeof this.#shouldReconnect(0, err) !== 'number') return;
        
        this.emit('reconnecting');
        this.#connect().catch(() => {
            // the error was already emitted, silently ignore it
        });
    }
  • isOpen is true, we've established that
  • shouldReconnect will always be a number if no option is provided

So we see that the socket close event will cause this.#connect to be called again, even though we are in the middle of connecting to begin with.

A potential solution would be to check #isReady in onSocketError to know whether we are already in the process of connecting or not.

Node.js Version

lts/gallium

Redis Server Version

No response

Node Redis Version

4.3.1, but source taken from 4.6.5

Platform

linux

Logs

23:45:43.678 redis client error Socket closed unexpectedly
23:45:43.678 redis client is reconnecting
23:45:43.678 redis client is connecting
23:45:58.694 redis client error Socket closed unexpectedly
23:45:58.694 redis client error Socket closed unexpectedly
23:53:40.370 redis client error read ECONNRESET
23:53:40.370 redis client is reconnecting
23:53:40.370 redis client error read ECONNRESET
23:53:40.370 redis client is reconnecting
23:53:40.370 redis client is connecting
23:53:40.370 redis client is ready
23:53:41.371 redis client error Connection timeout
23:53:41.371 redis client is reconnecting
23:53:43.373 redis client error Connection timeout
23:53:45.375 redis client error Connection timeout
23:54:11.402 redis client error Socket closed unexpectedly
23:54:11.402 redis client is reconnecting
23:54:11.402 redis client is connecting
23:54:26.416 redis client error Socket closed unexpectedly
23:54:26.416 redis client is reconnecting
23:54:26.416 redis client error Socket closed unexpectedly
23:54:26.416 redis client is reconnecting
23:54:26.416 redis client is connecting
23:54:26.416 redis client is ready
23:54:26.416 redis client is connecting
23:54:26.416 redis client is ready
@edkimmel edkimmel added the Bug label Mar 16, 2023
@leibale
Copy link
Collaborator

leibale commented Mar 16, 2023

The best bug report I've seen in a long time! Great job! :)

I think that your solution (using #isReady in #onSocketError) should do the trick, I'll investigate more (and add a test) later. Will keep you posted.

@TrevorNodeJS
Copy link

@leibale Is there a plan for this fix?

@leibale
Copy link
Collaborator

leibale commented Apr 17, 2023

@TrevorNodeJS I'm on vacation, I'll fix it when I'm back...

leibale added a commit that referenced this issue Apr 26, 2023
* fix #2443 - fix multiple sockets on error in socket initiator

* handle `error` events in test
@niksy
Copy link

niksy commented May 19, 2023

@leibale is this something that is related to #2032 and #1993?

Because we’re on latest Redis client version (1.5.7) and we’re still getting SocketClosedUnexpectedlyError when Redis client is just idling or doesn’t receive any request.

@leibale
Copy link
Collaborator

leibale commented May 21, 2023

@niksy umm no.. the issue here is that an error in the client "initiator" (which is the function that runs between the "connect" and "ready" event and is in charge of setting all the connection level config like SELECT AUTH and so on) result in multiple open sockets for one client (which should never happen).

Regarding your situation - if the errors are only when the client is idling - usually it's the server that closes idle connections (I'm pretty sure that it's on by default on Azure and AWS). You can workaround that using the pingInterval feature (see here).

@niksy
Copy link

niksy commented May 22, 2023

Oh, okay, maybe that could help. Do you know if node-redis v3 had some built-in pinging ability with specified value?

@leibale
Copy link
Collaborator

leibale commented May 23, 2023

@niksy No, but you can always use setInterval...

setInterval(() => {
  client.ping(err => {
    if (err) {
      // ...
    }
  });
}, 1000 * 60 * 4);

@niksy
Copy link

niksy commented Jun 2, 2023

Regarding your situation - if the errors are only when the client is idling - usually it's the server that closes idle connections (I'm pretty sure that it's on by default on Azure and AWS). You can workaround that using the pingInterval feature (see here).

For anyone looking for solution to this error message, set pingInterval to a value which is lower than the value of idle connection error, e.g. if the error appears every 10 minutes when idling, set to 9 minutes and 50 seconds. This way connection will always be ready and error wont’t show.

More information: https://learn.microsoft.com/en-us/azure/azure-cache-for-redis/cache-best-practices-connection#idle-timeout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants