ECONNRESET errors when idling #318

Open
mrapczynski opened this Issue Nov 3, 2015 · 26 comments

Comments

Projects
None yet
@mrapczynski

I'm trying to figure out the source of the error below, or how to catch it properly and mitigate it. I am using LDAPJS as part of an identity management pipeline to move user accounts from our ERP into Active Directory (connected over TLS with a self-signed cert). There is often idle periods in the database where no events are being dispatched, and in turn LDAPJS raises these exceptions when I assume it has nothing to do.

12:20:54 bsis-0 Error: read ECONNRESET
    at exports._errnoException (util.js:812:11)
    at TLSWrap.onread (net.js:542:26)
2015-11-03 12:20:54: App name:bsis id:0 exited with code 1
12:20:54 PM2 App name:bsis id:0 exited with code 1
2015-11-03 12:21:54: Starting execution sequence in -fork mode- for app name:bsis id:0
12:21:54 PM2 Starting execution sequence in -fork mode- for app name:bsis id:0
2015-11-03 12:21:54: App name:bsis id:0 online
12:21:54 PM2 App name:bsis id:0 online
12:21:54 bsis-0 WARNING: NODE_APP_INSTANCE value of '0' did not match any instance config file names.
12:21:54 bsis-0 WARNING: See https://github.com/lorenwest/node-config/wiki/Strict-Mode
12:37:54 bsis-0 Error: read ECONNRESET
    at exports._errnoException (util.js:812:11)
    at TLSWrap.onread (net.js:542:26)
2015-11-03 12:37:54: App name:bsis id:0 exited with code 1
12:37:54 PM2 App name:bsis id:0 exited with code 1
2015-11-03 12:38:54: Starting execution sequence in -fork mode- for app name:bsis id:0
12:38:54 PM2 Starting execution sequence in -fork mode- for app name:bsis id:0
2015-11-03 12:38:54: App name:bsis id:0 online
12:38:54 PM2 App name:bsis id:0 online
12:38:54 bsis-0 WARNING: NODE_APP_INSTANCE value of '0' did not match any instance config file names.
12:38:54 bsis-0 WARNING: See https://github.com/lorenwest/node-config/wiki/Strict-Mode
12:54:55 bsis-0 Error: read ECONNRESET
    at exports._errnoException (util.js:812:11)
    at TLSWrap.onread (net.js:542:26)
2015-11-03 12:54:55: App name:bsis id:0 exited with code 1
12:54:55 PM2 App name:bsis id:0 exited with code 1

They only show up because of my process.on('uncaughtException') handler which in turns logs the event and then auto-restarts the broker.

Ideas?

@mrapczynski

This comment has been minimized.

Show comment
Hide comment
@mrapczynski

mrapczynski Nov 3, 2015

One other detail - don't ask me why I thought to do this - but since I continue to see these reset errors in the log, I quickly compared the amount of time between each event and it looks to be approx. 17 minutes:

1:29PM - 1:12 = 17
12:54PM - 12:37 = 17
12:37PM - 12:21 = 16

Maybe this is a setting somewhere?

One other detail - don't ask me why I thought to do this - but since I continue to see these reset errors in the log, I quickly compared the amount of time between each event and it looks to be approx. 17 minutes:

1:29PM - 1:12 = 17
12:54PM - 12:37 = 17
12:37PM - 12:21 = 16

Maybe this is a setting somewhere?

@pfmooney

This comment has been minimized.

Show comment
Hide comment
@pfmooney

pfmooney Nov 4, 2015

Member

It's quite possible that the LDAP server you're connecting to will time out idle clients. Subscribing to the error event on the client will allow you to receive these events and take appropriate action.

Member

pfmooney commented Nov 4, 2015

It's quite possible that the LDAP server you're connecting to will time out idle clients. Subscribing to the error event on the client will allow you to receive these events and take appropriate action.

@pfmooney pfmooney added the question label Nov 4, 2015

@mrapczynski

This comment has been minimized.

Show comment
Hide comment
@mrapczynski

mrapczynski Nov 4, 2015

@pfmooney OK I'll check into that. In the background, does LDAPJS pick up these errors and try to reconnect, or completely bails until I restart my Node process?

@pfmooney OK I'll check into that. In the background, does LDAPJS pick up these errors and try to reconnect, or completely bails until I restart my Node process?

@pfmooney

This comment has been minimized.

Show comment
Hide comment
@pfmooney

pfmooney Nov 4, 2015

Member

While ldapjs had client options to automatically reconnect in the face of socket errors, they have not yet been documented. Alternatively, you can perform such actions manually by instantiating a new connection when your client encounters an error.

Member

pfmooney commented Nov 4, 2015

While ldapjs had client options to automatically reconnect in the face of socket errors, they have not yet been documented. Alternatively, you can perform such actions manually by instantiating a new connection when your client encounters an error.

@tapmodo

This comment has been minimized.

Show comment
Hide comment
@tapmodo

tapmodo Nov 13, 2015

As noted above, the documentation for the Client API doesn't mention auto-reconnecting or how to handle an eventual ECONNRESET from the server. I can't imagine an LDAP server that never closes or resets the connection, or never goes down. Appears to be option reconnect: true?

The documentation also does not mention how to close a connection, which is necessary if you are writing a small script that should gracefully shut down after doing its work. I looked in the source and appears this can be done with client.destroy()

Both of these points would be useful to mention in the documentation. Thanks!

tapmodo commented Nov 13, 2015

As noted above, the documentation for the Client API doesn't mention auto-reconnecting or how to handle an eventual ECONNRESET from the server. I can't imagine an LDAP server that never closes or resets the connection, or never goes down. Appears to be option reconnect: true?

The documentation also does not mention how to close a connection, which is necessary if you are writing a small script that should gracefully shut down after doing its work. I looked in the source and appears this can be done with client.destroy()

Both of these points would be useful to mention in the documentation. Thanks!

@Megatronic79 Megatronic79 referenced this issue in RocketChat/Rocket.Chat Nov 30, 2015

Closed

LDAP authentication with active directory [$10] #1491

@randomsock

This comment has been minimized.

Show comment
Hide comment
@randomsock

randomsock Dec 18, 2015

👍

Thanks very much for doing the digging @tapmodo. reconnect: true does indeed auto-reconnect after this failure. This strikes me as a really important resilience feature so is a bit surprising it's not documented. You do still need the client.on('error', ...) handler to stop the disconnection from bringing the process down first, as @pfmooney pointed out.

So an over-simplified solution looks like:

var options = {
    url: 'ldap://myad.foo.net:3268',
    reconnect: true
};
var client = ldapjs.createClient(options);
client.on('error', function(err) {
    console.warn('LDAP connection failed, but fear not, it will reconnect OK', err);
});

Job done - thanks guys!

👍

Thanks very much for doing the digging @tapmodo. reconnect: true does indeed auto-reconnect after this failure. This strikes me as a really important resilience feature so is a bit surprising it's not documented. You do still need the client.on('error', ...) handler to stop the disconnection from bringing the process down first, as @pfmooney pointed out.

So an over-simplified solution looks like:

var options = {
    url: 'ldap://myad.foo.net:3268',
    reconnect: true
};
var client = ldapjs.createClient(options);
client.on('error', function(err) {
    console.warn('LDAP connection failed, but fear not, it will reconnect OK', err);
});

Job done - thanks guys!

@dustinsmith1024

This comment has been minimized.

Show comment
Hide comment
@dustinsmith1024

dustinsmith1024 Dec 30, 2015

This doesn't seem to reconnect properly for me. Anyone else having issues? The program doesn't exit but it just hangs on any attempt to re-authenticate.

This doesn't seem to reconnect properly for me. Anyone else having issues? The program doesn't exit but it just hangs on any attempt to re-authenticate.

@dustinsmith1024 dustinsmith1024 referenced this issue in auth0/passport-windowsauth Dec 30, 2015

Merged

Updates ldapjs to 1.0.0 #30

@mjoyce91

This comment has been minimized.

Show comment
Hide comment
@mjoyce91

mjoyce91 Jan 4, 2016

@dustinsmith1024 - I've been having ECONNRESET errors as well. I've tried adding reconnect:true option, and using client.destroy() one minute after every query. This only started happening after I migrated from an AD on an Azure server to one on an AWS server, so I think it has some thing to do with what @pfmooney mentioned - the server is timing out idle connections. I just can't get the error to be handled any where.

mjoyce91 commented Jan 4, 2016

@dustinsmith1024 - I've been having ECONNRESET errors as well. I've tried adding reconnect:true option, and using client.destroy() one minute after every query. This only started happening after I migrated from an AD on an Azure server to one on an AWS server, so I think it has some thing to do with what @pfmooney mentioned - the server is timing out idle connections. I just can't get the error to be handled any where.

@saigop

This comment has been minimized.

Show comment
Hide comment
@saigop

saigop Jan 4, 2016

Where to add this entry reconnect:true and client.destroy()?

Thanks in advance

saigop commented Jan 4, 2016

Where to add this entry reconnect:true and client.destroy()?

Thanks in advance

@saigop

This comment has been minimized.

Show comment
Hide comment
@saigop

saigop Jan 4, 2016

I have changed the LDAP idle timeout in AD, post that it looks fine, will keep monitoring.

thanks.

saigop commented Jan 4, 2016

I have changed the LDAP idle timeout in AD, post that it looks fine, will keep monitoring.

thanks.

@mjoyce91

This comment has been minimized.

Show comment
Hide comment
@mjoyce91

mjoyce91 Jan 5, 2016

@saigop - I believe you put reconnect : true in your options object that you pass into ldap.createClient. I moved my client.unbind() and client.destroy() into my resp.on handlers and I think that did the trick.

mjoyce91 commented Jan 5, 2016

@saigop - I believe you put reconnect : true in your options object that you pass into ldap.createClient. I moved my client.unbind() and client.destroy() into my resp.on handlers and I think that did the trick.

CKarper added a commit to CKarper/promised-ldap that referenced this issue Feb 3, 2016

Wrap call to destroy
As per joyent/node-ldapjs#318 , sometimes you will need to call destroy to completely close a client connection so you don't idle error out after ~17 minutes.

@CKarper CKarper referenced this issue in stewartml/promised-ldap Feb 3, 2016

Merged

Wrap call to destroy #2

@the1mills

This comment has been minimized.

Show comment
Hide comment
@the1mills

the1mills Feb 9, 2016

@saigop "I have changed the LDAP idle timeout in AD, post that it looks fine, will keep monitoring. " what is AD??

@saigop "I have changed the LDAP idle timeout in AD, post that it looks fine, will keep monitoring. " what is AD??

@mjoyce91

This comment has been minimized.

Show comment
Hide comment
@mjoyce91

mjoyce91 Feb 10, 2016

@the1mills Active Directory

@the1mills Active Directory

@ma-zal

This comment has been minimized.

Show comment
Hide comment
@ma-zal

ma-zal Feb 15, 2016

Without using TLS, same issue.

Error: read ECONNRESET
    at errnoException (net.js:905:11)
    at TCP.onread (net.js:559:19)

ma-zal commented Feb 15, 2016

Without using TLS, same issue.

Error: read ECONNRESET
    at errnoException (net.js:905:11)
    at TCP.onread (net.js:559:19)
@jontowles

This comment has been minimized.

Show comment
Hide comment
@jontowles

jontowles May 22, 2016

When I add the reconnect I get a:

Err:OperationsError: 000004DC: LdapErr: DSID-0C090748, comment: In order to perform this operation a successful bind must be completed on the connection

I think some insight into how we need to gracefully handle this in NodeJS would be helpful. Realistically we need to ask do we close the connection, do we let is close on its own, should we be destroying the connection in intervals?

When I add the reconnect I get a:

Err:OperationsError: 000004DC: LdapErr: DSID-0C090748, comment: In order to perform this operation a successful bind must be completed on the connection

I think some insight into how we need to gracefully handle this in NodeJS would be helpful. Realistically we need to ask do we close the connection, do we let is close on its own, should we be destroying the connection in intervals?

@jontowles

This comment has been minimized.

Show comment
Hide comment
@jontowles

jontowles May 24, 2016

I've been looking into this more deeply and I'd love to see some responses.

We added reconnect: true along with bind/unbind as we were seeing basically an LDAP break around 15m after starting the service.

We still continue to see everyone's favorite error: [2016-05-23 22:04:44.068] [ERROR] Server - Exception ->Error: read ECONNRESETerrStack: Error: read ECONNRESET
at exports._errnoException (util.js:870:11)
at TLSWrap.onread (net.js:552:26)

It's much more sporadic now with the exception that the re-connect and unbind ensure that we no longer see any issues because it automatically reconnects.

A packet capture shows us that even though we did an unbind, we see the RST from our LDAP server, which leads to an exchange of SYN and ACH packets and once our NodeJS server and LDAP exchange two sets of ACH packets another RST ensues basically saying "send me something! no you send me something! okay we'll close now"

At this point, this just looks like noise to me as everything works fine outside of that.

I've been looking into this more deeply and I'd love to see some responses.

We added reconnect: true along with bind/unbind as we were seeing basically an LDAP break around 15m after starting the service.

We still continue to see everyone's favorite error: [2016-05-23 22:04:44.068] [ERROR] Server - Exception ->Error: read ECONNRESETerrStack: Error: read ECONNRESET
at exports._errnoException (util.js:870:11)
at TLSWrap.onread (net.js:552:26)

It's much more sporadic now with the exception that the re-connect and unbind ensure that we no longer see any issues because it automatically reconnects.

A packet capture shows us that even though we did an unbind, we see the RST from our LDAP server, which leads to an exchange of SYN and ACH packets and once our NodeJS server and LDAP exchange two sets of ACH packets another RST ensues basically saying "send me something! no you send me something! okay we'll close now"

At this point, this just looks like noise to me as everything works fine outside of that.

@mrapczynski

This comment has been minimized.

Show comment
Hide comment
@mrapczynski

mrapczynski May 24, 2016

I recently experimented with setting up my own LDAP connection pool using the npm package pool2, and so far it's going very well. After weeks of testing, we rolled the update to production with no incidents.

Basically I have the pool give me connections and then automatically unbind them if they age out thus completely avoiding the idle timeout issue that I experience with Active Directory.

I have never found that reconnect: true worked, and so prior to this I would just process.exit(0) if I got an idle timeout forcing my app worker to re-generate and pick up a fresh new connection.

I recently experimented with setting up my own LDAP connection pool using the npm package pool2, and so far it's going very well. After weeks of testing, we rolled the update to production with no incidents.

Basically I have the pool give me connections and then automatically unbind them if they age out thus completely avoiding the idle timeout issue that I experience with Active Directory.

I have never found that reconnect: true worked, and so prior to this I would just process.exit(0) if I got an idle timeout forcing my app worker to re-generate and pick up a fresh new connection.

@jontowles

This comment has been minimized.

Show comment
Hide comment
@jontowles

jontowles May 24, 2016

The thing that I learned is that its not an idle session timeout. For some reason LDAPJS keeps trying to connect to LDAP even when no attempt to bind is occurring.

The thing that I learned is that its not an idle session timeout. For some reason LDAPJS keeps trying to connect to LDAP even when no attempt to bind is occurring.

@Jackyjjc

This comment has been minimized.

Show comment
Hide comment
@Jackyjjc

Jackyjjc Jun 14, 2016

any update on this issue? we are using the ldapauth fork library which build on top of ldapjs. It is suffering from this same issue due to this bug in ldapjs and here are the discussion: vesse/node-ldapauth-fork#23

any update on this issue? we are using the ldapauth fork library which build on top of ldapjs. It is suffering from this same issue due to this bug in ldapjs and here are the discussion: vesse/node-ldapauth-fork#23

@PaulBernier

This comment has been minimized.

Show comment
Hide comment
@PaulBernier

PaulBernier Jul 8, 2016

+1 What's the proper way to close a client? Proper way to reconnect? Those sound like essential points that need to be documented please. Thanks.

+1 What's the proper way to close a client? Proper way to reconnect? Those sound like essential points that need to be documented please. Thanks.

@blandman

This comment has been minimized.

Show comment
Hide comment

+3

@Nepoxx

This comment has been minimized.

Show comment
Hide comment
@Nepoxx

Nepoxx Dec 20, 2016

This issue was opened a year ago, has this changed?

edit:

My current workaround is simply this:

const ldapClient = ldapjs.createClient({
  url: config.get('ldap.url'),
  reconnect: true
})

ldapClient.on('error', err => {
  logger.error(err.message)
})

Once in a while I'll get a read ECONNRESET error, but the client is still usable and seems to re-create connections as they are needed.

Nepoxx commented Dec 20, 2016

This issue was opened a year ago, has this changed?

edit:

My current workaround is simply this:

const ldapClient = ldapjs.createClient({
  url: config.get('ldap.url'),
  reconnect: true
})

ldapClient.on('error', err => {
  logger.error(err.message)
})

Once in a while I'll get a read ECONNRESET error, but the client is still usable and seems to re-create connections as they are needed.

@banzy

This comment has been minimized.

Show comment
Hide comment
@banzy

banzy May 26, 2017

I've solved the problem removing the client definition out of the main node loop. So I'm building the client just when it's necessary and after the call I destroy it.

  app.post('/item', function(req, res) { 
       const ldapClient = ldapjs.createClient({
           url: 'ldap://ldap.server.com:3880',
           reconnect: false
        });
    ... 
        res.on('end', function(result) {
            console.log('status: ' + result.status);
            ldapClient.destroy();
        });

banzy commented May 26, 2017

I've solved the problem removing the client definition out of the main node loop. So I'm building the client just when it's necessary and after the call I destroy it.

  app.post('/item', function(req, res) { 
       const ldapClient = ldapjs.createClient({
           url: 'ldap://ldap.server.com:3880',
           reconnect: false
        });
    ... 
        res.on('end', function(result) {
            console.log('status: ' + result.status);
            ldapClient.destroy();
        });
@ORESoftware

This comment has been minimized.

Show comment
Hide comment
@ORESoftware

ORESoftware Jun 9, 2017

For some reason

client.destroy()

is not documented, as far as I can tell.

Should we call unbind() before destroy()?

client.unbind(function(){
   client.destroy();
});

ORESoftware commented Jun 9, 2017

For some reason

client.destroy()

is not documented, as far as I can tell.

Should we call unbind() before destroy()?

client.unbind(function(){
   client.destroy();
});
@dpatte

This comment has been minimized.

Show comment
Hide comment
@dpatte

dpatte Jan 23, 2018

Seeing this:
Error: read ECONNRESET
at errnoException utiil.js:873:11)
at TCP.onread (net.js:557:26)

The timing seems random. I'm on 1.4

dpatte commented Jan 23, 2018

Seeing this:
Error: read ECONNRESET
at errnoException utiil.js:873:11)
at TCP.onread (net.js:557:26)

The timing seems random. I'm on 1.4

@ms007

This comment has been minimized.

Show comment
Hide comment
@ms007

ms007 Jun 13, 2018

destroy calls unbind internally

ms007 commented Jun 13, 2018

destroy calls unbind internally

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment