Cleanup sockets when creating new connections #46

dmiddlecamp · 2015-10-29T18:40:39Z

We noticed a bug where many thousands of simultaneous connections can be leaked. This closes sockets and also prevents uncatchable exceptions from being thrown when many quick connection attempts are made in a short period of time.

… don't storm

…one.

…kets then we limit our simultaneous connections, but that won't do anything to slow aggressive re-connections

…ak globally

troy · 2015-10-29T22:18:11Z

@kenperkins if we can help test and/or review this, I'm at your service

kenperkins · 2015-11-13T20:38:47Z

@dmiddlecamp can you expand on what you're seeing? I'd like to better understand.

dmiddlecamp · 2015-11-13T20:51:14Z

Sure sure!

We found an installation running this module that had spawned many thousands of connections to papertrail, which caused us to be rate limited. Since the TLS session is created independently of the TCP session, it's possible for the module to attempt hundreds (or thousands) of simultaneous connections while it's waiting on the TLS session to be established. These sockets aren't cleaned up, and the rate limiting wasn't enforced in this scenario. This patch guarantees that the module will never open more than 1 socket at any given moment, and will cleanup the socket in the event that the TLS session doesn't succeed.

We were also seeing uncatchable crashes when many connection attempts were made quickly, so I've also added a fix for that and a unit test to confirm.

edit: When I say "an installation", I mean all copies of all our microservices in multiple cloud environments using this module. We had... a lot of connections open to papertrail.

edit 2: I could certainly be wrong here as well if I misunderstood the cause, but some socket cleanup logic felt like a pretty safe / straightforward approach to this either way.

Thanks!
David

kenperkins · 2015-11-13T21:12:30Z

lib/winston-papertrail.js

+                self.socket = null;
+            }
+        }
+        catch (e) { }


So we're just catching here and swallowing the error. I'm wondering if we should emit it?

so the destroy here prevents the error from being thrown as opposed to close or end. The try/catch does nothing to catch that socket error.

Although we could certainly emit an error event on the class object, at least that would be catchable. But it wouldn't be the aforementioned uncatchable socket error in TLS.

That's what I was wondering; is there value in self.emit(e)? Not being super familiar with when socket.destroy() throws I'm speculating.

dmiddlecamp · 2015-11-13T21:17:17Z

Also BTW, thank you for taking the time to review, much appreciated! :)

kenperkins · 2015-11-25T16:12:18Z

Did we decide whether or not we should emit those two exceptions?

I've typically biased to emitting so that if there's something weird going on, you bubble it up to a higher level where it might still be visible. I've learned that swallowing errors in your logging is generally bad :)

dmiddlecamp · 2015-11-25T16:25:46Z

I'm okay with emitting those if you feel strongly about it, but I only try to emit errors for things where something should / can be done. In a scorched earth / cleanup scenario I'm not sure what recovery steps we'd take out of band between disposing sockets and re-establishing them?

kenperkins · 2015-11-25T16:27:28Z

I think the only thing that could be done is being able to log something out to the console for investigation purposes. And given that I don't even know what this will throw, lets not bother for now.

If we see evidence to the contrary, we can revisit.

matteocontrini · 2015-11-25T16:28:07Z

I tend to agree with @dmiddlecamp, I wouldn't know what to do knowing that winston-papertrail had an internal error. Would I log that the logging library had an error? :P

kenperkins · 2015-11-25T16:29:10Z

So be it :)

dcollinsf5 · 2016-03-11T19:54:38Z

Hi guys!
I believe that this issue impacted our production app this last week during a Papertrail outage. Could you guys shed some light on this pull request? Does it resolve the many-connections issue, and if so why isn't it merged?

troy · 2016-04-28T14:26:24Z

@dmiddlecamp could I trouble you to pull master and merge it? We're merging this and an unrelated PR into master, then letting folks use it for a few weeks before making a release.

dmiddlecamp · 2016-04-29T20:52:40Z

Sure thing, I'll take a look

# Conflicts: # package.json

dmiddlecamp · 2016-04-29T20:57:23Z

Looks like it was just a package.json conflict?

troy · 2016-04-29T21:53:38Z

If you're reading this message, we'd love if you can pull the current master of winston-papertrail and run it. We'll wait at least 2 weeks to see whether any issues come up from wider use of this and one other change. If you encounter problems with master which don't occur in the current release, please open a new GitHub Issue with the transcript and reference this PR.

troy · 2016-08-24T15:09:30Z

This is live on NPM in version 1.0.3.

brycekahle and others added 14 commits July 7, 2015 16:45

Remove peer dep to test npm install issue

292ed7d

fix logging of error objects.

903b4fb

Merge branch 'patch/error_obj'

95d9af5

in the event of connectStream call leaks, debounce the events so they…

28b87fb

… don't storm

try to close the stream if it's still around, before we create a new …

bc96209

…one.

watch / cleanup the socket also

9ca726d

making tests pass again, in theory if we're policing our previous soc…

64955cb

…kets then we limit our simultaneous connections, but that won't do anything to slow aggressive re-connections

spaces

47d1626

removing accidental whitespace

62a9671

don't loop forever.

9a2e9c6

also kill this one

107b956

lets try not removing these, I think they're causing TLS errors to le…

0061c50

…ak globally

lol j/k

e0ce4b5

yay, fixing "very bad uncatchable explosion" bug!

44832bb

kenperkins reviewed Nov 13, 2015
View reviewed changes

kenperkins mentioned this pull request Nov 17, 2015

fix logging of error objects. #44

Merged

whythecode mentioned this pull request Nov 18, 2015

Exception on stream timeout #47

Closed

1.0.3

49824d1

Merge branch 'master' of github.com:kenperkins/winston-papertrail

e5aaf6c

# Conflicts: # package.json

troy self-assigned this Apr 29, 2016

troy merged commit 1bc8933 into kenperkins:master Apr 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cleanup sockets when creating new connections #46

Cleanup sockets when creating new connections #46

dmiddlecamp commented Oct 29, 2015

troy commented Oct 29, 2015

kenperkins commented Nov 13, 2015

dmiddlecamp commented Nov 13, 2015

kenperkins Nov 13, 2015

dmiddlecamp Nov 13, 2015

dmiddlecamp Nov 13, 2015

kenperkins Nov 17, 2015

dmiddlecamp commented Nov 13, 2015

kenperkins commented Nov 25, 2015

dmiddlecamp commented Nov 25, 2015

kenperkins commented Nov 25, 2015

matteocontrini commented Nov 25, 2015

kenperkins commented Nov 25, 2015

dcollinsf5 commented Mar 11, 2016

troy commented Apr 28, 2016

dmiddlecamp commented Apr 29, 2016

dmiddlecamp commented Apr 29, 2016

troy commented Apr 29, 2016

troy commented Aug 24, 2016

Cleanup sockets when creating new connections #46

Cleanup sockets when creating new connections #46

Conversation

dmiddlecamp commented Oct 29, 2015

troy commented Oct 29, 2015

kenperkins commented Nov 13, 2015

dmiddlecamp commented Nov 13, 2015

kenperkins Nov 13, 2015

Choose a reason for hiding this comment

dmiddlecamp Nov 13, 2015

Choose a reason for hiding this comment

dmiddlecamp Nov 13, 2015

Choose a reason for hiding this comment

kenperkins Nov 17, 2015

Choose a reason for hiding this comment

dmiddlecamp commented Nov 13, 2015

kenperkins commented Nov 25, 2015

dmiddlecamp commented Nov 25, 2015

kenperkins commented Nov 25, 2015

matteocontrini commented Nov 25, 2015

kenperkins commented Nov 25, 2015

dcollinsf5 commented Mar 11, 2016

troy commented Apr 28, 2016

dmiddlecamp commented Apr 29, 2016

dmiddlecamp commented Apr 29, 2016

troy commented Apr 29, 2016

troy commented Aug 24, 2016