Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_stream_wrap: prevent use after free in TLS #1910

Closed
wants to merge 12 commits into from

Conversation

indutny
Copy link
Member

@indutny indutny commented Jun 6, 2015

Queued write requests should be invoked on handle close, otherwise the
"consumer" might be already destroyed when the write callbacks of the
"consumed" handle will be invoked.

Fix: #1696

cc @EricTheOne

@indutny
Copy link
Member Author

indutny commented Jun 6, 2015

@indutny
Copy link
Member Author

indutny commented Jun 6, 2015

cc @nodejs/crypto

@brendanashworth brendanashworth added the tls Issues and PRs related to the tls subsystem. label Jun 6, 2015
@indutny
Copy link
Member Author

indutny commented Jun 9, 2015

@bnoordhuis : let's review it anyway, even without @EricTheOne's feedback the fix that it does is still relevant

@indutny
Copy link
Member Author

indutny commented Jun 9, 2015

cc @EricTheOne

@@ -10,9 +10,11 @@ function StreamWrap(stream) {

this.stream = stream;

this.queue = null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally a noob question. Why are we not implementing this with an array instead of this complex enqueue and dequeue logic?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is queue supposed to be accessed directly from outside of the class?
Should it be prefixed or not?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thefourtheye because I'd like to avoid lookup cost when removing elements from it
@ChALkeR Should be prefixed, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@indutny I am trying to understand this better. Theoretically, there will be no lookup in a queue, the first-in will be dequeued, right? If we are going to arbitrarily remove, then why not a Map?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thefourtheye there is no promise of FIFO here, and I like linked lists pretty much :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChALkeR fixed

@indutny
Copy link
Member Author

indutny commented Jun 11, 2015

cc @trevnorris @shigeki @bnoordhuis @chrisdickinson maybe? Let's land this thing!

@EricTheOne
Copy link

@indutny this really seems to fix the original issue - no more segfaults and no more runaway memory allocations, thank you very much!

I've been running my server for a while, and found a few errors which I haven't seen before:

TypeError: Cannot read property 'start' of null
    at TLSSocket._start (_tls_wrap.js:550:15)
    at TLSSocket.<anonymous> (_tls_wrap.js:542:12)
    at TLSSocket.g (events.js:260:16)
    at emitNone (events.js:72:20)
    at TLSSocket.emit (events.js:166:7)
    at Socket.<anonymous> (_tls_wrap.js:432:12)
    at Socket.g (events.js:260:16)
    at emitNone (events.js:67:13)
    at Socket.emit (events.js:166:7)
    at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1029:10)
TypeError: Cannot read property 'start' of null
    at TLSSocket._start (_tls_wrap.js:550:15)
    at TLSSocket.<anonymous> (_tls_wrap.js:542:12)
    at TLSSocket.g (events.js:260:16)
    at emitNone (events.js:72:20)
TypeError: immediate._onImmediate is not a function
    at processImmediate (timers.js:371:17)
    at doNTCallback0 (node.js:408:9)
    at process._tickCallback (node.js:337:13)
    at Socket.<anonymous> (_stream_wrap.js:41:20)
    at emitOne (events.js:77:13)
    at Socket.emit (events.js:169:7)
    at readableAddChunk (_stream_readable.js:145:16)
    at Socket.Readable.push (_stream_readable.js:109:10)
    at TCP.onread (net.js:519:20)
TypeError: immediate._onImmediate is not a function
    at processImmediate (timers.js:371:17)
    at doNTCallback0 (node.js:408:9)
    at process._tickCallback (node.js:337:13)
Error: read EINVAL
    at exports._errnoException (util.js:838:11)
    at StreamWrap.Socket._read (net.js:391:21)
    at StreamWrap.Readable.read (_stream_readable.js:324:10)
    at StreamWrap.Socket.read (net.js:280:43)
    at StreamWrap.Socket (net.js:166:12)
    at new StreamWrap (_stream_wrap.js:51:10)
    at new TLSSocket (_tls_wrap.js:233:12)
    at Object.exports.connect (_tls_wrap.js:913:16)

Maybe it's in my code, but I don't see it in the stack traces. The patch was applied over master (0f68377)


StreamWrap.prototype._dequeue = function dequeue(req) {
var next = req._next;
var prev = req._prev;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we const these, as we don't reassign anything to them?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@trevnorris
Copy link
Contributor

Patch looks good. I'd have to give it a closer look, but don't let that hold up merging it if more devs sign off before then.

@indutny
Copy link
Member Author

indutny commented Jun 16, 2015

Fixed the first problem, please take a look @EricTheOne

@EricTheOne
Copy link

@indutny thanks, the original onImmediate issue seemed to be due to my code (passed a function() instead of function in to setImmediate).
I'm now testing this pull request over v2.3.0, and have found an issue:

TypeError: Cannot read property 'finishShutdown' of null
    at Immediate._onImmediate (_stream_wrap.js:86:19)
    at processImmediate [as _immediateCallback] (timers.js:371:17)

Seems to be due to _handle being null in _stream_wrap.js:

StreamWrap.prototype.shutdown = function shutdown(req) {
  const self = this;

  this.stream.end(function() {
    // Ensure that write was dispatched
    setImmediate(function() {
      self._handle.finishShutdown(req, 0);
    });
  });
  return 0;
};

@EricTheOne
Copy link

@indutny thanks for a4c5489 and 4cbf799, I ran the server with them but "TypeError: Cannot read property 'finishShutdown' of null" still happens.

@EricTheOne
Copy link

@indutny two more issues found, in addition to finishShutdown error:

../deps/uv/src/unix/stream.c:1489: uv_read_start: Assertion `((stream)->io_watcher.fd) >= 0' failed.
Error: read EINVAL
    at exports._errnoException (util.js:839:11)
    at StreamWrap.Socket._read (net.js:392:21)
    at StreamWrap.Readable.read (_stream_readable.js:325:10)
    at StreamWrap.Socket.read (net.js:281:43)
    at StreamWrap.Socket (net.js:167:12)
    at new StreamWrap (_stream_wrap.js:52:10)
    at new TLSSocket (_tls_wrap.js:234:12)
    at Object.exports.connect (_tls_wrap.js:938:16)
    ...

By far the most frequent is the finishShutdown error, second to it is EINVAL, and last is the assertion.

@EricTheOne
Copy link

@indutny is there any more info I can extract to help with the above three issues?

@indutny
Copy link
Member Author

indutny commented Jun 28, 2015

Fixed finishShutdown error, looking at the rest.

@indutny
Copy link
Member Author

indutny commented Jun 29, 2015

@EricTheOne hopefully fixed the last ones two, thank you for reporting them! Please give it a try ;)

@EricTheOne
Copy link

@indutny this is excellent news! I applied the pull request over v2.3.1 and ran the server twice. The first thing to notice is that the server behaves much better:

  1. Memory and CPU are far more stable
  2. finishShutdown error is gone

We're very close to resolving this, however two issues still appear:

  1. read EINVAL and write EINVAL
  2. ../deps/uv/src/unix/stream.c:1489: uv_read_start: Assertion `((stream)->io_watcher.fd) >= 0' failed.

They happen at roughly the same time (less than 20 seconds apart) and then the server crashes. I use multiple processes so the errors may come from different child processes.

I may have a clue - visually inspecting the logs, it seems that the two errors and a memory leak appear on reconnects. Especially I notice a lot of reconnects due to timeouts. I generally dispose of the sockets and create new ones. There is no kernel socket leak.

@indutny
Copy link
Member Author

indutny commented Jun 29, 2015

@EricTheOne may I ask you to provide a fresh stack trace for EINVAL? (Maybe both of read/write?)

@indutny
Copy link
Member Author

indutny commented Jun 29, 2015

@EricTheOne pushed one more fix, hope it helps

@EricTheOne
Copy link

@indutny with some more testing I found another issue (causing a crash). Not related to the latest commit (cb4a005):

../src/node_crypto.cc:2283: node::crypto::CheckResult node::crypto::CheckWhitelistedServerCert(X509_STORE_CTX*): Assertion `(root_cert) != (nullptr)' failed.

@shigeki
Copy link
Contributor

shigeki commented Jun 30, 2015

@EricTheOne Could you try the latest HEAD of master? I bleave it was fixed in #2064.

@EricTheOne
Copy link

@shigeki thanks, checking now, so far seems stable

@EricTheOne
Copy link

@shigeki @indutny #2064 indeed fixes the assertion, thanks.

@indutny I have reduced the rate of reconnects on the server, and neither of the two errors happen even without cb4a005. It seems like cause was either a race condition or incorrect handling of edge cases.
I've since run the server with cb4a005 too (the entire pull request was applied over 05a73c0 (master)), and it looks fine, no new issues.

From my point of view this work solves the original issues and does not introduce new ones, hence I'd like to see it merged.

@indutny
Copy link
Member Author

indutny commented Jun 30, 2015

cc @trevnorris please do one more pass over it. @shigeki may I ask you to take a look too?

@indutny
Copy link
Member Author

indutny commented Jun 30, 2015

debug('end');
if (self._handle)
self._handle.emitEOF();
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: throw in a comment on why the setImmediate() is necessary. future proofing for new devs. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, let's drop it out, and revert it back only in case of any troubles. I no longer think that it might be reasonable. (@EricTheOne: I hope you don't mind)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@indutny I don't understand the code enough to comment on the code level, but I'll retest the final version and let you know if anything new comes up.
btw just to further my understanding, why would setImmediate be resolving possible troubles there?

@trevnorris
Copy link
Contributor

Left comments about cosmetic stuff, but LGTM.

indutny added a commit that referenced this pull request Jul 1, 2015
Queued write requests should be invoked on handle close, otherwise the
"consumer" might be already destroyed when the write callbacks of the
"consumed" handle will be invoked. Same applies to the shutdown
requests.

Make sure to "move" away socket from server to not break the
`connections` counter in `net.js`. Otherwise it might not call `close`
callback, or call it too early.

Fix: #1696
PR-URL: #1910
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
@indutny
Copy link
Member Author

indutny commented Jul 1, 2015

Landed in 9180140, thank you! (Decided to squash everything into one commit, because I forgot what this hotfixes belonged too)

@indutny indutny closed this Jul 1, 2015
@indutny indutny deleted the fix/tls-crashes branch July 1, 2015 03:10
@EricTheOne
Copy link

@indutny thanks, I'll retest with master as soon as possible.

@rvagg rvagg mentioned this pull request Jul 2, 2015
mscdex pushed a commit to mscdex/io.js that referenced this pull request Jul 9, 2015
Queued write requests should be invoked on handle close, otherwise the
"consumer" might be already destroyed when the write callbacks of the
"consumed" handle will be invoked. Same applies to the shutdown
requests.

Make sure to "move" away socket from server to not break the
`connections` counter in `net.js`. Otherwise it might not call `close`
callback, or call it too early.

Fix: nodejs#1696
PR-URL: nodejs#1910
Reviewed-By: Trevor Norris <trev.norris@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tls Issues and PRs related to the tls subsystem.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

segfault in TLS
7 participants