Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tls: Reduce memory copying and number of BIO buffer allocations #31499

Closed
wants to merge 3 commits into from

Conversation

@rustyconover
Copy link
Contributor

rustyconover commented Jan 24, 2020

Avoid copying buffers before passing to SSL_write if there
are zero length buffers involved. Only copy the data when
the buffer has a non zero length.

Send a memory allocation hint to the crypto BIO about how much
memory will likely be needed to be allocated by the next call
to SSL_write. This makes a single allocation rather than the BIO
allocating a buffer for each 16k TLS segment written. This
solves a problem with large buffers written over TLS triggering
V8's GC.

  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • commit message follows [commit guidelines]
src/node_crypto_bio.h Outdated Show resolved Hide resolved
src/tls_wrap.cc Outdated Show resolved Hide resolved
@rustyconover rustyconover force-pushed the rustyconover:fix-ssl-speed branch to 4b4ee85 Jan 24, 2020
@nodejs-github-bot

This comment has been minimized.

@rustyconover rustyconover requested a review from devnexen Jan 24, 2020
src/node_crypto_bio.h Outdated Show resolved Hide resolved
@rustyconover rustyconover requested a review from bnoordhuis Jan 25, 2020
src/node_crypto_bio.h Outdated Show resolved Hide resolved
src/node_crypto_bio.h Outdated Show resolved Hide resolved
src/tls_wrap.cc Outdated Show resolved Hide resolved
@rustyconover rustyconover requested a review from bnoordhuis Jan 26, 2020
Copy link
Member

bnoordhuis left a comment

Thanks, LGTM.

@nodejs-github-bot

This comment has been minimized.

@nodejs-github-bot

This comment has been minimized.

@rustyconover

This comment has been minimized.

Copy link
Contributor Author

rustyconover commented Jan 26, 2020

How do I request this also get landed in v12.x?

@Trott
Trott approved these changes Jan 26, 2020
@Trott

This comment has been minimized.

Copy link
Member

Trott commented Jan 26, 2020

Benchmark results:

                                                  confidence improvement accuracy (*)   (**)  (***)
 tls/throughput.js size=1024 type='asc' dur=5                     1.53 %       ±2.08% ±2.79% ±3.68%
 tls/throughput.js size=1024 type='buf' dur=5                     1.84 %       ±2.18% ±2.93% ±3.88%
 tls/throughput.js size=1024 type='utf' dur=5                     1.11 %       ±2.09% ±2.78% ±3.62%
 tls/throughput.js size=1048576 type='asc' dur=5         ***      3.62 %       ±1.47% ±1.96% ±2.56%
 tls/throughput.js size=1048576 type='buf' dur=5         ***      3.95 %       ±1.39% ±1.85% ±2.40%
 tls/throughput.js size=1048576 type='utf' dur=5                  1.44 %       ±1.72% ±2.28% ±2.97%
 tls/throughput.js size=16777216 type='asc' dur=5          *      5.40 %       ±4.25% ±5.71% ±7.55%
 tls/throughput.js size=16777216 type='buf' dur=5        ***      9.64 %       ±1.23% ±1.64% ±2.14%
 tls/throughput.js size=16777216 type='utf' dur=5        ***      6.09 %       ±1.59% ±2.11% ±2.76%
 tls/throughput.js size=2 type='asc' dur=5                        1.23 %       ±2.28% ±3.05% ±3.98%
 tls/throughput.js size=2 type='buf' dur=5                        0.43 %       ±1.59% ±2.13% ±2.79%
 tls/throughput.js size=2 type='utf' dur=5                       -2.19 %       ±4.88% ±6.57% ±8.71%
 tls/throughput.js size=4194304 type='asc' dur=5         ***      6.62 %       ±1.26% ±1.68% ±2.20%
 tls/throughput.js size=4194304 type='buf' dur=5         ***      7.55 %       ±0.93% ±1.24% ±1.62%
 tls/throughput.js size=4194304 type='utf' dur=5         ***      4.21 %       ±1.65% ±2.20% ±2.86%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case there are 15 comparisons, you can thus
expect the following amount of false-positive results:
  0.75 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.15 false positives, when considering a   1% risk acceptance (**, ***),
  0.01 false positives, when considering a 0.1% risk acceptance (***)
@Trott

This comment has been minimized.

Copy link
Member

Trott commented Jan 26, 2020

Interestingly and completely believably, this seems to cause sequential/test-https-keep-alive-large-write to fail on Windows.

03:27:21 not ok 668 sequential/test-https-keep-alive-large-write
03:27:21   ---
03:27:21   duration_ms: 0.404
03:27:21   severity: fail
03:27:21   exitcode: 1
03:27:21   stack: |-
03:27:21     assert.js:142
03:27:21       throw err;
03:27:21       ^
03:27:21     
03:27:21     AssertionError [ERR_ASSERTION]: function should not have been called at C:\workspace\node-test-binary-windows-js-suites\node\test\sequential\test-https-keep-alive-large-write.js:33
03:27:21         at Server.mustNotCall (C:\workspace\node-test-binary-windows-js-suites\node\test\common\index.js:417:12)
03:27:21         at Server.emit (events.js:321:20)
03:27:21         at TLSSocket.socketOnTimeout (_http_server.js:516:37)
03:27:21         at TLSSocket.emit (events.js:321:20)
03:27:21         at TLSSocket.Socket._onTimeout (net.js:479:8)
03:27:21         at TLSSocket.<anonymous> (C:\workspace\node-test-binary-windows-js-suites\node\test\sequential\test-https-keep-alive-large-write.js:29:62)
03:27:21         at TLSSocket._onTimeout (C:\workspace\node-test-binary-windows-js-suites\node\test\common\index.js:359:15)
03:27:21         at IncomingMessage.<anonymous> (C:\workspace\node-test-binary-windows-js-suites\node\test\sequential\test-https-keep-alive-large-write.js:42:14)
03:27:21         at Object.onceWrapper (events.js:428:26)
03:27:21         at IncomingMessage.emit (events.js:321:20) {
03:27:21       generatedMessage: false,
03:27:21       code: 'ERR_ASSERTION',
03:27:21       actual: undefined,
03:27:21       expected: undefined,
03:27:21       operator: 'fail'
03:27:21     }
03:27:21   ...
Copy link
Member

Trott left a comment

Needs the Windows test fixed. (I didn't look to see if the problem is here or in the test or someplace else. But the possibility that the issue is in the test and/or an existing bug in Node.js core Windows code is very real.)

@Trott

This comment has been minimized.

Copy link
Member

Trott commented Jan 26, 2020

How do I request this also get landed in v12.x?

Welcome @rustyconover and thanks for the great pull request!

Things that land on the master branch that are performance improvements (and not breaking changes) tend to get pulled into the LTS Active release (which is what 12.x is right now). So there's no need to do anything special.

@rustyconover

This comment has been minimized.

Copy link
Contributor Author

rustyconover commented Jan 26, 2020

So I think I can explain that test failure.

TLSWrap::EncOut() will only try to write kSimultaneousBufferCount buffers at a time. My PR has changed the BIO buffer scheme such that a single BIO buffer could be the entire length of the write (due to the allocation hint). This means rather than doing 10 ~16k TLS segments per call, there could be just big call which may complete before the timeout will be issued.

So the test is trying to make a guarantee that that write could be interrupted and timeout signaled mid-buffer, which seems dubious to me.

@rustyconover rustyconover requested a review from Trott Jan 26, 2020
@Trott

This comment has been minimized.

Copy link
Member

Trott commented Jan 27, 2020

So I think I can explain that test failure.

TLSWrap::EncOut() will only try to write kSimultaneousBufferCount buffers at a time. My PR has changed the BIO buffer scheme such that a single BIO buffer could be the entire length of the write (due to the allocation hint). This means rather than doing 10 ~16k TLS segments per call, there could be just big call which may complete before the timeout will be issued.

So the test is trying to make a guarantee that that write could be interrupted and timeout signaled mid-buffer, which seems dubious to me.

@nodejs/http Thoughts on the validity of sequential/test-https-keep-alive-large-write?

@Trott

This comment has been minimized.

Copy link
Member

Trott commented Jan 27, 2020

@nodejs/http Thoughts on the validity of sequential/test-https-keep-alive-large-write?

/ping @nodejs/testing too.

@bnoordhuis

This comment has been minimized.

Copy link
Member

bnoordhuis commented Jan 27, 2020

After looking at the test I agree it's based on a flawed assumption / inherently race-y.

It looks to have worked by accident until now, but throw a fast enough machine at it and I'm sure it'll start failing even without this PR. I'm okay with removing it.

@rustyconover

This comment has been minimized.

Copy link
Contributor Author

rustyconover commented Jan 27, 2020

Should I amend the commit to remove the test?

@addaleax

This comment has been minimized.

Copy link
Member

addaleax commented Jan 27, 2020

Should I amend the commit to remove the test?

I think it’s fine to remove it in a separate commit (e.g. test: remove parallel/...), if you want.

@jasnell

This comment has been minimized.

Copy link
Member

jasnell commented Jan 27, 2020

For the removed test, it would be good to document what the flawed assumption is in the commit message.

@nodejs-github-bot

This comment has been minimized.

@rustyconover

This comment has been minimized.

Copy link
Contributor Author

rustyconover commented Feb 26, 2020

@addaleax The changes to the test worked, now it seems there is some flakiness in an unrelated test on a Raspberry Pi in the CI. How can I try to run the CI tests again to see if it will resolve?

@rustyconover

This comment has been minimized.

Copy link
Contributor Author

rustyconover commented Feb 26, 2020

This #31966 seems to be causing the problem in getting the CI tests to pass.

@nodejs-github-bot

This comment has been minimized.

@nodejs-github-bot

This comment has been minimized.

@nodejs-github-bot

This comment has been minimized.

@Trott

This comment has been minimized.

Copy link
Member

Trott commented Feb 27, 2020

This #31966 seems to be causing the problem in getting the CI tests to pass.

That fix landed a few minutes ago so it hopefully won't be a problem anymore. Did a full CI rebuild for this PR. Hopefully things will go easier/better this time....

@rustyconover

This comment has been minimized.

Copy link
Contributor Author

rustyconover commented Feb 28, 2020

@Trott Thanks for your help triggering the new CI, it seems that there is an unrelated failure in HTTP2 now. I wasn't setting this error before:

https://ci.nodejs.org/job/node-test-binary-windows-js-suites/2044/RUN_SUBSET=3,nodes=win2012r2-COMPILED_BY-vs2019-x86/console

It doesn't seem to be directly related to the PR at hand. Can we try it one more time?

@nodejs-github-bot

This comment has been minimized.

@rustyconover

This comment has been minimized.

Copy link
Contributor Author

rustyconover commented Feb 28, 2020

@Trott seems like the CI passed . Thank you!

Could you complete your review with the new test change?

@Trott
Trott approved these changes Feb 29, 2020
@Trott

This comment has been minimized.

Copy link
Member

Trott commented Feb 29, 2020

Landed in 3d894d0...8b1efe0

@Trott Trott closed this Feb 29, 2020
Trott added a commit to Trott/io.js that referenced this pull request Feb 29, 2020
Remove a test that made a flawed assumption that a single
large buffer write can be interrupted by a timeout event.

PR-URL: nodejs#31499
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Trott added a commit to Trott/io.js that referenced this pull request Feb 29, 2020
Change the test to not be sensitive to the buffer size causing
TCP resets to be received by the client causing the test to fail.
The test now reads the entire expected buffer and then checks for
the expected event to fire.

PR-URL: nodejs#31499
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Trott added a commit to Trott/io.js that referenced this pull request Feb 29, 2020
Avoid copying buffers before passing to SSL_write if there
are zero length buffers involved.  Only copy the data when
the buffer has a non zero length.

Send a memory allocation hint to the crypto BIO about how much
memory will likely be needed to be allocated by the next call
to SSL_write.  This makes a single allocation rather than the BIO
allocating a buffer for each 16k TLS segment written.  This
solves a problem with large buffers written over TLS triggering
V8's GC.

PR-URL: nodejs#31499
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
codebytere added a commit that referenced this pull request Feb 29, 2020
Remove a test that made a flawed assumption that a single
large buffer write can be interrupted by a timeout event.

PR-URL: #31499
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
codebytere added a commit that referenced this pull request Feb 29, 2020
Change the test to not be sensitive to the buffer size causing
TCP resets to be received by the client causing the test to fail.
The test now reads the entire expected buffer and then checks for
the expected event to fire.

PR-URL: #31499
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
codebytere added a commit that referenced this pull request Feb 29, 2020
Avoid copying buffers before passing to SSL_write if there
are zero length buffers involved.  Only copy the data when
the buffer has a non zero length.

Send a memory allocation hint to the crypto BIO about how much
memory will likely be needed to be allocated by the next call
to SSL_write.  This makes a single allocation rather than the BIO
allocating a buffer for each 16k TLS segment written.  This
solves a problem with large buffers written over TLS triggering
V8's GC.

PR-URL: #31499
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@codebytere codebytere mentioned this pull request Feb 29, 2020
codebytere added a commit that referenced this pull request Mar 15, 2020
Remove a test that made a flawed assumption that a single
large buffer write can be interrupted by a timeout event.

PR-URL: #31499
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
codebytere added a commit that referenced this pull request Mar 15, 2020
Change the test to not be sensitive to the buffer size causing
TCP resets to be received by the client causing the test to fail.
The test now reads the entire expected buffer and then checks for
the expected event to fire.

PR-URL: #31499
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
codebytere added a commit that referenced this pull request Mar 15, 2020
Avoid copying buffers before passing to SSL_write if there
are zero length buffers involved.  Only copy the data when
the buffer has a non zero length.

Send a memory allocation hint to the crypto BIO about how much
memory will likely be needed to be allocated by the next call
to SSL_write.  This makes a single allocation rather than the BIO
allocating a buffer for each 16k TLS segment written.  This
solves a problem with large buffers written over TLS triggering
V8's GC.

PR-URL: #31499
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
codebytere added a commit that referenced this pull request Mar 17, 2020
Remove a test that made a flawed assumption that a single
large buffer write can be interrupted by a timeout event.

PR-URL: #31499
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
codebytere added a commit that referenced this pull request Mar 17, 2020
Change the test to not be sensitive to the buffer size causing
TCP resets to be received by the client causing the test to fail.
The test now reads the entire expected buffer and then checks for
the expected event to fire.

PR-URL: #31499
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
codebytere added a commit that referenced this pull request Mar 17, 2020
Avoid copying buffers before passing to SSL_write if there
are zero length buffers involved.  Only copy the data when
the buffer has a non zero length.

Send a memory allocation hint to the crypto BIO about how much
memory will likely be needed to be allocated by the next call
to SSL_write.  This makes a single allocation rather than the BIO
allocating a buffer for each 16k TLS segment written.  This
solves a problem with large buffers written over TLS triggering
V8's GC.

PR-URL: #31499
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
codebytere added a commit that referenced this pull request Mar 30, 2020
Remove a test that made a flawed assumption that a single
large buffer write can be interrupted by a timeout event.

PR-URL: #31499
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
codebytere added a commit that referenced this pull request Mar 30, 2020
Change the test to not be sensitive to the buffer size causing
TCP resets to be received by the client causing the test to fail.
The test now reads the entire expected buffer and then checks for
the expected event to fire.

PR-URL: #31499
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
codebytere added a commit that referenced this pull request Mar 30, 2020
Avoid copying buffers before passing to SSL_write if there
are zero length buffers involved.  Only copy the data when
the buffer has a non zero length.

Send a memory allocation hint to the crypto BIO about how much
memory will likely be needed to be allocated by the next call
to SSL_write.  This makes a single allocation rather than the BIO
allocating a buffer for each 16k TLS segment written.  This
solves a problem with large buffers written over TLS triggering
V8's GC.

PR-URL: #31499
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl>
Reviewed-By: David Carlier <devnexen@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

7 participants
You can’t perform that action at this time.