Skip to content
This repository

Major performance degradation in crypto.createHash #5015

Open
3rd-Eden opened this Issue March 14, 2013 · 30 comments

10 participants

Arnout Kazemier Fedor Indutny Ben Noordhuis Isaac Z. Schlueter Roman Shtylman Nathan Zadoks Ron Korving Douglas Christopher Wilson Trevor Norris Brian White
Arnout Kazemier

A really simple snippet:

var crypto = require('crypto');

console.time('iterations');
for (var i = 0; i < 100000; i++) {
  crypto.createHash('md5').update(i + '').digest();
}
console.timeEnd('iterations');

Node.js 0.8.22: 329ms
Node.js 0.10.0: 2291ms

These results were generated on Mac OSX 10.8.2

Fedor Indutny
Collaborator
Ben Noordhuis

Okay, confirmed.

v0.10 is on average 800% slower than v0.8 and it's because we spend roughly 10% total running time in Buffer::New() now. For comparison, in v0.8 that's only 0.1%.

On top of that, because we create lots of buffers, the GC has to do a lot more work: 15-20% is spent in the GC as opposed to ~3.5% in v0.8.

I think it's fair to say that the buffer-ication of the crypto API is not a resounding success. I move we revert it before v0.10.1.

Isaac Z. Schlueter
Collaborator

Reverting the stream API here would be pretty annoying, but losing the buffer-ization would be worse.

Let's explore this in more detail than just reverting the whole thing to v0.8. If we can have it use buffers, but not do the stream API, then it's easy enough to move that bit to userland while we figure out a clearer API. This is a long-standing wart, and it'd be really annoying to just go back to the awful mess of an API that it was.

Roman Shtylman
Roman Shtylman
Nathan Zadoks

@shtylman Crypto deals with bytes. It makes sense for it to deal with Buffers.

Roman Shtylman
Simon Gaeremynck simong referenced this issue in oaeproject/Hilary March 18, 2013
Closed

Upgrade to Node 0.10 #415

Ron Korving

Is anything happening with this?

Douglas Christopher Wilson

The patch from @indutny just landed on the v0.10 branch, so I assume it'll be in the next v0.10 release.

Douglas Christopher Wilson

Actually, I just checked again and there is now a v0.10.1-release branch and the change log includes the patch, so there is a large speed up on v0.10.1.

Ron Korving

Thanks @dougwilson ! Looking forward to the v0.10.1 release.

Isaac Z. Schlueter
Collaborator

@ronkorving @3rd-Eden @dougwilson That patch from @indutny makes things much less-bad, but it's still about half the speed of v0.8. There'll be another patch in 0.10.2 to get us the rest of the way.

Sorry for the inconvenience.

Isaac Z. Schlueter
Collaborator

To clarify, the problem isn't that we're using Buffers by default, but that we're converting strings to Buffers in JS-land instead of moving that code into C++-land.

Roman Shtylman
Trevor Norris
Collaborator

EDIT: Note that my test 1 is incorrect. The hash should have been created before the loop, then timed how long it takes to update.

Let's break down the benchmark (leaving out unecessaries):

var time = process.hrtime();
// test 1
for (var i = 0; i < 1e6; i++)
  crypto.createHash('md5').update(i + '');
// test 2
for (var i = 0; i < 1e6; i++)
  crypto.createHash('md5').update(i + '').digest();

Results (@isaacs, know you'll love me for reporting them like this ;-):

           v0.8.22        v0.10
test 1:    1120 ns/op     3650 ns/op
test 2:    1590 ns/op     6450 ns/op

As a quick fix for test 1, made it so if strings were being passed then it would just continue to append them to an internal property until digest() was run. At that time it ran the actual update(). The speed improved by 3x, down to 900 ns/op (and, fwiw, actually returned the correct digest). Interestingly this is the same amount of time it takes if a Buffer is passed instead of a string. Though remember the hack was tailored to this case, and has obvious fail points (e.g. if two update()'s use different encodings).

Not surprisingly, that hack doesn't help much in the case of test 2. Since only a single Buffer write is required.

The largest chunk of time in test 2 comes from Encode(md_value, md_len, BUFFER). Which accounts for around 3000 ns. About 1/3 of Encode time is spend creating new SlowBuffers.

I'm reporting this just as an fyi to everyone asking why the performance drop, and to say I'm confident we can get it in the sub 2000 ns/op range.

Brian White

Yikes! This is not good for us heavy crypto users :-(

Trevor Norris
Collaborator

Just want to give a quick update. Based on the work in #4964 here are the updated times:

v0.8.23: 1605ms
v0.10.4: 6540ms
master:  5529ms
#4964:   3607ms
Brian White

@trevnorris What test code are you using for these latest numbers?

Trevor Norris
Collaborator

@mscdex Using the test code above. Though I tweaked the number of iterations.

Brian White

@trevnorris Test 1 or test 2?

Trevor Norris
Collaborator

Oh. Actually the test posted by @3rd-Eden.

Trevor Norris
Collaborator

I'd like to clarify some of the original post. First, here are the tests ran (each test was run individually, and the result is the avg of several iterations):

var iter = 1e7;

// test 1
var hash = crypto.createHash('md5');
var time = process.hrtime();
for (var i = 0; i < iter; i++)
  hash.update(i + '');

// test 2
var time = process.hrtime();
for (var i = 0; i < iter; i++)
  crypto.createHash('md5').update(i + '').digest();

time = process.hrtime(time);

console.log((time[0] * 1e9 + time[1]) / iter);

Here are the times on my machine:

         test 1   test 2
v0.8.23:  390ns   1580ns
v0.10.5: 1950ns   6720ns
master:  1220ns   5380ns
#4964:   1050ns   3480ns

Though these are for strings. Let's look at updating with a 128 byte Buffer instead:

         test 1   test 2
v0.8.23:  410ns   1470ns
v0.10.5:  388ns   3880ns
master:   364ns   3440ns
#4964:    345ns   1930ns
Isaac Z. Schlueter
Collaborator

I've spent a good amount of time tracking this down yesterday and today.

The vast majority of this regression is due to the fact that HashDigest now returns a buffer (of the SlowBuffer variety), instead of a binary string, and the Persistent/MakeWeak stuff is killing us. What's worse, if you do hash.update('binary') then it's still making a Buffer, and then turning the Buffer into a string in JS, so that's also hurting.

There are a few changes that would make this much better.

  1. Do a similar write*String approach for crypto classes as we do for StreamWrap::Write. This yields about a 5% performance improvement on the "lots of small hashes" test, and around a 10-20% improvement on the "Hash stream throughput" test.
  2. Move the digest(encoding) stuff back into C++.

With these two changes, we out-perform v0.8 considerably.

As an approach, I'm thinking:

  1. Add an api to node.cc that does char* DataFromString<EncodingEnum encoding>(Handle<String> string, int* len);. To call it, you'd pass in an int pointer which will store the length of bytes written, and it would return the char* pointing at the raw data. It's your responsibility to delete[] the return value when you're done with it. (The API shape feels a bit odd to me. I'm not the expert at designing idiomatic C++ APIs, and have no strong opinions. @bnoordhuis care to weigh in on this point?) This is not the main
  2. Have the StreamWrap::WriteStringImpl<encoding> use this function, as well as Hash::HashUpdate, Cipher::CipherUpdate etc.
  3. In addition to the decodeStrings flag on stream.Writable options, I think that Readable.push(chunk, encoding) should just accept the string as-is if encoding === this.options.encoding. That's a pretty simple optimization that would save another unnecessary Buffer creation, since we just immediately convert it back to the same kind of string anyway.
  4. In the (javascript) Hash.prototype.digest(encoding) function, it can pass the encoding arg to the binding object, and then the code in node_crypto.cc can return a string instead of a buffer. When using the Transform Stream API, it'll then this.push(result, encoding), and avoid the Buffer-ization there.

Opinions?

Bottom line: The regressed crypto stuff will be a bit faster if you use a string encoding (especially 'binary' or 'ascii'), but if you want to use Buffers, then they'll be a bit slower until we can make Buffers as fast as strings (maybe 0.12, maybe 1.0).

Ben Noordhuis

The API looks fine but making it a template means you have to do this everywhere:

if (encoding == FOO)
  data = DataFromString<FOO>(string, &len);
else if (encoding == BAR)
  data = DataFromString<BAR>(string, &len);
else if (encoding == QUUX)
  /* and so on */

You probably want to make the encoding a function parameter rather than a template parameter.

Another suggestion is to change the prototype to something like this:

// Could also be done as a RAII class
void* DataFromString(Handle<String> string, enum EncodingEnum encoding, void* buf, size_t* len);

If buf != NULL, you try to write to it (and *len should be the length), else you new[] the required memory. If the input string is small, you use an on-stack buffer to avoid incurring the overhead of new[] and delete[]. Mixing heap and non-heap pointers is error prone so it would make sense to hide the implementation details in a small class.

Isaac Z. Schlueter
Collaborator
isaacs commented May 01, 2013

@bnoordhuis Thanks, that's a good idea. I only suggested using a template function since that's how StreamWrap did it, but I agree that a small class is probably a better way to go. I'll whip something up soon.

Isaac Z. Schlueter
Collaborator
isaacs commented May 02, 2013

Work progresses on my crypto-fix-perf branch. Current benchmarks: https://gist.github.com/isaacs/5506763

$ ../node-v0.8/node benchmark/crypto/hash-stream-creation.js api=legacy algo=md5 writes=100000 len=1000
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=hex len=1000 api=legacy: 1.0535
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=binary len=1000 api=legacy: 1.0947
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=buffer len=1000 api=legacy: 0.74339
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=hex len=1000 api=legacy: 0.68227
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=binary len=1000 api=legacy: 0.69946
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=buffer len=1000 api=legacy: 0.54533
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=hex len=1000 api=legacy: 1.6482
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=binary len=1000 api=legacy: 1.7380
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=buffer len=1000 api=legacy: 0.98841

$ ./node-v0.10 benchmark/crypto/hash-stream-creation.js api=legacy algo=md5 writes=100000 len=1000
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=hex len=1000 api=legacy: 0.60906
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=binary len=1000 api=legacy: 0.60367
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=buffer len=1000 api=legacy: 0.67015
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=hex len=1000 api=legacy: 0.48901
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=binary len=1000 api=legacy: 0.49979
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=buffer len=1000 api=legacy: 0.53076
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=hex len=1000 api=legacy: 0.93640
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=binary len=1000 api=legacy: 0.94936
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=buffer len=1000 api=legacy: 1.0725

$ ./node benchmark/crypto/hash-stream-creation.js api=legacy algo=md5 writes=100000 len=1000
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=hex len=1000 api=legacy: 1.3122
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=binary len=1000 api=legacy: 1.3359
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=buffer len=1000 api=legacy: 0.89541
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=hex len=1000 api=legacy: 1.1237
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=binary len=1000 api=legacy: 1.1371
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=buffer len=1000 api=legacy: 0.79360
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=hex len=1000 api=legacy: 1.5987
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=binary len=1000 api=legacy: 1.6357
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=buffer len=1000 api=legacy: 1.0219

Will finish up into a pull req for review tomorrow.

Brian White
mscdex commented May 02, 2013

:+1:

Trevor Norris
Collaborator

Update. Running the OP's code, get the following:

          iterations
-----------------------
v0.8      160ms
master    450ms
patched   300ms

patched is a combination of PR's 4964 and 5641. Not 100% there yet, but almost.

Ron Korving

Is this still an ongoing effort? Are there still opportunities for improvement to explore?

Trevor Norris
Collaborator
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.