Skip to content
This repository has been archived by the owner. It is now read-only.

Major performance degradation in crypto.createHash #5015

Closed
3rd-Eden opened this issue Mar 14, 2013 · 43 comments

Comments

@3rd-Eden
Copy link

commented Mar 14, 2013

A really simple snippet:

var crypto = require('crypto');

console.time('iterations');
for (var i = 0; i < 100000; i++) {
  crypto.createHash('md5').update(i + '').digest();
}
console.timeEnd('iterations');

Node.js 0.8.22: 329ms
Node.js 0.10.0: 2291ms

These results were generated on Mac OSX 10.8.2

@indutny

This comment has been minimized.

Copy link
Member

commented Mar 15, 2013

@bnoordhuis

This comment has been minimized.

Copy link
Member

commented Mar 15, 2013

Okay, confirmed.

v0.10 is on average 800% slower than v0.8 and it's because we spend roughly 10% total running time in Buffer::New() now. For comparison, in v0.8 that's only 0.1%.

On top of that, because we create lots of buffers, the GC has to do a lot more work: 15-20% is spent in the GC as opposed to ~3.5% in v0.8.

I think it's fair to say that the buffer-ication of the crypto API is not a resounding success. I move we revert it before v0.10.1.

@isaacs

This comment has been minimized.

Copy link

commented Mar 15, 2013

Reverting the stream API here would be pretty annoying, but losing the buffer-ization would be worse.

Let's explore this in more detail than just reverting the whole thing to v0.8. If we can have it use buffers, but not do the stream API, then it's easy enough to move that bit to userland while we figure out a clearer API. This is a long-standing wart, and it'd be really annoying to just go back to the awful mess of an API that it was.

@defunctzombie

This comment has been minimized.

Copy link

commented Mar 15, 2013

Was there a reason to change the api in the first place? People that wanted
any sort of different api can just implement it on top of the existing
stuff. Personally, I think streams is too high of an abstraction for low
level core (and certainly crypto items). Streams is just syntactic sugar on
top of what should be very thing barebones apis. Much less testing surface
this way and you don't have to keep changing the api for whatever new
userspace syntactic exploration you want to do.

I am +1 on removing all the new streams stuff and leaving things as they
were. Just document what is going on and let people build what they want on
top of it. Streams do not have to be in core at all.

On Fri, Mar 15, 2013 at 10:45 AM, Isaac Z. Schlueter <
notifications@github.com> wrote:

Reverting the stream API here would be pretty annoying, but losing the
buffer-ization would be worse.

Let's explore this in more detail than just reverting the whole thing to
v0.8. If we can have it use buffers, but not do the stream API, then it's
easy enough to move that bit to userland while we figure out a clearer API.
This is a long-standing wart, and it'd be really annoying to just go back
to the awful mess of an API that it was.


Reply to this email directly or view it on GitHubhttps://github.com//issues/5015#issuecomment-14964422
.

@defunctzombie

This comment has been minimized.

Copy link

commented Mar 15, 2013

I will add that I am also in favor of not using the buffer class here too
(if that wasn't clear from before) if it means making performance better.
To me, buffer is a thing I can always use to wrap the output if I want to
use it in some other way.

On Fri, Mar 15, 2013 at 12:04 PM, Roman Shtylman shtylman@gmail.com wrote:

Was there a reason to change the api in the first place? People that
wanted any sort of different api can just implement it on top of the
existing stuff. Personally, I think streams is too high of an abstraction
for low level core (and certainly crypto items). Streams is just syntactic
sugar on top of what should be very thing barebones apis. Much less testing
surface this way and you don't have to keep changing the api for whatever
new userspace syntactic exploration you want to do.

I am +1 on removing all the new streams stuff and leaving things as they
were. Just document what is going on and let people build what they want on
top of it. Streams do not have to be in core at all.

On Fri, Mar 15, 2013 at 10:45 AM, Isaac Z. Schlueter <
notifications@github.com> wrote:

Reverting the stream API here would be pretty annoying, but losing the
buffer-ization would be worse.

Let's explore this in more detail than just reverting the whole thing to
v0.8. If we can have it use buffers, but not do the stream API, then it's
easy enough to move that bit to userland while we figure out a clearer API.
This is a long-standing wart, and it'd be really annoying to just go back
to the awful mess of an API that it was.


Reply to this email directly or view it on GitHubhttps://github.com//issues/5015#issuecomment-14964422
.

@edef1c

This comment has been minimized.

Copy link

commented Mar 15, 2013

@shtylman Crypto deals with bytes. It makes sense for it to deal with Buffers.

@defunctzombie

This comment has been minimized.

Copy link

commented Mar 15, 2013

If buffers can be just as fast, then OK. But I view the current perf loss
as much worse than some idea of buffers being the "pure" approach.
On Mar 15, 2013 12:26 PM, "Nathan Zadoks" notifications@github.com wrote:

@shtylman https://github.com/shtylman Crypto deals with bytes. It makes
sense for it to deal with Buffers.


Reply to this email directly or view it on GitHubhttps://github.com//issues/5015#issuecomment-14970423
.

@ronkorving

This comment has been minimized.

Copy link

commented Mar 21, 2013

Is anything happening with this?

@dougwilson

This comment has been minimized.

Copy link
Member

commented Mar 21, 2013

The patch from @indutny just landed on the v0.10 branch, so I assume it'll be in the next v0.10 release.

@dougwilson

This comment has been minimized.

Copy link
Member

commented Mar 21, 2013

Actually, I just checked again and there is now a v0.10.1-release branch and the change log includes the patch, so there is a large speed up on v0.10.1.

@ronkorving

This comment has been minimized.

Copy link

commented Mar 21, 2013

Thanks @dougwilson ! Looking forward to the v0.10.1 release.

@isaacs

This comment has been minimized.

Copy link

commented Mar 21, 2013

@ronkorving @3rd-Eden @dougwilson That patch from @indutny makes things much less-bad, but it's still about half the speed of v0.8. There'll be another patch in 0.10.2 to get us the rest of the way.

Sorry for the inconvenience.

@isaacs

This comment has been minimized.

Copy link

commented Mar 21, 2013

To clarify, the problem isn't that we're using Buffers by default, but that we're converting strings to Buffers in JS-land instead of moving that code into C++-land.

@defunctzombie

This comment has been minimized.

Copy link

commented Mar 24, 2013

Um, I don't think that code does what you think it does.

On Sun, Mar 24, 2013 at 5:19 PM, Amir Mamedov notifications@github.comwrote:

var crypto = require('crypto');
console.time('iterations');for (var i = 0; i < 100000; i++) {
process.nextTick(function(){
crypto.createHash('md5').update(i + '').digest();
});}console.timeEnd('iterations');

Node.js 0.8.22: 42ms
Node.js 0.10.0: 36ms


Reply to this email directly or view it on GitHubhttps://github.com//issues/5015#issuecomment-15369424
.

@trevnorris

This comment has been minimized.

Copy link

commented Mar 29, 2013

EDIT: Note that my test 1 is incorrect. The hash should have been created before the loop, then timed how long it takes to update.

Let's break down the benchmark (leaving out unecessaries):

var time = process.hrtime();
// test 1
for (var i = 0; i < 1e6; i++)
  crypto.createHash('md5').update(i + '');
// test 2
for (var i = 0; i < 1e6; i++)
  crypto.createHash('md5').update(i + '').digest();

Results (@isaacs, know you'll love me for reporting them like this ;-):

           v0.8.22        v0.10
test 1:    1120 ns/op     3650 ns/op
test 2:    1590 ns/op     6450 ns/op

As a quick fix for test 1, made it so if strings were being passed then it would just continue to append them to an internal property until digest() was run. At that time it ran the actual update(). The speed improved by 3x, down to 900 ns/op (and, fwiw, actually returned the correct digest). Interestingly this is the same amount of time it takes if a Buffer is passed instead of a string. Though remember the hack was tailored to this case, and has obvious fail points (e.g. if two update()'s use different encodings).

Not surprisingly, that hack doesn't help much in the case of test 2. Since only a single Buffer write is required.

The largest chunk of time in test 2 comes from Encode(md_value, md_len, BUFFER). Which accounts for around 3000 ns. About 1/3 of Encode time is spend creating new SlowBuffers.

I'm reporting this just as an fyi to everyone asking why the performance drop, and to say I'm confident we can get it in the sub 2000 ns/op range.

@mscdex

This comment has been minimized.

Copy link

commented Apr 13, 2013

Yikes! This is not good for us heavy crypto users :-(

@trevnorris

This comment has been minimized.

Copy link

commented Apr 20, 2013

Just want to give a quick update. Based on the work in #4964 here are the updated times:

v0.8.23: 1605ms
v0.10.4: 6540ms
master:  5529ms
#4964:   3607ms
@mscdex

This comment has been minimized.

Copy link

commented Apr 20, 2013

@trevnorris What test code are you using for these latest numbers?

@trevnorris

This comment has been minimized.

Copy link

commented Apr 20, 2013

@mscdex Using the test code above. Though I tweaked the number of iterations.

@mscdex

This comment has been minimized.

Copy link

commented Apr 20, 2013

@trevnorris Test 1 or test 2?

@trevnorris

This comment has been minimized.

Copy link

commented Apr 21, 2013

Oh. Actually the test posted by @3rd-Eden.

@trevnorris

This comment has been minimized.

Copy link

commented Apr 29, 2013

I'd like to clarify some of the original post. First, here are the tests ran (each test was run individually, and the result is the avg of several iterations):

var iter = 1e7;

// test 1
var hash = crypto.createHash('md5');
var time = process.hrtime();
for (var i = 0; i < iter; i++)
  hash.update(i + '');

// test 2
var time = process.hrtime();
for (var i = 0; i < iter; i++)
  crypto.createHash('md5').update(i + '').digest();

time = process.hrtime(time);

console.log((time[0] * 1e9 + time[1]) / iter);

Here are the times on my machine:

         test 1   test 2
v0.8.23:  390ns   1580ns
v0.10.5: 1950ns   6720ns
master:  1220ns   5380ns
#4964:   1050ns   3480ns

Though these are for strings. Let's look at updating with a 128 byte Buffer instead:

         test 1   test 2
v0.8.23:  410ns   1470ns
v0.10.5:  388ns   3880ns
master:   364ns   3440ns
#4964:    345ns   1930ns
@isaacs

This comment has been minimized.

Copy link

commented Apr 30, 2013

I've spent a good amount of time tracking this down yesterday and today.

The vast majority of this regression is due to the fact that HashDigest now returns a buffer (of the SlowBuffer variety), instead of a binary string, and the Persistent/MakeWeak stuff is killing us. What's worse, if you do hash.update('binary') then it's still making a Buffer, and then turning the Buffer into a string in JS, so that's also hurting.

There are a few changes that would make this much better.

  1. Do a similar write*String approach for crypto classes as we do for StreamWrap::Write. This yields about a 5% performance improvement on the "lots of small hashes" test, and around a 10-20% improvement on the "Hash stream throughput" test.
  2. Move the digest(encoding) stuff back into C++.

With these two changes, we out-perform v0.8 considerably.

As an approach, I'm thinking:

  1. Add an api to node.cc that does char* DataFromString<EncodingEnum encoding>(Handle<String> string, int* len);. To call it, you'd pass in an int pointer which will store the length of bytes written, and it would return the char* pointing at the raw data. It's your responsibility to delete[] the return value when you're done with it. (The API shape feels a bit odd to me. I'm not the expert at designing idiomatic C++ APIs, and have no strong opinions. @bnoordhuis care to weigh in on this point?) This is not the main
  2. Have the StreamWrap::WriteStringImpl<encoding> use this function, as well as Hash::HashUpdate, Cipher::CipherUpdate etc.
  3. In addition to the decodeStrings flag on stream.Writable options, I think that Readable.push(chunk, encoding) should just accept the string as-is if encoding === this.options.encoding. That's a pretty simple optimization that would save another unnecessary Buffer creation, since we just immediately convert it back to the same kind of string anyway.
  4. In the (javascript) Hash.prototype.digest(encoding) function, it can pass the encoding arg to the binding object, and then the code in node_crypto.cc can return a string instead of a buffer. When using the Transform Stream API, it'll then this.push(result, encoding), and avoid the Buffer-ization there.

Opinions?

Bottom line: The regressed crypto stuff will be a bit faster if you use a string encoding (especially 'binary' or 'ascii'), but if you want to use Buffers, then they'll be a bit slower until we can make Buffers as fast as strings (maybe 0.12, maybe 1.0).

@bnoordhuis

This comment has been minimized.

Copy link
Member

commented May 1, 2013

The API looks fine but making it a template means you have to do this everywhere:

if (encoding == FOO)
  data = DataFromString<FOO>(string, &len);
else if (encoding == BAR)
  data = DataFromString<BAR>(string, &len);
else if (encoding == QUUX)
  /* and so on */

You probably want to make the encoding a function parameter rather than a template parameter.

Another suggestion is to change the prototype to something like this:

// Could also be done as a RAII class
void* DataFromString(Handle<String> string, enum EncodingEnum encoding, void* buf, size_t* len);

If buf != NULL, you try to write to it (and *len should be the length), else you new[] the required memory. If the input string is small, you use an on-stack buffer to avoid incurring the overhead of new[] and delete[]. Mixing heap and non-heap pointers is error prone so it would make sense to hide the implementation details in a small class.

@isaacs

This comment has been minimized.

Copy link

commented May 1, 2013

@bnoordhuis Thanks, that's a good idea. I only suggested using a template function since that's how StreamWrap did it, but I agree that a small class is probably a better way to go. I'll whip something up soon.

@isaacs

This comment has been minimized.

Copy link

commented May 3, 2013

Work progresses on my crypto-fix-perf branch. Current benchmarks: https://gist.github.com/isaacs/5506763

$ ../node-v0.8/node benchmark/crypto/hash-stream-creation.js api=legacy algo=md5 writes=100000 len=1000
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=hex len=1000 api=legacy: 1.0535
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=binary len=1000 api=legacy: 1.0947
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=buffer len=1000 api=legacy: 0.74339
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=hex len=1000 api=legacy: 0.68227
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=binary len=1000 api=legacy: 0.69946
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=buffer len=1000 api=legacy: 0.54533
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=hex len=1000 api=legacy: 1.6482
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=binary len=1000 api=legacy: 1.7380
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=buffer len=1000 api=legacy: 0.98841

$ ./node-v0.10 benchmark/crypto/hash-stream-creation.js api=legacy algo=md5 writes=100000 len=1000
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=hex len=1000 api=legacy: 0.60906
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=binary len=1000 api=legacy: 0.60367
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=buffer len=1000 api=legacy: 0.67015
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=hex len=1000 api=legacy: 0.48901
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=binary len=1000 api=legacy: 0.49979
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=buffer len=1000 api=legacy: 0.53076
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=hex len=1000 api=legacy: 0.93640
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=binary len=1000 api=legacy: 0.94936
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=buffer len=1000 api=legacy: 1.0725

$ ./node benchmark/crypto/hash-stream-creation.js api=legacy algo=md5 writes=100000 len=1000
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=hex len=1000 api=legacy: 1.3122
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=binary len=1000 api=legacy: 1.3359
crypto/hash-stream-creation.js writes=100000 algo=md5 type=asc out=buffer len=1000 api=legacy: 0.89541
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=hex len=1000 api=legacy: 1.1237
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=binary len=1000 api=legacy: 1.1371
crypto/hash-stream-creation.js writes=100000 algo=md5 type=utf out=buffer len=1000 api=legacy: 0.79360
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=hex len=1000 api=legacy: 1.5987
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=binary len=1000 api=legacy: 1.6357
crypto/hash-stream-creation.js writes=100000 algo=md5 type=buf out=buffer len=1000 api=legacy: 1.0219

Will finish up into a pull req for review tomorrow.

@mscdex

This comment has been minimized.

Copy link

commented May 3, 2013

👍

@trevnorris

This comment has been minimized.

Copy link

commented Jun 10, 2013

Update. Running the OP's code, get the following:

          iterations
-----------------------
v0.8      160ms
master    450ms
patched   300ms

patched is a combination of PR's 4964 and 5641. Not 100% there yet, but almost.

@ronkorving

This comment has been minimized.

Copy link

commented Jul 14, 2013

Is this still an ongoing effort? Are there still opportunities for improvement to explore?

@trevnorris

This comment has been minimized.

Copy link

commented Jul 14, 2013

Yes and yes. There's still improvement that can be had, but since a large
part of the regression has been fixed it's fallen down the urgency poll.

@krnlde

This comment has been minimized.

Copy link

commented Jul 22, 2014

Any news on this? I recognized the node.js implementation of MD5 can't compete with the one from PHP. Only if I store var hash = crypto.createHash('md5') before the loop and only do hash.update(i+'') inside the loop it matches the time needed in PHP's md5(i.''), but obviously the results are falsy. As already mentioned before, I suspect it has to do with the crypto.createHash() which has a significant performance impact. Wouldn't it be good to reset the buffer of the already instantiated Hash object to make it reusable and not having to construct a new one?

@trevnorris

This comment has been minimized.

Copy link

commented Jul 24, 2014

@krnlde Instantiating a new Buffer object in 0.11 is now measurable in ns. So it has little to do w/ creating the Buffer itself. It's how the data is handled once ingested by crypto. It can still be improved. I'll assign this to myself and check it out on a rainy day.

@krnlde

This comment has been minimized.

Copy link

commented Jul 24, 2014

So it's the first ingest which takes a long time for each Hash object?

Anyways, awesome! And thank you for your effort.

@krnlde

This comment has been minimized.

Copy link

commented Sep 11, 2014

Hey @trevnorris Any news on this? I read that in 0.12 there will be many improvements in the crypto lib. reference here

@trevnorris

This comment has been minimized.

Copy link

commented Sep 16, 2014

@krnlde There's still improvements to be made. Though that's probably always the case. Additional performance improvements will be coming after the initial v0.12 release. Most of them will be safe to land in the v0.12 branch, so you can expect v0.12 to get progressively faster.

@jasnell

This comment has been minimized.

Copy link
Member

commented May 20, 2015

@trevnorris ... was this pursued further?

@jasnell

This comment has been minimized.

Copy link
Member

commented May 28, 2015

Running in v0.12 on osx...

> for (var i = 0; i < 100000; i++) {
...   crypto.createHash('md5').update(i + '').digest();
... }
<Buffer d3 eb 9a 92 33 e5 29 48 74 0d 7e b8 c3 06 2d 14>
> console.timeEnd('iterations');
iterations: 361ms
undefined
> 

Going to assume this is resolved. Closing. Can reopen if new information is presented.

@jasnell jasnell closed this May 28, 2015

@3rd-Eden

This comment has been minimized.

Copy link
Author

commented May 28, 2015

WTF. I don't even.. You run it on one version without comparing regression with the old node to see if its actually resolved. How does that logic even work?!

On May 28, 2015, at 6:27 PM, James M Snell notifications@github.com wrote:

Running in v0.12 on osx...

for (var i = 0; i < 100000; i++) {
... crypto.createHash('md5').update(i + '').digest();
... }
<Buffer d3 eb 9a 92 33 e5 29 48 74 0d 7e b8 c3 06 2d 14>
console.timeEnd('iterations');
iterations: 361ms
undefined

Going to assume this is resolved. Closing. Can reopen if new information is presented.


Reply to this email directly or view it on GitHub.

@indutny

This comment has been minimized.

Copy link
Member

commented May 28, 2015

$ ~/Code/voxer/voxnode/node -e "console.log(process.version);console.time('hash');crypto = require('crypto');for (var i = 0; i < 100000; i++) crypto.createHash('md5').update(i + '').digest();console.timeEnd('hash')"
v0.8.26
hash: 161ms

./iojs -e "console.log(process.version);console.time('hash');crypto = require('crypto');for (var i = 0; i < 100000; i++) crypto.createHash('md5').update(i + '').digest();console.timeEnd('hash')"
v2.0.3
hash: 180ms

~/.node/0.12/bin/node -e "console.log(process.version);console.time('hash');crypto = require('crypto');for (var i = 0; i < 100000; i++) crypto.createHash('md5').update(i + '').digest();console.timeEnd('hash')"
v0.12.0
hash: 386ms

So it is resolved in io.js and not resolved in node v0.12

@mscdex

This comment has been minimized.

Copy link

commented May 28, 2015

Additionally we should really be using process.hrtime() instead of console.time*()/Date.now().

@jasnell

This comment has been minimized.

Copy link
Member

commented May 28, 2015

@mscdex @indutny ... do you see reason to keep this particular issue open then? The original issue showed a degradation between v0.8 and v0.10. While there are still improvements that definitely can be made, the converged stream will bring along the io.js improvements. Unless you think there's something we need to backport into v0.12, I'm inclined to defer to the converged stream.

@indutny

This comment has been minimized.

Copy link
Member

commented May 28, 2015

I don't see any point in having it open.

@bnoordhuis

This comment has been minimized.

Copy link
Member

commented May 28, 2015

If there are at least two people willing to do the back-porting work (one to back-port, one to review), it might be reasonable to keep the issue open. I'm going to guess however that no one is volunteering.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
You can’t perform that action at this time.