Bring 'crypto' module into the modern age #3278

Closed
isaacs opened this Issue May 16, 2012 · 17 comments

Projects

None yet

10 participants

@isaacs
isaacs commented May 16, 2012

The crypto module relies on many outdated Node ideas. It cannot operate on Buffers directly, and doesn't present any kind of stream interface.

It's the only reason we cannot yet deprecate the 'binary' string encoding.

Its API should be cleaned up, and ideally move much more of its functionality into JavaScript, and make it more dry.

@isaacs isaacs added a commit that referenced this issue May 16, 2012
@isaacs isaacs Revert "Fix #3242 Actually deprecate 'binary' buffer encoding"
This reverts commit 5979f09.

Related:
- #3279
- #3278
a3753b4
@jorangreef

Was just looking for a way to get an SHA1 hash digest piped directly into a buffer at an offset in the buffer e.g. digest(buffer, offset) rather than receiving an intermediary buffer from digest and then copying this.

Also noticed that digest('base64') is twice as slow as digest('hex') or digest('binary'):

Benchmark(function() { crypto.createHash('sha1').update('abcdefg', 'utf8').digest('base64') }, 1000)
0.00823ms

Benchmark(function() { crypto.createHash('sha1').update('abcdefg', 'utf8').digest('base64') }, 1000)
0.0076ms

Benchmark(function() { crypto.createHash('sha1').update('abcdefg', 'utf8').digest('hex') }, 1000)
0.00497ms

Benchmark(function() { crypto.createHash('sha1').update('abcdefg', 'utf8').digest('hex') }, 1000)
0.00482ms

Benchmark(function() { crypto.createHash('sha1').update('abcdefg', 'utf8').digest('binary') }, 1000)
0.00446ms

Benchmark(function() { crypto.createHash('sha1').update('abcdefg', 'utf8').digest('binary') }, 1000)
0.00444ms

@ThisIsMissEm

@isaacs What's the overall status of this, and is there anything I could do to help?

@tanepiper

@miksago AFAIK this stuff will be 0.10 territory after speaking to Mikeal - everything should be moved to streams and be the last API-breaking changes

@jorangreef

Regarding API, it would be great if the interface for crypto.randomBytes(size, callback) could match that of window.crypto.getRandomValues(typedarray).

The problem with Node's randomBytes is that if you already have a buffer which you need to be filled with entropy then you have to copy memory twice. This incurs unnecessary copies in the kind of situation where you would often want to reuse a buffer.

window.crypto.getRandomValues is blocking but for the use case it is more helpful. Would prefer Node to make randomBytes synchronous (i.e. treat the entropy generation to be a blocking CPU activity rather than a non-blocking IO activity). If the user wants, they can call getRandomValues multiple times rather to avoid blocking. Consider also that the hash functions in Node are already synchronous in this same sense (one can update SHA1 multiple times to avoid blocking, but the update operation itself is synchronous).

@ThisIsMissEm

I think more what I'm interested in, is: should crypto methods be returning a Buffer instead of binary string or other string? (if they don't already)

@jorangreef

Yes they should be able to return a buffer, but they should also be able to write the digest into an existing buffer at a specified offset. If the getRandomValues interface could be improved while you're at it, then that would be much appreciated.

@ThisIsMissEm

hmm, I might have a look at this over the long weekend.

@aseemk
aseemk commented Jun 15, 2012

Having crypto methods return buffers might make sense, but just wanted to toss in that having them support returning strings (maybe not by default) is really convenient. E.g. hash.digest('hex').

Would love to see this for randomBytes() too (btw, any reason it couldn't be shortened to just random()? crypto.random() would be a nice parallel to Math.random()), e.g. crypto.random(16, 'hex').

@stephank
stephank commented Aug 8, 2012

Mentioning related issues that are still open: #3571, #1393, #2945.

Streams make a ton of sense here. For ciphers, though, that pretty much implies creating new buffers.

On the other hand, I didn't see any mention of ciphers here, but mostly of methods that don't return a lot of data. Stuff like hash output and randomBytes is usually limited to a handful. Isn't this plenty fast with the existing buffer pool in node? Is the small copy for this data significant?

I'm pretty much in favor of Hash, Hmac, Signer and Verify all as write streams, both Cipher and Decipher as read/write streams, and the remaining functions operating on and returning plain buffers. You'll find those functions deal with small amounts in practice.

@jorangreef

Those functions deal with small amounts in practice.

Yet they may be called many times a second in busy servers: e.g. deduplication, hashing keys for lookup etc.

It should be possible to eliminate the copy, if the interface is designed right, by having the hash write output into an existing buffer at a specified offset.

For the interface to impose a copy when it's not necessary and not desired by the user is wasteful.

That's how we've ended up with systems that do copies all the way down.

On 08 Aug 2012, at 10:24 AM, Joyent/Node reply@reply.github.com wrote:

Mentioning related issues that are still open: #3571, #1393, #2945.

Streams make a ton of sense here. For ciphers, though, that pretty much implies creating new buffers.

On the other hand, I didn't see any mention of ciphers here, but mostly of methods that don't return a lot of data. Stuff like hash output and randomBytes is usually limited to a handful. Isn't this plenty fast with the existing buffer pool in node? Is the small copy for this data significant?

I'm pretty much in favor of Hash, Hmac, Signer and Verify all as write streams, both Cipher and Decipher as read/write streams, and the remaining functions operating on and returning plain buffers. You'll find those functions deal with small amounts in practice.


Reply to this email directly or view it on GitHub.

@stephank
stephank commented Aug 9, 2012

My concern is that there may not be a perfect interface for this, when the trade-off is between flexibility and complexity. Assuming we're transitioning to streams, I count a total of 10 remaining methods that deal with raw data. Each of these would have to support some or all of the following means of output in their interface:

  • Return a buffer.
  • Return an encoded string.
  • Output to a buffer, optionally at an offset.
  • Output an encoded representation to a buffer, optionally at an offset. (for e.g. text-only protocols?)

I'm also not sure what a zero-copy cipher would look like, if we want to expose them as regular node streams.

@mscdex
mscdex commented Aug 9, 2012

+1 for writing to an existing buffer as an option. I think this would perform much better for the reasons @jorangreef listed and thus would be a good fit for the crypto use in the ssh2 module I'm writing.

@stephank
stephank commented Aug 9, 2012

@mscdex Well snap, guess what I'm writing. :)

@shlevy
shlevy commented Aug 28, 2012

Does this issue including making crypto streaming à la zlib, with the actual encryption calls being done in a background thread? Or should I open up another issue for that?

@indutny indutny added a commit that closed this issue Sep 18, 2012
@indutny indutny crypto: bring module into modern age
Introduce 'buffer' encoding, allow returning and giving buffers as
arguments of 'crypto' routines.

Fix #3278
63ff449
@indutny indutny closed this in 63ff449 Sep 18, 2012
@indutny
Member
indutny commented Sep 18, 2012

Fixed in 63ff449

@shlevy
shlevy commented Sep 18, 2012

@indutny Did 63ff449 add a stream interface for crypto?

@langpavel

@shlevy I don't think so, there is no stream interface yet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment