crypto: improve randomUUID performance #37243

rangoo94 · 2021-02-06T00:23:42Z

I very like the idea of including UUIDv4 into node.js core. I think that maximizing its performance could lead to further standardization. The initial version still had the potential for improvements, so I took some effort into it.

Benchmark results

After each step (separate commit), I ran crypto/randomUUID.js benchmark to observe the performance difference.

Cache - entropy cache
Pre-allocated - memory reserved on crypto module initialization
Post-allocated - memory reserved after first crypto.randomUUID() call

Step	Description	With cache	%	Without cache	%	Pre-allocated	Post-allocated
0	Current implementation	3,630,047	100%	444,904	100%	144B (`kHexDigits`)	2048B (cache)
1	Decompose operations to separate functions	3,755,310	103%	448,765	100%	144B (`kHexDigits`)	2048B (cache)
2	Eliminate intermediate buffer (`slice` on entropy cache)	4,826,939	132%	442,647	99%	144B (`kHexDigits`)	2048B (cache)
3	Optimize `disableEntropyCache` access	5,327,442	146%	445,531	100%	144B (`kHexDigits`)	2048B (cache)
4	Serialize UUID per byte (cache `00`-`ff` strings)	18,173,445	500%	474,999	106%	2064B (`kHexBytes`)	2048B (cache)
5	Allocate `kHexBytes` on first `randomUUID()` call	16,416,650	452%	466,446	104%	0B	2048B (cache) + 2064B (`kHexBytes`)

Entropy cache size

Entropy cache size contributes to the performance, so I prepared a matrix of different sizes on different variants for comparison.

Step	1024B	%	2048B (current)	%	3072B	%	4096B	%	5120B	%
3	4,896,245	134%	5,327,442	146%	5,502,509	151%	5,557,077	153%	5,623,439	154%
4	13,795,762	380%	18,173,445	500%	19,831,019	546%	20,960,149	577%	21,807,676	600%
5	12,673,362	349%	16,416,650	452%	17,768,837	489%	18,796,783	517%	19,255,872	530%

Increasing the entropy cache could be considered for variants 4 and 5, as it will improve ~10% per 1KB of additional cache.

Summary

There are 3 approaches to include the improvements, depending on what is expected:

Minimal resources - variant 3 - it simplifies the current algorithm without any downsides and ~50% improvement
Maximized performance - variant 4 - it's far faster, but using crypto, even without randomUUID, will take ~2KB of memory
Good performance, opt-in - variant 5 -similar to 4 (slightly slower), but these additional ~2KB will be allocated only after script will use randomUUID

What are your thoughts about that?

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
documentation is changed or added
commit message follows commit guidelines

devsnek · 2021-02-06T00:36:41Z

@rangoo94 could you please retarget this PR onto the master branch? (you can do it from the gh ui by clicking edit and then selecting a different branch)

rangoo94 · 2021-02-06T00:41:41Z

@devsnek, thanks, sorry, I overlooked that. I rebased it now on top of the master branch.

lib/internal/crypto/random.js

jasnell · 2021-02-06T16:24:48Z

lib/internal/crypto/random.js

 let uuidBatch = 0;

+let hexBytesCache;
+function getHexBytes() {
+  if (hexBytesCache === undefined) {
+    hexBytesCache = new Array(256);
+    for (let i = 0; i < hexBytesCache.length; i++) {
+      const hex = NumberPrototypeToString(i, 16);
+      hexBytesCache[i] = StringPrototypePadStart(hex, 2, '0');
+    }
+  }
+  return hexBytesCache;
+}
+
+function serializeUUID(buf, offset = 0) {
+  const kHexBytes = getHexBytes();


To simplifify things a bit further here ... just generate the hex array on module load...

Suggested change

let uuidBatch = 0;

let hexBytesCache;

function getHexBytes() {

if (hexBytesCache === undefined) {

hexBytesCache = new Array(256);

for (let i = 0; i < hexBytesCache.length; i++) {

const hex = NumberPrototypeToString(i, 16);

hexBytesCache[i] = StringPrototypePadStart(hex, 2, '0');

}

}

return hexBytesCache;

}

function serializeUUID(buf, offset = 0) {

const kHexBytes = getHexBytes();

let uuidBatch = 0;

const kHexBytes = new Array(256);

for (let i = 0; i < kHexBytes.length; i++) {

const hex = NumberPrototypeToString(i, 16);

kHexBytes[i] = StringPrototypePadStart(hex, 2, '0');

}

function serializeUUID(buf, offset = 0) {

Thanks! I like this simplification (4th variant), but it reserves 2KB of data immediately after loading crypto. I couldn't imagine a case where it could be a problem, but - for safety - I introduced a lazy getter to avoid that.

Just to confirm, does it mean that this 2KB allocation is negligible? :)

On the other hand - when the kHexBytes is initialized statically without the for loop, the results are ~8% better. The downside is that it costs the code space too.

Do you think that it's worth speeding it up this way instead?

const kHexBytes = [ '00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '0a', '0b', '0c', '0d', '0e', '0f', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '1a', '1b', '1c', '1d', '1e', '1f', '20', '21', '22', '23', '24', '25', '26', '27', '28', '29', '2a', '2b', '2c', '2d', '2e', '2f', '30', '31', '32', '33', '34', '35', '36', '37', '38', '39', '3a', '3b', '3c', '3d', '3e', '3f', '40', '41', '42', '43', '44', '45', '46', '47', '48', '49', '4a', '4b', '4c', '4d', '4e', '4f', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '5a', '5b', '5c', '5d', '5e', '5f', '60', '61', '62', '63', '64', '65', '66', '67', '68', '69', '6a', '6b', '6c', '6d', '6e', '6f', '70', '71', '72', '73', '74', '75', '76', '77', '78', '79', '7a', '7b', '7c', '7d', '7e', '7f', '80', '81', '82', '83', '84', '85', '86', '87', '88', '89', '8a', '8b', '8c', '8d', '8e', '8f', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99', '9a', '9b', '9c', '9d', '9e', '9f', 'a0', 'a1', 'a2', 'a3', 'a4', 'a5', 'a6', 'a7', 'a8', 'a9', 'aa', 'ab', 'ac', 'ad', 'ae', 'af', 'b0', 'b1', 'b2', 'b3', 'b4', 'b5', 'b6', 'b7', 'b8', 'b9', 'ba', 'bb', 'bc', 'bd', 'be', 'bf', 'c0', 'c1', 'c2', 'c3', 'c4', 'c5', 'c6', 'c7', 'c8', 'c9', 'ca', 'cb', 'cc', 'cd', 'ce', 'cf', 'd0', 'd1', 'd2', 'd3', 'd4', 'd5', 'd6', 'd7', 'd8', 'd9', 'da', 'db', 'dc', 'dd', 'de', 'df', 'e0', 'e1', 'e2', 'e3', 'e4', 'e5', 'e6', 'e7', 'e8', 'e9', 'ea', 'eb', 'ec', 'ed', 'ee', 'ef', 'f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8', 'f9', 'fa', 'fb', 'fc', 'fd', 'fe', 'ff' ];

lib/internal/crypto/random.js

Trott · 2021-02-06T17:21:51Z

Benchmark: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/938/ (queued)

devsnek · 2021-02-07T04:31:26Z

                                                     confidence improvement accuracy (*)   (**)  (***)
crypto/randomUUID.jsdisableEntropyCache=0 n=10000000        ***    286.00 %       ±5.43% ±7.25% ±9.48%
crypto/randomUUID.jsdisableEntropyCache=1 n=10000000         **      2.95 %       ±1.83% ±2.44% ±3.19%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case, there are 2 comparisons, you can thus
expect the following amount of false-positive results:
  0.10 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.02 false positives, when considering a   1% risk acceptance (**, ***),
  0.00 false positives, when considering a 0.1% risk acceptance (***)

aduh95

LGTM, but I'd prefer to have approval from @nodejs/crypto before landing this.

nodejs-github-bot · 2021-02-19T11:43:39Z

CI: https://ci.nodejs.org/job/node-test-pull-request/36240/

aduh95 · 2021-02-22T11:20:46Z

@rangoo94 can you please rebase on top of master to solve the git conflict?

rangoo94 · 2021-02-22T18:41:27Z

@aduh95 sure, rebased :)

nodejs-github-bot · 2021-02-23T23:59:35Z

CI: https://ci.nodejs.org/job/node-test-pull-request/36329/

Co-authored-by: mscdex <mscdex@users.noreply.github.com>

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>

Co-authored-by: James M Snell <jasnell@gmail.com>

rangoo94 · 2021-02-24T18:20:27Z

Rebased once again on top of master - test-asan was failing earlier for all PRs, but I see that now there is one that succeed.

nodejs-github-bot · 2021-02-26T10:14:15Z

CI: https://ci.nodejs.org/job/node-test-pull-request/36368/
Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/955/

Benchmark results:

                                                     confidence improvement accuracy (*)   (**)   (***)
crypto/randomUUID.jsdisableEntropyCache=0 n=10000000        ***    285.58 %       ±7.33% ±9.83% ±12.95%
crypto/randomUUID.jsdisableEntropyCache=1 n=10000000                -1.61 %       ±2.32% ±3.08%  ±4.01%

PR-URL: #37243 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>

jasnell · 2021-03-05T17:47:35Z

Landed in 5694f7f

PR-URL: #37243 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>

lukeed · 2021-03-25T17:32:09Z

I arrived to similar conclusions (using the same approach) a year ago: @lukeed/uuid. I ran a matrix of cache-size tests then too, and found that a "kBatchSize" of 256 was the best performer, even if it meant a bit more memory usage upfront.

The first implementation of my module actually only incremented the cache-window's by 1 (instead of 16), meaning that a shorter buffer could last longer. This was actually a suggestion from a developer who implemented the approach as Crystal library (and eventually made its way into Crystal's stdlib). While it's not as secure, perhaps this could be added as an option to avoid regenerating buffers more than is necessary?

rangoo94 · 2021-03-27T10:51:57Z

Hi @lukeed, thanks for the comment! The solution with incrementing offset by 1 seems interesting in terms of performance (~15% faster), but it may introduce security issues within 2 dimensions:

Collisions: I wasn't able to prove mathematically that it's increasing the chance of collisions, but I think that it may cause some issues, given that crypto.randomBytes will provide pseudo-random numbers
Uniqueness: when you will receive UUID, you may easily infer the next or previous UUID (there are up to 256 options)

While collisions could be acceptable (maybe not in the stdlib anyway), the lack of uniqueness is both very dangerous and not applicable to the standard.

As an example, if the online shop would generate http://example.shop.com/order/<uuid> link, the person who will receive it can reach personal data from both previous and next order:

I receive http://example.shop.com/order/27579150-9831-45f9-b3ba-9050aa37237d
Previous order is /[0-9a-f]{2}275791-5098-41e5-b9f3-ba9050aa3723/
Next order is /57915098-31e5-49f3-ba90-50aa37237d[0-9a-f]{2}/

Basically, I think that this idea is really great, but only in very specific circumstances, though should be rather done as separate library.

lukeed · 2021-03-27T12:09:02Z

Right, it's less secure. That's why the suggestion came with a "behind an option" requirement :) It should definitely not be the default, but there may be use cases where the developer need not be concerned with an end-user guessing new variants.

nodejs-github-bot added crypto Issues and PRs related to the crypto subsystem. v15.x labels Feb 6, 2021

rangoo94 force-pushed the improve-crypto-randomuuid-performance branch from a31a956 to d504e9a Compare February 6, 2021 00:30

devsnek removed the v15.x label Feb 6, 2021

devsnek requested a review from jasnell February 6, 2021 00:35

rangoo94 changed the base branch from v15.x to master February 6, 2021 00:37

rangoo94 force-pushed the improve-crypto-randomuuid-performance branch from d504e9a to 0f68e30 Compare February 6, 2021 00:40

mscdex reviewed Feb 6, 2021

View reviewed changes