Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crypto: improve randomUUID performance #37243

Closed

Conversation

rangoo94
Copy link
Contributor

@rangoo94 rangoo94 commented Feb 6, 2021

I very like the idea of including UUIDv4 into node.js core. I think that maximizing its performance could lead to further standardization. The initial version still had the potential for improvements, so I took some effort into it.

Benchmark results

After each step (separate commit), I ran crypto/randomUUID.js benchmark to observe the performance difference.

  • Cache - entropy cache
  • Pre-allocated - memory reserved on crypto module initialization
  • Post-allocated - memory reserved after first crypto.randomUUID() call
Step Description With cache % Without cache % Pre-allocated Post-allocated
0 Current implementation 3,630,047 100% 444,904 100% 144B (kHexDigits) 2048B (cache)
1 Decompose operations to separate functions 3,755,310 103% 448,765 100% 144B (kHexDigits) 2048B (cache)
2 Eliminate intermediate buffer (slice on entropy cache) 4,826,939 132% 442,647 99% 144B (kHexDigits) 2048B (cache)
3 Optimize disableEntropyCache access 5,327,442 146% 445,531 100% 144B (kHexDigits) 2048B (cache)
4 Serialize UUID per byte (cache 00-ff strings) 18,173,445 500% 474,999 106% 2064B (kHexBytes) 2048B (cache)
5 Allocate kHexBytes on first randomUUID() call 16,416,650 452% 466,446 104% 0B 2048B (cache) + 2064B (kHexBytes)

Entropy cache size

Entropy cache size contributes to the performance, so I prepared a matrix of different sizes on different variants for comparison.

Step 1024B % 2048B (current) % 3072B % 4096B % 5120B %
3 4,896,245 134% 5,327,442 146% 5,502,509 151% 5,557,077 153% 5,623,439 154%
4 13,795,762 380% 18,173,445 500% 19,831,019 546% 20,960,149 577% 21,807,676 600%
5 12,673,362 349% 16,416,650 452% 17,768,837 489% 18,796,783 517% 19,255,872 530%

Increasing the entropy cache could be considered for variants 4 and 5, as it will improve ~10% per 1KB of additional cache.

Summary

There are 3 approaches to include the improvements, depending on what is expected:

  • Minimal resources - variant 3 - it simplifies the current algorithm without any downsides and ~50% improvement
  • Maximized performance - variant 4 - it's far faster, but using crypto, even without randomUUID, will take ~2KB of memory
  • Good performance, opt-in - variant 5 -similar to 4 (slightly slower), but these additional ~2KB will be allocated only after script will use randomUUID

What are your thoughts about that?

Checklist

  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines

@nodejs-github-bot nodejs-github-bot added crypto Issues and PRs related to the crypto subsystem. v15.x labels Feb 6, 2021
@rangoo94 rangoo94 force-pushed the improve-crypto-randomuuid-performance branch from a31a956 to d504e9a Compare February 6, 2021 00:30
@devsnek devsnek removed the v15.x label Feb 6, 2021
@devsnek devsnek requested a review from jasnell February 6, 2021 00:35
@devsnek
Copy link
Member

devsnek commented Feb 6, 2021

@rangoo94 could you please retarget this PR onto the master branch? (you can do it from the gh ui by clicking edit and then selecting a different branch)

@rangoo94 rangoo94 changed the base branch from v15.x to master February 6, 2021 00:37
@rangoo94 rangoo94 force-pushed the improve-crypto-randomuuid-performance branch from d504e9a to 0f68e30 Compare February 6, 2021 00:40
@rangoo94
Copy link
Contributor Author

rangoo94 commented Feb 6, 2021

@devsnek, thanks, sorry, I overlooked that. I rebased it now on top of the master branch.

@rangoo94 rangoo94 force-pushed the improve-crypto-randomuuid-performance branch from a466943 to 4bbe9fe Compare February 6, 2021 11:09
Comment on lines 303 to 318
let uuidBatch = 0;

let hexBytesCache;
function getHexBytes() {
if (hexBytesCache === undefined) {
hexBytesCache = new Array(256);
for (let i = 0; i < hexBytesCache.length; i++) {
const hex = NumberPrototypeToString(i, 16);
hexBytesCache[i] = StringPrototypePadStart(hex, 2, '0');
}
}
return hexBytesCache;
}

function serializeUUID(buf, offset = 0) {
const kHexBytes = getHexBytes();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To simplifify things a bit further here ... just generate the hex array on module load...

Suggested change
let uuidBatch = 0;
let hexBytesCache;
function getHexBytes() {
if (hexBytesCache === undefined) {
hexBytesCache = new Array(256);
for (let i = 0; i < hexBytesCache.length; i++) {
const hex = NumberPrototypeToString(i, 16);
hexBytesCache[i] = StringPrototypePadStart(hex, 2, '0');
}
}
return hexBytesCache;
}
function serializeUUID(buf, offset = 0) {
const kHexBytes = getHexBytes();
let uuidBatch = 0;
const kHexBytes = new Array(256);
for (let i = 0; i < kHexBytes.length; i++) {
const hex = NumberPrototypeToString(i, 16);
kHexBytes[i] = StringPrototypePadStart(hex, 2, '0');
}
function serializeUUID(buf, offset = 0) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I like this simplification (4th variant), but it reserves 2KB of data immediately after loading crypto. I couldn't imagine a case where it could be a problem, but - for safety - I introduced a lazy getter to avoid that.

Just to confirm, does it mean that this 2KB allocation is negligible? :)

Copy link
Contributor Author

@rangoo94 rangoo94 Feb 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the other hand - when the kHexBytes is initialized statically without the for loop, the results are ~8% better. The downside is that it costs the code space too.

Do you think that it's worth speeding it up this way instead?

const kHexBytes = [
  '00', '01', '02', '03', '04', '05', '06', '07', '08', '09', '0a', '0b', '0c',
  '0d', '0e', '0f', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19',
  '1a', '1b', '1c', '1d', '1e', '1f', '20', '21', '22', '23', '24', '25', '26',
  '27', '28', '29', '2a', '2b', '2c', '2d', '2e', '2f', '30', '31', '32', '33',
  '34', '35', '36', '37', '38', '39', '3a', '3b', '3c', '3d', '3e', '3f', '40',
  '41', '42', '43', '44', '45', '46', '47', '48', '49', '4a', '4b', '4c', '4d',
  '4e', '4f', '50', '51', '52', '53', '54', '55', '56', '57', '58', '59', '5a',
  '5b', '5c', '5d', '5e', '5f', '60', '61', '62', '63', '64', '65', '66', '67',
  '68', '69', '6a', '6b', '6c', '6d', '6e', '6f', '70', '71', '72', '73', '74',
  '75', '76', '77', '78', '79', '7a', '7b', '7c', '7d', '7e', '7f', '80', '81',
  '82', '83', '84', '85', '86', '87', '88', '89', '8a', '8b', '8c', '8d', '8e',
  '8f', '90', '91', '92', '93', '94', '95', '96', '97', '98', '99', '9a', '9b',
  '9c', '9d', '9e', '9f', 'a0', 'a1', 'a2', 'a3', 'a4', 'a5', 'a6', 'a7', 'a8',
  'a9', 'aa', 'ab', 'ac', 'ad', 'ae', 'af', 'b0', 'b1', 'b2', 'b3', 'b4', 'b5',
  'b6', 'b7', 'b8', 'b9', 'ba', 'bb', 'bc', 'bd', 'be', 'bf', 'c0', 'c1', 'c2',
  'c3', 'c4', 'c5', 'c6', 'c7', 'c8', 'c9', 'ca', 'cb', 'cc', 'cd', 'ce', 'cf',
  'd0', 'd1', 'd2', 'd3', 'd4', 'd5', 'd6', 'd7', 'd8', 'd9', 'da', 'db', 'dc',
  'dd', 'de', 'df', 'e0', 'e1', 'e2', 'e3', 'e4', 'e5', 'e6', 'e7', 'e8', 'e9',
  'ea', 'eb', 'ec', 'ed', 'ee', 'ef', 'f0', 'f1', 'f2', 'f3', 'f4', 'f5', 'f6',
  'f7', 'f8', 'f9', 'fa', 'fb', 'fc', 'fd', 'fe', 'ff'
];

@Trott Trott added the needs-benchmark-ci PR that need a benchmark CI run. label Feb 6, 2021
@Trott
Copy link
Member

Trott commented Feb 6, 2021

@devsnek
Copy link
Member

devsnek commented Feb 7, 2021

                                                     confidence improvement accuracy (*)   (**)  (***)
crypto/randomUUID.jsdisableEntropyCache=0 n=10000000        ***    286.00 %       ±5.43% ±7.25% ±9.48%
crypto/randomUUID.jsdisableEntropyCache=1 n=10000000         **      2.95 %       ±1.83% ±2.44% ±3.19%

Be aware that when doing many comparisons the risk of a false-positive
result increases. In this case, there are 2 comparisons, you can thus
expect the following amount of false-positive results:
  0.10 false positives, when considering a   5% risk acceptance (*, **, ***),
  0.02 false positives, when considering a   1% risk acceptance (**, ***),
  0.00 false positives, when considering a 0.1% risk acceptance (***)

@aduh95 aduh95 added backport-requested-v14.x and removed needs-benchmark-ci PR that need a benchmark CI run. labels Feb 7, 2021
Copy link
Contributor

@aduh95 aduh95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'd prefer to have approval from @nodejs/crypto before landing this.

@aduh95 aduh95 added author ready PRs that have at least one approval, no pending requests for changes, and a CI started. review wanted PRs that need reviews. labels Feb 19, 2021
@nodejs-github-bot
Copy link
Collaborator

@aduh95
Copy link
Contributor

aduh95 commented Feb 22, 2021

@rangoo94 can you please rebase on top of master to solve the git conflict?

@aduh95 aduh95 removed the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 22, 2021
@rangoo94 rangoo94 force-pushed the improve-crypto-randomuuid-performance branch from 1874e77 to 3eb8e89 Compare February 22, 2021 18:30
@rangoo94
Copy link
Contributor Author

@aduh95 sure, rebased :)

@nodejs-github-bot
Copy link
Collaborator

@aduh95 aduh95 added the author ready PRs that have at least one approval, no pending requests for changes, and a CI started. label Feb 24, 2021
@rangoo94 rangoo94 force-pushed the improve-crypto-randomuuid-performance branch from 3eb8e89 to 61f67cf Compare February 24, 2021 18:17
@rangoo94
Copy link
Contributor Author

Rebased once again on top of master - test-asan was failing earlier for all PRs, but I see that now there is one that succeed.

@nodejs-github-bot
Copy link
Collaborator

nodejs-github-bot commented Feb 26, 2021

CI: https://ci.nodejs.org/job/node-test-pull-request/36368/
Benchmark CI: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/955/

Benchmark results:

                                                     confidence improvement accuracy (*)   (**)   (***)
crypto/randomUUID.jsdisableEntropyCache=0 n=10000000        ***    285.58 %       ±7.33% ±9.83% ±12.95%
crypto/randomUUID.jsdisableEntropyCache=1 n=10000000                -1.61 %       ±2.32% ±3.08%  ±4.01%

jasnell pushed a commit that referenced this pull request Mar 5, 2021
PR-URL: #37243
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
@jasnell
Copy link
Member

jasnell commented Mar 5, 2021

Landed in 5694f7f

@jasnell jasnell closed this Mar 5, 2021
danielleadams pushed a commit that referenced this pull request Mar 16, 2021
PR-URL: #37243
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
@lukeed
Copy link

lukeed commented Mar 25, 2021

I arrived to similar conclusions (using the same approach) a year ago: @lukeed/uuid. I ran a matrix of cache-size tests then too, and found that a "kBatchSize" of 256 was the best performer, even if it meant a bit more memory usage upfront.

The first implementation of my module actually only incremented the cache-window's by 1 (instead of 16), meaning that a shorter buffer could last longer. This was actually a suggestion from a developer who implemented the approach as Crystal library (and eventually made its way into Crystal's stdlib). While it's not as secure, perhaps this could be added as an option to avoid regenerating buffers more than is necessary?

@rangoo94
Copy link
Contributor Author

rangoo94 commented Mar 27, 2021

Hi @lukeed, thanks for the comment! The solution with incrementing offset by 1 seems interesting in terms of performance (~15% faster), but it may introduce security issues within 2 dimensions:

  • Collisions: I wasn't able to prove mathematically that it's increasing the chance of collisions, but I think that it may cause some issues, given that crypto.randomBytes will provide pseudo-random numbers
  • Uniqueness: when you will receive UUID, you may easily infer the next or previous UUID (there are up to 256 options)

While collisions could be acceptable (maybe not in the stdlib anyway), the lack of uniqueness is both very dangerous and not applicable to the standard.

As an example, if the online shop would generate http://example.shop.com/order/<uuid> link, the person who will receive it can reach personal data from both previous and next order:

  • I receive http://example.shop.com/order/27579150-9831-45f9-b3ba-9050aa37237d
  • Previous order is /[0-9a-f]{2}275791-5098-41e5-b9f3-ba9050aa3723/
  • Next order is /57915098-31e5-49f3-ba90-50aa37237d[0-9a-f]{2}/

Basically, I think that this idea is really great, but only in very specific circumstances, though should be rather done as separate library.

@lukeed
Copy link

lukeed commented Mar 27, 2021

Right, it's less secure. That's why the suggestion came with a "behind an option" requirement :) It should definitely not be the default, but there may be use cases where the developer need not be concerned with an end-user guessing new variants.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. crypto Issues and PRs related to the crypto subsystem. review wanted PRs that need reviews.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants