Reuse pool more times #4

KrishnaPG · 2018-11-26T16:17:40Z

This is good work.

Currently with the bufIdx += n the uuid.BUFFER_SIZE pool of random bytes gets exhausted after uuid.BUFFER_SIZE / n times (e.g. 512 / 16), which is not much.

In practice, this can be just bufIdx ++ and pool gets used uuid.BUFFER_SIZE - n times (e.g. 512 - 16)

There is no effect on uniqueness, since unless all the n + 1 bytes are equal in value, bufIdx ++ is as random as bufIdx += n

This improves the mileage.

Something like buf.slice(bufIdx, bufIdx + n), modifying the condition to be (++bufIdx + n) > uuid.BUFFER_SIZE)

The text was updated successfully, but these errors were encountered:

jchook · 2018-11-26T19:32:12Z

Interesting suggestion. However, this would make sequential UUIDs quite predictable. Even though the spec says not to, I've seen people use UUID as security tokens in the wild.

KrishnaPG · 2018-11-27T03:48:52Z

Sample starting point in a random pool of bytes should not have any effect on the outcomes. If it is becoming predictable, then there is some bug in the underlying crypto. You should report it.

jchook · 2018-11-27T04:40:37Z

Can you explain this a little more?

I thought the bytes in the buffer were used to directly construct the UUID so I picture the sequential generation in this case as:

abcd
bcde
cdef
defg

KrishnaPG · 2018-11-27T06:43:46Z

Yes. Except, the underlying buffer itself is not sequential. It is random bytes. The chance of generating "abcd..." sequence in the buffer by the Cyrpto (here) is highly improbable.

If that were the case, then the bufIdx += n would not be any safe either, since,

abcd....mnopqrst
pqrst...xyz...

By random what it means is that there is no correlation between buffer[k] and buffer[k+1] or buffer[k+n] in general. If there is, then the algorithm filling the buffer (in this case, the Crypto.randomBytes () etc.) is broken.

For that matter, we can reuse this buffer so many times, such as

bufIdx += n
bufIdx += 1
bufIdx += 2
bufIdx += 3 ... so on, and then, when it reaches the end come in reverse
bufIdx-- and splice the buffer from bufIdx to bufIdx -n

KrishnaPG · 2018-11-27T07:38:33Z

In other words, the buffer returned by the crypto randomBytes / randomValues function mathematically satisfies the below condition:

Assuming the buffer size is M bytes, any combination of n bytes sampled from the buffer (either sequentially or randomly), will all generate unique values. That is, the buffer is capable of generating Mcn = M! / ( n! * (M-n)!) number of unique strings (of length n), out of which bufIdx += n produces M/n number of strings, bufIdx += 1 produces M-n number of strings, bufIdx += 2 produces (M-n)/2 number of strings so on, all of which when you combine should form the set Mcn unique sample space.

Based on this, you can write an ultimate uniqueness test case, such as below, to test this theory:

 buf = randomBytes(uuid.BUFFER_SIZE);

 // generate all possible combinations
 const maxSamples = factorial(uuid.BUFFER_SIZE) / (factorial(n) * factorial(uuid.BUFFER_SIZE - n));
 const allCombinations = [];
 for(let i =0; i < maxSamples; i++)
     allCombinations.push(getCombination(i, buf, n));

// ensure there are no duplicates in the generated combinations
 for(let i =0; i < maxSamples; ++i)
    for(let j=0; j < i; j++)
        if(allCombinations[i] == allCombinations[j]) throw "duplicate found"

This test will succeed for any moderate random algorithm implementation. Because, to generate unique ids of length n, you would only (M - n +1) bytes in the buffer need to be unique. The remaining n-1 bytes all can be of same value.

For example, buffer holding "aaaabcdef" bytes is sure to generate unique strings of length 5 no matter which combination of 5 letters you take from that buffer (despite having 4 a's). (However, the chance of getting consecutive a's, or the sequential bytes such as "bcdef..." are very improbable from Crypto, especially for longer strings)

In other words, you can simplify the above test code, to check if there are (M - n +1) unique bytes in the generated buffer, without having to enumerate all the combinations individually. Anything less that that number is bound to give you duplicates when sampling exhaustively from the buffer.

The bigger challenge here is, not local uniqueness, rather global / universal uniqueness. But that is purely dependent on the randomBytes generated by the Crypto (one of the reasons why they try to take MAC address etc. into consideration while generating these random bytes). If the generated buffer is guaranteed to be globally unique, then the sampling order, such as bufIdx += n or bufIdx ++ , does not matter (as long as there are (M - n +1) unique bytes in the generated buffer).

(Sidenote: for those who are interested in knowing how to generate all the combinations, permutations or the i^th combination in general, from the buffer, refer to my Sequence Indexing )

jchook · 2018-11-29T23:35:59Z

Hey, exquisite explanation and thank you kindly for that.

I did try your suggestion and I believe everything you say is correct about the uniqueness.

However, I still believe that it makes sequential UUIDs predictable. I know this isn't a requirement of the v4 spec, but I do find unpredictability to be an expectation of users.

See the sequential UUIDs generated below using your ++bufIdx suggestion:

4b64a1ac-722f-44a5-8a9d-8fb718f9398d
64a1ac72-2f44-458a-9d8f-b718f9398de5
a1ac722f-4445-4a9d-8fb7-18f9398de536
ac722f44-454a-4d8f-b718-f9398de5368e
722f4445-4a4d-4fb7-98f9-398de5368ec3
2f44454a-4d4f-4798-b939-8de5368ec3b6
44454a4d-4f47-48b9-b98d-e5368ec3b681

Notice how they follow a predictable pattern (roughly x << 8 & random_byte). If an attacker knew the previous UUID, he could guess the next by testing only 2^8 possibilities.

Edit: I will add that your suggestion more than doubles the performance and may be worth adding a setting for folks who don't need unpredictability.

KrishnaPG · 2018-11-30T01:18:13Z

I see your point regarding the predictability. You are right, someone needs to just try 256 possibilities. Certainly cannot be used for crypto. The example IDs illustrated the point more clearly - thank you for pointing it out.

I like your idea of a flag controlling the behavior for performance. In many cases, such as database records, where _id is generated for each record, or device ids in a manufacturing batch, voter IDs etc. where predictability is not a concern, but performance could be critical, it will be useful,

KrishnaPG closed this as completed Nov 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse pool more times #4

Reuse pool more times #4

KrishnaPG commented Nov 26, 2018 •

edited

jchook commented Nov 26, 2018

KrishnaPG commented Nov 27, 2018 •

edited

jchook commented Nov 27, 2018

KrishnaPG commented Nov 27, 2018

KrishnaPG commented Nov 27, 2018 •

edited

jchook commented Nov 29, 2018 •

edited

KrishnaPG commented Nov 30, 2018 •

edited

Reuse pool more times #4

Reuse pool more times #4

Comments

KrishnaPG commented Nov 26, 2018 • edited

jchook commented Nov 26, 2018

KrishnaPG commented Nov 27, 2018 • edited

jchook commented Nov 27, 2018

KrishnaPG commented Nov 27, 2018

KrishnaPG commented Nov 27, 2018 • edited

jchook commented Nov 29, 2018 • edited

KrishnaPG commented Nov 30, 2018 • edited

KrishnaPG commented Nov 26, 2018 •

edited

KrishnaPG commented Nov 27, 2018 •

edited

KrishnaPG commented Nov 27, 2018 •

edited

jchook commented Nov 29, 2018 •

edited

KrishnaPG commented Nov 30, 2018 •

edited