Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse pool more times #4

Closed
KrishnaPG opened this issue Nov 26, 2018 · 7 comments
Closed

Reuse pool more times #4

KrishnaPG opened this issue Nov 26, 2018 · 7 comments

Comments

@KrishnaPG
Copy link

KrishnaPG commented Nov 26, 2018

This is good work.

Currently with the bufIdx += n the uuid.BUFFER_SIZE pool of random bytes gets exhausted after uuid.BUFFER_SIZE / n times (e.g. 512 / 16), which is not much.

In practice, this can be just bufIdx ++ and pool gets used uuid.BUFFER_SIZE - n times (e.g. 512 - 16)

There is no effect on uniqueness, since unless all the n + 1 bytes are equal in value, bufIdx ++ is as random as bufIdx += n

This improves the mileage.

Something like buf.slice(bufIdx, bufIdx + n), modifying the condition to be (++bufIdx + n) > uuid.BUFFER_SIZE)

@jchook
Copy link
Owner

jchook commented Nov 26, 2018

Interesting suggestion. However, this would make sequential UUIDs quite predictable. Even though the spec says not to, I've seen people use UUID as security tokens in the wild.

@KrishnaPG
Copy link
Author

KrishnaPG commented Nov 27, 2018

Sample starting point in a random pool of bytes should not have any effect on the outcomes. If it is becoming predictable, then there is some bug in the underlying crypto. You should report it.

@jchook
Copy link
Owner

jchook commented Nov 27, 2018

Can you explain this a little more?

I thought the bytes in the buffer were used to directly construct the UUID so I picture the sequential generation in this case as:

  1. abcd
  2. bcde
  3. cdef
  4. defg

@KrishnaPG
Copy link
Author

Yes. Except, the underlying buffer itself is not sequential. It is random bytes. The chance of generating "abcd..." sequence in the buffer by the Cyrpto (here) is highly improbable.

If that were the case, then the bufIdx += n would not be any safe either, since,

  1. abcd....mnopqrst
  2. pqrst...xyz...

By random what it means is that there is no correlation between buffer[k] and buffer[k+1] or buffer[k+n] in general. If there is, then the algorithm filling the buffer (in this case, the Crypto.randomBytes () etc.) is broken.

For that matter, we can reuse this buffer so many times, such as

  • bufIdx += n
  • bufIdx += 1
  • bufIdx += 2
  • bufIdx += 3 ... so on, and then, when it reaches the end come in reverse
  • bufIdx-- and splice the buffer from bufIdx to bufIdx -n

@KrishnaPG
Copy link
Author

KrishnaPG commented Nov 27, 2018

In other words, the buffer returned by the crypto randomBytes / randomValues function mathematically satisfies the below condition:

Assuming the buffer size is M bytes, any combination of n bytes sampled from the buffer (either sequentially or randomly), will all generate unique values. That is, the buffer is capable of generating Mcn = M! / ( n! * (M-n)!) number of unique strings (of length n), out of which bufIdx += n produces M/n number of strings, bufIdx += 1 produces M-n number of strings, bufIdx += 2 produces (M-n)/2 number of strings so on, all of which when you combine should form the set Mcn unique sample space.

Based on this, you can write an ultimate uniqueness test case, such as below, to test this theory:

 buf = randomBytes(uuid.BUFFER_SIZE);

 // generate all possible combinations
 const maxSamples = factorial(uuid.BUFFER_SIZE) / (factorial(n) * factorial(uuid.BUFFER_SIZE - n));
 const allCombinations = [];
 for(let i =0; i < maxSamples; i++)
     allCombinations.push(getCombination(i, buf, n));

// ensure there are no duplicates in the generated combinations
 for(let i =0; i < maxSamples; ++i)
    for(let j=0; j < i; j++)
        if(allCombinations[i] == allCombinations[j]) throw "duplicate found"

This test will succeed for any moderate random algorithm implementation. Because, to generate unique ids of length n, you would only (M - n +1) bytes in the buffer need to be unique. The remaining n-1 bytes all can be of same value.

For example, buffer holding "aaaabcdef" bytes is sure to generate unique strings of length 5 no matter which combination of 5 letters you take from that buffer (despite having 4 a's). (However, the chance of getting consecutive a's, or the sequential bytes such as "bcdef..." are very improbable from Crypto, especially for longer strings)

In other words, you can simplify the above test code, to check if there are (M - n +1) unique bytes in the generated buffer, without having to enumerate all the combinations individually. Anything less that that number is bound to give you duplicates when sampling exhaustively from the buffer.

The bigger challenge here is, not local uniqueness, rather global / universal uniqueness. But that is purely dependent on the randomBytes generated by the Crypto (one of the reasons why they try to take MAC address etc. into consideration while generating these random bytes). If the generated buffer is guaranteed to be globally unique, then the sampling order, such as bufIdx += n or bufIdx ++ , does not matter (as long as there are (M - n +1) unique bytes in the generated buffer).

(Sidenote: for those who are interested in knowing how to generate all the combinations, permutations or the i^th combination in general, from the buffer, refer to my Sequence Indexing )

@jchook
Copy link
Owner

jchook commented Nov 29, 2018

Hey, exquisite explanation and thank you kindly for that.

I did try your suggestion and I believe everything you say is correct about the uniqueness.

However, I still believe that it makes sequential UUIDs predictable. I know this isn't a requirement of the v4 spec, but I do find unpredictability to be an expectation of users.

See the sequential UUIDs generated below using your ++bufIdx suggestion:

4b64a1ac-722f-44a5-8a9d-8fb718f9398d
64a1ac72-2f44-458a-9d8f-b718f9398de5
a1ac722f-4445-4a9d-8fb7-18f9398de536
ac722f44-454a-4d8f-b718-f9398de5368e
722f4445-4a4d-4fb7-98f9-398de5368ec3
2f44454a-4d4f-4798-b939-8de5368ec3b6
44454a4d-4f47-48b9-b98d-e5368ec3b681

Notice how they follow a predictable pattern (roughly x << 8 & random_byte). If an attacker knew the previous UUID, he could guess the next by testing only 2^8 possibilities.

Edit: I will add that your suggestion more than doubles the performance and may be worth adding a setting for folks who don't need unpredictability.

@KrishnaPG
Copy link
Author

KrishnaPG commented Nov 30, 2018

I see your point regarding the predictability. You are right, someone needs to just try 256 possibilities. Certainly cannot be used for crypto. The example IDs illustrated the point more clearly - thank you for pointing it out.

I like your idea of a flag controlling the behavior for performance. In many cases, such as database records, where _id is generated for each record, or device ids in a manufacturing batch, voter IDs etc. where predictability is not a concern, but performance could be critical, it will be useful,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants