New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Easy User-land CSPRNG #1119

Closed
wants to merge 23 commits into
base: master
from

Conversation

@SammyK
Contributor

SammyK commented Feb 24, 2015

Submitting as a PR per @nikic's recommendation.

RFC can be found here.

@lt

This comment has been minimized.

lt commented on ext/standard/random.c in f8d7aec Feb 14, 2015

I wonder about the usefulness of this function. This can be easily achieved with random_bytes and bin2hex, so I feel like this is unnecessary duplication of functionality for the sake of one function call.

This comment has been minimized.

Owner

SammyK replied Feb 16, 2015

Yeah, I'm wouldn't put up too much of a fight on this one, but 100% of the times I've ever needed random bytes it has always been in the form of hex or int. I'm thinking most user-land kids are in the same boat. :)

I noticed that the hex table for bin2hex() is just a char of 16 as opposed to the larger base64_encode() or session hex tables. Would it be considered more random with a bigger hex table?

This comment has been minimized.

Owner

SammyK replied Feb 17, 2015

So I've been doing a bunch of research on binary to hex conversions and learned a lot. Since hex (base 16) is just a different representation of the same data in binary (base 2), it wouldn't necessarily add more randomness to convert the data to Base32 or Base64. But at the end of the day, the common user-land PHP developer won't know about these things in too much detail (as I didn't before tonight!) But what the user-land dev is thinking, "I need a random alpha-numeric string", so hex should suffice.

I've also be researching how other languages do random like Ruby, Go and Python. They all look for /dev/urandom or the CryptGenRandom on Windows.

BTW: I've seen most implementations of a unique ID in other languages make use of the CSPRNG like in Python. I think we should do the same with uniqid(). :)

This comment has been minimized.

lt replied Feb 17, 2015

Good job doing the research (I feel almost proud! :)).

You're absolutely right, hex or base64 doesn't add anything. In fact it takes away entropy, and we're not talking about small amounts here.

So lets take hex for example. Conversion to hex literally takes an 8 bit representation of data, and turns it into a 4 bit representation of data (used as an index in a character table). So how much entropy do we really lose? Maths time (I'm thinking free-form here, and I'm not a mathematician, but hopefully this is along the right lines!)

The entropy in a base-encoded string compared to the raw binary is: 1 / (2(8 - bits_per_byte ) * nbytes). Just to test, plugging in the full 8 bits per byte we get 1 / 20 which is 1/1, full entropy.

So for example a 16 byte (128 bit) piece of keying material, keeping the input as hex rather than full binary, the key is 1 / 24*16 as strong. 1 / 264.

264 is a reasonably familiar number, being the number of combinations in a 64-bit unsigned integer. I can't remember it in full, but I can remember it's roughly 18.5 quintillion. A hex encoded string, for 16 bytes, is 18.5 quintillion times weaker than the full binary string. Incidentally, that is the same number of key combinations there are (2128 / 264 == 264).

It's still going to take a long time to crack, but I have no idea how this affects attacks on ciphers that break X of so many rounds, if you can guarantee specific weaknesses in the key.

This comment has been minimized.

lt replied Feb 17, 2015

I think migrating other parts of PHP to use a CSPRNG should be a separate PR.

uniqid() and session_create_id() are obvious candidates.

This comment has been minimized.

Owner

SammyK replied Feb 17, 2015

Oh man! Math! I've got more research to do. :)

Yes, I agree with you on creating separate PR's for uniqid() and session_create_id(). Hopefully those will be easy to knock out after this one.

I should have a little bit more time on Thursday to convert some of this research into commits on this bad boy. :)

@lt

This comment has been minimized.

lt commented on ext/standard/random.c in f8d7aec Feb 14, 2015

Edit: - Hah, I see you already did this in your second commit :)

With your includes at the top:

#if PHP_WIN32
#include "win32/winutil.h"
#endif

Then use php_win32_get_random_bytes(unsigned char *buf, size_t len)
This already uses the CryptGenRandom windows API call, so we're all good here.

@lt

This comment has been minimized.

lt commented on ext/standard/random.c in 26e4ed2 Feb 14, 2015

This should go back to being zend_string and use bytes->val as your char *.
zend_string_alloc to create it and zend_string_release if you hit an error condition. Don't release and use RETURN_STR if you have success. This saves having to copy the string at the end of the function with RETVAL_STRINGL

@lt

This comment has been minimized.

lt commented on ext/standard/random.c in 26e4ed2 Feb 14, 2015

We need to be careful about uniformity in this function. If you use modulus to bring the number down to your upper bound and the divisor is not a power of 2, you can end up with some bias in the result which we obviously want to avoid with a CSPRNG.

I'll help with the implementation of fixing this.

This comment has been minimized.

Owner

SammyK replied Feb 20, 2015

Should we remove the min option and just have max? None of the PRNG's in other languages I've reviewed have a min. Also, in Ruby the default functionality returns a float. Don't know if we want to explore that path as well.

This comment has been minimized.

lt replied Feb 20, 2015

Remove min 👍
Return float 👎

This comment has been minimized.

Owner

SammyK replied Feb 20, 2015

Sounds good!

@lt

This comment has been minimized.

lt commented on ext/standard/random.c in 26e4ed2 Feb 14, 2015

There's a few more options to investigate here. /dev/urandom is going to be "ok", but I also think it should be the last option. Where possible we should try and avoid opening file descriptors.

Off the top of my head, things to investigate (I can do all of this if you like):

  • What would be the security impact of having a user space arc4random implementation for Linux.
  • Where arc4random exists, and libc is shared amongst all processes, is the arc4random state shared or per process?
  • Attempt to use /dev/arandom if it exists and for some odd reason we don't have arc4random
  • Linux getrandom syscall (for kernel versions >= 3.17)
@lt

This comment has been minimized.

lt commented on ext/standard/random.c in 26e4ed2 Feb 14, 2015

To get a random number to work on do php_random_bytes(&number, sizeof(number));, so whatever source of random is picked as being the best for the platform is used.

@lt

This comment has been minimized.

lt commented on ext/standard/random.c in 7ef5754 Feb 24, 2015

@Rican7 - Documentation issue in the RFC. The value in the implementation here is correct. I've let Sammy know.

This comment has been minimized.

Owner

SammyK replied Feb 24, 2015

Fixed in the RFC.

This comment has been minimized.

Rican7 replied Feb 24, 2015

👍

@lt

This comment has been minimized.

lt commented on ext/standard/random.c in 7ef5754 Feb 24, 2015

@SammyK I derped in the error message, minimum definitely should not be greater than the maximum :)

This comment has been minimized.

lt replied Feb 24, 2015

Perhaps we should make it so both min and max need to be specified to be valid? Doesn't really feel sensible to allow the min without a max. random_int(0) => 123456 // wat!

if (ZEND_NUM_ARGS() == 1) {
    php_error_docref(NULL, E_WARNING, "Error message of your choosing");
    RETURN_FALSE;
}

This comment has been minimized.

Owner

SammyK replied Feb 24, 2015

Lol. Nice catch. :) I'll fix.

@lt

This comment has been minimized.

lt commented on ext/standard/random.c in 7ef5754 Feb 24, 2015

This needs min putting back in.

if (php_random_bytes(bytes->val, size) == FAILURE) {
zend_string_release(bytes);
return;

This comment has been minimized.

@nikic

nikic Feb 24, 2015

Member

Probably that should be RETURN_FALSE as well? Same in random_int.

This comment has been minimized.

@SammyK

SammyK Feb 24, 2015

Contributor

Nice catch! I'll fix.

static int php_random_bytes(void *bytes, size_t size)
{
int n = 0;

This comment has been minimized.

@nikic

nikic Feb 24, 2015

Member

Better would be ssize_t.

This comment has been minimized.

@nikic

nikic Feb 24, 2015

Member

The declaration should also be moved into the while loop, otherwise this will likely generate a warning on BSD systems with arc4random.

zend_ulong result;
if (ZEND_NUM_ARGS() == 1) {
php_error_docref(NULL, E_WARNING, "A minimum and maximum value are expected, only minimum given");

This comment has been minimized.

@laruence

laruence Feb 25, 2015

Member

since you already have LONG_MIN and LONG_MAX, so if only minimum value min is given. then simply assume it means min to LONG_MAX?

This comment has been minimized.

@patrickallaert

patrickallaert Mar 19, 2015

Contributor

@laruence: I thought about the same thing but then I realize that reading:
random_int(10);
might be understood as 10 being the max.

This comment has been minimized.

@laruence

laruence Mar 19, 2015

Member

hmm, that could be a doc issue.. but it's not a big deal :)

This comment has been minimized.

@SammyK

SammyK Mar 20, 2015

Contributor

Oops - this is a dingleberry left over from when min and max were optional args. They are both required in the current spec that's being voted on. I'll remove this check. :)

}
if (min >= max) {
php_error_docref(NULL, E_WARNING, "Minimum value must be less than the maximum value");

This comment has been minimized.

@laruence

laruence Feb 25, 2015

Member

maybe something like: 1st arg must be less than 2nd arg is more obvious for user?

@lt

This comment has been minimized.

Contributor

lt commented Mar 30, 2015

@sarciszewski This is something we have discussed, but decided to leave for future scope. getrandom (and getentropy on BSD) are typically used to seed RNGs, and are not designed for high throughput themselves.

If you compile against LibreSSL and have a Linux Kernel version > 3.17, you will be using getrandom today.

@ircmaxell

This comment has been minimized.

Contributor

ircmaxell commented Apr 9, 2015

The vote passed, but was never closed. This should be merged soon ;-)

RANDOM_G(fd) = fd;
}
ssize_t n = 0;

This comment has been minimized.

@nikic

nikic Apr 9, 2015

Member

This needs to be moved into the while look (combined declaration and assignment) to conform with C89.

@SammyK

This comment has been minimized.

Contributor

SammyK commented Apr 9, 2015

Thanks for the push @ircmaxell! :) I'm coordinating with @lt to get this bad boy ready for merge. :)

Return an arbitrary pseudo-random integer */
PHP_FUNCTION(random_int)
{
zend_long min = ZEND_LONG_MIN;

This comment has been minimized.

@lt

lt Apr 9, 2015

Contributor

@SammyK Since both parameters are required, we don't need these defaults any more.

@SammyK

This comment has been minimized.

Contributor

SammyK commented Apr 10, 2015

Hokay! @lt updated this PR based on feedback. I normalized the return values when errors happen & updated the tests & merged in the lest from master & fixed conflicts. Phew!

The RFC officially passed unanimously and this PR can be merged into master now. W00t! Not sure if @nikic is the one to do that or who we need to ping.

Thanks again @ircmaxell, @lt, @rdlowrey, and all the other kids who help me along on this one! 🐻 Let's do some more of these! :D

@nikic

This comment has been minimized.

nikic commented on f8a6d38 Apr 29, 2015

Zpp failures should always return null (what was implemented initially), unless there is some very strong reason to do otherwise. We usually only make exceptions to this for legacy functions.

@nikic nikic referenced this pull request May 9, 2015

Closed

Userland CSPRNG #1268

@nikic

This comment has been minimized.

Member

nikic commented May 9, 2015

Merged via #1268. Thanks everyone who worked on this!

@nikic nikic closed this May 9, 2015

@SammyK

This comment has been minimized.

Contributor

SammyK commented May 11, 2015

Yay! Thanks @nikic! :)

@SammyK SammyK deleted the SammyK:rand-bytes branch May 11, 2015

@CodesInChaos

This comment has been minimized.

CodesInChaos commented Jul 6, 2015

  1. The documentation for random_int could use some improvements:
    • Clarify the behaviour if min<=max is violated
    • Specify if the bounds are inclusive or exclusive (looking at rand and mt_rand it's probably inclusive)
    • It should mention that the distribution is uniform
  2. Why does it return FALSE which can easily treated as 0 instead of something you can't ignore, like an exception when the RNG in unavailable?
  3. What about a secure polyfill?
@SammyK

This comment has been minimized.

Contributor

SammyK commented Jul 6, 2015

Why does it return FALSE which can easily treated as 0 instead of something you can't ignore, like an exception when the RNG in unavailable?

Good point. I think a RuntimeException should be thrown when a proper source of random cannot be detected. That way when errors are turned off we won't have people using false/0 as their CSPRNG. :) Thoughts @lt?

@SammyK

This comment has been minimized.

Contributor

SammyK commented Jul 6, 2015

@lt: I just pushed up a branch with this change, can you take a look? :)

@CodesInChaos

This comment has been minimized.

CodesInChaos commented Jul 7, 2015

@SammyK What should happen if min > max? Also an exception?

@lt

This comment has been minimized.

Contributor

lt commented Jul 7, 2015

I think when we wrote this, exceptions in the engine hadn't yet passed.

I'd be happy with InvalidArgumentException for min>max and len<=0

@CodesInChaos

This comment has been minimized.

CodesInChaos commented Jul 7, 2015

Should len === 0 be considered invalid? I'd consider it valid and only throw an exception if len < 0.

@tom--

This comment has been minimized.

tom-- commented Jul 7, 2015

When someone tries to generate an empty string using the CSPRNG, I think it's likely a programming error (if it were my project, I'd define it as a programming error). So I think it is safer if PHP treats this case as invalid.

And, from the other point of view, I think PHP can reasonably reject bug reports that you can't generate empty strings with the CSPRNG, especially if the conditions for these exceptions are documented.

@lt

This comment has been minimized.

Contributor

lt commented Jul 7, 2015

If it's not a programming error, it's a stupid thing to do. Invalid all the way.

@CodesInChaos

This comment has been minimized.

CodesInChaos commented Jul 7, 2015

At least min == max on random_int shouldn't be an error, since that can be useful in practice. For example when picking the last card in a shuffling algorithm.

@tom--

This comment has been minimized.

tom-- commented Jul 7, 2015

@CodesInChaos I agree.

@lt

This comment has been minimized.

Contributor

lt commented Jul 7, 2015

I'll disagree with practically useful. But I'll agree that it should be allowed, in the "spirit of PHP"

If we're going to allow min == max it should be special cased, no point fetching from the rng for it.

    if (min >= max) {
        if (min == max) {
            RETURN_LONG(min);
        }

        zend_throw_exception(spl_ce_InvalidArgumentException, "Maximum value must not be less than the minimum value", 0);
        return;

    }

@SammyK perhaps do something like this for SPL exceptions:

#ifdef HAVE_SPL
So it can fall back to a less specific exception if SPL is not included.

@nikic

This comment has been minimized.

Member

nikic commented Jul 7, 2015

Please do the change to exceptions separately from the changing any error conditions.

@SammyK

This comment has been minimized.

Contributor

SammyK commented Jul 7, 2015

Sounds good. I'll submit 2 separate PR's - one for throwing general exceptions and one for new error conditions. Then I'll update the docs when they are merged and post here for another once-over to make sure the docs cover all the bases. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment