Replace C implementation of OS Random engine with Python one that just calls os.urandom #2073

glyph · 2015-06-27T05:11:03Z

This is alternate fix for #2007 that moves file descriptor shenanigans (and C code) out of cryptography entirely. It is intended to supersede #2035 .

apparently (?) ENGINE_by_id treats its ID as an opaque *pointer* key and not actually as a string, and while CPython's CFFI support seems to manage to preserve the pointer identity when using the same Python string, PyPy doesn't. Fix things to use a cffi-wrapped pointer again and tests pass on PyPy.

alex · 2015-06-27T12:25:56Z

src/_cffi_src/openssl/engine.py

+    void (*add)(const void *, int, double);
+    int (*pseudorand)(unsigned char *, int);
+    int (*status)();
+};


This can be written typedef struct { stuff } RAND_METHOD; so you don't need to know the name

alex · 2015-06-27T15:03:40Z

src/cryptography/hazmat/bindings/openssl/binding.py

+        @retain
+        @cls.ffi.callback("ENGINE_GEN_INT_FUNC_PTR", error=0)
+        def osrandom_finish(engine):
+            return 1


Actually I think finish and init can be totally dropped.

make sure we're not in an error state when we start, because then all bets are off and we might consume an error we didn't cause. then clear the error queue, which restores the behavior of the way the C module was previously checking for existence of its engine.

alex · 2015-06-27T23:10:22Z

src/cryptography/hazmat/bindings/openssl/binding.py

+        assert result == 1
+    looked_up_engine = lib.ENGINE_by_id(_osrandom_engine_id)
+    assert looked_up_engine != ffi.NULL
+    return 1


The return values aren't being used here.

There's a test which fails without them.

Well, we should update the test. It was written on the assumption of a C API. Now that this code is python we can do something sane, instead of C's bonkers.

"something sane" being "raise an exception on the second call"?

sounds reasonable

On Sat, Jun 27, 2015 at 9:25 PM, Glyph notifications@github.com wrote:

In src/cryptography/hazmat/bindings/openssl/binding.py
#2073 (comment):

engine = lib.ENGINE_new()

try:

result = lib.ENGINE_set_id(engine, _osrandom_engine_id)

assert result == 1

result = lib.ENGINE_set_name(engine, _osrandom_engine_name)

assert result == 1

result = lib.ENGINE_set_RAND(engine, method)

assert result == 1

result = lib.ENGINE_add(engine)

assert result == 1

finally:

result = lib.ENGINE_free(engine)

assert result == 1

looked_up_engine = lib.ENGINE_by_id(_osrandom_engine_id)

assert looked_up_engine != ffi.NULL

return 1

"something sane" being "raise an exception on the second call"?

—
Reply to this email directly or view it on GitHub
https://github.com/pyca/cryptography/pull/2073/files#r33417937.

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

Any recommendations for what a good exception type would be?

RuntimeError? (Let's just get something down on paper, we can argue over
that detail later)

On Sat, Jun 27, 2015 at 9:29 PM, Glyph notifications@github.com wrote:

In src/cryptography/hazmat/bindings/openssl/binding.py
#2073 (comment):

engine = lib.ENGINE_new()

try:

result = lib.ENGINE_set_id(engine, _osrandom_engine_id)

assert result == 1

result = lib.ENGINE_set_name(engine, _osrandom_engine_name)

assert result == 1

result = lib.ENGINE_set_RAND(engine, method)

assert result == 1

result = lib.ENGINE_add(engine)

assert result == 1

finally:

result = lib.ENGINE_free(engine)

assert result == 1

looked_up_engine = lib.ENGINE_by_id(_osrandom_engine_id)

assert looked_up_engine != ffi.NULL

return 1

Any recommendations for what a good exception type would be?

—
Reply to this email directly or view it on GitHub
https://github.com/pyca/cryptography/pull/2073/files#r33417944.

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

Works for me.

glyph · 2015-06-28T08:55:17Z

The last commit addresses @alex's last bit of feedback, so I think this is good to go now?

alex · 2015-06-28T18:56:51Z

src/cryptography/hazmat/bindings/openssl/binding.py

 import threading

 from cryptography.hazmat.bindings._openssl import ffi, lib


+@ffi.callback("int (*)(unsigned char *, int)", error=-1)


@reaperhulk do we need to call ERR_put_error on error here, or will stuff work ok without it?

Things will work fine without it, but I'm curious why we're returning -1 here in case of error? 0 is what the old code returned.

https://www.openssl.org/docs/crypto/RAND_bytes.html -1 seems more correct
than 0 (particularly for psueobytes)

On Sun, Jun 28, 2015 at 3:49 PM, Paul Kehrer notifications@github.com
wrote:

In src/cryptography/hazmat/bindings/openssl/binding.py
#2073 (comment):

import threading

from cryptography.hazmat.bindings._openssl import ffi, lib

+@ffi.callback("int (*)(unsigned char *, int)", error=-1)

Things will work fine without it, but I'm curious why we're returning -1
here in case of error? 0 is what the old code returned.

—
Reply to this email directly or view it on GitHub
https://github.com/pyca/cryptography/pull/2073/files#r33426214.

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

-1 is definitely the right response for pseudo (since that's a terrible response code system and we handled it that way in the past), but I'm not convinced it's the right one for RAND_bytes. It's probably not a big deal, but I do not know what OpenSSL does when you tell it the random engine doesn't support RAND_bytes as opposed to telling it that it's broken.

Everysingle caller of RAND_bytes in ssl/ uses <= 0 to check the
status, so it seems to be immaterial.

On Sun, Jun 28, 2015 at 3:58 PM, Paul Kehrer notifications@github.com
wrote:

In src/cryptography/hazmat/bindings/openssl/binding.py
#2073 (comment):

import threading

from cryptography.hazmat.bindings._openssl import ffi, lib

+@ffi.callback("int (*)(unsigned char *, int)", error=-1)

-1 is definitely the right response for pseudo (since that's a terrible
response code system and we handled it that way in the past), but I'm not
convinced it's the right one for RAND_bytes. It's probably not a big
deal, but I do not know what OpenSSL does when you tell it the random
engine doesn't support RAND_bytes as opposed to telling it that it's broken.

—
Reply to this email directly or view it on GitHub
https://github.com/pyca/cryptography/pull/2073/files#r33426295.

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

alex · 2015-06-28T18:57:59Z

Besides that comment this LGTM, @reaperhulk what do you think?

Now that this is more easily worked with (because python), we should file some follow up issues to write tests that stub out os.urandom and verify that the results are used correctly.

reaperhulk · 2015-06-28T22:32:06Z

Since we do have the ability to do some monkeypatching in tests I'd like to see at least one test that verifies the data from os.urandom is being copied into OpenSSL-land properly. It can really just call RAND_bytes directly.

Incidentally, I did a quick crappy test to see if there's any significant speed difference (as this method should result in at least one additional memory copy in addition to calling into Python) and a loop of 10000 fetches of 8192 bytes of randomness took about the same amount of time with both the old and new approach.

alex · 2015-06-29T02:44:52Z

@reaperhulk 8192 is a pretty big read for /dev/urandom, if you've got the script handy, could you try with 16 byte reads?

reaperhulk · 2015-06-29T13:26:01Z

100,000 16 byte reads takes ~0.6-0.7s with the C engine code, ~0.9-1.0s with this PR.

reaperhulk · 2015-06-29T13:34:45Z

Here's an even more minimal script that attempts to solely measure the cost of the RAND_bytes call (my previous code allocated a new buffer and also pulled out the data after each call)

from cryptography.hazmat.backends.openssl import backend

buf = backend._ffi.new("char[16]")
for _ in range(100000):
    res = backend._lib.RAND_bytes(buf, 16)
    assert res == 1

On my machine running time python testrand.py (best run of 5)

This PR

real    0m0.848s
user    0m0.567s
sys 0m0.276s

Variance between 0.848s and 0.860s

Master

real    0m0.535s
user    0m0.326s
sys 0m0.205s

Variance between 0.535s and 0.554s

I don't necessarily think this is going to be a serious problem, just something to be aware of. It could conceivably negatively affect TLS handshake performance? PyOpenSSL won't be affected though because it doesn't activate this engine.

glyph · 2015-06-29T18:13:35Z

I cranked up the iteration count by another 10x, switched to xrange instead of range, and measured with PyPy.

best of 3:

This PR

real    0m2.193s
user    0m0.731s
sys 0m1.463s

Variance between 2.292 and 2.193

Master

real    0m1.857s
user    0m0.407s
sys 0m1.447s

Variance between 1.857 and 1.945

It looks to me like we're pretty clearly just measuring function call / buffer copy overhead for your Python runtime here, and pypy does commensurately better. How much random data does one TLS handshake require? My suspicion is that this amount of additional overhead (on pypy, approximately 3µs/call - and it is per call, not per byte, as the overhead all but disappears with fewer calls generating the same number of bytes) will verge on undetectable. Just ballpark-wise, http://www.semicomplete.com/blog/geekery/ssl-latency.html (from 2010) puts TLS handshake latency at 322000µs, so we'd have to be calling RAND_bytes 1000 times in the course of the handshake to cause a 1% degradation to performance.

alex · 2015-06-29T22:52:19Z

Yeah, I think the performance implications of this are totally 100% fine :-)

glyph · 2015-06-30T00:25:30Z

That last commit should address @reaperhulk's feedback. Anything else?

codecov-io · 2015-06-30T00:29:10Z

Current coverage is `99.8%`

Merging #2073 into master will change coverage by 0% by cd8977b

Coverage Diff

@@            master   #2073   diff @@
======================================
  Files          112     112       
  Stmts        10632   10683    +51
  Branches      1223    1225     +2
  Methods          0       0       
======================================
+ Hit          10611   10662    +51
  Partial         21      21       
  Missed           0       0

Powered by Codecov

reaperhulk · 2015-06-30T00:39:29Z

The patched test is failing on libressl.

glyph · 2015-06-30T00:44:48Z

The patched test is failing on libressl.

whyyyyyyyy

OK. Any tips on how to set up a libressl build environment that I can test with?

glyph · 2015-06-30T07:16:09Z

For those following along at home, I did a brew install libressl and then this helpful shell function to switch OpenSSL's:

function kegger () {
    for kegtoadd in "$@"; do
        local kegpath="/usr/local/opt/$kegtoadd";
        export LDFLAGS="-L${kegpath}/lib $LDFLAGS";
        export CPPFLAGS="-I${kegpath}/include $CPPFLAGS";
        export PATH="${kegpath}/bin:$PATH";
    done;
}

glyph · 2015-06-30T07:40:04Z

OpenSSL:

int RAND_bytes(unsigned char *buf, int num)
{
    const RAND_METHOD *meth = RAND_get_rand_method();
    if (meth && meth->bytes)
        return meth->bytes(buf, num);
    return (-1);
}

LibreSSL:

/*
 * Hurray. You've made it to the good parts.
 */
int
RAND_bytes(unsigned char *buf, int num)
{
    if (num > 0)
        arc4random_buf(buf, num);
    return 1;
}

See also libressl/portable#17

glyph · 2015-06-30T09:30:52Z

Regarding the build failures:

the travis failure is a spurious issue with installing dependencies on the docs builder.
the codecov failure is a result of LibreSSL not being part of the test matrix; I added that code as a result of @reaperhulk's feedback, so I'm assuming LibreSSL should be in that matrix?

reaperhulk · 2015-06-30T13:58:32Z

Libre is in our jenkins cluster and historically we haven't been able to correlate coverage from that. codecov may give us that ability now, but that's going to be a separate investigation. For now I'd say you should make a new function that triggers a skip if you pass a string with "LibreSSL" in it, call that from your test, and then add one more function that tests the skip (using with pytest.raises(pytest.skip.Exception):)

reaperhulk · 2015-07-01T00:18:55Z

LGTM, awesome work @glyph. We should (probably as a separate PR) update our docs around the OS random engine to note explicitly that it uses the Python os.urandom implementation (and what that entails) along with a note that LibreSSL patches random itself.

alex · 2015-07-01T01:08:18Z

Merge?

On Tue, Jun 30, 2015 at 8:18 PM, Paul Kehrer notifications@github.com
wrote:

LGTM, awesome work @glyph https://github.com/glyph. We should (probably
as a separate PR) update our docs around the OS random engine to note
explicitly that it uses the Python os.urandom implementation (and what
that entails) along with a note that LibreSSL patches random itself.

—
Reply to this email directly or view it on GitHub
#2073 (comment).

"I disapprove of what you say, but I will defend to the death your right to
say it." -- Evelyn Beatrice Hall (summarizing Voltaire)
"The people's good is the highest law." -- Cicero
GPG Key fingerprint: 125F 5C67 DFE9 4084

glyph · 2015-07-01T01:16:49Z

YES PLEASE

On Jun 30, 2015, at 6:08 PM, Alex Gaynor notifications@github.com wrote:

Replace C implementation of OS Random engine with Python one that just calls os.urandom

glyph · 2015-07-01T02:06:36Z

🎉 🎈 ⭐ 🌟 🌟 ⭐ ✨ 💖 👍 🆗 🎆

glyph added 9 commits June 26, 2015 21:59

deopaque a couple of things

332936d

a place for a couple of new constants to live

e55898a

compare contents and not pointers

eaed951

use new constant

73541ea

python implementation

b3d37a5

remove remaining vestiges, make adding twice work

b51d246

lint

79b291d

comply with C coding standard, for which there is no linter

add79c0

glyph mentioned this pull request Jun 27, 2015

Add Glyph to AUTHORS.rst. #2074

Merged

alex reviewed Jun 27, 2015
View reviewed changes

alex added backend cleanup security-hardening bindings labels Jun 27, 2015

alex added this to the Tenth Release milestone Jun 27, 2015

alex reviewed Jun 27, 2015
View reviewed changes

glyph added 5 commits June 27, 2015 15:13

don't need the intermediary 'struct' declaration.

28e7d80

move everything to module scope; much simpler that way

e03e9aa

also retain method with a global reference

dd53a5b

bind ERR_clear_error

885d688

clear the error queue

c1d0446

make sure we're not in an error state when we start, because then all bets are off and we might consume an error we didn't cause. then clear the error queue, which restores the behavior of the way the C module was previously checking for existence of its engine.

alex reviewed Jun 27, 2015
View reviewed changes

the assertier the merrier

b7c6aaf

alex reviewed Jun 28, 2015
View reviewed changes

the output of RAND_bytes is os.urandom's result

7c3e7a8

glyph added 2 commits June 30, 2015 01:46

Detect and ignore LibreSSL.

14e67ac

pep8

fa40f9f

test libressl when there is no libressl

b18fc39

reaperhulk added a commit that referenced this pull request Jul 1, 2015

Merge pull request #2073 from glyph/no-c-random

0a4c9cc

Replace C implementation of OS Random engine with Python one that just calls os.urandom

reaperhulk merged commit 0a4c9cc into pyca:master Jul 1, 2015

reaperhulk mentioned this pull request Jul 1, 2015

osrandom_engine should validate the fd's st_dev and st_ino #2007

Closed

github-actions bot locked as resolved and limited conversation to collaborators Sep 9, 2020

Replace C implementation of OS Random engine with Python one that just calls os.urandom #2073

Replace C implementation of OS Random engine with Python one that just calls os.urandom #2073

Conversation

glyph commented Jun 27, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glyph commented Jun 28, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alex commented Jun 28, 2015

reaperhulk commented Jun 28, 2015

alex commented Jun 29, 2015

reaperhulk commented Jun 29, 2015

reaperhulk commented Jun 29, 2015

This PR

Master

glyph commented Jun 29, 2015

This PR

Master

alex commented Jun 29, 2015

glyph commented Jun 30, 2015

codecov-io commented Jun 30, 2015

Current coverage is 99.8%

Coverage Diff

reaperhulk commented Jun 30, 2015

glyph commented Jun 30, 2015

glyph commented Jun 30, 2015

glyph commented Jun 30, 2015

glyph commented Jun 30, 2015

reaperhulk commented Jun 30, 2015

reaperhulk commented Jul 1, 2015

alex commented Jul 1, 2015

glyph commented Jul 1, 2015

glyph commented Jul 1, 2015

Current coverage is `99.8%`