Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Improved threading API #451

Closed
wants to merge 7 commits into from

Conversation

@ghedo
Copy link
Contributor

commented Oct 26, 2015

See mailing list discussion.

This is still work-in-progress: the Windows support is currently missing, many things still need to be converted from the old API, documentation and tests are missing, the build configurations need to be updated to enable pthreads where appropriate, etc...

To build you need to manually pass -pthread or no-threads to ./config.

@ghedo

This comment has been minimized.

Copy link
Contributor Author

commented Oct 26, 2015

@levitte I pushed a new version with these changes:

  • Moved all the new functions back to crypto.h and removed threads.h
  • Fixed the license and file name headers
  • Switched the BN_MONT_CTX_set_locked() calls to use the locks in the DH and DSA objects instead of using new locks (I was already doing this for RSA)
  • Re-added the SSL_CTX locking for the generate_session_id callback (removing the lock wasn't on purpose)
  • Converted a few more locking calls

I have a question though. Looking at rsa_get_blinding(), I was wondering what the best way to convert that to CRYPTO_MUTEX (thus removing the CRYPTO_THREADID calls) would be. For example BoringSSL uses an array of blinding objects instead of making the distinction between local and non-local blindings. Otherwise we could always assume that the blinding is shared and always lock.

Regarding engines/e_chil.c and engines/e_ubsec.c, they were disabled in 29e7a56 and currently fail to build. Since the other disabled engines were removed, should these be removed as well?

@levitte

This comment has been minimized.

Copy link
Member

commented Oct 26, 2015

Regarding engines/e_chil.c and engines/e_ubsec.c, they were disabled in 29e7a56 and currently fail to build. Since the other disabled engines were removed, should these be removed as well?

I hadn't noticed they got disabled. That explains why you didn't get any errors on them. Yeah, I guess they should be removed before 1.1 is released, although I think I'd like to transfer them to another spot and see that at least e_chil builds as an independent engine.

(It might seem silly, but I'm a bit fond of e_chil, it was our first actual implementation back in 2000, and it's also the only one we have demonstrating hw-stored keys, so it does work as a reference for that)

@richsalz

This comment has been minimized.

Copy link
Contributor

commented Oct 26, 2015

Before keeping chil, look at https://rt.openssl.org/Ticket/Display.html?id=1736 and https://rt.openssl.org/Ticket/Display.html?id=2616 1736 is a killer that prevents real thread improvements, although I recall that Sander had a fix for it.

@levitte

This comment has been minimized.

Copy link
Member

commented Oct 26, 2015

There is a fix at the end of 1736...

@ghedo

This comment has been minimized.

Copy link
Contributor Author

commented Oct 28, 2015

I just pushed another commit that converts the ERR_STATE lhash to thread-local storage (but as mentioned in #450 (comment) the built-in memleak detector conflicts with TLS).

Another thing I'd like to try is to convert the whole ERR_STRING_DATA thing to use statically initialized arrays like BoringSSL does. This would not only simplify err.c a great deal, but would also remove the need for CRYPTO_LOCK_ERR.

@levitte

This comment has been minimized.

Copy link
Member

commented Oct 28, 2015

Another thing I'd like to try is to convert the whole ERR_STRING_DATA thing to use statically initialized arrays like BoringSSL does. This would not only simplify err.c a great deal, but would also remove the need for CRYPTO_LOCK_ERR.

Not sure I'm so keen on that idea... With the current construction, dynamically loaded engine implementations can load their strings dynamically, and thereby have part in the error reported back. BoringSSL got rid of ENGINE since they don't need it for their purposes, that's a main reason they could do away with the whole ERR_STRING_DATA thing.

@ghedo

This comment has been minimized.

Copy link
Contributor Author

commented Oct 28, 2015

Pushed another commit making RAND_SSLeay() use TLS as well. Basically I put all the global variables in md_rand.c into a struct and updated the functions that need them to retrieve a thread-specific instance of the struct and use that instead of the global variables. This means that all threads now seed the RNG independently.

initialized = 0;
MD_RAND *r = ssleay_rand_get();

OPENSSL_cleanse(r->state, sizeof(r->state));

This comment has been minimized.

Copy link
@kaduk

kaduk Oct 28, 2015

Contributor

So now RAND_cleanup only affects the current thread? That doesn't seem like the intended behavior.

This comment has been minimized.

Copy link
@richsalz

richsalz Oct 28, 2015

Contributor

i don't think making the rand stuf per-thread is the right way to go.

@ghedo

This comment has been minimized.

Copy link
Contributor Author

commented Oct 28, 2015

Ok, so now RAND_SSLeay() uses a global statically initialized CRYPTO_MUTEX. However note that on Windows static initialization for the mutex is not possible, so a work-around is needed and it will probably make the code slower and uglier. Windows also doesn't release thread-local storage on thread exit like pthreads do, so that will need to be addressed as well.

This is starting to look more complicated than I originally thought. Damn Windows :/

@ghedo

This comment has been minimized.

Copy link
Contributor Author

commented Oct 28, 2015

Also, I don't actually have a Windows machine so I can't test any of this.

@levitte

This comment has been minimized.

Copy link
Member

commented Oct 28, 2015

I have a VirtualBox with Windows and could do some tests. I might be able to do it while traveling tomorrow...

@ghedo

This comment has been minimized.

Copy link
Contributor Author

commented Oct 29, 2015

@levitte I've been working on getting #400 into shape for merge, see #456. This would provide some automated testing for WIndows like we have now with Travis. However AppVeyor supports only Window Server 2012 so earlier versions like XP are not covered.

@ghedo

This comment has been minimized.

Copy link
Contributor Author

commented Oct 29, 2015

Also, I haven't actually implemented Windows threading support yet, so there's not much to test right now.

@kaduk

This comment has been minimized.

Copy link
Contributor

commented Oct 29, 2015

Can we please let Windows XP die? It tends to require a lot of contortions to support, where even Vista is much simpler to deal with.

@richsalz

This comment has been minimized.

Copy link
Contributor

commented Oct 29, 2015

I am in favor of dropping XP for master.

@Manishearth

This comment has been minimized.

Copy link
Contributor

commented Oct 29, 2015

Note that there are more XP desktop users out there than Linux 😄.

Are there any major software using openssl on Windows?

@kaduk

This comment has been minimized.

Copy link
Contributor

commented Oct 29, 2015

Are there any major software using openssl on Windows?

That's not the right question; we should be asking whether there are any major software using openssl on Windows and still shipping new versions for XP.

@Manishearth

This comment has been minimized.

Copy link
Contributor

commented Oct 29, 2015

Agreed.

@ghedo

This comment has been minimized.

Copy link
Contributor Author

commented Oct 31, 2015

So, anyway, I rebased and pushed a new version now. The RAND patch was broken because it caused a deadlock (rand_bytes() called rand_add() without unlocking first). I'm not sure if the current version is correct though, but at least it doesn't make the test suite hang.

I also implemented the CRYPTO_ONCE thing. This simplifies the md_rand.c code a bit.

@mattcaswell mattcaswell referenced this pull request Nov 20, 2015
@ghedo

This comment has been minimized.

Copy link
Contributor Author

commented Nov 23, 2015

I rebased the patches on master and the comments from @richsalz disappeared... I'm going to answer them below, let me know if I missed something:

I think it's better to have NO API if threads support is not available.

Wouldn't that require to wrap all threading calls with #ifdef's?

Our strategy these days is to xxx_new and xxx_free routines that work with opaque pointers

The difference is that CRYPTO_MUTEX_init() doesn't allocate memory, it just initializes the mutex. Should _new be used in this case as well?

I don't think this is the right approach. I think API's that "get index" (key_create) and that then get/set specific are more natural and portable, and also mimic the existing "exdata" API. It means that crypto_thread_tls_storage_index, or some less cumbersome name, must be an abstract type.

Do you mean that each API user would have a different TLS "key"? The complication would be that each user would need to initialized the key independently, but I suppose it's possible.

I'd put the lock parameter first

Will do.

and it would be nice to be able to use gcc intrinsics here if possible.

I think the main problem would be actually detecting support for that. New-ish GCC versions introduced the __atomic_* API (corresponding to the C11 API) which needs linking to libatomic on some platforms, while older versions have the __sync_* API instead.

perhaps flip these and return the OLD value. seems more common to do that.

returning the new value is what CRYPTO_add() did, and some code depends on this. I agree that the other way seems more common, but I'd rather try to reduce the number of logic changes for now, to make the patches smaller and possibly less buggy.

I think this whole approach is wrong. Add a mutex for EVERY SINGLE OBJECT seems like a waste. I'd rather have one mutext for all crypto_add operations (which then lets us use x86 interlocked add, compiler intrinsics). I would want to see proof that the contention around that single lock justifies anything finer-grain.

Someone on the mailing list also mentioned that having one lock per object should improve performance, but I haven't actually tested this yet. In any case it shouldn't really prevent us from using the atomic add (in that case the lock would just be ignored).

why is that cast necessary?

IIRC it was because the code failed to build on OS X. I don't remember the specific error though, but I can try to reproduce it.

I am not convinced about making the random state be per-thread

It's not anymore. I restored the "global" rand lock, so it should be more or less like before. My main concern about the RAND_ code is that it locks recursively in some places (e.g. rand_bytes() calls rand_add()), which is not really supported by my implementation (instead I unlock just before the function calls that may try to lock).

@kaduk

This comment has been minimized.

Copy link
Contributor

commented Nov 23, 2015

On 11/23/2015 04:57 PM, Alessandro Ghedini wrote:

I rebased the patches on master and the comments from @richsalz
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_richsalz&d=CwMCaQ&c=96ZbZZcaMF4w0F4jpN6LZg&r=sssDLkeEEBWNIXmTsdpw8TZ3tAJx-Job4p1unc7rOhM&m=QL7hOPpzBJvrhtlNDayxkHLNWi1OjZEjHM2xvbOH4fU&s=y8UaKSa5zA1cGY_qMscd2jZMgQOllynCwLB_imun8Tk&e=
disappeared... I'm going

Comments on individual commits are tied to the individual commits; when
the pull request is updated to refer to different (rebased) commits,
those comments are orphaned and not navigable to in any reasonable fashion.

Comments on the full diff for the PR (i.e.,
https://github.com/openssl/openssl/pull/451/files) are retained in the
conversation log even if the commits involved are updated; they appear
as "kaduk commented on an outdated diff".

-Ben

@ghedo

This comment has been minimized.

Copy link
Contributor Author

commented Nov 23, 2015

they appear as "kaduk commented on an outdated diff".

Yeah, but for whatever reason I don't get them in this PR...

@Manishearth

This comment has been minimized.

Copy link
Contributor

commented Nov 24, 2015

Note that rebasing doesn't necessarily make the comments disappear -- making changes to the diff chunk the comment is on does. Basically, if you change the line the comment was on, GH considers it to be "addressed".

@andersk

This comment has been minimized.

Copy link

commented Nov 24, 2015

@ghedo, they are in this PR, right here. Only comments that were made on the “Files changed” tab of the PR (not comments on the individual commits) are preserved in this way. This is why you should always make comments on the “Files changed” tab.

@ghedo

This comment has been minimized.

Copy link
Contributor Author

commented Nov 25, 2015

Anyway, I just pushed the initial Windows (Vista and above) implementation (I haven't really tested it though). To do that I had to drop the mutex static initializer because WIndows doesn't have that.This required some changes to the RAND patch that made me realize that md_rand.c is way more fragile than I originally thought and may not work in some cases with my patch applied. Some refactoring of that code is long overdue as per the recent discussion on the mailing list.

@nicowilliams

This comment has been minimized.

Copy link

commented Dec 3, 2015

My commentary, as posted to the openssl-dev list:

  • If support for systems older than Vista is needed, then you'll need
    an implementation of InitOnceExecuteOnce(). I can point at suitable
    implementations if need be.

    E.g.,
    https://github.com/heimdal/heimdal/blob/master/lib/base/heimbase.c#L406

    (Should be easy to massage into exactly InitOnceExecuteOnce().)

  • The thread-local API is problematic.

    The implementation uses locks only because it allows the caller of
    the setter to also pass in a destructor.

    Instead the API should be much more like the POSIX
    pthread_key_create(), pthread_getspecific(), pthread_setspecific(),
    and pthread_key_destroy().

    On Windows you should use DllMain() to process DLL_THREAD_DETACH to
    call destructors for the thread's keys.

    Then you'll be able to say "look ma', no locks here".

    For me this is critical and must be fixed as described.

  • What is the status of the old lock callback getters/setters after
    this? I don't see any changes to that code.

    Dead code should be removed, thought you should leave the old
    functions just for backwards compatibility.

    But then again, if the app provides these callbacks in a non-threaded
    configuration of OpenSSL, then they ought to be used!

  • I don't feel confident that using rw locks is a good idea for the
    first iteration.

    When I wrote the commits in the thread_safety branch of my clone
    (https://github.com/nicowilliams/openssl) I didn't feel at all
    confident that OpenSSL's use of read/write locks was correct. The
    reason is that the underlying locks are generally exclusive locks
    (when callbacks are provided), so the use of read vs write locks may
    be incorrect.

    Use of rw vs exclusive locks should be configurable for a while, and
    thourough testing should be done to ensure that rw locks will work.

    Also, exclusive locks are easier to use on Windows (as we can see in
    this PR, that's what's used on Windows).

    You might want to copy my once-initializer for the pthread case from
    the thread_safety branch of my clone. This will prove useful for
    other things even if you don't retain support for using lock
    callbacks provided by the app (which is why I needed a once-
    initializer: so I could set callbacks once and only once).

    (It was probably silly for me to try to use lock callbacks if
    provided, in retrospect I see little value for that, except in the
    non-threaded configurations of OpenSSL.)

  • I don't understand why use different function names for the locking
    functions. Can you explain?

  • I don't think critical sections are the right thing to do on Windows.

    A mutex on Windows would be better (but one needs a once-initializer
    to set those up correctly since there's no static initializer for
    mutexes on Windows).

  • I wonder if OpenSSL in non-threaded configuration could detect and
    adjust to transitions to threaded mode. This would be heroic on the
    part of OpenSSL, so I won't insist on it :)

    It would be nice, but it can wait too.

    For the dynamically-linked case you can detect threaded-ness by using
    weak symbols for pthreads, where weak symbols are supported anyways,
    and there's no need to use dlopen(). Although the switch from
    not-threaded to threaded will require care (a once initializer), and
    in particular locks that were "owned" in the not-threaded case should
    become real locks and acquired so that they stay owned if the caller
    is the main thread, otherwise if the caller is not the main thread
    and the main thread owns OpenSSL locks, then you must return an
    error.

    I'm not sure how one would do this in the statically linked case.

@nicowilliams

This comment has been minimized.

Copy link

commented Dec 4, 2015

I would say that heroics to support run-time upgrade from dynamically-linked single-threaded to multi-threaded has some value in case someone builds OpenSSL w/o pthread. But it'd be easier and, IMO, better to require pthread if building dynamically ("sorry, must have thread support").

For static link archives I think there's no easy way to provide those heroics, and that's just fine: if you're building statically-linked apps you should be building OpenSSL from scratch and you should know what you're doing.


void CRYPTO_ONCE_run(CRYPTO_ONCE *once, void (*init)(void))
{
init();

This comment has been minimized.

Copy link
@mattcaswell

mattcaswell Jan 30, 2016

Member

That doesn't seem to implement the "once" semantics I would expect?

@mattcaswell

This comment has been minimized.

Copy link
Member

commented Jan 30, 2016

Where are we with this? I'm working on some code that needs the "once" stuff...it would be good to build it off of this.

@richsalz

This comment has been minimized.

Copy link
Contributor

commented Jan 30, 2016

We're kinda almost there. I think if you want to take this on and work with @nicowilliams and @ghedo to finish it off, then I'm willing to review and plus-one that. Docs are missing and I could help with that. (It might be better to have an offline email discussion among the four of us.)

@richsalz

This comment has been minimized.

Copy link
Contributor

commented Feb 1, 2016

RT tickets to look at: 1743, 1736, 1878, 2216, 2591, 2824

@ghedo

This comment has been minimized.

Copy link
Contributor Author

commented Feb 2, 2016

The main problem I have right now is the non-deterministic test failures (test_ecdh and test_ecdsa in particular) so some help debugging them would be helpful.

Additionally there are a few places that I'm not sure how to convert, these are:

  • crypto/rand/md_rand.c: while I did convert it, I'm not sure if the conversion is correct
  • crypto/rsa/rsa_ossl.c: the rsa_get_blinding function uses the THREADID API, so to keep the same semantics I'd need to implement something equivalent to THREADID on top of pthread/winthread, but maybe there's a better way?
  • crypto/mem_dbg.c: it uses two different locks and THREADID, but I don't quite understand the logic behind that.

Other TODOs off the top of my head:

  • Finish conversion to CRYPTO_MUTEX
  • Remove remaining THREADID uses
  • Implement thread-local storage cleanup on Windows
  • Properly implement thread-local storage for no-threads
  • Test
  • ...
@ghedo

This comment has been minimized.

Copy link
Contributor Author

commented Feb 2, 2016

So, I just added a new "thread ID" API and converted BN_BLINDING to use it. I'm going to switch back RAND_OpenSSL to use that too.

@richsalz

This comment has been minimized.

Copy link
Contributor

commented Feb 2, 2016

There's a discussion going on about openssl rand and thread issues, so don't do a great deal of work on RAND without Matt chiming in.

@mattcaswell

This comment has been minimized.

Copy link
Member

commented Feb 2, 2016

I wouldn't want this work to be held up by the RAND stuff, so if its relatively minor tweaks then go ahead. Just don't do a ton of work in the RAND code.

@ghedo

This comment has been minimized.

Copy link
Contributor Author

commented Feb 2, 2016

My previous RAND commit was quite a bit more than minor tweaks, but I removed that now because I think it was broken anyway. I'll make another attempt, and this time it should be easier since I added the new THREAD_ID API so the conversion should be more "natural".

@richsalz

This comment has been minimized.

Copy link
Contributor

commented Feb 2, 2016

see also RT tickets 3806, 2856,2891

@hyc

This comment has been minimized.

Copy link

commented Feb 2, 2016

If you need help with thread-local storage cleanup on Windows, here's a working model
https://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/mdb.c#L4525

@mattcaswell

This comment has been minimized.

Copy link
Member

commented Feb 3, 2016

Just a reminder of the timeline needed for this to get accepted into 1.1.0: alpha3 is scheduled for 11th February. beta1 is scheduled for 10th March. beta1 is when we declare feature freeze, so no new features (which would include this PR) will be accepted after 10th March.

@richsalz

This comment has been minimized.

Copy link
Contributor

commented Feb 6, 2016

do the new locks have read/read-write distinction? see https://rt.openssl.org/Ticket/Display.html?id=3388

@ghedo

This comment has been minimized.

Copy link
Contributor Author

commented Feb 6, 2016

@richsalz only the pthread implementation (which uses pthread_rwlock_t). Windows doesn't support that AFAICT.

@richsalz

This comment has been minimized.

Copy link
Contributor

commented Feb 6, 2016

That's fine. The internal api will expose them and windows code will just have one type, I assume.

@mattcaswell

This comment has been minimized.

Copy link
Member

commented Feb 10, 2016

Can we get the first commit in this series reviewed and pushed soon? Perhaps splitting it out into a separate PR? The new init code uses pthread_once directly for now, but I'd like to move it over to using the proper API asap.

@richsalz

This comment has been minimized.

Copy link
Contributor

commented Feb 10, 2016

What's the timetable for that? This imminent alpha release?

@mattcaswell

This comment has been minimized.

Copy link
Member

commented Feb 10, 2016

No!!!

sess = ssl->session;
CRYPTO_MUTEX_lock_write(&sess->lock);

This comment has been minimized.

Copy link
@ajmohan

ajmohan Feb 10, 2016

Contributor

The removed comment is valid. Why is the lock moved to after sess is assigned, it can be thread unsafe now.

This comment has been minimized.

Copy link
@ghedo

ghedo Feb 10, 2016

Author Contributor

Why is the lock moved to after sess is assigned

Because sess is an invalid pointer until it gets initialized..

it can be thread unsafe now

The problem really is that sess can be NULL, so that new code is wrong...

This comment has been minimized.

Copy link
@ghedo

ghedo Feb 10, 2016

Author Contributor

The proper way to do this is to lock ssl->lock, and then use CRYPTO_atomic_add on sess->references.

This comment has been minimized.

Copy link
@ajmohan

ajmohan Feb 10, 2016

Contributor

@ghedo, thanks for clarifying.

yes, as you said accessing ssl->session may be protected with ssl->lock. In SSL_set_session() also, it need to be serialised ssl->session access similarly.

@richsalz

This comment has been minimized.

Copy link
Contributor

commented Feb 14, 2016

THERE ARE MANY RT TICKETS THAT POINT HERE. SORRY FOLKS, PLEASE SEE #679 NOW :)

@ghedo ghedo closed this Feb 17, 2016
@ghedo ghedo deleted the ghedo:thread branch Feb 29, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
10 participants
You can’t perform that action at this time.