Address concurrency #11

Closed
3noch opened this Issue Sep 2, 2014 · 42 comments

Projects

None yet

4 participants

@3noch
3noch commented Sep 2, 2014

It's unclear from the documentation whether these bindings support fully concurrent usage of the libmysqlclient APIs. Some limited testing (and the unsafe designation on many of the FFIs) is leading me to believe that these bindings do not actually support multiple, simultaneous connections/queries. Either way, I think the docs need to reflect the capabilities. If concurrency is not supported, then let's get it up and running!

@3noch
3noch commented Sep 2, 2014

BTW, I am highly interested in supporting concurrency with MySQL, so if necessary, I'm willing to help out in whatever way I can (although I'm not experienced with FFI). It appears that the libmysqlclient API requires each OS thread to be "registered" via an init call. This seems problematic for Haskell since we do not have direct control over thread creation. I can think of some last-resort mechanisms (keep a thread registration checklist to be checked for each API call or the introduction of a new API e.g. withMySqlThread) that would allow support albeit not ideally.

@taladar
taladar commented Dec 30, 2014

The library (at least via mysql-simple) seems to work fine with multiple connections. There is only one rather serious issue I have seen in practice due to the unsafe foreign calls.

If you have several connections it can happen that you are inside a MySQL transaction t1 which holds some locks but its thread is currently executing Haskell code (between statements). If other Haskell threads now start their own transactions which block inside an unsafe call because they are waiting for the locks held by transaction t1 those blocking foreign calls can block either all capabilities in the Haskell RTS or all tasks (OS threads) where the Haskell thread processing t1 can run (e.g. if it is the main thread it is bound to its OS thread and can run only in that one). Then we have a distributed deadlock, half in MySQL (threads waiting for locks held by t1), half in Haskell (the thread can't run to finish the transaction t1).

My understanding is that this situation would be avoided if all foreign calls which could block were marked 'safe'.

@3noch
3noch commented Dec 30, 2014

You're right that Haskell code using this library can easily have multiple connections. The problem arises when you want two different threads (which means you're running with -threaded) to both read/write from MySQL simultaneously. Because this library uses unsafe FFI calls and because it links against the single-threaded libmysqlclient, you can't actually pull that off. This same scenario must be what also leads to the deadlock you're describing. It's because we're writing code that appears multithreaded, but it really is not.

@taladar
taladar commented Dec 30, 2014

It has nothing to do with libmysqlclient being single-threaded, multi-threading works just fine on the C level as far as I can tell (when using -threaded -with-rtsopts=-N). The problem arises from unsafe calls not allowing the Haskell RTS to reuse the capability (think of it as a virtual CPU running Haskell code) in another OS thread (the RTS calls them tasks) because unsafe foreign calls do not save and restore state (they are meant for short, non-blocking calls).

This becomes a problem if the foreign call will block until another Haskell thread does something but that Haskell thread is never scheduled again because there are no capabilities left to schedule it on.

This also leads to blocking of other Haskell threads completely unrelated to MySQL. In particular bound threads (like the main thread or any thread created with forkOS) are prone to be affected since they can only run in one particular OS thread. This can then show symptoms of e.g. our web server not dispatching any new requests to worker threads because the dispatch loop ran in the main thread which was bound to the OS thread which was blocked by MySQL.

You do not actually need to even use two connections in parallel, only one foreign call has to be in progress to produce this effect if it is active on either the only capability (with -threaded -with-rtsopts=-N1 which happens to be the default for -threaded) or the capability the Haskell thread is bound to.

@3noch
3noch commented Dec 30, 2014

I think we're agreeing, but possibly talking about two different use-cases. The use of unsafe means that MySQL queries actually force our Haskell code to run sequentially (regardless of what other threads are trying to do). That's pretty terrible as is. I think I'm talking about another, additional problem: since this links against the single-threaded C library, even if you were to solve the unsafe issue, you would not be able to have two capabilities executing MySQL queries simultaneously. I did some testing a while ago with an application that makes heavy use of concurrent MySQL queries (it's a data mover between servers). Even making the FFI calls safe did not improve my performance because the C library squeezed all my concurrent requests onto a single thread. I very well might be wrong in this assessment. Thoughts?

I'm "eacameron" on #haskell IRC if you want to work together on fixing this.

@feuerbach
Contributor

Here's a simple test case:

{-# LANGUAGE OverloadedStrings #-}
import Database.MySQL.Simple
import Control.Concurrent
import Control.Concurrent.STM
import Control.Applicative
import Control.Monad
import Control.Exception

main = do
  tv <- atomically $ newTVar 0
  withConn $ \conn -> do
    mapM_ (execute_ conn)
      [ "drop table if exists test"
      , "create table test (x int)"
      , "insert into test values (0)"
      ]

  forM_ [1..2] $ \n -> forkIO $ withConn $ \conn -> (do
    execute_ conn "begin"
    putStrLn $ show n ++ " updating"
    execute_ conn "update test set x = 42"
    putStrLn $ show n ++ " waiting"
    threadDelay (10^6)
    execute_ conn "commit"
    putStrLn $ show n ++ " committed"
    ) `finally`
    (atomically $ modifyTVar tv (+1))

  atomically $ check =<< (>=2) <$> readTVar tv
  where
    withConn = bracket (connect defaultConnectInfo) close

The deadlock is avoided iff:

  1. Unsafe ffi calls in mysql-0.1.1.8 are replaced by safe
  2. The program is linked with the threaded runtime

Otherwise, you'll see

1 updating
1 waiting
2 updating
1 committed
test: ConnectionError {errFunction = "query", errNumber = 1205, errMessage = "Lock wait timeout exceeded; try restarting transaction"}

@bos I think this needs to be fixed

@3noch
3noch commented Apr 14, 2015

@feuerbach Am I mistaken that using the single-threaded C library also forces all MySQL calls to run in a single thread? How does a non-thread-safe C-library safely support multithreaded usage? I would not be surprised if using safe risks SegFaulting due to the C-library clobbering itself.

@3noch
3noch commented Apr 14, 2015

Perhaps the threaded version of libmysqlclient merely allows the same connection to be used across threads? This could explain why it would work fine with safe.

@feuerbach
Contributor

@3noch: what do you mean exactly by "single-threaded C library"?

I can interpret it in 3 ways:

  1. That mysql-client itself doesn't spawn any threads.
  2. That mysql-client is not thread-safe, ie. doesn't guard access to shared resources.
  3. That mysql-client has thread-local state.

The first point is irrelevant; the latter two are valid.

Here's the page that discusses multithreaded usage of mysql-client: https://dev.mysql.com/doc/refman/5.6/en/c-api-threaded-clients.html

From my cursory reading of the docs, it seems like:

  1. At the beginning, mysql_library_init has to be called once (it is not thread-safe);
  2. The threads using mysql have to be bound;
  3. Either
    1. Each thread establishes its own connection, or
    2. If a connection pool is used, every thread which uses mysql has to call mysql_thread_init prior to making any other calls.
  4. If threads are short-lived, call mysql_thread_end before finishing the thread
@3noch
3noch commented Apr 15, 2015

@feuerbach Good questions. My own questions arose from that very page you provided.

  1. That mysql-client itself doesn't spawn any threads.

Right. No issues.

  1. That mysql-client is not thread-safe, ie. doesn't guard access to shared resources.
  2. That mysql-client has thread-local state.

I'm unsure about this. There are clearly two versions of libmysqlclient: a single-threaded one and a multi-threaded one (libmysqlclient_r). But the docs don't seem to mention the distinction much so it's hard to tell if that is merely a historical artifact or if it still has bearing.

At least for the multi-threaded one, this is clearly not a problem provided the user properly initializes threads. However my analysis of mysql is that

  • It does not use libmysqlclient_r (again, whether this is necessary, I'm not sure).
  • It does not do any explicit handling (mysql_thread_init and mysql_thread_end) for each capability (but I'm unsure if it's strictly necessary as @bos has written it since certain other C API calls will cover this case for you).

Nowhere in mysql's docs does it mention the need for bound threads or the limitations of simultaneous, concurrent MySQL calls.

@bos's use of unsafe everywhere seems improper, but that might be his way of dealing with the concurrency problem. Still, even if we can guarantee that no resources are improperly shared across threads, we still might end up forcing calls to run in a single-threaded way. This, while behaviorally correct, is quite disappointing, especially to the Haskell user, since one of Haskell's primary advantages is it's excellence in the concurrency domain (i.e. if I wanted single-threaded MySQL code, I could get just as much performance with Python or Ruby!). Albeit, correctness is most important in every case.

@feuerbach
Contributor

Thanks for pointing out the two versions, I wasn't aware of that.

It does not do any explicit handling (mysql_thread_init and mysql_thread_end) for each capability (but I'm unsure if it's strictly necessary as @bos has written it since certain other C API calls will cover this case for you).

It is true that Haskell's connect will initialize the thread (or are you referring to other calls as well?). But if you use a resource pool, connect may happen on a different OS thread from the thread that'll run the query. The latter thread may be uninitialized, which, according to the docs, may lead to a segfault.

Nowhere in mysql-simple's docs does it mention the need for bound threads or the limitations of simultaneous, concurrent MySQL calls.

I'm not sure if you're pointing out the need to improve the docs or appealing to them. We already established that mysql-simple doesn't work well with concurrency, so I wouldn't trust the docs much either.

@bos's use of unsafe everywhere seems improper, but that might be his way of dealing with the concurrency problem

Not only does it seem improper, it is improper as my test case above shows. A deadlock is not considered behaviorally correct.

@feuerbach
Contributor

To clarify: the need for bound threads only arises once you use safe calls (and, hopefully, do proper thread initialization).

With unsafe calls they make no differece IIUC.

@3noch
3noch commented Apr 15, 2015

The latter thread may be uninitialized, which, according to the docs, may lead to a segfault.

Right, that is one case I'm thinking of. Also, using fold, the Haskell handler callback could block causing the runtime to move it to a different [potentially uninitialized] capability (I might be wrong here, I'm not expert in GHC's runtime).

To clarify: the need for bound threads only arises once you use safe calls.

This is why I was unsure if @bos intentionally used unsafe everywhere or if it was due to a [common] misunderstanding of the difference between safe and unsafe (I wouldn't expect such a misunderstanding from him!).

So the next question is, if @bos is too preoccupied to fix this himself (as it seems to be the case from his involvement so far), can we fix it ourselves? I'm happy to help out, but with rather limited time (otherwise I would have fixed it by now!).

@3noch
3noch commented Apr 15, 2015

Nowhere in mysql-simple's docs does it mention the need for bound threads or the limitations of simultaneous, concurrent MySQL calls.

I'm not sure if you're pointing out the need to improve the docs or appealing to them. We already established that mysql-simple doesn't work well with concurrency, so I wouldn't trust the docs much either.

Sorry. I was merely pointing out that it doesn't seem @bos made these decisions with extreme intentionality (regarding which library to link against and the use of unsafe).

@feuerbach
Contributor

So the next question is, if @bos is too preoccupied to fix this himself (as it seems to be the case from his involvement so far), can we fix it ourselves? I'm happy to help out, but with rather limited time (otherwise I would have fixed it by now!).

Sure, that's what I intend to do within a week.

@3noch
3noch commented Apr 15, 2015

@feuerbach Excellent! Let me know if I can help in any way. I'll gladly do testing. I have a tool that never made it to production because its interaction with MySQL was abysmally slow (highly concurrent design).

@feuerbach
Contributor

@3noch great, I'd appreciate review and testing. Also, it'd be helpful if you could figure out the situation with the two libraries (if not, I'll get to it sooner or later).

@taladar
taladar commented Apr 15, 2015

Since MySQL 5.5 both are the thread-safe version of the library, the libmysqlclient_r.so is just a symlink to the other one. (see the last paragraph here: http://dev.mysql.com/doc/refman/5.5/en/c-api-implementations.html ). That said at least the mysql_config Tool shipped with MariaDB currently gets things wrong (see commit on my fork) as far as other libraries (pthread,...) to add to the linker commandline goes. Simultaneously their libmysqlclient package on Debian contains broken symlinks from libmysqclient_r.so to libmysqlclient.so.

In my experience mysql-simple works just fine concurrently as far as libmysqlclient is concerned, the problem are the unsafe foreign calls that can lead to deadlocks when transactions in Haskell threads that can not run (e.g. because they are bound threads and their task (OS thread) is blocked by another unsafe foreign call) hold locks in MySQL. Of course you need to use at least one separate connection per Haskell thread, otherwise you run into the problems described above by feuerbach as well as general confusion about interleaved statements.

@3noch
3noch commented Apr 15, 2015

@feuerbach @taladar So it sounds like it's safe to assume that libmysqlclient exports the thread-related functions.

@feuerbach
Contributor

I've summarized my research here: https://ro-che.info/articles/2015-04-17-safe-concurrent-mysql-haskell

As I said, I'd appreciate review, feedback, and testing.

@taladar
taladar commented Apr 17, 2015

Your issue 1 can also happen in the presence of bound threads if the thread holding the lock on the Mysql side is a bound thread and its task (OS thread) is currently blocked by an unsafe call, even if plenty of free capabilities are left. It can also block other bound threads, most notably the main thread (which in e.g. many web applications does the work dispatching and so no new requests are processed at all while it is prevented from running).

As for issue 2, we do have a fair number of highly concurrent applications using mysql in production and we never ran into any crashes related to thread local state so I would say your assumption that it is currently not in use on the client is correct. I am not sure if forcing users to create a bound thread for this reason alone makes sense. It should probably get a prominent warning in the documentation though.

Issue 3 could maybe be solved by some kind of MVar mechanism that ensures that the implicit calls don't happen concurrently? I believe someone recently posted on the /r/haskell subreddit about doing something to make the (similar) network initialization calls on Windows implicit?

I also wonder if it would be feasible to store some kind of information about the initialization state of the OS threads in such a way that we could access it without too much performance overhead and just initialize any OS thread which is not initialized yet implicitly.

@feuerbach
Contributor

@taladar I don't know if that's what you describe, but there appears to be a big problem with bound threads indeed.

Consider my test program above. One thread starts a transaction, another does too, then blocks.

If both threads are bound to the same OS thread, then the second thread would again prevent the first one from executing. Bummer.

@feuerbach
Contributor

Although from studying the rts I'm getting the impression that an OS thread is never shared by two bound threads. Which makes perfect sense — otherwise they would overwrite each other's thread-local state.

@feuerbach
Contributor

Yep, that's right. forkOS calls forkOS_createThread (in rts/posix/OSThreads.c), which always creates a new pthread. @taladar does that address your concern?

@3noch
3noch commented Apr 17, 2015

@feuerbach

OS thread is never shared by two bound threads

That would be good news indeed.

@taladar

also wonder if it would be feasible to store some kind of information about the initialization state of the OS threads in such a way that we could access it without too much performance overhead and just initialize any OS thread which is not initialized yet implicitly.

That's exactly what I was thinking too! See my first comments above:

I can think of some last-resort mechanisms (keep a thread registration checklist to be checked for each API call or the introduction of a new API e.g. withMySqlThread) that would allow support albeit not ideally.

@taladar
taladar commented Apr 18, 2015

@feuerbach Changing all the foreign calls to be safe calls does address the concern. I just meant that the issue 1 in your post can occur even long before all capabilities are used up as long as we use unsafe foreign calls because the unsafe foreign calls can potentially block the task (OS thread) of any random bound thread and thus block it from ever running again until the unsafe foreign call returns. Really unsafe foreign calls should never be used for anything that has a chance to block or run for a long time for this reason.

@feuerbach
Contributor

the unsafe foreign calls can potentially block the task (OS thread) of any random bound thread

No, a bound task will only ever execute its own bound thread.

@taladar
taladar commented Apr 19, 2015

While that is true the reverse is not. The unsafe foreign call in an unbound Haskell thread might very well execute in the same OS thread that is bound to some other Haskell thread.

There is also the case of calls into Haskell where several bound Haskell threads end up sharing an OS thread but I do not think we need to worry about that one here since all but one of those Haskell threads are in foreign calls themselves.

All the details should be in these links
https://www.vex.net/~trebla/haskell/ghc-conc-ffi.xhtml
http://blog.ezyang.com/2014/12/unintended-consequences-bound-threads-and-unsafe-ffi-calls/
http://blog.melding-monads.com/2011/10/24/concurrency-and-foreign-functions-in-the-glasgow-haskell-compiler/
https://ghc.haskell.org/trac/ghc/wiki/Commentary/Rts/Scheduler#TheScheduler

@feuerbach
Contributor

While that is true the reverse is not. The unsafe foreign call in an unbound Haskell thread might very well execute in the same OS thread that is bound to some other Haskell thread.

No, and your statement is not a reverse of mine. Recall that Task simply means an OS thread in the RTS. What I'm saying is (I realize my wording was somewhat ambiguous, so let me rephrase): task may run thread (meaning a Haskell thread, or TSO) iff thread is bound to task. What you are saying is that task may also run some other, unbound thread. But that's in direct contradiction with my claim, because that other thread is not bound to task; in fact, it is not bound to any thread, because it's unbound. So it'll never run by any bound thread.

Here is the relevant scheduler code that handles the case when a bound task attempts to run an unbound thread.

Now, of those many links, I assume you're alluding to Edward Yang's post. As Simon Marlow explains in the comments, the phenomenon has nothing to do with bound threads; it happens because the unsafe call grabs a capability.

So let me amend your earlier comment:

the issue 1 in your post can occur even long before all capabilities are used up as long as we use unsafe foreign calls because the unsafe foreign calls can potentially block the task (OS thread) capability of any random bound thread and thus block it from ever running again until the unsafe foreign call returns

I should also clarify that a task (even a bound one) is not, in general, stuck to a capability; but in that specific case, since the capability is blocked by an unsafe call, it doesn't get a chance to run the scheduler and push that task out to other capabilities.

@taladar
taladar commented Apr 19, 2015

What does task->incall->tso actually represent in that piece of code? I am just wondering if that case only applies to Tasks created by calls into Haskell. I do know there is special handling for that case to always create a Haskell thread bound to the OS thread the incall originates from (so if the Haskell code is come kind of callback handler it can do calls into the C library in the same thread).

@feuerbach
Contributor

What does task->incall->tso actually represent in that piece of code?

typedef struct InCall_ {
    StgTSO *   tso;             // the bound TSO (or NULL for a worker)

I am just wondering if that case only applies to Tasks created by calls into Haskell.

Yes, but that's the only way to create a bound task/thread. forkOS works by foreign-calling an IO action.

@taladar
taladar commented Apr 19, 2015

Okay...so that means you are correct, each task only runs its associated bound thread if it has one. In any case it shouldn't matter once all foreign calls are marked safe since a foreign call won't be able to block a task holding a capability anymore then.

@3noch
3noch commented Apr 28, 2015

@feuerbach How is it progressing?

@feuerbach
Contributor

@3noch my job is done here: I've made and incorporated the necessary patches on my fork and documented how to use it properly.

It would be nice if Bryan pulled the changes and made a release, but anyone who's using mysql in their software should be able to use the fixed version.

@3noch
3noch commented Apr 28, 2015

@feuerbach Great work! I'd like to integrate your work into mysql-simple but of course I cannot since this package is still broken. I believe this issue seriously cripples Haskell's involvement in the MySQL world. Is there any way to push harder to get this merged in, or get new ownership on Hackage?

@feuerbach
Contributor

Using a forked package isn't as hard as you may think; I wrote about it here. You have plenty of options there.

@3noch
3noch commented Apr 28, 2015

@feuerbach That may be true, but still the vast majority of Haskellers (especially newcomers) won't know to use your fork instead of this package. Have you considered putting yours up on hackage? If you did, we could try to get this one deprecated.

@feuerbach
Contributor

Forking would require changing (or forking again) the whole chain: mysql-simple, persistent-mysql etc.

You could start by trying to reach out to Bryan via some alternative media (email, twitter etc.)

If that fails, and if you want to overtake the package, you should follow the procedure. Or if you want to pursue the fork option, feel free to upload it to hackage yourself.

I'm not really interested in taking this any further myself.

@3noch
3noch commented May 2, 2015

@feuerbach Would you be willing to submit a PR with your changes to this repo? That would make @bos's job very easy. If not, I could do it.

@feuerbach
Contributor

I don't believe this makes any difference, but if you do, go ahead.

@saurabhnanda saurabhnanda referenced this issue in lpsmith/postgresql-libpq Aug 25, 2016
Closed

Just confirming: Concurrency issues? #33

@paul-rouse
Owner

Closed via #15

@paul-rouse paul-rouse closed this Oct 24, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment