-
-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lgamma is not thread safe #1250
Comments
Are you suggesting using the Boost version of |
It's whatever the doc string for lgamma says: http://mc-stan.org/math/d4/d84/namespacestan_1_1math.html#aead76f03bdbc60484ad760fc31bad40f |
I'm confused. The doc string for lgamma doesn't say whether to use Boost or not to implement it.
|
Or do you mean which errors it throws, which it is also silent about.
Are you saying that our API is defined by these incomplete doc strings?
… On May 23, 2019, at 2:37 PM, Bob Carpenter ***@***.***> wrote:
I'm confused. The doc string for lgamma doesn't say whether to use Boost or not to implement it.
|
Yeah, what we promise in the doxygen comments is what we're holding ourselves to in theory. Since it doesn't mention exceptions it seems like we have made no promises there. Would it be hard to match whatever behavior we currently have? |
Yes, I am suggesting to revert to commit hash I am quoting above where we replace boost with the std version. However, another area for potential trouble is performance... from my quick tests the boost version is 3x slower than the std version. While the boost version has a well defined precision according to a policy, the std version does not let us control this in any way. Thus, we may have to balance performance vs accuracy, but we don't know where we stand right now. I can start on a PR where I revert the above commit. |
We should make a note of how the behavior changes just to be polite and add it to the release notes one day (maybe put a section on that in the top of the PR or issue, something like "for release notes: lgamma now throws exceptions..." or whatever. I think that seems reasonable. Another weird thing we could do would be to use std::lgamma if we're not in threading mode, but I think that would be too confusing. |
I was hoping instead of laying down a letter of the law based on our spotty code documentation, that we'd be moving to consistent treatment of edge cases (+inf, -inf, poles, NaN, and out-of-domain inputs). I also hoped we'd be adding documentation to clarify behavior rather than avoiding it to allow us wiggle room w.r.t. backward compatibility. |
That sounds good, why don't you write up a proposal for how you want to do that as a design doc? |
Just linking previous discussion |
@wds15: does I think the issue itself could use more work. What is the ideal outcome? Will using
@seantalts: I'm not sure how a policy discussion would fall into a design doc (for code)? Maybe my imagination isn't as good as yours. |
Yeah, design doc (for code) is not the right kind - but some kind of Request For Comments draft would probably be a good way to get traction on that issue. |
@syclik Using
However, when I drop-in replaced The best outcome is to use the boost version of And yes, this problem is with probably a few more gamma functions and maybe even with more functions which we do not know about yet. |
It's not clear what we're talking about for thread safety here. You mean the ability to concurrently call a function?
The container classes (std::vector, Eigen::Matrix) are not thread safe, for example, but I don't think our library ever accesses a container concurrently. The log density accumulator isn't thread safe.
None of our autodiff classes are thread safe. Our autodiff functionals are not thread safe.
Static constant initialization is guaranteed to be thread safe. But nothing with a mutable static is going to be thread safe.
Is the Boost version of lgamma thread safe? Does it allow accuracy control? I don't think Stan has any requirements for accuracy.
Here's what the spec says for lgamma:
The POSIX version of lgamma is not thread-safe: each execution of the function stores the sign of the gamma function of arg in the static external variable signgam. Some implementations provide lgamma_r, which takes a pointer to user-provided storage for singgam as the second parameter, and is thread-safe.
Neither tgamma nor digamma have similar qualifications about thread safety.
… On May 27, 2019, at 4:00 AM, wds15 ***@***.***> wrote:
@syclik std::lgamma is not guaranteed to be thread-safe as per documentation. In fact you get some implementation on a given platform which will behave in a non-defined behaviour wrt to threading. On my macOS system the implementation obviously uses internally a global mutex to access a global variable. I am not sure how I can write a test for such an internal mechanism of the function (maybe by using negative and positive arguments, but I do not know that). Writing a test for this is very hard - right now I could only detect it using performance benchmarks which do not scale with multiple threads, but that isn't a good unit test, right?
Using std::lgamma has the disadvantages of
• having undefined behaviour across platforms wrt to threading. On macOS performance is seriously hampered when used in parallel due to an internal mutex.
• the output accuracy is uncontrolled - at least I cannot find any notes on the accuracy of that function
However, when I drop-in replaced std::lgamma with the boost version, then the boost version was ~3x slower than std.
The best outcome is to use the boost version of lgamma for all our applications and hopefully when the boost version is asked to output similar accuracy as the std version then the speed is the same of both functions.
And yes, this problem is with probably a few more gamma functions and maybe even with more functions which we do not know about yet.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
That's right - all of stan-math is not thread safe unless you run it with
Does that explain? |
I still have the question of whether Boost's lgamma is thread safe. Is the proposal just to use that instead of std::lgamma?
STAN_THREADS only makes the memory access thread safe. It doesn't make things like containers (Eigen::Matrix or std::vector) or our stan::math::vari instances thread safe for mutability.
… On May 28, 2019, at 1:37 PM, wds15 ***@***.***> wrote:
That's right - all of stan-math is not thread safe unless you run it with STAN_THREADS turned on. In that case we can run things concurrently with map_rect. In this setting we have a problem with std::lgamma for these reasons:
• There is no thread safety guarantee. So concurrent calls to std::lgamma from concurrent map_rect context's will be problematic.
• It turns out that on macOS the behavior of std::lgamma whenever threads are used is such that a global mutex is used to synchronize the calls (likely to deal with the sign of the gamma function). This synchronization makes the implementation on macOS per se thread safe, but it kills off any performance gains through parallelization.
• The worst thing is that the thread behavior of that function is implementation dependent (OS, compiler, libstdc++, whatever)
Does that explain?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
That's why we only support threading via map_rect :D |
On May 28, 2019, at 2:00 PM, seantalts ***@***.***> wrote:
STAN_THREADS only makes the memory access thread safe. It doesn't make things like containers (Eigen::Matrix or std::vector) or our stan::math::vari instances thread safe for mutability.
That's why we only support threading via map_rect :D
My understanding is that std::lgamma's lack of thread safety becomes a problem precisely because map_rect might call it from multiple threads. So I believe map_rect imposes thread-safety requirements on every single Stan function that might be evaluated in a Stan program. Generally, we haven't allowed side effects other than the autodiff stack, so that's not a problem. But this is partly why proposals to do things like statically store state to restart solvers is going to be tricky. And why we better make sure every static local variable is const.
As things stand, the output of print(...) will get garbled as the underlying I/O is thread safe but not atomic at the message level.
Aren't other issues/PRs in the works to relax the restriction of threading to map_rect? Such as allowing parallel for loops?
|
Yep. Just highlighting that the only sense of "thread safety" we need to worry about is via map_rect. If the design doc for parallel for goes through, that will need to use the same mechanisms, which I believe is the plan. |
Things need to be safe if concurrently executed in different local contexts....and static variables can be a source of problems if these are not const, yes. |
@bob-carpenter is right. Fortunately, I think We don't have this problem with MPI because it uses separate processes. Part of me thinks we should reevaluate including threading for now while we sort out these problems; we essentially need to search through all library functions we use to see if they're not thread-safe. But we can pick that discussion up on discourse. |
For now, I think it's reasonable to describe the qualities of
@wds15: we need to write a test that flexes what we think could be an issue when running the function with threading. Yes, it might not be easy, but this is a necessary condition. How else will you prevent someone in the future from looking at Re: nomenclature. I don't care if it's not called a unit test, but it's what we need to test. |
The link says
which I have always thought was a non-issue for us because nothing in Stan Math is or should be writing or reading |
The title of this issue shouldn't be "lgamma is not thread safe"; rather it should be "signgam is not thread safe". So, if it is true that no one is using For statistics in general and HMC in particular, I have a hard time thinking of an example where someone would want to access We could introduce a |
@bob-carpenter re: boost::math and thread safety: https://www.boost.org/doc/libs/1_70_0/libs/math/doc/html/math_toolkit/threads.html |
Tests are great if feasible. It can be really hard to test these threading bugs---I've seen cases that take 48 hours on 72 concurrent threads to arise.
I don't know how to stop people from calling |
That's convenient. It just says:
|
What other standard functions have the side effect of setting a global variable like |
How else will you prevent someone in the future from looking at boost::lgamma and thinking... "oh, I'll switch that to std::lgamma"?
By imposing thread safety requirements and assuming people know the language. It's how we do everything. We can't guard against generic programmer error. We have exactly the same problem with every one of our functions, though they don't come with such a tempting replacement.
If you can formulate a test that's reliable, that's great, but it's a notoriously hard problem in a multithreaded context to trigger bugs. I've seen threading bugs that took on the order of days to trigger given dozens of concurrent threads.
|
None that I can find searching for thread safety. None of the other gamma functions have the same qualification about threading in their doc. I have no idea why lgamma is different than the other lib functions.
… On May 30, 2019, at 2:17 PM, bgoodri ***@***.***> wrote:
lgamma isn't special---we have the same requirements on all of our functions. The only difference is that it has a tempting replacement in std::lgamma.
What other standard functions have the side effect of setting a global variable like signgam?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@bgoodri: it looks like functions that are related (I'm seeing the same list on different sites and can find a source if you want it): I don't know if there are other functions in Eigen or elsewhere that have these behaviors. More importantly, this means that any Math function implementation can't really use static variables to compute things. This is a new requirement. That would have been bad practice, but we haven't explicitly checked for it or forbidden it. The only places where I could think we might even attempt to do that is in optimized code. There's a lot of cleverness there with separate memory management and we should probably comb through that very carefully.
Yup. What's the best way to impose thread safety requirements?
@wds15 was able to demonstrate that |
@bgoodri: it looks like functions that are related (I'm seeing the same list on different sites and can find a source if you want it):
gamma(), lgamma(),lgammaf(),lgammal()
gamma() isn't in the standard library in C++.
lgammaf() and lgammal() are just the float and long form argument forms of lgamma().
By imposing thread safety requirements and assuming people know the language. It's how we do everything. We can't guard against generic programmer error. We have exactly the same problem with every one of our functions, though they don't come with such a tempting replacement.
Yup. What's the best way to impose thread safety requirements?
My response was above and I'll stick by it. We can't test this
programmatically in a proactive way.
If you can formulate a test that's reliable, that's great, but it's a notoriously hard problem in a multithreaded context to trigger bugs. I've seen threading bugs that took on the order of days to trigger given dozens of concurrent threads.
@wds15 was able to demonstrate that std::lgamma performs differently from boost::lgamma with a few threads. That's what generated this issue. That should be one of the tests, if not the test.
Great---that's what I meant by being able to formulate a reliable test.
|
A test for this will make everything brittle...I need to launch 4 threads and observe that the speed needs to be 4x as fast. Doubling was not enough...this implies I need 4 idle cores for the test to run fine...on top of that this the behavior on mac...I don’t know how things are under Linux or windows...it could be different there. |
Description
We have moved away from using boost functions for most math specialty functions in commit 01bb664. This affected in particular the log gamma function. Instead we now use the
std::lgamma
functions which turn out to cause problems in threading contexts. The behaviour is generally undefined wrt to threading safety, see here.On macOS Mojave the lgamma implementation of clang will introduce locking of a global mutex as can be seen by terrible scaling of multi-threaded benchmarks, see here.
This is a large problem for any threading based applications involving Stan (parallel chains or httpstan).
Example
See the discourse thread linked above for an example.
Expected Output
We should use a thread-safe version of the lgamma - and ideally one which does not lock a global mutex which drastically limits parallelisation performance under threading. At the worst we should distinguish between threading and non-threading.
Right now the behaviour of lgamma when using threads is implementation specific and generally undefined to my understanding.
Current Version:
v2.19.1
The text was updated successfully, but these errors were encountered: