Threading Support

ossoso edited this page Nov 16, 2018 · 15 revisions

By default the stan-math library is not thread safe which is due to the requirement that the autodiff stack uses a global autodiff tape which records all operations of functions being evaluated. Starting with version 2.18 of stan-math threading support can be switched on using the compile-time switch STAN_THREADS. Defining this variable at compile time can be achieved by adding to make/local


Once this is set stan-math will use the C++11 thread_local facility such that the autodiff stack is maintained per thread and not anymore globally. This allows the use of autodiff in a threaded application as this enables the calculation of the derivatives of a function inside a thread (the function itself may not use threads). Only if the function to be evaluated is composed of independent tasks it maybe possible to evaluate derivatives of a function in a threaded approach. An example of a threaded gradient evaluation is the forthcoming map_rect function in stan-math.

In addition to making stan-math thread safe this also turns on parallel execution support of the map_rect function. Currently, the maximal number of threads being used by the function is controlled by the environment variable STAN_NUM_THREADS at runtime. Setting this variable to a positive integer number defines the maximal number of threads being used. In case the variable is set to the special value of -1 requests that as many threads as physical cores are being used. If the variable is not set a single thread is used. In version 2.18 any illegal value (not an integer, zero, other negative) will turn off the use of multiple threads. In future versions illegal values will cause an exception to be thrown.


Threading support requires a fully C++11 compliant compiler which has a working thread_local implementation. Below you find for each operating system what is known to work. Known to work configurations refers to run successfully by developers.

The compiler support for the C++11 thread_local keyword for the major open-source compilers is available since these versions:

  • GNU g++ 4.8.1, see here; please also add -pthread to the CXXFLAGS variable
  • clang++ 3.3, see here
    Note: clang + linux long had issues with thread_local which should be fixed with clang >=4.0

Mac OS X

Known to work:

  • macOS R toolchain, clang 4, see here
  • Apple's clang 9.1.0 (Xcode 9.1) on macOS High Sierra
  • g++ 6.4.0 from macports

Should work:

  • Apple's clang compilers support the thread_local keyword since Mac OS Sierra (Xcode 8.0)


Known to work:

  • GNU g++ 4.9 - this configuration is also tested

With clang on linux there are issues during the linking step of programs which happens on old distributions like ubuntu trusty 14.04 (the 16.04 LTS is fine). A solution can be found here. It is likely that clang 4.0 if used with libc++ 4.0 will work just fine, but developers have not yet confirmed this.


Known NOT to work:

  • RTools for Windows with GNU g++ 4.9.1 compiles all code of a Stan program when threading is turned on, but the generated program terminates without sampling.

Works for me:

  • RTools 4.0 for Windows with a port of GNU g++ 8.2 see here but that compiler can also be used with CmdStan
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.