Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent segfaults with map_rect and TBB #1637

Closed
HAMdetector opened this issue Jan 22, 2020 · 6 comments
Closed

Intermittent segfaults with map_rect and TBB #1637

HAMdetector opened this issue Jan 22, 2020 · 6 comments
Assignees
Labels
Milestone

Comments

@HAMdetector
Copy link

Description

Running a model with map_rect and multithreading enabled results in segfaults in around 5% of the cases. The issue only occurs on v2.21.0, not on v2.20.0 and could be reproduced by @wds15.
When the segfault occurs, results are still written to disk.

Example

To reproduce, compile the attached model (taken from the stan user manual) and repeatedly run it with the attached data (though the segfaults seem to occur independently of the model or data).
Observed output (~5% of the runs):

Elapsed Time: 0.482551 seconds (Warm-up)
0.6037 seconds (Sampling)
1.08625 seconds (Total)

[2] 88862 segmentation fault /home/hbr/sync/test/map_rect_model sample data

Link to discourse thread: https://discourse.mc-stan.org/t/intermittent-segfaults-with-map-rect-and-tbb/12750

g++ -v

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-solus-linux/9/lto-wrapper
Target: x86_64-solus-linux
Configured with: …/configure --prefix=/usr --with-pkgversion=Solus --libdir=/usr/lib64 --libexecdir=/usr/lib64 --with-system-zlib --enable-shared --enable-threads=posix --enable-gnu-indirect-function --enable-__cxa_atexit --enable-plugin --enable-gold --enable-ld=default --enable-clocale=gnu --enable-multilib --with-multilib-list=m32,m64 --enable-lto --with-gcc-major-version-only --with-bugurl=https://dev.getsol.us/ --with-arch_32=i686 --enable-linker-build-id --with-linker-hash-style=gnu --with-gnu-ld --build=x86_64-solus-linux --target=x86_64-solus-linux --enable-languages=c,c++,fortran
Thread model: posix
gcc version 9.2.0 (Solus)

Expected Output

No intermittent segfaults

map_rect_model.stan.txt
data.Rdump.txt

@rok-cesnovar
Copy link
Member

rok-cesnovar commented Jan 22, 2020

Does this mean any map_rect model is affected when threading is on?

@wds15
Copy link
Contributor

wds15 commented Jan 22, 2020

See here:

stan-dev/cmdstan#802 (comment)

It's not a severe problem from my view. All inferences are still correct.

I can probably fix it by tomorrow (assuming my understanding is all correct, which I think it is).

@rok-cesnovar
Copy link
Member

If you point me in the general direction I can take a stab at it today.

@wds15
Copy link
Contributor

wds15 commented Jan 22, 2020

You should compile and link the TBB in debug mode (a flag for the TBB makefiles). Then it's a matter of ensuring that the tbb scheduler and the observer which we create are deallocated in the right order. At worst we have static de-allocation catastrophe which requires a bit more thinking as to how and where we instantiate our global management stuff of the TBB.

@rok-cesnovar
Copy link
Member

rok-cesnovar commented Jan 22, 2020

This is probably stating the obvious but yeah, commenting out this line https://github.com/stan-dev/math/blob/develop/stan/math/rev/core/init_chainablestack.hpp#L52 does it. But that is obviously not the solution, the erase is the cause.

@wds15
Copy link
Contributor

wds15 commented Jan 22, 2020

There is probably a circular dependency and we need to write a destructor to handle things in the right order.

The thing is that when the tbb scheduler is torn down it will run the exit functions of the observers...but the observers are also going out of scope...hmmm...we can try to write a destructor which uses the lock mutex to ensure proper ordering...which hopefully does not cause another mess

@mcol mcol added this to the 3.1.0 milestone Jan 27, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants