-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segfault: Calling finufft and fftw in threaded loop #62
Comments
I just ran your example on 5700U ryzen 2, 8-core (16 logical threads, which
julia chooses), ubuntu 22.04, Julia 1.10.0, FINUFFT v3.2.0, finufft_jll
v2.2.0+2, FFTW v1.8.0, FFTW_jll v3.3.10+0,
and, yes, *only* when I uncomment the fft(x) line, it crashes.
Thanks for the MWE.
The implication that nthreads=1 julia nufft calls are not thread-safe.
Output:
```
julia> for i=1:100; issue(threaded=true); end
double free or corruption (fasttop)
[48264] signal (6.-6): Aborted
in expression starting at REPL[22]:1
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: 0x77568f689675)
unknown function (ip: 0x77568f6a0cfb)
unknown function (ip: 0x77568f6a29c9)
free at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
forget at
/home/alex/.julia/artifacts/e95ca94c82899616429924e9fdc7eccda275aa38/lib/libfftw3.so
(unknown line)
mkplan at
/home/alex/.julia/artifacts/e95ca94c82899616429924e9fdc7eccda275aa38/lib/libfftw3.so
(unknown line)
fftw_mkapiplan at
/home/alex/.julia/artifacts/e95ca94c82899616429924e9fdc7eccda275aa38/lib/libfftw3.so
(unknown line)
fftw_plan_guru64_dft at
/home/alex/.julia/artifacts/e95ca94c82899616429924e9fdc7eccda275aa38/lib/libfftw3.so
(unknown line)
macro expansion at /home/alex/.julia/packages/FFTW/6nZei/src/fft.jl:652
[inlined]
cFFTWPlan at /home/alex/.julia/packages/FFTW/6nZei/src/FFTW.jl:49
#plan_fft#10 at /home/alex/.julia/packages/FFTW/6nZei/src/fft.jl:782
[inlined]
plan_fft at /home/alex/.julia/packages/FFTW/6nZei/src/fft.jl:772 [inlined]
fft at /home/alex/.julia/packages/AbstractFFTs/4iQz5/src/definitions.jl:67
fft at /home/alex/.julia/packages/AbstractFFTs/4iQz5/src/definitions.jl:214
[inlined]
fft at /home/alex/.julia/packages/AbstractFFTs/4iQz5/src/definitions.jl:66
[inlined]
_inner at ./REPL[20]:5
macro expansion at ./REPL[4]:4 [inlined]
#2#threadsfor_fun#2 at ./threadingconstructs.jl:214
#2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined]
#1 at ./threadingconstructs.jl:153
unknown function (ip: 0x77568e51c8b2)
_jl_invoke at
/cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894
[inlined]
ijl_apply_generic at
/cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at
/cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982
[inlined]
start_task at
/cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238
Allocations: 13790939 (Pool: 12665074; Big: 1125865); GC: 273
Aborted (core dumped)
```
Of course, you could try the DUCC option in v2.3.0-rc1, but I don't know if
Ludvig has that yet.
The PR #354 is supposed to prevent rather than cause segfaults, of course,
so this is rather upsetting -
I hope Ludvig has some insight.
Thanks, Alex
…On Thu, Aug 22, 2024 at 7:51 AM Jonas Krimmer ***@***.***> wrote:
Unfortunately, I stumbled another segmentation fault caused by
multithreading in julia: Calling both finufft and fftw in a threaded loop
yields a segmentation fault (Windows and Linux). Here, the order in which
finufft and fftw are called does not matter.
MWE
Make sure to initialize julia with multiple threads, e.g., run julia
--threads=auto.
using FFTW
using FINUFFT
function _inner(N)
y = randn(ComplexF64, N)
x = randn(N)
nufft1d1(x, y, 1, 1e-8, N, nthreads=1)
# fft(x)
end
function issue(iters=1000, N=1024; threaded=false)
if threaded
***@***.*** for k in 1:iters
_inner(N)
end
else
for k in 1:iters
_inner(N)
end
end
end
@info "single-threaded loop"
issue()
@info "multi-threaded loop"
issue(threaded=true)
julia v1.10.4
FINUFFT v3.2.0
finufft_jll v2.2.0+2
Potential but suboptimal solution
The only mitigation approaches I have found so far is going back to
libfinufft_jll v2.1.0 (where flatironinstitute/finufft#354
<flatironinstitute/finufft#354> has not been
merged) or building a more recent version where
fftw_make_planner_thread_safe is re-introduced which is probably not
desired.
—
Reply to this email directly, view it on GitHub
<#62>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACNZRSQEVXWR6FW2A6KL6L3ZSXGBVAVCNFSM6AAAAABM57CGWCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ4DANJYHE3DSMY>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
*-------------------------------------------------------------------~^`^~._.~'
|\ Alex Barnett Center for Computational Mathematics, Flatiron Institute
| \ http://users.flatironinstitute.org/~ahb 646-876-5942
|
Hi Alex, Thank you for looking into this issue. Fortunately, I have already been able to test the DUCC option which resolves this issue. Maybe there is some clash between the FFTW-planner calls originating from julia and the ones from within finufft? Thanks |
Well, I guess the good news is that thanks to @jkrimmer we now have working crash examples for Windows, Linux, and MacOS! Not sure if I have any insight though, seems to me like the fundamental issue is the FFTW/FINUFFT interplay with threading active, but then we should be able to reproduce this directly in C++ code. |
Finally, I have been able to reproduce the issue in C++. It seems like, the finufft-internal fftw-lock does not prevent other calls to fftw from building an FFTW-plan. Please find an MWE (on Ubuntu, this MWE builds with MWE#include <iostream>
#include <complex>
#include <vector>
#include <random>
#include <mutex>
#include <fftw3.h>
#include <finufft.h>
#include <omp.h>
using namespace std;
#define N 65384
int main() {
int64_t Ns[3]; // guru describes mode array by vector [N1,N2..]
Ns[0] = N;
// set lock
recursive_mutex lck;
finufft_opts *opts = new finufft_opts; // opts is pointer to struct
finufft_default_opts(opts);
opts->nthreads = 1;
opts->debug = 0;
// random nonuniform points (x) and complex strengths (c)...
vector<complex<double>> c(N);
// init FFTW threads
fftw_init_threads();
// FFTW and FINUFFT execution using OpenMP parallelization
#pragma omp parallel for
for (int j = 0; j < 100; ++j) {
// allocate output array for FFTW...
vector<complex<double>> F1(N);
// FFTW plan
lck.lock();
fftw_plan_with_nthreads(1);
fftw_plan plan = fftw_plan_dft_1d(N, reinterpret_cast<fftw_complex*>(c.data()), reinterpret_cast<fftw_complex*>(F1.data()), FFTW_FORWARD, FFTW_ESTIMATE);
fftw_destroy_plan(plan);
lck.unlock();
// FINUFFT plan
// lck.lock();
finufft_plan nufftplan;
finufft_makeplan(1, 1, Ns, 1, 1, 1e-6, &nufftplan, opts);
finufft_destroy(nufftplan);
// lck.unlock();
}
return 0;
} Similarly, we observe the same behavior in julia. Running the following code with multiple threads yields a segmentation fault on my machine using FFTW
using FINUFFT
tmp = randn(ComplexF64, 1024)
Threads.@threads for k = 1:1000
plan_fft(tmp)
finufft_makeplan(1, [1024], 1, 1, 1e-6)
end However, adding an additional lock to the finufft-planning avoids the segfault: using FFTW
using FINUFFT
tmp = randn(ComplexF64, 1024)
lk = ReentrantLock()
Threads.@threads for k = 1:1000
plan_fft(tmp)
lock(lk)
try
finufft_makeplan(1, [1024], 1, 1, 1e-6)
finally
unlock(lk)
end
end |
Robert @blackwer since you are our expert on fftw global state and thread-safety, would you be able to take a look at the above C++ MWE which causes crash apparently? (I have only verified the julia-version crash) Thanks, Alex |
I don't think there is a One proper solution is for the application to call Another solution would be to supply a mechanism to acquire a pointer to the Is it possible on the |
Thank you very much for the detailed explanation! I assume that making the call to Lately, I have been considering another approach: The FFTW-interface |
This definitely works, and can be done straightforwardly without having to extend the API (by having two MWE#include <complex>
#include <vector>
#include <mutex>
#include <fftw3.h>
#include <finufft.h>
#include <omp.h>
using namespace std;
#define N 65384
void locker(void *lck) { reinterpret_cast<recursive_mutex *>(lck)->lock(); }
void unlocker(void *lck) { reinterpret_cast<recursive_mutex *>(lck)->unlock(); }
int main() {
int64_t Ns[3]; // guru describes mode array by vector [N1,N2..]
Ns[0] = N;
recursive_mutex lck;
finufft_opts opts;
finufft_default_opts(&opts);
opts.nthreads = 1;
opts.debug = 0;
opts.fftw_lock_fun = locker;
opts.fftw_unlock_fun = unlocker;
opts.fftw_lock_data = reinterpret_cast<void *>(&lck);
// random nonuniform points (x) and complex strengths (c)...
vector<complex<double>> c(N);
// init FFTW threads
fftw_init_threads();
// FFTW and FINUFFT execution using OpenMP parallelization
#pragma omp parallel for
for (int j = 0; j < 100; ++j) {
// allocate output array for FFTW...
vector<complex<double>> F1(N);
// FFTW plan
lck.lock();
fftw_plan_with_nthreads(1);
fftw_plan plan = fftw_plan_dft_1d(N, reinterpret_cast<fftw_complex*>(c.data()), reinterpret_cast<fftw_complex*>(F1.data()), FFTW_FORWARD, FFTW_ESTIMATE);
fftw_destroy_plan(plan);
lck.unlock();
// FINUFFT plan
finufft_plan nufftplan;
finufft_makeplan(1, 1, Ns, 1, 1, 1e-6, &nufftplan, &opts);
finufft_destroy(nufftplan);
}
return 0;
} @ahbarnett thoughts? |
This is interesting, having a global lock passed in c++ is not an issue. But what about having a Julia lock passed to finufft? Alternatively, one has to acquire the lock in Julia and then do the entire finufft_makeplan in a critical section. Which feels wasteful as the only non thread safe code is fftw_plan. As for 2.3.0, I would not add this to it. I would do a 2.3.1 or put it in 2.4.0 if we release this soon. 2.3.0 has been on the working for too long we should release it asap :) |
You should be able to pass from any language that lets you make a standard C-style callback function, which julia definitely lets you do, even with closures. Python does too, if you could make this be a problem there with its GIL limited threading. https://julialang.org/blog/2013/05/callback/ has details on how do do this in julia.
I'm in no rush, and if Jonas is happy with the stop-gap |
Hi Robert,
I think this is a great idea, and should go in 2.3.1 or 2.4, since it
changes the opts struct (which requires changing ctypes for py, fortran
module, docs, etc).
Thanks for the demonstration!
It will only be used by power users, but there is no harm in having it, and
gives flexibility in the future.
I opened it as a PR to FINUFFT master.
The jl wrapper could then provide the power jl used with ability to pass in
a lock from FFTW.jl
It's amazing what a mess having multiple packages call the same library
with non-thread-safe functions, is.
While DUCC gets rid of this problem, there will always be users who want
FFTW due to faster in certain cases.
…On Wed, Sep 4, 2024 at 3:47 PM Robert Blackwell ***@***.***> wrote:
This is interesting, having a global lock passed in c++ is not an issue.
But what about having a Julia lock passed to finufft?
You should be able to pass from any language that lets you make a standard
C-style callback function, which julia definitely lets you do, even with
closures. Python does too, if you could make this be a problem there with
its GIL limited threading. https://julialang.org/blog/2013/05/callback/
has details on how do do this in julia.
As for 2.3.0, I would not add this to it. I would do a 2.3.1 or put it in
2.4.0 if we release this soon. 2.3.0 has been on the working for too long
we should release it asap :)
I'm in no rush, and if Jonas is happy with the stop-gap
fftw_make_planner_thread_safe option, then I am. But the provided
solution does already work (I think I just fixed the last issue) and is
about as language agnostic as possible with a very small code footprint.
—
Reply to this email directly, view it on GitHub
<#62 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACNZRSQTRYS73P3KCZIQJ4DZU5PUDAVCNFSM6AAAAABM57CGWCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRZHA2DKNZQGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
*-------------------------------------------------------------------~^`^~._.~'
|\ Alex Barnett Center for Computational Mathematics, Flatiron Institute
| \ http://users.flatironinstitute.org/~ahb 646-876-5942
|
Dear all, Thanks a lot for your work on this issue! I have successfully tested the user-locking approach by @blackwer in julia over at https://github.com/jkrimmer/FINUFFT.jl/tree/user_fftw_lock. However, until the required library adaptations are merged, we can use Cheers |
Unfortunately, I stumbled another segmentation fault caused by multithreading in julia: Calling both finufft and fftw in a threaded loop yields a segmentation fault (Windows and Linux). Here, the order in which finufft and fftw are called does not matter.
MWE
Make sure to initialize julia with multiple threads, e.g., run
julia --threads=auto
.Note: Commenting the
fft
-call or thenufft1d1
-call makes the issue disappear.General information:
julia v1.10.4
FINUFFT v3.2.0
finufft_jll v2.2.0+2
Potential but suboptimal solution
The only mitigation approaches I have found so far is going back to libfinufft_jll v2.1.0 (where flatironinstitute/finufft#354 has not been merged) or building a more recent version where
fftw_make_planner_thread_safe
is re-introduced which is probably not desired. Alternatively, building finufft with DUCC instead of FFTW works as well.The text was updated successfully, but these errors were encountered: