Segfault: Calling finufft and fftw in threaded loop #62

jkrimmer · 2024-08-22T11:50:30Z

Unfortunately, I stumbled another segmentation fault caused by multithreading in julia: Calling both finufft and fftw in a threaded loop yields a segmentation fault (Windows and Linux). Here, the order in which finufft and fftw are called does not matter.

MWE

Make sure to initialize julia with multiple threads, e.g., run julia --threads=auto.

using FFTW
using FINUFFT

function _inner(N)
    y = randn(ComplexF64, N)
    x = randn(N)
    nufft1d1(x, y, 1, 1e-8, N, nthreads=1)
    # fft(x)
end

function issue(iters=1000, N=1024; threaded=false)
    if threaded
        Threads.@threads for k in 1:iters
            _inner(N)
        end
    else
        for k in 1:iters
            _inner(N)
        end
    end
end

@info "single-threaded loop" 
issue()
@info "multi-threaded loop"
issue(threaded=true)

Note: Commenting the fft-call or the nufft1d1-call makes the issue disappear.

General information:
julia v1.10.4
FINUFFT v3.2.0
finufft_jll v2.2.0+2

Potential but suboptimal solution

The only mitigation approaches I have found so far is going back to libfinufft_jll v2.1.0 (where flatironinstitute/finufft#354 has not been merged) or building a more recent version where fftw_make_planner_thread_safe is re-introduced which is probably not desired. Alternatively, building finufft with DUCC instead of FFTW works as well.

The text was updated successfully, but these errors were encountered:

ahbarnett · 2024-08-22T22:20:24Z

I just ran your example on 5700U ryzen 2, 8-core (16 logical threads, which julia chooses), ubuntu 22.04, Julia 1.10.0, FINUFFT v3.2.0, finufft_jll v2.2.0+2, FFTW v1.8.0, FFTW_jll v3.3.10+0, and, yes, *only* when I uncomment the fft(x) line, it crashes. Thanks for the MWE. The implication that nthreads=1 julia nufft calls are not thread-safe. Output: ``` julia> for i=1:100; issue(threaded=true); end double free or corruption (fasttop) [48264] signal (6.-6): Aborted in expression starting at REPL[22]:1 pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) unknown function (ip: 0x77568f689675) unknown function (ip: 0x77568f6a0cfb) unknown function (ip: 0x77568f6a29c9) free at /lib/x86_64-linux-gnu/libc.so.6 (unknown line) forget at /home/alex/.julia/artifacts/e95ca94c82899616429924e9fdc7eccda275aa38/lib/libfftw3.so (unknown line) mkplan at /home/alex/.julia/artifacts/e95ca94c82899616429924e9fdc7eccda275aa38/lib/libfftw3.so (unknown line) fftw_mkapiplan at /home/alex/.julia/artifacts/e95ca94c82899616429924e9fdc7eccda275aa38/lib/libfftw3.so (unknown line) fftw_plan_guru64_dft at /home/alex/.julia/artifacts/e95ca94c82899616429924e9fdc7eccda275aa38/lib/libfftw3.so (unknown line) macro expansion at /home/alex/.julia/packages/FFTW/6nZei/src/fft.jl:652 [inlined] cFFTWPlan at /home/alex/.julia/packages/FFTW/6nZei/src/FFTW.jl:49 #plan_fft#10 at /home/alex/.julia/packages/FFTW/6nZei/src/fft.jl:782 [inlined] plan_fft at /home/alex/.julia/packages/FFTW/6nZei/src/fft.jl:772 [inlined] fft at /home/alex/.julia/packages/AbstractFFTs/4iQz5/src/definitions.jl:67 fft at /home/alex/.julia/packages/AbstractFFTs/4iQz5/src/definitions.jl:214 [inlined] fft at /home/alex/.julia/packages/AbstractFFTs/4iQz5/src/definitions.jl:66 [inlined] _inner at ./REPL[20]:5 macro expansion at ./REPL[4]:4 [inlined] #2#threadsfor_fun#2 at ./threadingconstructs.jl:214 #2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined] #1 at ./threadingconstructs.jl:153 unknown function (ip: 0x77568e51c8b2) _jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined] ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined] start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238 Allocations: 13790939 (Pool: 12665074; Big: 1125865); GC: 273 Aborted (core dumped) ``` Of course, you could try the DUCC option in v2.3.0-rc1, but I don't know if Ludvig has that yet. The PR #354 is supposed to prevent rather than cause segfaults, of course, so this is rather upsetting - I hope Ludvig has some insight. Thanks, Alex

…

On Thu, Aug 22, 2024 at 7:51 AM Jonas Krimmer ***@***.***> wrote: Unfortunately, I stumbled another segmentation fault caused by multithreading in julia: Calling both finufft and fftw in a threaded loop yields a segmentation fault (Windows and Linux). Here, the order in which finufft and fftw are called does not matter. MWE Make sure to initialize julia with multiple threads, e.g., run julia --threads=auto. using FFTW using FINUFFT function _inner(N) y = randn(ComplexF64, N) x = randn(N) nufft1d1(x, y, 1, 1e-8, N, nthreads=1) # fft(x) end function issue(iters=1000, N=1024; threaded=false) if threaded ***@***.*** for k in 1:iters _inner(N) end else for k in 1:iters _inner(N) end end end @info "single-threaded loop" issue() @info "multi-threaded loop" issue(threaded=true) julia v1.10.4 FINUFFT v3.2.0 finufft_jll v2.2.0+2 Potential but suboptimal solution The only mitigation approaches I have found so far is going back to libfinufft_jll v2.1.0 (where flatironinstitute/finufft#354 <flatironinstitute/finufft#354> has not been merged) or building a more recent version where fftw_make_planner_thread_safe is re-introduced which is probably not desired. — Reply to this email directly, view it on GitHub <#62>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNZRSQEVXWR6FW2A6KL6L3ZSXGBVAVCNFSM6AAAAABM57CGWCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGQ4DANJYHE3DSMY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

jkrimmer · 2024-08-23T08:06:44Z

Hi Alex,

Thank you for looking into this issue. Fortunately, I have already been able to test the DUCC option which resolves this issue. Maybe there is some clash between the FFTW-planner calls originating from julia and the ones from within finufft?

Thanks
Jonas

ludvigak · 2024-08-23T13:18:25Z

Well, I guess the good news is that thanks to @jkrimmer we now have working crash examples for Windows, Linux, and MacOS!

Not sure if I have any insight though, seems to me like the fundamental issue is the FFTW/FINUFFT interplay with threading active, but then we should be able to reproduce this directly in C++ code.

jkrimmer · 2024-08-26T17:17:07Z

Finally, I have been able to reproduce the issue in C++. It seems like, the finufft-internal fftw-lock does not prevent other calls to fftw from building an FFTW-plan. Please find an MWE (on Ubuntu, this MWE builds with g++ test_finufft_crash.cpp -o test_finufft_crash -I[include] -L[lib] -lfinufft -lfftw3_threads -lfftw3 -fopenmp -Wl,-rpath=[lib] -std=c++11) in the collapsed section below. Commenting out the lock around the finufft-planner-stage prevents the segfault.

MWE

#include <iostream>
#include <complex>
#include <vector>
#include <random>
#include <mutex>

#include <fftw3.h>
#include <finufft.h>
#include <omp.h>

using namespace std;

#define N 65384

int main() {
    int64_t Ns[3]; // guru describes mode array by vector [N1,N2..]
    Ns[0] = N;

    // set lock
    recursive_mutex lck;

    finufft_opts *opts = new finufft_opts;         // opts is pointer to struct
    finufft_default_opts(opts);
    opts->nthreads = 1;
    opts->debug = 0;

    // random nonuniform points (x) and complex strengths (c)...
    vector<complex<double>> c(N);

    // init FFTW threads
    fftw_init_threads();

    // FFTW and FINUFFT execution using OpenMP parallelization
    #pragma omp parallel for
    for (int j = 0; j < 100; ++j) {

        // allocate output array for FFTW...
        vector<complex<double>> F1(N);
        // FFTW plan
        lck.lock();
        fftw_plan_with_nthreads(1);
        fftw_plan plan = fftw_plan_dft_1d(N, reinterpret_cast<fftw_complex*>(c.data()), reinterpret_cast<fftw_complex*>(F1.data()), FFTW_FORWARD, FFTW_ESTIMATE);
        fftw_destroy_plan(plan);
        lck.unlock();
        
        // FINUFFT plan
        // lck.lock();
        finufft_plan nufftplan; 
        finufft_makeplan(1, 1, Ns, 1, 1, 1e-6, &nufftplan, opts);
        finufft_destroy(nufftplan);
        // lck.unlock();
    }
    return 0;
}

Similarly, we observe the same behavior in julia. Running the following code with multiple threads yields a segmentation fault on my machine

using FFTW
using FINUFFT

tmp = randn(ComplexF64, 1024)

Threads.@threads for k = 1:1000
    plan_fft(tmp)
    finufft_makeplan(1, [1024], 1, 1, 1e-6)
end

However, adding an additional lock to the finufft-planning avoids the segfault:

using FFTW
using FINUFFT

tmp = randn(ComplexF64, 1024)
lk = ReentrantLock()

Threads.@threads for k = 1:1000
    plan_fft(tmp)
     
    lock(lk)
    try
        finufft_makeplan(1, [1024], 1, 1, 1e-6)
    finally
        unlock(lk)
    end
end

ahbarnett · 2024-08-27T17:27:10Z

Robert @blackwer since you are our expert on fftw global state and thread-safety, would you be able to take a look at the above C++ MWE which causes crash apparently? (I have only verified the julia-version crash) Thanks, Alex

blackwer · 2024-09-03T14:20:43Z

I don't think there is a finufft side solution to this. There's no way for the finufft side to detect what the application side is doing. This is actually in the discussion at https://www.fftw.org/doc/Thread-safety.html, a document I have had to look at way too many times.

One proper solution is for the application to call fftw_make_planner_thread_safe() rather than finufft. finufft can do this, and in practice it's usually sufficient, but there are cases where it could still crash even with the older version that did have this call in it. E.g. if you call finufft_make_plan before you make fftw_plan_dft_1d in your MWE, this solution works. It doesn't when you call fftw_plan_dft_1d directly first. It might work, but it's technically a race condition.

Another solution would be to supply a mechanism to acquire a pointer to the finufft lock on the application side, but I don't think this is adequate for the julia case, since that'd have to end up in the fftw code, which doesn't make much sense.

Is it possible on the julia side to have the FINUFFT import make the fftw_make_planner_thread_safe() call? Otherwise I think the only solution is a custom build either with an alternative FFT implementation or with a patch to the FFTW_INIT routine (though this does have the race condition).

jkrimmer · 2024-09-04T13:24:38Z

Thank you very much for the detailed explanation! I assume that making the call to fftw_make_planner_thread_safe during the FINUFFT initialization in julia should work and provide a quick fix.

Lately, I have been considering another approach: The FFTW-interface FFTW.jl uses julia's ReentrantLock() to protect the FFTW-planning stage from race conditions. If we could pass an application defined lock/locking function to the finufft library, e.g., based on this ReentrantLock, the race conditions would also be avoided in my understanding. Do you think this approach could work as well?

blackwer · 2024-09-04T14:55:25Z

Lately, I have been considering another approach: The FFTW-interface FFTW.jl uses julia's ReentrantLock() to protect the FFTW-planning stage from race conditions. If we could pass an application defined lock/locking function to the finufft library, e.g., based on this ReentrantLock, the race conditions would also be avoided in my understanding. Do you think this approach could work as well?

This definitely works, and can be done straightforwardly without having to extend the API (by having two void (func*)(void *) function pointers on the plan and an optional void * for the lock data). This was very straightforward to implement on the finufft side, so could feasibly make it into 2.3 (maybe?). I've written an example implementation at https://github.com/blackwer/finufft/tree/user_fftw_lock with new driver code

MWE

#include <complex>
#include <vector>
#include <mutex>

#include <fftw3.h>
#include <finufft.h>
#include <omp.h>

using namespace std;

#define N 65384

void locker(void *lck) { reinterpret_cast<recursive_mutex *>(lck)->lock(); }
void unlocker(void *lck) { reinterpret_cast<recursive_mutex *>(lck)->unlock(); }

int main() {
    int64_t Ns[3]; // guru describes mode array by vector [N1,N2..]
    Ns[0] = N;
    recursive_mutex lck;

    finufft_opts opts;
    finufft_default_opts(&opts);
    opts.nthreads = 1;
    opts.debug = 0;
    opts.fftw_lock_fun = locker;
    opts.fftw_unlock_fun = unlocker;
    opts.fftw_lock_data = reinterpret_cast<void *>(&lck);

    // random nonuniform points (x) and complex strengths (c)...
    vector<complex<double>> c(N);

    // init FFTW threads
    fftw_init_threads();

    // FFTW and FINUFFT execution using OpenMP parallelization
    #pragma omp parallel for
    for (int j = 0; j < 100; ++j) {

        // allocate output array for FFTW...
        vector<complex<double>> F1(N);

        // FFTW plan
        lck.lock();
        fftw_plan_with_nthreads(1);
        fftw_plan plan = fftw_plan_dft_1d(N, reinterpret_cast<fftw_complex*>(c.data()), reinterpret_cast<fftw_complex*>(F1.data()), FFTW_FORWARD, FFTW_ESTIMATE);
        fftw_destroy_plan(plan);
        lck.unlock();

        // FINUFFT plan
        finufft_plan nufftplan;
        finufft_makeplan(1, 1, Ns, 1, 1, 1e-6, &nufftplan, &opts);
        finufft_destroy(nufftplan);
    }
    return 0;
}

@ahbarnett thoughts?

DiamonDinoia · 2024-09-04T18:40:35Z

This is interesting, having a global lock passed in c++ is not an issue. But what about having a Julia lock passed to finufft?

Alternatively, one has to acquire the lock in Julia and then do the entire finufft_makeplan in a critical section. Which feels wasteful as the only non thread safe code is fftw_plan.

As for 2.3.0, I would not add this to it. I would do a 2.3.1 or put it in 2.4.0 if we release this soon. 2.3.0 has been on the working for too long we should release it asap :)

blackwer · 2024-09-04T19:46:53Z

This is interesting, having a global lock passed in c++ is not an issue. But what about having a Julia lock passed to finufft?

You should be able to pass from any language that lets you make a standard C-style callback function, which julia definitely lets you do, even with closures. Python does too, if you could make this be a problem there with its GIL limited threading. https://julialang.org/blog/2013/05/callback/ has details on how do do this in julia.

As for 2.3.0, I would not add this to it. I would do a 2.3.1 or put it in 2.4.0 if we release this soon. 2.3.0 has been on the working for too long we should release it asap :)

I'm in no rush, and if Jonas is happy with the stop-gap fftw_make_planner_thread_safe option, then I am. But the provided solution does already work (I think I just fixed the last issue) and is about as language agnostic as possible with a very small code footprint.

ahbarnett · 2024-09-04T20:56:53Z

Hi Robert, I think this is a great idea, and should go in 2.3.1 or 2.4, since it changes the opts struct (which requires changing ctypes for py, fortran module, docs, etc). Thanks for the demonstration! It will only be used by power users, but there is no harm in having it, and gives flexibility in the future. I opened it as a PR to FINUFFT master. The jl wrapper could then provide the power jl used with ability to pass in a lock from FFTW.jl It's amazing what a mess having multiple packages call the same library with non-thread-safe functions, is. While DUCC gets rid of this problem, there will always be users who want FFTW due to faster in certain cases.

…

On Wed, Sep 4, 2024 at 3:47 PM Robert Blackwell ***@***.***> wrote: This is interesting, having a global lock passed in c++ is not an issue. But what about having a Julia lock passed to finufft? You should be able to pass from any language that lets you make a standard C-style callback function, which julia definitely lets you do, even with closures. Python does too, if you could make this be a problem there with its GIL limited threading. https://julialang.org/blog/2013/05/callback/ has details on how do do this in julia. As for 2.3.0, I would not add this to it. I would do a 2.3.1 or put it in 2.4.0 if we release this soon. 2.3.0 has been on the working for too long we should release it asap :) I'm in no rush, and if Jonas is happy with the stop-gap fftw_make_planner_thread_safe option, then I am. But the provided solution does already work (I think I just fixed the last issue) and is about as language agnostic as possible with a very small code footprint. — Reply to this email directly, view it on GitHub <#62 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ACNZRSQTRYS73P3KCZIQJ4DZU5PUDAVCNFSM6AAAAABM57CGWCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRZHA2DKNZQGQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- *-------------------------------------------------------------------~^`^~._.~' |\ Alex Barnett Center for Computational Mathematics, Flatiron Institute | \ http://users.flatironinstitute.org/~ahb 646-876-5942

jkrimmer · 2024-09-09T11:36:33Z

Dear all,

Thanks a lot for your work on this issue! I have successfully tested the user-locking approach by @blackwer in julia over at https://github.com/jkrimmer/FINUFFT.jl/tree/user_fftw_lock. However, until the required library adaptations are merged, we can use fftw_make_planner_thread_safe (see #66 ) to resolve this issue.

Cheers

jkrimmer mentioned this issue Aug 22, 2024

Segfault: finufft in multithreaded loop on Mac OS #63

Open

jkrimmer mentioned this issue Sep 9, 2024

Temporarily use safe FFTW planning #66

Merged

jkrimmer closed this as completed Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault: Calling finufft and fftw in threaded loop #62

Segfault: Calling finufft and fftw in threaded loop #62

jkrimmer commented Aug 22, 2024 •

edited

Loading

ahbarnett commented Aug 22, 2024 via email

jkrimmer commented Aug 23, 2024

ludvigak commented Aug 23, 2024

jkrimmer commented Aug 26, 2024

ahbarnett commented Aug 27, 2024

blackwer commented Sep 3, 2024

jkrimmer commented Sep 4, 2024

blackwer commented Sep 4, 2024 •

edited

Loading

DiamonDinoia commented Sep 4, 2024

blackwer commented Sep 4, 2024

ahbarnett commented Sep 4, 2024 via email

jkrimmer commented Sep 9, 2024

Segfault: Calling finufft and fftw in threaded loop #62

Segfault: Calling finufft and fftw in threaded loop #62

Comments

jkrimmer commented Aug 22, 2024 • edited Loading

MWE

Potential but suboptimal solution

ahbarnett commented Aug 22, 2024 via email

jkrimmer commented Aug 23, 2024

ludvigak commented Aug 23, 2024

jkrimmer commented Aug 26, 2024

ahbarnett commented Aug 27, 2024

blackwer commented Sep 3, 2024

jkrimmer commented Sep 4, 2024

blackwer commented Sep 4, 2024 • edited Loading

DiamonDinoia commented Sep 4, 2024

blackwer commented Sep 4, 2024

ahbarnett commented Sep 4, 2024 via email

jkrimmer commented Sep 9, 2024

jkrimmer commented Aug 22, 2024 •

edited

Loading

blackwer commented Sep 4, 2024 •

edited

Loading