cuFINUFFT interface #58

ludvigak · 2024-06-20T15:16:41Z

Interface to the (guru) cuFINUFFT library, using CUDA.jl for copying data to/from device.

Works just like current guru interfaces
Can be called with both host and device arrays (copies to device if needed)
Only for x86_64 at the moment, due to artifact build system

To-do list:

Wait for cufinufft_jll: [finufft] CUDA components of FINUFFT library JuliaPackaging/Yggdrasil#8928
Make sure that nothing breaks on non-GPU devices
Create new interfaces finufftDdN! ~~and cufinufftDdN~~
Document new routines
Update README
Tag new minor version

Fixes #49

ludvigak · 2024-06-27T17:29:27Z

@ahbarnett I think the CUDA interface is pretty much in place now, let me know what you think before I merge it!
The cufinufft_* functions now direct interfaces to the cuFINUFFT library, while the finufftDdN! functions are overloaded and go to cuFINUFFT when called with CUDA arrays.

ahbarnett

A100 GPU. Julia 1.9.3.

julia> ]add https://github.com/ludvigak/FINUFFT.jl#cufinufft
pkg> test FINUFFT
(wait several minutes for compilation of cuda etc...)

Test Summary: | Pass  Total  Time
FINUFFT       |   51     51  5.9s
┌ Warning: You are using a non-official build of Julia. This may cause issues with CUDA.jl.
│ Please consider using an official build from https://julialang.org/downloads/.
└ @ CUDA ~/.julia/packages/CUDA/75aiI/src/initialization.jl:180
Test Summary: | Pass  Total   Time
cuFINUFFT     |   34     34  24.0s
     Testing FINUFFT tests passed

Looks good. I seem not to have needed ]add CUDA

ahbarnett · 2024-07-03T18:56:42Z

I also played around with Float32 and benchmarking. 1d1 is 20x faster on A100 than 10threads of a top-end xeon.

You may want to include example benchmark code. Here's mine:

## Here we demo CUDA routines using the 1D type 1 transform
# single-prec speed conparison.

using FINUFFT
using LinearAlgebra
using BenchmarkTools

dtype = Float32 # Datatype for computations
tol   = 1e-5   # requested relative tolerance

# Setup problem
nj = Int(3e8)
x = pi*(1 .- 2*rand(dtype, nj)); # nonuniform points
c = rand(Complex{dtype}, nj);    # their strengths
ms = Int(1e6)                      # output size (number of Fourier modes)

# CPU computation with preallocated array
fk = Array{Complex{dtype}}(undef, ms);

@btime nufft1d1!(x, c, 1, tol, fk)       # 6 sec on 10 threads of xeon 8358
                                        # (0.16G NUpt/s)

##############################################
## Simple GPU interface for preallocated array
using CUDA # CUDA must be loaded for cuFINUFFT to be activated

# Copy input data to GPU, "_d" suffix indiciates data on device (GPU)
x_d = CuArray(x);
c_d = CuArray(c);
# Allocate CUDA aray
out_d = CuArray{Complex{dtype}}(undef, ms);
# Note: identical interface as CPU, but with CUDA arrays on device

@btime nufft1d1!(x_d, c_d, 1, tol, out_d)       # 0.28 sec on A100 (3.6G NUpt/s)
# 20x the CPU speed, float32 or 64.

# Copy results back to host memory
gpu_results = Array(out_d);
magnitude = norm(fk, Inf)
@show norm(gpu_results-fk, Inf) / magnitude     # Should be < epsilon

Thanks for the nice example code. Looks good to merge.

Bump minor version

ludvigak added 11 commits June 20, 2024 16:41

cuFINUFFT guru interface in place (no docs)

214d8b9

update tests

fd74ba8

Merge branch 'master' into cufinufft

3a2444f

Add docstrings and harmonize with CPU guru interface

20cf607

Merge branch 'master' into cufinufft

c57138c

Restructure and make CUDA usage depend on Requires

5e3251e

Move exports

3ae44c5

Fix for unavailable cufinufft_jll

3392460

Fix warning testing

45121cc

Add simple CUDA routines

959c649

CUDA demo

35d7d93

ludvigak requested a review from ahbarnett June 27, 2024 17:29

ludvigak marked this pull request as ready for review June 27, 2024 17:29

Update README.md

d90d848

ahbarnett approved these changes Jul 3, 2024

View reviewed changes

ludvigak and others added 2 commits July 5, 2024 13:02

CUDA timing code

85e7576

Update Project.toml

714ce05

Bump minor version

ludvigak merged commit c863e3e into master Jul 5, 2024
13 checks passed

ludvigak deleted the cufinufft branch July 5, 2024 11:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuFINUFFT interface #58

cuFINUFFT interface #58

ludvigak commented Jun 20, 2024 •

edited

Loading

ludvigak commented Jun 27, 2024

ahbarnett left a comment •

edited

Loading

ahbarnett commented Jul 3, 2024

cuFINUFFT interface #58

cuFINUFFT interface #58

Conversation

ludvigak commented Jun 20, 2024 • edited Loading

ludvigak commented Jun 27, 2024

ahbarnett left a comment • edited Loading

Choose a reason for hiding this comment

ahbarnett commented Jul 3, 2024

ludvigak commented Jun 20, 2024 •

edited

Loading

ahbarnett left a comment •

edited

Loading