Add GPU support#40
Open
hakkelt wants to merge 24 commits into
Open
Conversation
14de6d5 to
991a4a0
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #40 +/- ##
==========================================
- Coverage 89.28% 87.13% -2.16%
==========================================
Files 45 51 +6
Lines 3267 3716 +449
==========================================
+ Hits 2917 3238 +321
- Misses 350 478 +128 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
e65e941 to
88f5307
Compare
Add array_type keyword constructors and domain_storage_type/codomain_storage_type storage type traits to all operators. Add core GPU extension (GpuExt) with operator overrides for GetIndex, Variation, and ZeroPad. Add GPU extensions for DSPOperators, FFTWOperators, NFFTOperators, and WaveletOperators subpackages. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Update test infrastructure to support GPU (JLArray) testing. Add :jlarray tags to relevant testitems. Add gpu_utils.jl helper. Add GPU quality tests. Update operator testitems with proper tags (:linearoperator, :nonlinearoperator, etc.). Rename CpuWrapper tests to OperatorWrapper. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Updated tests for Eye, FiniteDiff, GetIndex, L-BFGS, LMatrixOp, MatrixOp, MyLinOp, Variation, Zeros, and nonlinear operators to utilize GPUEnv for backend management. - Removed specific CUDA and AMDGPU checks, replacing them with a loop over available GPU backends. - Simplified test setups by eliminating redundant code and ensuring compatibility with various GPU array types. - Ensured all tests are now tagged appropriately for GPU execution without dependency on specific GPU libraries.
- Updated various test files to replace domain_storage_type and codomain_storage_type with domain_array_type and codomain_array_type for consistency and clarity. - Removed unnecessary verbose print statements in tests to streamline output. - Adjusted GPU-related tests to ensure proper handling of array types. - Ensured that all tests maintain functionality while improving readability and maintainability.
…y backend-specific limitations
bb22286 to
5bd4ccf
Compare
…n storage type parameters
… documentation CI job
Co-authored-by: Copilot <copilot@github.com>
Replace the FFT-based adjoint mul! with a tiled FIR direct convolution on
CPU paths (H <: Array{T}). The GPU fallback keeps the FFT-based approach.
Algorithm: y[j] = Σ_k h[k] * b[padlen+j-k], unrolled 8-wide so all
accumulators (a0..a7) live in registers. Reads b[base:base+7] consecutively
per inner k-iteration (cache-friendly), writes y only once.
Benchmark (n=32768, h length 21, Float64, 1 FFTW thread):
- Before (65536-pt FFT): ~484 μs
- After (tiled FIR): ~107 μs (~4.5× speedup)
Baseline on benchmark machine was ~418 μs, so this should close the
regression seen in PR kul-optec#40.
- Add XcorrAdjFFT helper struct carrying the adjoint FFT buffers and plans
- Xcorr.adj_fft is Nothing for CPU arrays (H <: Array) — no adjoint FFT
buffers are allocated; the tiled FIR path (mul! on Xcorr{<:Array}) is used
- For GPU backends adj_fft is a XcorrAdjFFT; the FFT-based adjoint dispatches
on <:XcorrAdjFFT instead of the old top-level struct fields
- Add _xcorr_plan_kwargs helper: passes flags=FFTW.MEASURE for CPU Arrays only,
no flags for GPU backends — fixes FFTW.MEASURE incompatibility with cuFFT
Benchmark Results (Julia v1.12.6)🚀 6 benchmarks improved in time · 🚀 1 benchmark use less memory Time benchmarks
Memory benchmarks
|
Collaborator
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
High-Level Overview
This PR adds GPU support for all operators except WaveletOperator.
Filt,MIMOFilt,DFT,SignAlternation,NFFTOp,GetIndex,Variation, andZeroPad.AcceleratedDCTs. I didn't want to add this package to the dependencies of theFFTWOperatorssubpackage, so I implemented an extension that activates in the presence ofAcceleratedDCTs. It's an open question whether it's a good design:OperatorWrapper, that allows the combination of CPU operators with GPU operators:API Changes
array_type, that defaults toArraybut allows specifying other computing backends (e.g.,CuArray,RocArray,MArray, etc.)domain_storage_typeandcodomain_storage_typeare renamed todomain_array_typeandcodomain_array_typeList of Changes
Added GPU support across the operator stack with backend-specific extensions.
GpuExtfor the core package.DSPOperators,FFTWOperators, andNFFTOperators.AcceleratedDCTsintegration soDCT/IDCTcan run on GPU when that package is imported.WaveletOperatorsCPU-only and documented that limitation explicitly.Refactored operator internals to reduce type instability and improve composition behavior.
Compose,HCAT,VCAT,DCAT,Ax_mul_Bx,Ax_mul_Bxt, andAxt_mul_Bx.Reworked FFT and NFFT operator implementations.
IRDFT,RDFT,DCT,Shift, and FFT combination rules.array_type.Overhauled tests and test infrastructure.
domain_array_typeandcodomain_array_type.Added benchmarking support.
benchmark/gpu_crossover.jl.Expanded and corrected documentation.
docs/src/gpu.md.README.mdand subpackage READMEs.Updated package metadata.