Please tell us about your use of FINUFFT! #398
Replies: 5 comments 3 replies
-
As an example of a GPU use case with all of the parameters listed: #323 (reply in thread) |
Beta Was this translation helpful? Give feedback.
-
Hello (again)! We (the team doing Compressed Sensing for MRI, at Neurospin, France) uses both finufft and cufinufft in our pipeline for iterative reconstruction of non-cartesian MRI. In practice, we work with data that is:
NB: A key aspect of the Non-Cartesian Trajectories is that they are composed of "shots" that each passes through the center point of the k-space. Due to the number of coils, the potential use of extra interpolators (to correct for static field inhomogeneities3), the number of calls to type 1 (adjoint) and type 2 (forward) NUFFT is in the hundreds if not thousands. As there are many NUFFT implementations available, we have developed mri-nufft (in Python with numpy, cupy, and torch backend):
In term of parametrization:
finufft and cufinufft are the most stable implementations we can find out there. There are some performance challengers (well, just gpuNUFFT in some cases 4, but the recent merging of #330 calls for a new study) In terms of feature requests the discussion at #306 and #308 is a first point: getting preconditioning weights (similar to what is done in astronomy) helps for iterative reconstruction, and is essential for deep-learning based approaches56 @chaithyagr, @philouc, @matthieutrs Footnotes
|
Beta Was this translation helpful? Give feedback.
-
I missed this discussion post and therefore accidentally created an issue explaining how finufft is used in MRI! I also see @paquiteau have already answered quite thourougly. Many of the things he describes is also the case here where I do my research in Compressed Sensing at Umeå University. We mainly work with 4D-flow MRI data (3D+time) and not 4D (functional MRI) but how this library is used I imagine is rather the same. We also, due to multiple coils and multiple velocity/phase encodings have to do potentially hundreds or thousands of nuffts every iteration. As I didn't take the time to rewrite my issue post more suitably for this discussion I have simply pasted what I wrote there here instead, and added some extra information. ISSUE POSTI created this issue as a discussion on what other extensions the finufft library could provide that are still within the grasp of what it tries to achieve. Mainly, I hope it could bring some light to the reply @ahbarnett provided in #306. As such, the listed points will be written as if answered to that comment, so it is probably nice to read that comment first and also perhaps the expertly written inverse nufft tutorial. First, I see how exposing the spreading/interpolation operations is problematic. I agree that the nufft is the crucial feature of the library, and if other spreaders/interpolators that would make the nufft faster were found that should be prioritized! I think exposing just the spreader and interpolator in C/C++ would still be useful, but users would have to be carefull or aware that they may change between versions and so on. If asking me, by no means make this a priority. Although, as mentioned in @ahbarnett's reply, the sinc^2 weights of Geengard, Inati, et al, would be very useful! So, if it would be feasible to implement an interface to sinc^2 quadrature weights that would be great! If exposing the fast sinc^2 in the process seems reasonable that would also be nice. As pointed out. In MRI applications one often tries to minimize ||Ax-b|| where A represents a nufft. A may be both a 2D or 3D nufft depending on the particular problem. The iterative solvers have to repeatedly perform the operation A^HA, the normal operator. Sometimes to improve convergence, one instead solves ||W(Ax-b)||, where W is a diagonal matrix such that the iterations computes A^HWA instead. Offering up best likelihood estimation for convergence speed. Preconditioning might seem like a better alternative, and indeed sometimes it is, but sometimes also not. Preconditioning makes some of the proximal algorithms different and more difficult. The W used for convergence speedup is also often the same as the "density compensation weights" mentioned. Sometimes the density compensation weights are also used to perform a "gridding reconstruction". That is, one finds W such that instead of solving ||Ax-b|| via an iterative solver one just does x = A^H(Wb), this is approximate but fast! Also mentioned in the inverse tutorial (and often utilized in MRI), the toeplitz approach can greatly increase the speed of performing A^HA. But this approach is also sometimes not viable. For some problems, multiple different A^HA (coordinates differ) has to be performed in each iteration. Storing the toeplitz kernels for these kernels takes up too much memory so computing A^HA via nuffts has to be used instead. Also sometimes other modifications to the minimization problem taking into account off-resonance effects and such makes the toeplitz approach less feasible. Actually these problems are discussed in the Fessler article referenced in the inversion tutorial. After all this text I would like to provide some features that would be really useful for MRI users that hopefully are still within the finufft grasp. I list them in order of importance.
As I see it these features are all within the subject of what finufft strives to achieve, especially the first one. Atleast, if one considers inverse nufft as a goal. And I also think that changes to inner workings of how the nufft is performed could be made without this interface being unstable. EXTRA INFO
Another feature I didn't list but would be appreciated is the 1.25 upsampling factor mentioned in the cufinufft repo (issue 126) |
Beta Was this translation helpful? Give feedback.
-
The way we use it is quite different. Our entire 4D-MRI reconstruction is based on the XD-GRASP1. We implemented everything in c++. Give the structure of the algorithm we do some pre-processing then we split the reconstruction into independent conjugate-gradient problems along the Z-dimension. This has two advantages:
We further split the problem into multiple respiration phases, this allows to do multiple 2D transforms. Since, these transforms do not saturate the GPU we queue multiple of them in parallel. At each reconstruction we do thousands of 2D reconstruction. To further minimize overhead we only use the GURU interface and FFTW_MEASURE and we create the plans once per thread re-using them until the end of the program. Details:
Footnotes |
Beta Was this translation helpful? Give feedback.
-
Our use case is CMB lensing, and we work on iterative lensing reconstruction1. The Wiener-filtering is done with a conjugate-gradient (cg) descent. For each cg-iteration, we have to perform the lensing operation. To do the lensing operation using non-uniform FFTs, we,
This approach is tremendously better and faster than previous approaches as discussed in our lenspyx paper2, that implements a CPU code. We currently explore how much faster the lensing operation is on a GPU using cufinuFFT. All the enumerated steps above have to be put on the GPU. We use SHTns for the GPU SHT calculation. The spin-0 implemenation of the lensing operation is almost done and speed ups look quite good at the moment! Plan is to also include spin-2 (or spin-n, really), and eventually integrate this into delensalot. Footnotes |
Beta Was this translation helpful? Give feedback.
-
This is a discussion thread for users to post how they use FINUFFT. This will help us optimize and extend the software.
The main questions to answer are:
what type (1,2,3) and dimension?
what number of modes N = (N1,N2,...) ie uniform grid shape? (if type 1 or 2)
what number of nonuniform pts M? And are these points clustered or quasi-uniform?
what tolerance? (and do you use single- or double-precision library?)
what ntransf (if you use vectorized interface)?
Do you use CPU, GPU, or both?
For CPU, how many threads do you use? Do you use single-threaded calls in parallel?
Do you use the guru or the simple interfaces?
What language wrappers do you use?
What is your application area? (possibly link to paper or your package)
What feature requests do you have?
If you want to give a link to the part of your code that uses FINUFFT, that might also be useful.
Thanks so much! Alex
Beta Was this translation helpful? Give feedback.
All reactions