Add supporting code for GPU-based ops #60

ctrueden · 2014-09-23T17:11:09Z

We want to make implementing GPU-based ops as easy as possible. The glue code to execute GPU-based processing from Java is usually the same. The two main flavors to consider supporting are OpenCL and CUDA.

We can start by implementing a couple of GPU-based ops, and then factoring out common code into a shared type hierarchy. Due to the addition of dependencies for working with OpenCL and/or CUDA, we will likely need to create a new imagej-ops-gpu project (and/or imagej-ops-cuda and/or imagej-ops-opencl projects) which extend imagej-ops.

The text was updated successfully, but these errors were encountered:

ctrueden · 2014-09-30T20:32:07Z

See also the NAR plugin for Maven as well as the SciJava native library loader for general solutions seeking to integrate native libraries with Java.

dscho · 2014-09-30T20:40:46Z

@ctrueden thanks for reminding me... @bnorthan I actually wanted to introduce you to the NAR project a little for the purpose of integrating native code into ImageJ plugins. Have a look at https://github.com/imagej/minimal-ij1-plugin/tree/native for example... And feel free to bombard me with questions!

bnorthan · 2014-09-30T21:16:31Z

https://github.com/bobpepin/YacuDecu

@bobpepin wrote this and it is licensed under lgpl. I ran some tests on it a while back and it seems to work pretty well. It would be a good starting point for a cuda decon op. @bobpepin wrote wrappers for matlab and imaris and said he'd be happy to see it in imagej eventually.

ctrueden · 2014-10-14T21:03:49Z

@StephanPreibisch As discussed at the hackathon, this issue may be of interest to you as well!

StephanPreibisch · 2014-10-15T15:19:20Z

I have started to write a simple infrastructure for calling native and CUDA code here: https://github.com/fiji/SPIM_Registration/tree/master/src/main/java/spim/process/cuda

Two examples of CUDA implementation for separable and non-separable convolution are here, both are very useful for deconvolution:
https://github.com/StephanPreibisch/FourierConvolutionCUDALib
https://github.com/StephanPreibisch/SeparableConvolutionCUDALib

I think it would be great to have some common infrastructure for calling this kind of code.

bobpepin · 2014-10-15T15:42:04Z

Hi,
in case you were thinking of implementing the main deconvolution loop in Java, I wanted to note that in my implementation I was able to cut GPU memory usage by 40% by using CUDA streaming and transferring the data needed for the next step in parallel with the FFT. These kinds of tricks might be a bit harder to do in a Java inner loop, or at least would require a Java interface to a substantial part of the CUDA API.

Cheers,
Bob

On Oct 15, 2014, at 17:19, Stephan Preibisch notifications@github.com wrote:

I have started to write a simple infrastructure for calling native and CUDA code here: https://github.com/fiji/SPIM_Registration/tree/master/src/main/java/spim/process/cuda

Two examples of CUDA implementation for separable and non-separable convolution are here, both are very useful for deconvolution:
https://github.com/StephanPreibisch/FourierConvolutionCUDALib
https://github.com/StephanPreibisch/SeparableConvolutionCUDALib

I think it would be great to have some common infrastructure for calling this kind of code.

—
Reply to this email directly or view it on GitHub.

dscho · 2014-10-15T15:56:08Z

in my implementation

@bobpepin is it publicly visible? Remember: unpublished work never happened, for all practical purposes.

bobpepin · 2014-10-15T15:56:46Z

https://github.com/bobpepin/YacuDecu

On Oct 15, 2014, at 17:56, dscho notifications@github.com wrote:

in my implementation

@bobpepin is it publicly visible? Remember: unpublished work never happened, for all practical purposes.

—
Reply to this email directly or view it on GitHub.

bnorthan · 2014-10-15T15:58:59Z

https://github.com/bobpepin/YacuDecu

@bobpepin wrote a gpu deconvolution and it is licensed under lgpl. I ran some tests on it a while back and it seems to work pretty well. It would be a good starting point for a cuda decon op. Bob wrote wrappers for matlab and imaris and said he'd be happy to see it in imagej eventually.

dscho · 2014-10-15T17:02:57Z

Could the license be changed to BSD? Otherwise no problem, but then it will be eternally just an add-on to ImageJ...

bobpepin · 2014-10-15T17:18:41Z

LGPL was to encourage improvements to the library to be incorporated back into the main codebase also used by C/Matlab/Imaris interfaces. What about shipping the DLL/.so or source in a separate subdirectory and have the interface code be part of ImageJ under a BSD license, and contribute eventual changes to the cuda code back to the main yacudecu repository?

On 15 oct. 2014, at 19:02, dscho notifications@github.com wrote:

Could the license be changed to BSD? Otherwise no problem, but then it will be eternally just an add-on to ImageJ...

—
Reply to this email directly or view it on GitHub.

bobpepin · 2014-10-16T12:22:13Z

Also, you might want to consider supporting OpenCL instead of or in addition to CUDA, since it supports nVidia, ATI and Intel cards. The biggest problem there was, last time I looked, that the publicly available FFT implementation had some limits on the input size, 2048 pixels in each dimension or something like that.

On Oct 15, 2014, at 19:18, Bob Pepin bobpepin@gmail.com wrote:

LGPL was to encourage improvements to the library to be incorporated back into the main codebase also used by C/Matlab/Imaris interfaces. What about shipping the DLL/.so or source in a separate subdirectory and have the interface code be part of ImageJ under a BSD license, and contribute eventual changes to the cuda code back to the main yacudecu repository?

On 15 oct. 2014, at 19:02, dscho notifications@github.com wrote:

Could the license be changed to BSD? Otherwise no problem, but then it will be eternally just an add-on to ImageJ...

—
Reply to this email directly or view it on GitHub.

haesleinhuepf · 2019-06-26T15:27:18Z

Just to let you all know there are now quite some OpenCL-based ops (proudly presented by @frauzufall - big thanks to Debo! ):
https://github.com/clij/clij-ops

Based on
https://clij.github.io/

Documentation can be found here:
https://clij.github.io/clij-docs/clij_imagej_ops_java

Code examples can be found here:
https://github.com/clij/clij-ops/tree/master/src/test/java/net/haesleinhuepf/clij/ops/examples

Give them a try and let us know what you think!

Cheers,
Robert

ctrueden · 2019-06-26T15:38:20Z

Awesome! Thanks @frauzufall for working on this. If we have time, I'd like to show you the next iteration of the SciJava Ops framework while I am visiting.

Would it be feasible to name the ops so that they overload existing ops, rather than giving them new names? The idea would be to help people benefit from automatic performance improvements without needing to edit their scripts.

haesleinhuepf · 2019-06-26T16:07:16Z

Hey @ctrueden ,

that sounds like a great idea. However, before automatically overloading Ops of different implementations, we should dig a bit deeper and find out why some implementations deliver different results. I would also strongly vote for automatic tests ensuring that different implementations deliver similar results up to a given tolerance.
Just to get an idea of what I'm talking about:

This program suggests differences between Ops, CLIJ and ImageJ-legacy of different orders of magnitude:
MSE (IJ ops vs legacy) = 0.001654734344482422
MSE (IJ legacy vs clij) = 1.72487557392742E-11
MSE (IJ ops vs clij) = 0.0016547824096679689

Let's have a chat about it in Dresden :-)

Cheers,
Robert

bnorthan · 2019-06-26T16:30:39Z

Hi Curtis

It would be great if you could show us the next iteration of ops.

Correct me if I am wrong but it looks like these new Ops are typed on ClearCLBuffer and ClearCLImage. In fact, at least for the blur ops there seem to be 3 ops using different combinations.

As an aside what is the difference between ClearCLBuffer and ClearCLImage??

There are a few scenarios that I think we need to consider if overloading existing ops.

Do we use the same names for the ops but use CLIJ specific types? In this case the user would have to convert types but could keep a lot of their code the same.
Do we use the same names and types? In this case you could write an op that does the conversion and calls the underlying CLIJ op and converts back. Or better yet just have converters.
If automatically converting input and output scenario 2 would be problematic for a series of operations. Would there be someway to transfer the data to the GPU but only retrieve it lazily, when the next java operation is performed??
What about Cuda ?? I have converters to CUDA, and would like to polish them at some point. It would be nice to be able to overload ops with both CLIJ and CUDA implementations.
What about data that is too large to fit on the GPU?? I've spent some time playing with Imglib2 cache as a means to retrieve data in chunks and send to the GPU.... Does CLIJ do any chunking??

haesleinhuepf · 2019-06-26T17:12:05Z

Hey @bnorthan ,

Buffers and images result from the OpenCL standard. Depending on what operation is executed and depending on hardware, buffer-processing might be faster than image-processing or not. Just some insights: If you access pixel neighborhoods, images are beneficial. If your code runs on Nvidia cards with poor OpenCL-support (version < 1.1) 3D images don't work. Thus, CLIJ mainly uses buffers just for compatibility reasons.
https://stackoverflow.com/questions/9903855/buffer-object-and-image-buffer-object-in-opencl
https://software.intel.com/en-us/forums/opencl/topic/518474
https://community.khronos.org/t/buffer-vs-image/4064/2

1.-3. If possible, I would like to prevent automatic back-and-forth conversion because conversion takes time. GPU-acceleration is only beneficial, if long workflows are run on the GPU. That's why we initially thought automatic conversion shouldn't be enabled at all...
4. Conversion from OpenCL to CUDA should come for free, no?
5. CLIJ doesn't support any intelligent chunking. It's just a collection of OpenCL-kernels wrapped into java code.

Looking forward to discuss details! :-)

frauzufall · 2019-06-26T23:26:14Z

At the beginning I started to match the CLIJ Ops with existing imagej-ops (here is the code), but there were differences and it is quite some work to find the counterparts (at least for me) so we decided as a first step to write clearly marked CLIJ ops returning the same results as CLIJ does in other scenarios.

I also wrote converters. You can try removing the CLIJ_push and CLIJ_pull op calls in the examples (jython, Java). It works in many cases, but sometimes fails to match the ClearCLBuffer to a RAI if the Op has additional input parameters.

I stopped going too much into detail / fixing things because I don't want to waste time debugging something that is being rewritten anyways. But the CLIJ Ops are perfect to test some core concepts of imagej-ops. Excited to hear about the next iteration!

ctrueden assigned bnorthan Sep 23, 2014

ctrueden modified the milestones: low-priority, 1.1.0 Sep 23, 2014

ctrueden mentioned this issue Sep 30, 2014

Add convolution/deconvolution ops #70

Open

13 tasks

haesleinhuepf self-assigned this Dec 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add supporting code for GPU-based ops #60

Add supporting code for GPU-based ops #60

ctrueden commented Sep 23, 2014

ctrueden commented Sep 30, 2014

dscho commented Sep 30, 2014

bnorthan commented Sep 30, 2014

ctrueden commented Oct 14, 2014

StephanPreibisch commented Oct 15, 2014

bobpepin commented Oct 15, 2014

dscho commented Oct 15, 2014

bobpepin commented Oct 15, 2014

bnorthan commented Oct 15, 2014

dscho commented Oct 15, 2014

bobpepin commented Oct 15, 2014

bobpepin commented Oct 16, 2014

haesleinhuepf commented Jun 26, 2019

ctrueden commented Jun 26, 2019

haesleinhuepf commented Jun 26, 2019

bnorthan commented Jun 26, 2019 •

edited

Loading

haesleinhuepf commented Jun 26, 2019

frauzufall commented Jun 26, 2019 •

edited

Loading

Add supporting code for GPU-based ops #60

Add supporting code for GPU-based ops #60

Comments

ctrueden commented Sep 23, 2014

ctrueden commented Sep 30, 2014

dscho commented Sep 30, 2014

bnorthan commented Sep 30, 2014

ctrueden commented Oct 14, 2014

StephanPreibisch commented Oct 15, 2014

bobpepin commented Oct 15, 2014

dscho commented Oct 15, 2014

bobpepin commented Oct 15, 2014

bnorthan commented Oct 15, 2014

dscho commented Oct 15, 2014

bobpepin commented Oct 15, 2014

bobpepin commented Oct 16, 2014

haesleinhuepf commented Jun 26, 2019

ctrueden commented Jun 26, 2019

haesleinhuepf commented Jun 26, 2019

bnorthan commented Jun 26, 2019 • edited Loading

haesleinhuepf commented Jun 26, 2019

frauzufall commented Jun 26, 2019 • edited Loading

bnorthan commented Jun 26, 2019 •

edited

Loading

frauzufall commented Jun 26, 2019 •

edited

Loading