Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add supporting code for GPU-based ops #60

Open
ctrueden opened this issue Sep 23, 2014 · 18 comments
Open

Add supporting code for GPU-based ops #60

ctrueden opened this issue Sep 23, 2014 · 18 comments
Assignees
Milestone

Comments

@ctrueden
Copy link
Member

We want to make implementing GPU-based ops as easy as possible. The glue code to execute GPU-based processing from Java is usually the same. The two main flavors to consider supporting are OpenCL and CUDA.

We can start by implementing a couple of GPU-based ops, and then factoring out common code into a shared type hierarchy. Due to the addition of dependencies for working with OpenCL and/or CUDA, we will likely need to create a new imagej-ops-gpu project (and/or imagej-ops-cuda and/or imagej-ops-opencl projects) which extend imagej-ops.

@ctrueden ctrueden modified the milestones: low-priority, 1.1.0 Sep 23, 2014
@ctrueden
Copy link
Member Author

See also the NAR plugin for Maven as well as the SciJava native library loader for general solutions seeking to integrate native libraries with Java.

@dscho
Copy link
Contributor

dscho commented Sep 30, 2014

@ctrueden thanks for reminding me... @bnorthan I actually wanted to introduce you to the NAR project a little for the purpose of integrating native code into ImageJ plugins. Have a look at https://github.com/imagej/minimal-ij1-plugin/tree/native for example... And feel free to bombard me with questions!

@bnorthan
Copy link
Contributor

https://github.com/bobpepin/YacuDecu

@bobpepin wrote this and it is licensed under lgpl. I ran some tests on it a while back and it seems to work pretty well. It would be a good starting point for a cuda decon op. @bobpepin wrote wrappers for matlab and imaris and said he'd be happy to see it in imagej eventually.

@ctrueden
Copy link
Member Author

@StephanPreibisch As discussed at the hackathon, this issue may be of interest to you as well!

@StephanPreibisch
Copy link
Member

I have started to write a simple infrastructure for calling native and CUDA code here: https://github.com/fiji/SPIM_Registration/tree/master/src/main/java/spim/process/cuda

Two examples of CUDA implementation for separable and non-separable convolution are here, both are very useful for deconvolution:
https://github.com/StephanPreibisch/FourierConvolutionCUDALib
https://github.com/StephanPreibisch/SeparableConvolutionCUDALib

I think it would be great to have some common infrastructure for calling this kind of code.

@bobpepin
Copy link

Hi,
in case you were thinking of implementing the main deconvolution loop in Java, I wanted to note that in my implementation I was able to cut GPU memory usage by 40% by using CUDA streaming and transferring the data needed for the next step in parallel with the FFT. These kinds of tricks might be a bit harder to do in a Java inner loop, or at least would require a Java interface to a substantial part of the CUDA API.

Cheers,
Bob

On Oct 15, 2014, at 17:19, Stephan Preibisch notifications@github.com wrote:

I have started to write a simple infrastructure for calling native and CUDA code here: https://github.com/fiji/SPIM_Registration/tree/master/src/main/java/spim/process/cuda

Two examples of CUDA implementation for separable and non-separable convolution are here, both are very useful for deconvolution:
https://github.com/StephanPreibisch/FourierConvolutionCUDALib
https://github.com/StephanPreibisch/SeparableConvolutionCUDALib

I think it would be great to have some common infrastructure for calling this kind of code.


Reply to this email directly or view it on GitHub.

@dscho
Copy link
Contributor

dscho commented Oct 15, 2014

in my implementation

@bobpepin is it publicly visible? Remember: unpublished work never happened, for all practical purposes.

@bobpepin
Copy link

https://github.com/bobpepin/YacuDecu

On Oct 15, 2014, at 17:56, dscho notifications@github.com wrote:

in my implementation

@bobpepin is it publicly visible? Remember: unpublished work never happened, for all practical purposes.


Reply to this email directly or view it on GitHub.

@bnorthan
Copy link
Contributor

https://github.com/bobpepin/YacuDecu

@bobpepin wrote a gpu deconvolution and it is licensed under lgpl. I ran some tests on it a while back and it seems to work pretty well. It would be a good starting point for a cuda decon op. Bob wrote wrappers for matlab and imaris and said he'd be happy to see it in imagej eventually.

@dscho
Copy link
Contributor

dscho commented Oct 15, 2014

Could the license be changed to BSD? Otherwise no problem, but then it will be eternally just an add-on to ImageJ...

@bobpepin
Copy link

LGPL was to encourage improvements to the library to be incorporated back into the main codebase also used by C/Matlab/Imaris interfaces. What about shipping the DLL/.so or source in a separate subdirectory and have the interface code be part of ImageJ under a BSD license, and contribute eventual changes to the cuda code back to the main yacudecu repository?

On 15 oct. 2014, at 19:02, dscho notifications@github.com wrote:

Could the license be changed to BSD? Otherwise no problem, but then it will be eternally just an add-on to ImageJ...


Reply to this email directly or view it on GitHub.

@bobpepin
Copy link

Also, you might want to consider supporting OpenCL instead of or in addition to CUDA, since it supports nVidia, ATI and Intel cards. The biggest problem there was, last time I looked, that the publicly available FFT implementation had some limits on the input size, 2048 pixels in each dimension or something like that.

On Oct 15, 2014, at 19:18, Bob Pepin bobpepin@gmail.com wrote:

LGPL was to encourage improvements to the library to be incorporated back into the main codebase also used by C/Matlab/Imaris interfaces. What about shipping the DLL/.so or source in a separate subdirectory and have the interface code be part of ImageJ under a BSD license, and contribute eventual changes to the cuda code back to the main yacudecu repository?

On 15 oct. 2014, at 19:02, dscho notifications@github.com wrote:

Could the license be changed to BSD? Otherwise no problem, but then it will be eternally just an add-on to ImageJ...


Reply to this email directly or view it on GitHub.

@haesleinhuepf haesleinhuepf self-assigned this Dec 24, 2018
@haesleinhuepf
Copy link
Member

Just to let you all know there are now quite some OpenCL-based ops (proudly presented by @frauzufall - big thanks to Debo! ):
https://github.com/clij/clij-ops

Based on
https://clij.github.io/

Documentation can be found here:
https://clij.github.io/clij-docs/clij_imagej_ops_java

Code examples can be found here:
https://github.com/clij/clij-ops/tree/master/src/test/java/net/haesleinhuepf/clij/ops/examples

Give them a try and let us know what you think!

Cheers,
Robert

@ctrueden
Copy link
Member Author

Awesome! Thanks @frauzufall for working on this. If we have time, I'd like to show you the next iteration of the SciJava Ops framework while I am visiting.

Would it be feasible to name the ops so that they overload existing ops, rather than giving them new names? The idea would be to help people benefit from automatic performance improvements without needing to edit their scripts.

@haesleinhuepf
Copy link
Member

Hey @ctrueden ,

that sounds like a great idea. However, before automatically overloading Ops of different implementations, we should dig a bit deeper and find out why some implementations deliver different results. I would also strongly vote for automatic tests ensuring that different implementations deliver similar results up to a given tolerance.
Just to get an idea of what I'm talking about:

This program suggests differences between Ops, CLIJ and ImageJ-legacy of different orders of magnitude:
MSE (IJ ops vs legacy) = 0.001654734344482422
MSE (IJ legacy vs clij) = 1.72487557392742E-11
MSE (IJ ops vs clij) = 0.0016547824096679689

Let's have a chat about it in Dresden :-)

Cheers,
Robert

@bnorthan
Copy link
Contributor

bnorthan commented Jun 26, 2019

Hi Curtis

It would be great if you could show us the next iteration of ops.

Correct me if I am wrong but it looks like these new Ops are typed on ClearCLBuffer and ClearCLImage. In fact, at least for the blur ops there seem to be 3 ops using different combinations.

As an aside what is the difference between ClearCLBuffer and ClearCLImage??

There are a few scenarios that I think we need to consider if overloading existing ops.

  1. Do we use the same names for the ops but use CLIJ specific types? In this case the user would have to convert types but could keep a lot of their code the same.

  2. Do we use the same names and types? In this case you could write an op that does the conversion and calls the underlying CLIJ op and converts back. Or better yet just have converters.

  3. If automatically converting input and output scenario 2 would be problematic for a series of operations. Would there be someway to transfer the data to the GPU but only retrieve it lazily, when the next java operation is performed??

  4. What about Cuda ?? I have converters to CUDA, and would like to polish them at some point. It would be nice to be able to overload ops with both CLIJ and CUDA implementations.

  5. What about data that is too large to fit on the GPU?? I've spent some time playing with Imglib2 cache as a means to retrieve data in chunks and send to the GPU.... Does CLIJ do any chunking??

@haesleinhuepf
Copy link
Member

Hey @bnorthan ,

  1. Buffers and images result from the OpenCL standard. Depending on what operation is executed and depending on hardware, buffer-processing might be faster than image-processing or not. Just some insights: If you access pixel neighborhoods, images are beneficial. If your code runs on Nvidia cards with poor OpenCL-support (version < 1.1) 3D images don't work. Thus, CLIJ mainly uses buffers just for compatibility reasons.
    https://stackoverflow.com/questions/9903855/buffer-object-and-image-buffer-object-in-opencl
    https://software.intel.com/en-us/forums/opencl/topic/518474
    https://community.khronos.org/t/buffer-vs-image/4064/2

1.-3. If possible, I would like to prevent automatic back-and-forth conversion because conversion takes time. GPU-acceleration is only beneficial, if long workflows are run on the GPU. That's why we initially thought automatic conversion shouldn't be enabled at all...
4. Conversion from OpenCL to CUDA should come for free, no?
5. CLIJ doesn't support any intelligent chunking. It's just a collection of OpenCL-kernels wrapped into java code.

Looking forward to discuss details! :-)

@frauzufall
Copy link
Member

frauzufall commented Jun 26, 2019

At the beginning I started to match the CLIJ Ops with existing imagej-ops (here is the code), but there were differences and it is quite some work to find the counterparts (at least for me) so we decided as a first step to write clearly marked CLIJ ops returning the same results as CLIJ does in other scenarios.

I also wrote converters. You can try removing the CLIJ_push and CLIJ_pull op calls in the examples (jython, Java). It works in many cases, but sometimes fails to match the ClearCLBuffer to a RAI if the Op has additional input parameters.

I stopped going too much into detail / fixing things because I don't want to waste time debugging something that is being rewritten anyways. But the CLIJ Ops are perfect to test some core concepts of imagej-ops. Excited to hear about the next iteration!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants