Skip to content

etotheipi/MorphCUDA

Repository files navigation

Binary Mask Morphology in C++ using CUDA

Ridiculously fast morphology and convolutions using an NVIDIA GPU!


-----
Orig Author:  Alan Reiner
Date:         04 September, 2010
Email:        etotheipi@gmail.com


-----
This a BETA version of a complete morphological image processing library
using CUDA.  CUDA is an NVIDIA programming interfact for harnessing your
NVIDIA graphics card to do computations (instead of just graphics).  For
problems that are highly parallelizable (like convolution or morphology)
you can get speedups of several orders of magnitude.  

For instance, to dilate a 4096x4096 mask with a 15x15 structuring elt,
the CPU would take anywhere from 5 to 20 seconds.  Using this library 
with an NVIDIA GTX 460, the operation takes less than 0.1 seconds!  You
can see nearly identical speed improvements with regular convolutions of
ints or floats (this code is easily adapted to do regular convolutions).

To use this library, you will need the CUDA 3.1 toolkit, and you might
have to move a few libraries into your linker path.  And of course, you
need a CUDA-enabled graphics card (Fermi recommended, others will prob
work;  more on that below).


This library is designed to be used for performing a sequence of morph
operations on a single image/mask.  This is standard for morphological
processing, but also keeping the memory on the device saves a lot of 
time.  The saying at my workplace is, "Copying data in and out of the
device is expensive; Computation is basically free!"

This isn't to say that the library can't be used for isolated morph
operations (copy in, morph, copy out)... in fact, it shouldn't have
any computational disadvantages compared to other libraries (besides
kernel optimization design).  However, this library uses objects we
call "Morphological Workbenches" which may be non-intuitive at first,
but it actually simplifies the process of applying dozens, hundreds or
thousands of operations to a single image/mask.  

I envision this library could be useful for GIMP plugins, or ATR
systems.  Just remember, CUDA is NOT open-source, and the licensing
will not be favorable to OSS projects.



-----
Supported Hardware:

   This code was designed for NVIDIA devices of Compute Capability 2.0+
   which is any NVIDIA GTX 4XX series card (Fermi).  The code *should*
   compile and run on 1.X devices, but the code is not optimized for them
   so there may be a huge performance hit.  Maybe not.  (see note below
   about compiling for 1.X GPUs)

   I believe NVIDIA 8800 GT is the earliest NVIDIA card that supports
   CUDA, and it would be Compute Capability 1.0.

   CUDA was created by NVIDIA, for NVIDIA.  It will not work ATI cards.
   If you want to take up GPU-programming on ATI, the only two options
   I know are OpenGL and OpenCL.  However, the maturity of those
   programming interfaces are unclear (but at least such programs can 
   be run on both NVIDIA and ATI)


-----
Installing and running:

   This directory should contain everything you need to compile
   the image convolution code, besides the CUDA 3.1 toolkit
   which can be downloaded from the following website:

      http://developer.nvidia.com/object/cuda_3_1_downloads.html

   In addition to installing the toolkit, you might need to add 
   libcutil*.a and libshrutil*.a to your LD path.  In linux,
   it is sufficient to copy them to your /usr/local/cuda/lib[64] 
   directory.

   I personally work with space-separated files for images, because 
   they are trivial to read and write from C++.  Additionally, I 
   use MATLAB to read/write these files too, which makes it easy
   create input for the CUDA code, and verify output.

   There is no reason this code won't work on Windows, though I've
   never tried it.

   I strongly urge anyone trying to learn CUDA to download the 
   CUDA SDK.  It contains endless examples of every CUDA feature
   in existence.  Once you know what to do, you can get the
   syntax from there, which more often than not is very ugly.


-----
To work with older NVIDIA cards (8800+ GT, 9XXX GT, GTX 2XX):

   Open commom/common.mk and around line 150, uncomment the GENCODE_SM10
   line, which enables dual-compilation for CUDA Compute Capability 1.X
   devices.  The reason I commented this out is that 1.X devices don't
   support printf(), which is my main method for debugging.  Therefore,
   leaving this line in the Makefile prevents the code from compiling 
   when I'm attempting to debug.


-----
Planned Updates

   - Need to modify the kernel functions (or invocations) to handle masks
     that have arbitrary dimensions.  Currently, it is assumed that the
     input has dimensions that are multiples of 32.


About

GPU-accelerated image morphology written in C++/CUDA (100x faster than CPU!)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published