cupy as alternative to numpy on critical sections #11

NilsKrause · 2020-03-27T21:52:46Z

So, i looked a bit around online and stumbled across cupy. A library that basicly wraps numpy functionality in a library that runs on the gpu to perform highly concurrent calculations faster.

I tinkered a bit around but didn't really got to a state to test it effectivly. Mainly because I am an absolute python scrub and also have no clue about image computation whatsoever. But I am hoping that someone else can implement it into the code, just to see if it gives any performance upgrade on larger images.

Currently it's not that trivial to setup an environment for it, but I got it running on my Arch Linux with a GeForce 1050 Ti.
cupy GitHub page
cupy installation instructions

The text was updated successfully, but these errors were encountered:

NilsKrause · 2020-03-28T13:32:57Z

I got an implementation with cupy working, but it turned out to be slower than the actual numpy implementation. But I am in the dark as to why exactly it is slower.

So basicly I rewrote the pyxelate._reduce function to work with cupy as follows:

def _reduce(self, image):
    """Apply convolutions on image ITER times and generate a smaller image
    based on the highest magnitude of gradients"""

    # self is visible to decorated function
    @adapt_rgb(each_channel)
    def _wrapper(dim):
        # apply median filter for noise reduction
        dim = median(dim, square(4))
        for _ in range(self.ITER):
            h, w = dim.shape
            h, w = h // 2, w // 2
            new_image = cp.zeros((h * w)).astype("int")
            view = view_as_blocks(dim, (2, 2))

            flatten = cp.asarray(view).reshape(-1, 2, 2)
            
            # bottleneck
            for i, f in enumerate(flatten):
                conv = cp.abs(cp.sum(cp.multiply(self.CONVOLUTIONS, f.reshape(-1, 2, 2)).reshape(-1, 4), axis=1))
                new_image[i] = cp.mean(f[self.SOLUTIONS[cp.argmax(conv)]])
            
            new_image = new_image.reshape((h, w))
            dim = cp.asnumpy(new_image.copy())

        return new_image

I also hat to define the CONVOLUTIONS and SOLUTIONS array as cupy.ndarray.

My guess as to why it's slower is because eigther there weren't enougth operations to actually make up for the time it took to copy the array to the gpu. Or because the operations need to be optimized in some way to actually make use of the gpu.
(I tested the speed with multiple smaller images and one very large image, in both scenarios the cupy variant was significant slower)

Currently I am trying to wrap my head around what the function actually does so that I could rewrite it to use the gpu more effective. But it's probably gonna take a while until I understand everything, as it's my first time working with convolutional filters and skilearn/skimage as for that matter.

Any help and infos to that matter are appreciated.

sedthh · 2020-03-28T16:29:49Z

Thank you for your help! Let me know if you figure out anything.

If there was a way to bring
for i, f in enumerate(flatten):
to cupy and there would be a huge speed boost. Right now it still has to go back to python, do an iteration and create overhead through the function call to either NumPy or CuPy.

Still, this is great news, I will look into the library as well!

sedthh closed this as completed Apr 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cupy as alternative to numpy on critical sections #11

cupy as alternative to numpy on critical sections #11

NilsKrause commented Mar 27, 2020

NilsKrause commented Mar 28, 2020

sedthh commented Mar 28, 2020

cupy as alternative to numpy on critical sections #11

cupy as alternative to numpy on critical sections #11

Comments

NilsKrause commented Mar 27, 2020

NilsKrause commented Mar 28, 2020

sedthh commented Mar 28, 2020