Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cupy as alternative to numpy on critical sections #11

Closed
NilsKrause opened this issue Mar 27, 2020 · 2 comments
Closed

cupy as alternative to numpy on critical sections #11

NilsKrause opened this issue Mar 27, 2020 · 2 comments

Comments

@NilsKrause
Copy link
Contributor

So, i looked a bit around online and stumbled across cupy. A library that basicly wraps numpy functionality in a library that runs on the gpu to perform highly concurrent calculations faster.

I tinkered a bit around but didn't really got to a state to test it effectivly. Mainly because I am an absolute python scrub and also have no clue about image computation whatsoever. But I am hoping that someone else can implement it into the code, just to see if it gives any performance upgrade on larger images.

Currently it's not that trivial to setup an environment for it, but I got it running on my Arch Linux with a GeForce 1050 Ti.
cupy GitHub page
cupy installation instructions

@NilsKrause
Copy link
Contributor Author

I got an implementation with cupy working, but it turned out to be slower than the actual numpy implementation. But I am in the dark as to why exactly it is slower.

So basicly I rewrote the pyxelate._reduce function to work with cupy as follows:

def _reduce(self, image):
    """Apply convolutions on image ITER times and generate a smaller image
    based on the highest magnitude of gradients"""

    # self is visible to decorated function
    @adapt_rgb(each_channel)
    def _wrapper(dim):
        # apply median filter for noise reduction
        dim = median(dim, square(4))
        for _ in range(self.ITER):
            h, w = dim.shape
            h, w = h // 2, w // 2
            new_image = cp.zeros((h * w)).astype("int")
            view = view_as_blocks(dim, (2, 2))

            flatten = cp.asarray(view).reshape(-1, 2, 2)
            
            # bottleneck
            for i, f in enumerate(flatten):
                conv = cp.abs(cp.sum(cp.multiply(self.CONVOLUTIONS, f.reshape(-1, 2, 2)).reshape(-1, 4), axis=1))
                new_image[i] = cp.mean(f[self.SOLUTIONS[cp.argmax(conv)]])
            
            new_image = new_image.reshape((h, w))
            dim = cp.asnumpy(new_image.copy())

        return new_image

I also hat to define the CONVOLUTIONS and SOLUTIONS array as cupy.ndarray.

My guess as to why it's slower is because eigther there weren't enougth operations to actually make up for the time it took to copy the array to the gpu. Or because the operations need to be optimized in some way to actually make use of the gpu.
(I tested the speed with multiple smaller images and one very large image, in both scenarios the cupy variant was significant slower)

Currently I am trying to wrap my head around what the function actually does so that I could rewrite it to use the gpu more effective. But it's probably gonna take a while until I understand everything, as it's my first time working with convolutional filters and skilearn/skimage as for that matter.

Any help and infos to that matter are appreciated.

@sedthh
Copy link
Owner

sedthh commented Mar 28, 2020

Thank you for your help! Let me know if you figure out anything.

If there was a way to bring
for i, f in enumerate(flatten):
to cupy and there would be a huge speed boost. Right now it still has to go back to python, do an iteration and create overhead through the function call to either NumPy or CuPy.

Still, this is great news, I will look into the library as well!

@sedthh sedthh closed this as completed Apr 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants