Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup warp for 3D images #4410

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Mostafa-Alaa
Copy link
Contributor

Description

This PR speeds up warping for 3D images (3D or multichannel 2D) up-to 5x by writing _warp_fast_batch which handles a batch of 2D images directly.

The original approach had some performance problems like:

  • Overhead slicing and repacking the image
  • Force converting image slices to contiguous arrays which in some cases might not be needed
  • Recomputing the transform at each iteration which only depends on the coordinates in the 2D plane

And also an API problem where the input had to have its channel axis placed last, overcoming that would require transposing the image before and after the computation

The proposed code works on images with format [plane, row, column] or [row, column, channel]
by specifying the channel_axis parameter and controls whether the iteration over the channel axis should be the innermost loop or the outermost loop which basically impacts the caching.

I've ran some benchmarks and here are the results
CF32
CF64
CL32
CL64
Each figure has 3 rows (for interpolation order) and 5 columns (for image size nxn) of subplots which compares the number of channels vs speedup

A bunch of TODOs:

  • Settle on the logic for choosing channel loop placement
  • Documentation
  • Benchmark code

Reviews and change request are most welcome

Checklist

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.
  • Consider backporting the PR with @meeseeksdev backport to v0.14.x

Speedup more noticeable in smaller images
  - Switch from Numpy Python API to C API
  - Avoid np.asarray call
  - Minor optimizing changes
import numpy as np
cimport numpy as cnp
cimport numpy as np
np.import_array()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this necessary as cimport should take care of it. I think..... otherwise, we have real issue in the library ^_^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well then it seems we do have a library issue, check this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we seldom use the capi directly, you seem to be using it.

@hmaarrfk
Copy link
Member

Is using the CAPI meaningful? We typically use memoryvirws and the python API.

Could you share a benchmark with u with code and the benchmark in the appropriate folder.


cdef np_floats[:, ::1] img = np.ascontiguousarray(image)
cdef np_floats[:, ::1] M = np.ascontiguousarray(H)
cdef np_floats[:, ::1] img = np.PyArray_GETCONTIGUOUS(image)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using the capi?

Why not simply leave this as is, I don't think you are avoiding copies here

https://github.com/numpy/numpy/blob/v1.17.0/numpy/core/_asarray.py#L179

import numpy as np
cimport numpy as cnp
cimport numpy as np
np.import_array()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we seldom use the capi directly, you seem to be using it.

@Mostafa-Alaa
Copy link
Contributor Author

Could you share a benchmark with u with code and the benchmark in the appropriate folder.

Sure, here is how I'm benchmarking the code.

You just need to update warp in _warp.py to accept channel_axis and channel_innermost_loop, I didn't want to include them for the time being as they are subject to review and change.

@Mostafa-Alaa
Copy link
Contributor Author

@hmaarrfk I just checked multiple Cython files across the library and noticed that the Numpy's Python API is the one being used, while the C API used for simple stuff like data type checks.

I see that as a problem since Cython is basically C so it would seem more natural to use the C API rather than the Python one, it's like you go from C to Python to C so why not just use C code allover.

I've actually benchmarked the use of C vs Python API in _warp_fast without running the actual warp code (the code where the GIL is released and most of the time takes place) and got about 10x speedup.

Now I think opening an issue and having a discussion about the use of C API would be better than in this PR, so what do you think should I revert to Python API for now ?

@hmaarrfk
Copy link
Member

I can see how we are using c datatypes from numpy, but most of the code is using cython memoryviews isn't it?

@Mostafa-Alaa
Copy link
Contributor Author

Doing the computations on the memoryviews is definitely the right call, the problem here is with the setup code (like making sure the numpy array is contiguous or creating an array), using Numpy's C API avoids some unnecessary overhead.

@hmaarrfk
Copy link
Member

Is that seriously a large cost? It seems to me that that is the cost paid for by using numpy and python.

I would be happy to work with you to isolate your two changes: your computation order changes. The usage of the numpy c api.

I'm still not totally convinced that the usage of the Capi is the right call, but interested in being shown otherwise.

@Mostafa-Alaa
Copy link
Contributor Author

Is that seriously a large cost? It seems to me that that is the cost paid for by using numpy and python.

The cost is really subjective, for a cheap computation (small image size, nearest neighbour interpolation) reducing the overhead will help, but when the computation is expensive (large image size, bicubic interpolation) it doesn't really matter as most of the time share is spent on the computation itself.

I'm still not totally convinced that the usage of the Capi is the right call, but interested in being shown otherwise.

I know it might be inconvenient to use the C API as you're probably more used to using the Python API (who isn't), but the C API is definitely the right call here as Cython generates more code like fetching the numpy module, fetching and invoking the function, more checks, while when using the C API it just calls the function.

I would be happy to work with you to isolate your two changes: your computation order changes. The usage of the numpy c api.

That's great, I think separating the two changes would be better.

Base automatically changed from master to main February 18, 2021 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants