Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow access to CUDA pointers for interoperability with other libraries #16513

Merged
merged 1 commit into from Mar 5, 2020
Merged

Allow access to CUDA pointers for interoperability with other libraries #16513

merged 1 commit into from Mar 5, 2020

Conversation

pwuertz
Copy link
Contributor

@pwuertz pwuertz commented Feb 5, 2020

This is a proposal for adding CV_WRAP compatible cudaPtr() getter methods to GpuMat and Stream, required for enabling interoperability between OpenCV and other CUDA supporting python libraries like Numba, CuPy, PyTorch, etc.

Here is an example for sharing a GpuMat with CuPy:

import cv2 as cv
import cupy as cp

# Create GPU array with OpenCV
data_gpu_cv = cv.cuda_GpuMat()
data_gpu_cv.upload(np.eye(64, dtype=np.float32))

# Modify the same GPU array with CuPy
data_gpu_cp = cp.asarray(CudaArrayInterface(data_gpu_cv))
data_gpu_cp *= 42.0

# Download and verify
assert np.allclose(data_gpu_cp.get(), np.eye(64) * 42.0)

In this example CudaArrayInterface is a (incomplete) adapter class that implements the cuda array interface used by other frameworks:

class CudaArrayInterface:
    def __init__(self, gpu_mat):
        w, h = gpu_mat.size()
        type_map = {
            cv.CV_8U: "u1", cv.CV_8S: "i1",
            cv.CV_16U: "u2", cv.CV_16S: "i2",
            cv.CV_32S: "i4",
            cv.CV_32F: "f4", cv.CV_64F: "f8",
        }
        self.__cuda_array_interface__ = {
            "version": 2,
            "shape": (h, w),
            "data": (gpu_mat.cudaPtr(), False),
            "typestr": type_map[gpu_mat.type()],
            "strides": (gpu_mat.step, gpu_mat.elemSize()),
        }

If possible, I'd like to implement __cuda_array_interface__ within the GpuMat python binding in a future PR (not sure how to define a python property using the wrapper generator though).


force_builders=Custom
buildworker:Custom=linux-4
build_image:Custom=ubuntu-cuda:18.04

@leofang
Copy link

leofang commented Feb 7, 2020

Hi @pwuertz, thanks for joining the discussion on Numba. I don't have comment for your effort here (yet), but I'd like to kindly ask that when this PR (and any subsequent ones) is merged, it'd be nice if you could follow numba/numba#5104 and add opencv to the list 🙂 Thank you.

@alalek
Copy link
Member

alalek commented Feb 10, 2020

CudaArrayInterface

I believe it should own upstream gpu_mat object in its fields (extend lifetime of data_gpu_cv in example).

@pwuertz
Copy link
Contributor Author

pwuertz commented Feb 10, 2020

@alalek Please note that class CudaArrayInterface is not part of this PR. The PR was meant to provide the minimum requirement for any kind of interoperability in python, which is the access to the CUDA pointers.

For out-of-the-box interoperability with other libraries like Numba and Cupy I was planning on implementing the CUDA array interface (CAI) in a follow-up PR. As described earlier, I'm having trouble figuring out some OpenCV python binding generator details though.

Also note that within the current CAI specification (version 2), the responsibility for lifetime and synchronization resides with the user (similar to using cv::Mat constructors with data pointers). If this changes in some future version I'd be willing to update the CAI version on the OpenCV side of course.

@@ -681,6 +684,9 @@ class CV_EXPORTS_W Stream
//! returns true if stream object is not default (!= 0)
operator bool_type() const;

//! return Pointer to CUDA stream
CV_WRAP void* cudaPtr() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is StreamAccessor to do it: https://docs.opencv.org/master/d6/df1/structcv_1_1cuda_1_1StreamAccessor.html. I think it's better to wrap accessor rather expose private fields.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StreamAccessor

Not sure that this can help to reach the final goal:

to implement __cuda_array_interface__ within the GpuMat python binding

inline
void* GpuMat::cudaPtr() const
{
return data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is CV_PROP_RW macro that allows to expose object properties to Python and other languages. You do not need own method for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe current approach is fine.
We should not allow changing of this pointer through CV_PROP_RW.

@pwuertz
Copy link
Contributor Author

pwuertz commented Feb 14, 2020

@asmorkalov

  • I tried adding uchar* data as CV_PROP, but it doesn't compile. CV_PROP apparently doesn't support pointer types.
  • Like @alalek, I figured that it shouldn't be allowed to change the data pointer via public interface (neither from C++ nor from python), so having/promoting a getter method for it seems like a good thing.
  • The GpuMat::data pointer looks identical to Mat::data (name, uint8_t*, doc), yet it represents something completely different. I think void* cudaPtr() is a good name for what this pointer represents - an opaque handle to the CUDA array, not a data pointer for reading / writing uint8_t.

@pwuertz
Copy link
Contributor Author

pwuertz commented Feb 14, 2020

@alalek It's true that __cuda_array_interface__ currently does not specify stream handling, but stream interoperability really is a low hanging fruit. With access to the CUDA pointers, I did the following:

# allocated dataX_cpu, dataX_gpu_cv, stream_cv
# (proof of principle numba views dataX_gpu_nb, stream_nb)
t1 = time.time()
data1_gpu_cv.upload(arr=data1_cpu, stream=stream_cv)         # OpenCV upload
kernel_nb(data1_gpu_nb, out=data2_gpu_nb, stream=stream_nb)  # Numba operation
data2_gpu_cv.download(dst=data2_cpu, stream=stream_cv)       # OpenCV download
t2 = time.time()  # CPU time: 0.001 s
stream_nb.synchronize()
t3 = time.time()  # GPU time: 0.042 s
# verified data2_cpu

So you can freely mix OpenCV and Numba operations on a single, fully async stream.

@asmorkalov Oh sorry, haven't noticed the StreamAccessor class before. I'll try adding wrapper definitions to it. This means that cudaStream_t needs some kind of globally defined conversion rule too?

@pwuertz
Copy link
Contributor Author

pwuertz commented Feb 14, 2020

@asmorkalov StreamAccessor uses the cudaStream_t typedef, thus having a hard dependency on the CUDA SDK. I assume this is the reason for keeping it separate from Stream, which provides a public interface without CUDA dependency.

Even with HAVE_CUDA defined, neither misc/python headers nor cv2.cpp are able to include <cuda_runtime.h>.

How to proceed? Using void* for transporting CUDA stream pointers appears to be the least intrusive solution..

Copy link
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants