Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CuPy: Add and use a new GPU array backend with NumPy-compatible interface #266

Merged
merged 305 commits into from
Aug 20, 2015

Conversation

beam2d
Copy link
Member

@beam2d beam2d commented Jul 27, 2015

This is a large PR aiming at replacing the CUDA array backend from PyCUDA/scikit-cuda to a new one named CuPy. This PR includes the implementation of CuPy and updates on Chainer.

Background: PyCUDA is a great wrapper of CUDA that enables us to write our own kernels and call them from Python. However, its GPUArray has few functionalities and almost every time we have to write our own kernels to write down Function implementations. We want to make it easier to write user-defined Functions runnable on GPU. It requires us to use more powerful GPU-array implementations.

We want to use a GPU array backend with following features:

  1. It should enable us to write a common code running on both CPU and GPU. It should be able to write most Functions in this way.
  2. It should enable us to write our own elementwise kernels for performance.
    There exist some GPU-array implementations (e.g. CUDAMat, gnumpy, CUDArray, etc.), though none of them satisfies both requirements.

About CuPy: CuPy implements a subset of NumPy interface.

  • Within this subset, we can write one code running on NumPy and CuPy. See __init__.py for the list of supported functions (cupy.random is also provided).
  • It supports PyCUDA-style user-defined elementwise kernels.
    Like PyCUDA, CuPy compiles kernels on runtime and caches them to files (including all kernels predefined in CuPy). It may be slow at the first usage, which will be resolved by continuing to use it.

We are aiming at merge this PR by Sep. 2 for the v1.3.0 release. If you want to make a PR with new Functions, we recommend you to implement them based on this branch and merge them for the v1.3.0+. Of course, feature PRs written on the current Chainer can also be merged for v1.2.0.

TODO:

  • CuPy implementation
  • Replace PyCUDA/scikit-cuda by CuPy
  • Pass tests
  • Run examples on the new implementations
  • Fix the CUDA-related Chainer documentations
  • Write CuPy documentation

@beam2d beam2d added the cat:feature Implementation that introduces new interfaces. label Jul 27, 2015
@beam2d beam2d added this to the v1.3.0 milestone Jul 27, 2015
@mblondel
Copy link

Do you plan to make CuPy a separate project in the long term?

@beam2d
Copy link
Member Author

beam2d commented Jul 30, 2015

We currently have no any long term plan yet. I want to include it into chainer project to keep the development speed (managing two projects doubles the maintenance cost). This is a short term plan, though. We may make it a separate project in the future.

@beam2d
Copy link
Member Author

beam2d commented Aug 20, 2015

Following the internal discussion by the core developer team, we decided to merge this branch now for the next release v1.3.0. If you want to try this out before the official release, see the documentation and start from the updated examples.

Note that we now switched the default documentation shown at http://docs.chainer.org to the stable version instead of the latest master version. In order to read the documentation of CuPy, see http://docs.chainer.org/en/latest until this change is released.

Another note: the new implementation based on CuPy has one (and important) known issue: it might be slower than that based on PyCUDA (e.g. MNIST example), since many index manipulations are done by pure Python codes. Especially, the code which is not GPU intensive might get slower. GPU-intensive code is not affected (e.g. ImageNet example runs as fast as the PyCUDA version). We will raise an issue to resolve this degradation.

@beam2d beam2d changed the title [WIP] CuPy: Add and use a new GPU array backend with NumPy-compatible interface CuPy: Add and use a new GPU array backend with NumPy-compatible interface Aug 20, 2015
beam2d added a commit that referenced this pull request Aug 20, 2015
CuPy: Add and use a new GPU array backend with NumPy-compatible interface
@beam2d beam2d merged commit fc79c3c into master Aug 20, 2015
@okuta okuta deleted the cupy branch August 20, 2015 06:22
@bordingj
Copy link

Do you have any plans to support something similar to pyCuda's SourceModule in CuPy?

@beam2d
Copy link
Member Author

beam2d commented Aug 20, 2015

We do not currently support something similar to SourceModule, though you could use cupy.cuda.compile_with_cache as an alternative. It compiles a plain CUDA code to a cupy.cuda.Module object. You might pass a pointer to the array content to the resulting function, though we are not testing such use case yet.

You can also use cupy.carray.compile_with_cache like most CuPy kernels. A cupy.ndarray object can be passed to the resulting kernel, which is converted to a value of type CArray<T, ndim> defined in cupy/carray.cuh. Sorry for that both functions are currently not documented.

@bordingj
Copy link

Okay.
CuPy seems like a very promising project:)

Btw, it would be cool if you added and "add_dot" function to CuPy, similar to the skcuda.linalg.add_dot (blas gemm routine) function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cat:feature Implementation that introduces new interfaces.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants