CuPy: Add and use a new GPU array backend with NumPy-compatible interface #266

beam2d · 2015-07-27T05:59:41Z

This is a large PR aiming at replacing the CUDA array backend from PyCUDA/scikit-cuda to a new one named CuPy. This PR includes the implementation of CuPy and updates on Chainer.

Background: PyCUDA is a great wrapper of CUDA that enables us to write our own kernels and call them from Python. However, its GPUArray has few functionalities and almost every time we have to write our own kernels to write down Function implementations. We want to make it easier to write user-defined Functions runnable on GPU. It requires us to use more powerful GPU-array implementations.

We want to use a GPU array backend with following features:

It should enable us to write a common code running on both CPU and GPU. It should be able to write most Functions in this way.
It should enable us to write our own elementwise kernels for performance.
There exist some GPU-array implementations (e.g. CUDAMat, gnumpy, CUDArray, etc.), though none of them satisfies both requirements.

About CuPy: CuPy implements a subset of NumPy interface.

Within this subset, we can write one code running on NumPy and CuPy. See __init__.py for the list of supported functions (cupy.random is also provided).
It supports PyCUDA-style user-defined elementwise kernels.
Like PyCUDA, CuPy compiles kernels on runtime and caches them to files (including all kernels predefined in CuPy). It may be slow at the first usage, which will be resolved by continuing to use it.

We are aiming at merge this PR by Sep. 2 for the v1.3.0 release. If you want to make a PR with new Functions, we recommend you to implement them based on this branch and merge them for the v1.3.0+. Of course, feature PRs written on the current Chainer can also be merged for v1.2.0.

TODO:

CuPy implementation
Replace PyCUDA/scikit-cuda by CuPy
Pass tests
Run examples on the new implementations
Fix the CUDA-related Chainer documentations
Write CuPy documentation

mblondel · 2015-07-29T09:21:14Z

Do you plan to make CuPy a separate project in the long term?

beam2d · 2015-07-30T00:16:57Z

We currently have no any long term plan yet. I want to include it into chainer project to keep the development speed (managing two projects doubles the maintenance cost). This is a short term plan, though. We may make it a separate project in the future.

Conflicts: chainer/functions/basic_math.py

beam2d · 2015-08-20T06:10:14Z

Following the internal discussion by the core developer team, we decided to merge this branch now for the next release v1.3.0. If you want to try this out before the official release, see the documentation and start from the updated examples.

Note that we now switched the default documentation shown at http://docs.chainer.org to the stable version instead of the latest master version. In order to read the documentation of CuPy, see http://docs.chainer.org/en/latest until this change is released.

Another note: the new implementation based on CuPy has one (and important) known issue: it might be slower than that based on PyCUDA (e.g. MNIST example), since many index manipulations are done by pure Python codes. Especially, the code which is not GPU intensive might get slower. GPU-intensive code is not affected (e.g. ImageNet example runs as fast as the PyCUDA version). We will raise an issue to resolve this degradation.

CuPy: Add and use a new GPU array backend with NumPy-compatible interface

bordingj · 2015-08-20T09:30:48Z

Do you have any plans to support something similar to pyCuda's SourceModule in CuPy?

beam2d · 2015-08-20T16:10:22Z

We do not currently support something similar to SourceModule, though you could use cupy.cuda.compile_with_cache as an alternative. It compiles a plain CUDA code to a cupy.cuda.Module object. You might pass a pointer to the array content to the resulting function, though we are not testing such use case yet.

You can also use cupy.carray.compile_with_cache like most CuPy kernels. A cupy.ndarray object can be passed to the resulting kernel, which is converted to a value of type CArray<T, ndim> defined in cupy/carray.cuh. Sorry for that both functions are currently not documented.

bordingj · 2015-08-20T16:21:01Z

Okay.
CuPy seems like a very promising project:)

Btw, it would be cool if you added and "add_dot" function to CuPy, similar to the skcuda.linalg.add_dot (blas gemm routine) function.

beam2d added the cat:feature Implementation that introduces new interfaces. label Jul 27, 2015

beam2d added this to the v1.3.0 milestone Jul 27, 2015

beam2d force-pushed the cupy branch from 18d6e50 to 104c282 Compare July 29, 2015 00:37

beam2d and others added 25 commits July 30, 2015 16:14

Stop using find_library and use the full name of libraries

342336f

Add documents on manipulation routines

185dc70

Fix documentation of manipulation routines

ffa8435

Add cupy

e17eb38

Copy carray.cuh on installation

770abf9

Add gpu attribute to tests

d933efa

Fix travis config

1e9c329

Fix travis config

1d25bf6

Fix travis config

85e7b52

Fix style

9ce0d53

Check cupy style on Travis

628e07b

Fix import order

279cb39

Modify to use cupy

cf2fce4

Fix document

128b176

Support multi-GPU CuPy

b3f1895

Add get_xpy function

7e2c693

Fix functions

2999de2

Fix syntax error

a3ba9e8

Fix function

b614930

Fix code style

db5ef13

Fix batch_normalization

eb66189

Fix local_response_normalization

f43888f

Fix functions

f470e7b

Fix reshape

a1b03e6

Add dot tests

fe99c1e

beam2d and others added 20 commits August 18, 2015 18:49

Fix zero-sized array strides

1b14481

Merge branch 'cupy' of github.com:pfnet/chainer into cupy

d7a1db9

Fix bugs

a0a34aa

Rename xpy to xp

974c646

Move memoize docs

2225379

Fix kernel docs

7385ed4

Update function tutorial

599e61e

Merge branch 'master' into cupy

dcd943c

Conflicts: chainer/functions/basic_math.py

Fix bug

d2110fa

Switch to "current allocator" semantics

f7ea13f

Merge branch 'cupy' of github.com:pfnet/chainer into cupy

3e4391e

Fix an expression

7d6f435

Fix type bug

0a997d8

Merge branch 'cupy' of github.com:pfnet/chainer into cupy

4a948bc

Fix style

0003900

Fix style

e44959d

Improve some documents

d96c0f5

Merge branch 'cupy' of github.com:pfnet/chainer into cupy

0f98c0b

Fix basic_math

0fd95d0

Merge branch 'cupy' of github.com:pfnet/chainer into cupy

540bac1

beam2d changed the title ~~[WIP] CuPy: Add and use a new GPU array backend with NumPy-compatible interface~~ CuPy: Add and use a new GPU array backend with NumPy-compatible interface Aug 20, 2015

beam2d added a commit that referenced this pull request Aug 20, 2015

Merge pull request #266 from pfnet/cupy

fc79c3c

CuPy: Add and use a new GPU array backend with NumPy-compatible interface

beam2d merged commit fc79c3c into master Aug 20, 2015

okuta deleted the cupy branch August 20, 2015 06:22

delta2323 mentioned this pull request Aug 25, 2015

non-contiguous gpuarray issues #318

Closed

futurely mentioned this pull request Oct 13, 2015

Cache the compiled computation graph apache/mxnet#282

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CuPy: Add and use a new GPU array backend with NumPy-compatible interface #266

CuPy: Add and use a new GPU array backend with NumPy-compatible interface #266

beam2d commented Jul 27, 2015

mblondel commented Jul 29, 2015

beam2d commented Jul 30, 2015

beam2d commented Aug 20, 2015

bordingj commented Aug 20, 2015

beam2d commented Aug 20, 2015

bordingj commented Aug 20, 2015

CuPy: Add and use a new GPU array backend with NumPy-compatible interface #266

CuPy: Add and use a new GPU array backend with NumPy-compatible interface #266

Conversation

beam2d commented Jul 27, 2015

mblondel commented Jul 29, 2015

beam2d commented Jul 30, 2015

beam2d commented Aug 20, 2015

bordingj commented Aug 20, 2015

beam2d commented Aug 20, 2015

bordingj commented Aug 20, 2015