cupy
CuPy is an implementation of NumPy-compatible multi-dimensional array on CUDA. CuPy consists of the core multi-dimensional array class, cupy.ndarray
, and many functions on it. It supports a subset of numpy.ndarray
interface that is enough for Chainer.
The following is a brief overview of supported subset of NumPy interface:
- Basic indexing (indexing by ints, slices, newaxes, and Ellipsis)
- Element types (dtypes): bool_, (u)int{8, 16, 32, 64}, float{16, 32, 64}
- Most of the array creation routines
- Reshaping and transposition
- All operators with broadcasting
- All Universal functions (a.k.a. ufuncs) for elementwise operations except those for complex numbers
- Dot product functions (except einsum) using cuBLAS
- Reduction along axes (sum, max, argmax, etc.)
CuPy also includes following features for performance:
- Customizable memory allocator, and a simple memory pool as an example
- User-defined elementwise kernels
- User-defined reduction kernels
- cuDNN utilities
CuPy uses on-the-fly kernel synthesis: when a kernel call is required, it compiles a kernel code optimized for the shapes and dtypes of given arguments, sends it to the GPU device, and executes the kernel. The compiled code is cached to $(HOME)/.cupy/kernel_cache
directory (this cache path can be overwritten by setting the CUPY_CACHE_DIR
environment variable). It may make things slower at the first kernel call, though this slow down will be resolved at the second execution. CuPy also caches the kernel code sent to GPU device within the process, which reduces the kernel transfer time on further calls.
~ndarray.base
~ndarray.ctypes
~ndarray.itemsize
~ndarray.flags
~ndarray.nbytes
~ndarray.shape
~ndarray.size
~ndarray.strides
~ndarray.dtype
~ndarray.T
~ndarray.tolist
~ndarray.tofile
~ndarray.dump
~ndarray.dumps
~ndarray.astype
~ndarray.copy
~ndarray.view
~ndarray.fill
~ndarray.reshape
~ndarray.transpose
~ndarray.swapaxes
~ndarray.ravel
~ndarray.squeeze
~ndarray.take
~ndarray.diagonal
~ndarray.max
~ndarray.argmax
~ndarray.min
~ndarray.argmin
~ndarray.clip
~ndarray.trace
~ndarray.sum
~ndarray.mean
~ndarray.var
~ndarray.std
~ndarray.prod
~ndarray.dot
~ndarray.__lt__
~ndarray.__le__
~ndarray.__gt__
~ndarray.__ge__
~ndarray.__eq__
~ndarray.__ne__
~ndarray.__nonzero__
~ndarray.__neg__
~ndarray.__pos__
~ndarray.__abs__
~ndarray.__invert__
~ndarray.__add__
~ndarray.__sub__
~ndarray.__mul__
~ndarray.__div__
~ndarray.__truediv__
~ndarray.__floordiv__
~ndarray.__mod__
~ndarray.__divmod__
~ndarray.__pow__
~ndarray.__lshift__
~ndarray.__rshift__
~ndarray.__and__
~ndarray.__or__
~ndarray.__xor__
~ndarray.__iadd__
~ndarray.__isub__
~ndarray.__imul__
~ndarray.__idiv__
~ndarray.__itruediv__
~ndarray.__ifloordiv__
~ndarray.__imod__
~ndarray.__ipow__
~ndarray.__ilshift__
~ndarray.__irshift__
~ndarray.__iand__
~ndarray.__ior__
~ndarray.__ixor__
~ndarray.__copy__
~ndarray.__deepcopy__
~ndarray.__reduce__
~ndarray.__array__
~ndarray.__len__
~ndarray.__getitem__
~ndarray.__setitem__
~ndarray.__int__
~ndarray.__long__
~ndarray.__float__
~ndarray.__oct__
~ndarray.__hex__
~ndarray.__repr__
~ndarray.__str__
~ndarray.get
~ndarray.set
empty
empty_like
eye
identity
ones
ones_like
zeros
zeros_like
full
full_like
array
asarray
ascontiguousarray
copy
arange
linspace
diag
diagflat
copyto
reshape
ravel
rollaxis
swapaxes
transpose
atleast_1d
atleast_2d
atleast_3d
broadcast
broadcast_arrays
broadcast_to
expand_dims
squeeze
column_stack
concatenate
dstack
hstack
vstack
array_split
dsplit
hsplit
split
vsplit
roll
bitwise_and
bitwise_or
bitwise_xor
invert
left_shift
right_shift
take
diagonal
load
save
savez
savez_compressed
array_repr
array_str
dot
vdot
inner
outer
tensordot
trace
isfinite
isinf
isnan
logical_and
logical_or
logical_not
logical_xor
greater
greater_equal
less
less_equal
equal
not_equal
sin
cos
tan
arcsin
arccos
arctan
hypot
arctan2
deg2rad
rad2deg
degrees
radians
sinh
cosh
tanh
arcsinh
arccosh
arctanh
rint
floor
ceil
trunc
sum
prod
exp
expm1
exp2
log
log10
log2
log1p
logaddexp
logaddexp2
signbit
copysign
ldexp
frexp
nextafter
add
reciprocal
negative
multiply
divide
power
subtract
true_divide
floor_divide
fmod
mod
modf
remainder
clip
sqrt
square
absolute
sign
maximum
minimum
fmax
fmin
argmax
argmin
count_nonzero
nonzero
flatnonzero
where
amin
amax
mean
var
std
bincount
pad
scatter_add
asnumpy