Skip to content

ivpopov/cudnn.torch

 
 

Repository files navigation

cudnn.torch

Torch7 FFI bindings for NVidia CuDNN (R3) kernels!

Modules are API compatible their nn equivalents. Fully unit-tested against nn implementations. Conversion between nn and cudnn is available through cudnn.convert function.

Installation

Modules

-- All inputs have to be 3D or 4D(batch-mode), except ReLU, Tanh and Sigmoid
cudnn.SpatialConvolution(nInputPlane, nOutputPlane, kW, kH, [dW = 1], [dH = 1], [padW = 0], [padH = padW], [groups = 1])
cudnn.SpatialMaxPooling(kW, kH, dW, dH, padW, padH)
cudnn.SpatialAveragePooling(kW, kH, dW, dH, padW, padH)

-- the pointwise functions take an additional optional argument. if inplace=true then they do operations in-place without using any extra memory for themselves
cudnn.ReLU(inplace[=false])
cudnn.Tanh(inplace[=false])
cudnn.Sigmoid(inplace[=false])

-- SoftMax can be run in fast mode or accurate mode. Default is accurate mode.
cudnn.SoftMax(fastMode [= false])          -- SoftMax across each image (just like nn.SoftMax)
cudnn.LogSoftMax()                         -- LogSoftMax across each image (just like nn.LogSoftMax)
cudnn.SpatialSoftMax(fastMode [= false])   -- SoftMax across feature-maps (per spatial location)
cudnn.SpatialLogSoftMax()                  -- LogSoftMax across feature-maps (per spatial location)

cudnn.SpatialCrossEntropyCriterion()       -- A spatial version of LogSoftMax + ClassNLLCriterion in one shot

-- Volumetric inputs (4D or 5D batched mode)
cudnn.VolumetricConvolution(nInputPlane, nOutputPlane, kT, kW, kH, dT, dW, dH, padT, padW, padH)
cudnn.VolumetricMaxPooling(kT, kW, kH, dT, dW, dH, padT, padW, padH)
cudnn.VolumetricAveragePooling(kT, kW, kH, dT, dW, dH, padT, padW, padH)

Modes

There are two globally availabe modes useful for tuning performance:

require 'cudnn'
cudnn.benchmark = true -- uses the inbuilt cudnn auto-tuner to find the fastest convolution algorithms.
                       -- If this is set to false, uses some in-built heuristics that might not always be fastest.

by default cudnn.benchmark is set to false.

cudnn.fastest = true -- this is like the :fastest() mode for the Convolution modules,
                     -- simply picks the fastest convolution algorithm, rather than tuning for workspace size

by default, cudnn.fastest is set to false.

cudnn.verbose = true -- this prints out some more verbose information useful for debugging

by default, cudnn.verbose is set to false.

Conversion between cudnn and nn

Conversion is done by cudnn.convert function which takes a network and backend arguments and goes over network modules recursively substituting equivalents. No memory copy is done, just metatables are swapped.

net = nn.Sequential()
net:add(nn.SpatialConvolution(3,96,11,11,3,3))
net:add(nn.ReLU())
cudnn.convert(net, cudnn)
print(net)

will result in:

nn.Sequential {
  [input -> (1) -> (2) -> output]
  (1): cudnn.SpatialConvolution(3 -> 96, 11x11, 3,3)
  (2): cudnn.ReLU
}

Older versions

For version CuDNN R1, checkout the branch R1 For version CuDNN R2, checkout the branch R2

About

Torch-7 FFI bindings for NVIDIA CuDNN

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Lua 99.7%
  • CMake 0.3%