Skip to content

CuDNN v6, new layers, lots of bugfixes

Compare
Choose a tag to compare
@soumith soumith released this 31 Mar 16:27

Minor API Changes

  • in optim.Adamax, the default learning rate and epsilon have been made
    consistent with Lasagne, Keras and TF.
    • Previous: (lr=1e-2, eps=1e-38)
    • Current : (lr=2e-3, eps=1e-8)
  • Make random_ range exclusive (it used to be exclusive when only the upper bound was specified, and inclusive when both were given).
  • torch.cat now disallows catting along inexistent dimensions
    (to make it consistent with numpy and Variable cat)
  • torch.utils.clip_grad_norm now returns the total norm (say, for logging purposes).

Performance Improvements

  • Reduce DataParallel overhead on >4 GPUs
    • Improve broadcast/reduce performance by coalescing tensors
  • nn.Embedding's backward performance increased for batch sizes > 1024

New Features

torch

  • Batch triangular factorization and solves have been interfaced (CPU and GPU) and
    are available under torch.btrifact and torch.btrisolve. See documentation
    for usage
  • All RNG functions now have generator specifiable via a keyword argument
  • torch.mode is now supported on the GPU via a high-performance kernel.

autograd, nn and optim

  • CuDNN v6 integrated:
    • Faster Dilated Convolutions (and less memory hungry)
    • 1D FFT-based Convolutions
    • Significant performance improvement for Softmax layers
    • Speedups across many functions
    • Improved CuDNN error messages
    • We will integrate persistent RNNs in the next release
  • torch.trace, torch.cumsum, torch.cross are now implemented in autograd
  • nll_loss now supports Spatial inputs (i.e. 4d inputs BCHW) and computes
    channel-wise cross-entropy.
  • nn.PReLU now supports all dimensional Tensors, not just 1d and 2d.
  • add nn.PairwiseDistance and F.pairwise_distance that compute batchwise
    pairwise distance between two vectors.
  • Adaptive Max and Average Pooling added for 1d, 2d inputs via
    nn.AdaptiveMaxPooling1d, nn.AdaptiveAvgPooling2d, etc.
  • RMSProp now has momentum and a centered option. If centered is True,
    the gradient is normalized by an estimation of it's variance. (Graves 2013)

utils

  • WeightedRandomSampler has been added as a custom sampler for the DataLoader.
    It samples elements from [0,..,len(weights)-1] with the given probabilities
    and is useful to sample from unbalanced datasets where some classes have
    many more samples than others. See the docs
    for more details
  • DataLoader now allows returning of numpy arrays

Bug Fixes

torch

  • When loading GPU checkpoints from disk with storage location remapping,
    torch.cuda was still attempted to be imported. This is now fixed, and
    you can load GPU checkpoints on machines with no GPUs or CUDA.
  • Work around an OSX fread bug where loading checkpoints of each Tensor > 1GB
    would give an error.
  • Fixed a in torch.cat where it now does not
    accept reverse (it's not a PySequence)
    For example:
    l = [Variable(torch.ones(1,3)*i) for i in range(3)]
    torch.cat(reversed(l), 0) # errors now
    
  • Fix a memory leak in torch.from_numpy
  • GPU svd returned a larger matrix than expected in the some mode.
    This is now fixed to match CPU behavior.
  • Fix a bug in CPU max that was introduced in the previous release.

autograd, nn and optim

  • Reassigning attributes in modules correctly works now.
    This example used to not work correctly, l.a always remained None.
    Now it works as one would expect:
    l = nn.Linear(10, 20)
    l.a = None
    l.a = nn.Parameter(torch.randn(2))
    # l.a is correctly updated
  • Fix bug where adding a hook could replace an existing hook
  • Fix nn.Embedding and nn.CosineEmbeddingLoss to work without
    error on non-float CUDA (half, double)
  • Fix a bug in nn.Embedding when the max_norm option was used. Some of the
    indices were not respecting max_norm and this is fixed.
  • Fix corner-case in Variable's SetItem where gradient was of incorrect shape.
    x.grad used to be of shape 20, because `y[1]`` was of shape 20.
    x = Variable(torch.randn(1, 20), requires_grad=True)
    y = Variable(torch.zeros(10, 20))
    y[1] = x
    
  • Fix a segfault in Conv1d when input doesn't require grad.
  • Assertions in pack_padded_sequence to check that sequence is of length > 0
  • torch.prod's autograd forumlae were incorrect if the Tensor had 0. This
    formula has been fixed.
  • Variable expand and expand_as had incorrect dimension inference when using
    broadcasting semantics. The formula has been fixed in these cases.
  • Fix a size mismatch in CosineEmbeddingLoss. See this issue for more details.
  • Fixed a bug in LBFGS that caused it to use uninitialized locals. See issue
  • Add assertions for negative padding in nn.Conv* functions.
  • Fix the sttdev gradient formula for the stochastic function normal.

other

  • Fix issue when returning strings from the DataLoader when pin_memory=True
  • Binaries no longer dependent on needing a libcudart.so at runtime.