Release CuDNN v6, new layers, lots of bugfixes · pytorch/pytorch

Minor API Changes

in optim.Adamax, the default learning rate and epsilon have been made
consistent with Lasagne, Keras and TF.
- Previous: (lr=1e-2, eps=1e-38)
- Current : (lr=2e-3, eps=1e-8)
Make random_ range exclusive (it used to be exclusive when only the upper bound was specified, and inclusive when both were given).
torch.cat now disallows catting along inexistent dimensions
(to make it consistent with numpy and Variable cat)
torch.utils.clip_grad_norm now returns the total norm (say, for logging purposes).

Reduce DataParallel overhead on >4 GPUs
- Improve broadcast/reduce performance by coalescing tensors
nn.Embedding's backward performance increased for batch sizes > 1024

torch

Batch triangular factorization and solves have been interfaced (CPU and GPU) and
are available under torch.btrifact and torch.btrisolve. See documentation
for usage
All RNG functions now have generator specifiable via a keyword argument
torch.mode is now supported on the GPU via a high-performance kernel.

autograd, nn and optim

CuDNN v6 integrated:
- Faster Dilated Convolutions (and less memory hungry)
- 1D FFT-based Convolutions
- Significant performance improvement for Softmax layers
- Speedups across many functions
- Improved CuDNN error messages
- We will integrate persistent RNNs in the next release
torch.trace, torch.cumsum, torch.cross are now implemented in autograd
nll_loss now supports Spatial inputs (i.e. 4d inputs BCHW) and computes
channel-wise cross-entropy.
nn.PReLU now supports all dimensional Tensors, not just 1d and 2d.
add nn.PairwiseDistance and F.pairwise_distance that compute batchwise
pairwise distance between two vectors.
Adaptive Max and Average Pooling added for 1d, 2d inputs via
nn.AdaptiveMaxPooling1d, nn.AdaptiveAvgPooling2d, etc.
RMSProp now has momentum and a centered option. If centered is True,
the gradient is normalized by an estimation of it's variance. (Graves 2013)

utils

WeightedRandomSampler has been added as a custom sampler for the DataLoader.
It samples elements from [0,..,len(weights)-1] with the given probabilities
and is useful to sample from unbalanced datasets where some classes have
many more samples than others. See the docs
for more details
DataLoader now allows returning of numpy arrays

torch

When loading GPU checkpoints from disk with storage location remapping,
torch.cuda was still attempted to be imported. This is now fixed, and
you can load GPU checkpoints on machines with no GPUs or CUDA.
Work around an OSX fread bug where loading checkpoints of each Tensor > 1GB
would give an error.

Fixed a in torch.cat where it now does not
accept reverse (it's not a PySequence)
For example:

l = [Variable(torch.ones(1,3)*i) for i in range(3)]
torch.cat(reversed(l), 0) # errors now

Fix a memory leak in torch.from_numpy
GPU svd returned a larger matrix than expected in the some mode.
This is now fixed to match CPU behavior.
Fix a bug in CPU max that was introduced in the previous release.

autograd, nn and optim

Reassigning attributes in modules correctly works now.
This example used to not work correctly, l.a always remained None.
Now it works as one would expect:
```
l = nn.Linear(10, 20)
l.a = None
l.a = nn.Parameter(torch.randn(2))
# l.a is correctly updated
```
Fix bug where adding a hook could replace an existing hook
Fix nn.Embedding and nn.CosineEmbeddingLoss to work without
error on non-float CUDA (half, double)
Fix a bug in nn.Embedding when the max_norm option was used. Some of the
indices were not respecting max_norm and this is fixed.
Fix corner-case in Variable's SetItem where gradient was of incorrect shape.
x.grad used to be of shape 20, because `y[1]`` was of shape 20.
```
x = Variable(torch.randn(1, 20), requires_grad=True)
y = Variable(torch.zeros(10, 20))
y[1] = x
```
Fix a segfault in Conv1d when input doesn't require grad.
Assertions in pack_padded_sequence to check that sequence is of length > 0
torch.prod's autograd forumlae were incorrect if the Tensor had 0. This
formula has been fixed.
Variable expand and expand_as had incorrect dimension inference when using
broadcasting semantics. The formula has been fixed in these cases.
Fix a size mismatch in CosineEmbeddingLoss. See this issue for more details.
Fixed a bug in LBFGS that caused it to use uninitialized locals. See issue
Add assertions for negative padding in nn.Conv* functions.
Fix the sttdev gradient formula for the stochastic function normal.

other