CuDNN v6, new layers, lots of bugfixes
Minor API Changes
- in
optim.Adamax
, the default learning rate and epsilon have been made
consistent with Lasagne, Keras and TF.- Previous:
(lr=1e-2, eps=1e-38)
- Current :
(lr=2e-3, eps=1e-8)
- Previous:
- Make
random_
range exclusive (it used to be exclusive when only the upper bound was specified, and inclusive when both were given). torch.cat
now disallows catting along inexistent dimensions
(to make it consistent with numpy and Variable cat)torch.utils.clip_grad_norm
now returns the total norm (say, for logging purposes).
Performance Improvements
- Reduce DataParallel overhead on >4 GPUs
- Improve broadcast/reduce performance by coalescing tensors
nn.Embedding
's backward performance increased for batch sizes > 1024
New Features
torch
- Batch triangular factorization and solves have been interfaced (CPU and GPU) and
are available undertorch.btrifact
andtorch.btrisolve
. See documentation
for usage - All RNG functions now have
generator
specifiable via a keyword argument torch.mode
is now supported on the GPU via a high-performance kernel.
autograd, nn and optim
- CuDNN v6 integrated:
- Faster Dilated Convolutions (and less memory hungry)
- 1D FFT-based Convolutions
- Significant performance improvement for Softmax layers
- Speedups across many functions
- Improved CuDNN error messages
- We will integrate persistent RNNs in the next release
torch.trace
,torch.cumsum
,torch.cross
are now implemented in autogradnll_loss
now supports Spatial inputs (i.e. 4d inputs BCHW) and computes
channel-wise cross-entropy.nn.PReLU
now supports all dimensional Tensors, not just 1d and 2d.- add
nn.PairwiseDistance
andF.pairwise_distance
that compute batchwise
pairwise distance between two vectors. - Adaptive Max and Average Pooling added for 1d, 2d inputs via
nn.AdaptiveMaxPooling1d
,nn.AdaptiveAvgPooling2d
, etc. - RMSProp now has
momentum
and acentered
option. Ifcentered
is True,
the gradient is normalized by an estimation of it's variance. (Graves 2013)
utils
WeightedRandomSampler
has been added as a custom sampler for the DataLoader.
It samples elements from[0,..,len(weights)-1]
with the given probabilities
and is useful to sample from unbalanced datasets where some classes have
many more samples than others. See the docs
for more details- DataLoader now allows returning of numpy arrays
Bug Fixes
torch
- When loading GPU checkpoints from disk with storage location remapping,
torch.cuda
was still attempted to be imported. This is now fixed, and
you can load GPU checkpoints on machines with no GPUs or CUDA. - Work around an OSX
fread
bug where loading checkpoints of each Tensor > 1GB
would give an error. - Fixed a in
torch.cat
where it now does not
acceptreverse
(it's not aPySequence
)
For example:l = [Variable(torch.ones(1,3)*i) for i in range(3)] torch.cat(reversed(l), 0) # errors now
- Fix a memory leak in
torch.from_numpy
- GPU svd returned a larger matrix than expected in the
some
mode.
This is now fixed to match CPU behavior. - Fix a bug in CPU max that was introduced in the previous release.
autograd, nn and optim
- Reassigning attributes in modules correctly works now.
This example used to not work correctly,l.a
always remainedNone
.
Now it works as one would expect:l = nn.Linear(10, 20) l.a = None l.a = nn.Parameter(torch.randn(2)) # l.a is correctly updated
- Fix bug where adding a hook could replace an existing hook
- Fix
nn.Embedding
andnn.CosineEmbeddingLoss
to work without
error on non-float CUDA (half, double) - Fix a bug in
nn.Embedding
when themax_norm
option was used. Some of the
indices were not respectingmax_norm
and this is fixed. - Fix corner-case in
Variable
's SetItem where gradient was of incorrect shape.
x.grad
used to be of shape 20, because `y[1]`` was of shape 20.x = Variable(torch.randn(1, 20), requires_grad=True) y = Variable(torch.zeros(10, 20)) y[1] = x
- Fix a segfault in Conv1d when input doesn't require grad.
- Assertions in
pack_padded_sequence
to check that sequence is of length > 0 torch.prod
's autograd forumlae were incorrect if the Tensor had 0. This
formula has been fixed.- Variable
expand
andexpand_as
had incorrect dimension inference when using
broadcasting semantics. The formula has been fixed in these cases. - Fix a size mismatch in
CosineEmbeddingLoss
. See this issue for more details. - Fixed a bug in LBFGS that caused it to use uninitialized locals. See issue
- Add assertions for negative padding in
nn.Conv*
functions. - Fix the sttdev gradient formula for the stochastic function
normal
.
other
- Fix issue when returning strings from the DataLoader when
pin_memory=True
- Binaries no longer dependent on needing a
libcudart.so
at runtime.