Releases: pytorch/pytorch
PyTorch 2.0.1 Release, bug fix release
This release is meant to fix the following issues (regressions / silent correctness):
- Fix
_canonical_mask
throws warning when bool masks passed as input to TransformerEncoder/TransformerDecoder (#96009, #96286) - Fix Embedding bag max_norm=-1 causes leaf Variable that requires grad is being used in an in-place operation #95980
- Fix type hint for torch.Tensor.grad_fn, which can be a torch.autograd.graph.Node or None. #96804
- Can’t convert float to int when the input is a scalar np.ndarray. #97696
- Revisit torch._six.string_classes removal #97863
- Fix module backward pre-hooks to actually update gradient #97983
- Fix load_sharded_optimizer_state_dict error on multi node #98063
- Warn once for TypedStorage deprecation #98777
- cuDNN V8 API, Fix incorrect use of emplace in the benchmark cache #97838
Torch.compile:
- Add support for Modules with custom getitem method to torch.compile #97932
- Fix improper guards with on list variables. #97862
- Fix Sequential nn module with duplicated submodule #98880
Distributed:
- Fix distributed_c10d's handling of custom backends #95072
- Fix MPI backend not properly initialized #98545
NN_frontend:
- Update Multi-Head Attention's doc string #97046
- Fix incorrect behavior of
is_causal
paremeter for torch.nn.TransformerEncoderLayer.forward #97214 - Fix error for SDPA on sm86 and sm89 hardware #99105
- Fix nn.MultiheadAttention mask handling #98375
DataLoader:
- Fix regression for pin_memory recursion when operating on bytes #97737
- Fix collation logic #97789
- Fix Ppotentially backwards incompatible change with DataLoader and is_shardable Datapipes #97287
MPS:
- Fix LayerNorm crash when input is in float16 #96208
- Add support for cumsum on int64 input #96733
- Fix issue with setting BatchNorm to non-trainable #98794
Functorch:
- Fix Segmentation Fault for vmaped function accessing BatchedTensor.data #97237
- Fix index_select support when dim is negative #97916
- Improve docs for autograd.Function support #98020
- Fix Exception thrown when running Migration guide example for jacrev #97746
Releng:
- Fix Convolutions for CUDA-11.8 wheel builds #99451
- Fix Import torchaudio + torch.compile crashes on exit #96231
- Linux aarch64 wheels are missing the mkldnn+acl backend support - pytorch/builder@54931c2
- Linux aarch64 torchtext 0.15.1 wheels are missing for aarch64_linux platform - pytorch/builder#1375
- Enable ROCm 5.4.2 manywheel and python 3.11 builds #99552
- PyTorch cannot be installed at the same time as numpy in a conda env on osx-64 / Python 3.11 #97031
- Illegal instruction (core dumped) on Raspberry Pi 4.0 8gb - pytorch/builder#1370
Torch.optim:
- Fix fused AdamW causes NaN loss #95847
- Fix Fused AdamW has worse loss than Apex and unfused AdamW for fp16/AMP #98620
The release tracker should contain all relevant pull requests related to this release as well as links to related issues
PyTorch 2.0: Our next generation release that is faster, more Pythonic and Dynamic as ever
PyTorch 2.0 Release notes
- Highlights
- Backwards Incompatible Changes
- Deprecations
- New Features
- Improvements
- Bug fixes
- Performance
- Documentation
Highlights
We are excited to announce the release of PyTorch® 2.0 (release note) which we highlighted during the PyTorch Conference on 12/2/22! PyTorch 2.0 offers the same eager-mode development and user experience, while fundamentally changing and supercharging how PyTorch operates at compiler level under the hood with faster performance and support for Dynamic Shapes and Distributed.
This next-generation release includes a Stable version of Accelerated Transformers (formerly called Better Transformers); Beta includes torch.compile as the main API for PyTorch 2.0, the scaled_dot_product_attention function as part of torch.nn.functional, the MPS backend, functorch APIs in the torch.func module; and other Beta/Prototype improvements across various inferences, performance and training optimization features on GPUs and CPUs. For a comprehensive introduction and technical overview of torch.compile, please visit the 2.0 Get Started page.
Along with 2.0, we are also releasing a series of beta updates to the PyTorch domain libraries, including those that are in-tree, and separate libraries including TorchAudio, TorchVision, and TorchText. An update for TorchX is also being released as it moves to community supported mode. More details can be found in this library blog.
This release is composed of over 4,541 commits and 428 contributors since 1.13.1. We want to sincerely thank our dedicated community for your contributions. As always, we encourage you to try these out and report any issues as we improve 2.0 and the overall 2-series this year.
Summary:
- torch.compile is the main API for PyTorch 2.0, which wraps your model and returns a compiled model. It is a fully additive (and optional) feature and hence 2.0 is 100% backward compatible by definition.
- As an underpinning technology of torch.compile, TorchInductor with Nvidia and AMD GPUs will rely on OpenAI Triton deep learning compiler to generate performant code and hide low level hardware details. OpenAI Triton-generated kernels achieve performance that's on par with hand-written kernels and specialized cuda libraries such as cublas.
- Accelerated Transformers introduce high-performance support for training and inference using a custom kernel architecture for scaled dot product attention (SPDA). The API is integrated with torch.compile() and model developers may also use the scaled dot product attention kernels directly by calling the new scaled_dot_product_attention() operator.
- Metal Performance Shaders (MPS) backend provides GPU accelerated PyTorch training on Mac platforms with added support for Top 60 most used ops, bringing coverage to over 300 operators.
- Amazon AWS optimize the PyTorch CPU inference on AWS Graviton3 based C7g instances. PyTorch 2.0 improves inference performance on Graviton compared to the previous releases, including improvements for Resnet50 and Bert.
- New prototype features and technologies across TensorParallel, DTensor, 2D parallel, TorchDynamo, AOTAutograd, PrimTorch and TorchInductor.
Stable | Beta | Prototype | Platform Changes |
Accelerated PT 2 Transformers | torch.compile | DTensor | CUDA support for 11.7 & 11.8 (deprecating CUDA 11.6) |
PyTorch MPS Backend | TensorParallel | Python 3.8 (deprecating Python 3.7) | |
Scaled dot product attention | 2D Parallel | AWS Graviton3 | |
Functorch | Torch.compile (dynamic=True) | ||
Dispatchable Collectives | |||
torch.set_default_device and torch.device as context manager | |||
X86 quantization backend | |||
GNN inference and training performance |
*To see a full list of public 2.0, 1.13 and 1.12 feature submissions click here
Backwards Incompatible Changes
Drop support for Python versions <= 3.7 (#93155)
Previously the minimum supported version of Python for PyTorch was 3.7. This PR updates the minimum version to require 3.8 in order to install PyTorch. See Hardware / Software Support for more information.
Drop support for CUDA 10 (#89582)
This PR updates the minimum CUDA version to 11.0. See the getting-started for installation or building from source for more information.
Gradients are now set to None
instead of zeros by default in torch.optim.*.zero_grad()
and torch.nn.Module.zero_grad()
(#92731)
This changes the default behavior of zero_grad()
to zero out the grads by setting them to None
instead of zero tensors. In other words, the set_to_none
kwarg is now True
by default instead of False
. Setting grads to None
reduces peak memory usage and increases performance. This will break code that directly accesses data or does computation on the grads after calling zero_grad()
as they will now be None
. To revert to the old behavior, pass in zero_grad(set_to_none=False)
.
1.13 | 2.0 |
---|---|
>>> import torch
>>> from torch import nn
>>> module = nn.Linear(2,22)
>>> i = torch.randn(2, 2, requires_grad=True)
>>> module(i).sum().backward()
>>> module.zero_grad()
>>> module.weight.grad == None
False
>>> module.weight.grad.data
tensor([[0., 0.],
[0., 0.]])
>>> module.weight.grad + 1.0
tensor([[1., 1.],
[1., 1.]]) |
>>> import torch
>>> from torch import nn
>>> module = nn.Linear(5, 5)
>>> i = torch.randn(2, 5, requires_grad=True)
>>> module(i).sum().backward()
>>> module.zero_grad()
>>> module.weight.grad == None
True
>>> module.weight.grad.data
AttributeError: 'NoneType' object has no attribute 'data'
>>> module.weight.grad + 1.0
TypeError: unsupported operand type(s) for +:
'NoneType' and 'float' |
Update torch.tensor
and nn.Parameter
to serialize all their attributes (#88913)
Any attribute stored on torch.tensor
and torch.nn.Parameter
will now be serialized. This aligns the serialization behavior of torch.nn.Parameter
, torch.Tensor
and other tensor subclasses
1.13 | 2.0 |
---|---|
# torch.Tensor behavior
>>> a = torch.Tensor()
>>> a.foo = 'hey'
>>> buffer = io.BytesIO()
>>> torch.save(a, buffer)
>>> buffer.seek(0)
>>> b = torch.load(buffer)
>>> print(a.foo)
hey
>>> print(b.foo)
AttributeError: 'Tensor' object has no attribute 'foo'
# torch.nn.Parameter behavior
>>> a = nn.Parameter()
>>> a.foo = 'hey'
>>> buffer = io.BytesIO()
>>> torch.save(a, buffer)
>>> buffer.seek(0)
>>> b = torch.load(buffer)
>>> print(a.foo)
hey
>>> print(b.foo)
AttributeError: 'Parameter' object has no attribute 'foo'
# torch.Tensor subclass behavior
>>> class MyTensor(torch.Tensor):
... pass
>>> a = MyTensor()
>>> a.foo = 'hey'
>>> print(a.foo)
hey
>>> buffer = io.BytesIO()
>>> torch.save(a, buffer)
>>> buffer.seek(0)
>>> b = torch.load(buffer)
>>>print(b.foo)
hey |
# torch.Tensor behavior
a = torch.Tensor()
a.foo = 'hey'
>>> buffer = io.BytesIO()
>>> torch.save(a, buffer)
>>> buffer.seek(0)
>>> b = torch.load(buffer)
>>> print(a.foo)
hey
>>> print(b.foo)
hey
# torch.nn.Parameter behavior
>>> a = nn.Parameter()
>>> a.foo = 'hey'
>>> buffer = io.BytesIO()
>>> torch.save(a, buffer)
>>> buffer.seek(0)
>>> b = torch.load(buffer)
>>> print(a.foo)
hey
>>> print(b.foo)
hey
# torch.Tensor subclass behavior
>>> class MyTensor(torch.Tensor):
... pass
>>> a = MyTensor()
>>> a.foo = 'hey'
>>> print(a.foo)
hey
>>> buffer = io.BytesIO()
>>> torch.save(a, buffer)
>>> buffer.seek(0)
>>> b = torch.load(buffer)
>>>print(b.foo)
hey |
If you have an attribute that you don't want to be serialized you should not store it as an attribute on tensor or Parameter but instead it is recommended to use torch.utils.weak.WeakTensorKeyDictionary
>>> foo_dict = weak.WeakTensorKeyDictionary()
>>> foo_dict[a] = 'hey'
>>> print(foo_dict[a])
hey
Algorithms {Adadelta, Adagrad, Adam, Adamax, AdamW, ASGD, NAdam, RAdam, RMSProp, RProp, SGD}
default to faster foreach
implementation when on CUDA + differentiable=False
When applicable, this changes the default behavior of step()
and anything that ca...
PyTorch 1.13.1 Release, small bug fix release
This release is meant to fix the following issues (regressions / silent correctness):
- RuntimeError by torch.nn.modules.activation.MultiheadAttention with bias=False and batch_first=True #88669
- Installation via pip on Amazon Linux 2, regression #88869
- Installation using poetry on Mac M1, failure #88049
- Missing masked tensor documentation #89734
- torch.jit.annotations.parse_type_line is not safe (command injection) #88868
- Use the Python frame safely in _pythonCallstack #88993
- Double-backward with full_backward_hook causes RuntimeError #88312
- Fix logical error in get_default_qat_qconfig #88876
- Fix cuda/cpu check on NoneType and unit test #88854 and #88970
- Onnx ATen Fallback for BUILD_CAFFE2=0 for ONNX-only ops #88504
- Onnx operator_export_type on the new registry #87735
- torchrun AttributeError caused by file_based_local_timer on Windows #85427
The release tracker should contain all relevant pull requests related to this release as well as links to related issues
PyTorch 1.13: beta versions of functorch and improved support for Apple’s new M1 chips are now available
Pytorch 1.13 Release Notes
- Highlights
- Backwards Incompatible Changes
- New Features
- Improvements
- Performance
- Documentation
- Developers
Highlights
We are excited to announce the release of PyTorch 1.13! This includes stable versions of BetterTransformer. We deprecated CUDA 10.2 and 11.3 and completed migration of CUDA 11.6 and 11.7. Beta includes improved support for Apple M1 chips and functorch, a library that offers composable vmap (vectorization) and autodiff transforms, being included in-tree with the PyTorch release. This release is composed of over 3,749 commits and 467 contributors since 1.12.1. We want to sincerely thank our dedicated community for your contributions.
Summary:
-
The BetterTransformer feature set supports fastpath execution for common Transformer models during Inference out-of-the-box, without the need to modify the model. Additional improvements include accelerated add+matmul linear algebra kernels for sizes commonly used in Transformer models and Nested Tensors is now enabled by default.
-
Timely deprecating older CUDA versions allows us to proceed with introducing the latest CUDA version as they are introduced by Nvidia®, and hence allows support for C++17 in PyTorch and new NVIDIA Open GPU Kernel Modules.
-
Previously, functorch was released out-of-tree in a separate package. After installing PyTorch, a user will be able to
import functorch
and use functorch without needing to install another package. -
PyTorch is offering native builds for Apple® silicon machines that use Apple's new M1 chip as a beta feature, providing improved support across PyTorch's APIs.
Stable | Beta | Prototype |
---|---|---|
|
|
|
You can check the blogpost that shows the new features here.
Backwards Incompatible changes
Python API
uint8 and all integer dtype masks are no longer allowed in Transformer (#87106)
Prior to 1.13, key_padding_mask
could be set to uint8 or other integer dtypes in TransformerEncoder
and MultiheadAttention
, which might generate unexpected results. In this release, these dtypes are not allowed for the mask anymore. Please convert them to torch.bool
before using.
1.12.1
>>> layer = nn.TransformerEncoderLayer(2, 4, 2)
>>> encoder = nn.TransformerEncoder(layer, 2)
>>> pad_mask = torch.tensor([[1, 1, 0, 0]], dtype=torch.uint8)
>>> inputs = torch.cat([torch.randn(1, 2, 2), torch.zeros(1, 2, 2)], dim=1)
# works before 1.13
>>> outputs = encoder(inputs, src_key_padding_mask=pad_mask)
1.13
>>> layer = nn.TransformerEncoderLayer(2, 4, 2)
>>> encoder = nn.TransformerEncoder(layer, 2)
>>> pad_mask = torch.tensor([[1, 1, 0, 0]], dtype=torch.bool)
>>> inputs = torch.cat([torch.randn(1, 2, 2), torch.zeros(1, 2, 2)], dim=1)
>>> outputs = encoder(inputs, src_key_padding_mask=pad_mask)
Updated torch.floor_divide
to perform floor division (#78411)
Prior to 1.13, torch.floor_divide
erroneously performed truncation division (i.e. truncated the quotients). In this release, it has been fixed to perform floor division. To replicate the old behavior, use torch.div
with rounding_mode='trunc'
.
1.12.1
>>> a = torch.tensor([4.0, -3.0])
>>> b = torch.tensor([2.0, 2.0])
>>> torch.floor_divide(a, b)
tensor([ 2., -1.])
1.13
>>> a = torch.tensor([4.0, -3.0])
>>> b = torch.tensor([2.0, 2.0])
>>> torch.floor_divide(a, b)
tensor([ 2., -2.])
# Old behavior can be replicated using torch.div with rounding_mode='trunc'
>>> torch.div(a, b, rounding_mode='trunc')
tensor([ 2., -1.])
Fixed torch.index_select
on CPU to error that index is out of bounds when the source
tensor is empty (#77881)
Prior to 1.13, torch.index_select
would return an appropriately sized tensor filled with random values on CPU if the source tensor was empty. In this release, we have fixed this bug so that it errors out. A consequence of this is that torch.nn.Embedding
which utilizes index_select
will error out rather than returning an empty tensor when embedding_dim=0
and input
contains indices which are out of bounds. The old behavior cannot be reproduced with torch.nn.Embedding
, however since an Embedding layer with embedding_dim=0
is a corner case this behavior is unlikely to be relied upon.
1.12.1
>>> t = torch.tensor([4], dtype=torch.long)
>>> embedding = torch.nn.Embedding(3, 0)
>>> embedding(t)
tensor([], size=(1, 0), grad_fn=<EmbeddingBackward0>)
1.13
>>> t = torch.tensor([4], dtype=torch.long)
>>> embedding = torch.nn.Embedding(3, 0)
>>> embedding(t)
RuntimeError: INDICES element is out of DATA bounds, id=4 axis_dim=3
Disallow overflows when tensors are constructed from scalars (#82329)
Prior to this PR, overflows during tensor construction from scalars would not throw an error. In 1.13, such cases will error.
1.12.1
>>> torch.tensor(1000, dtype=torch.int8)
tensor(-24, dtype=torch.int8)
1.13
>>> torch.tensor(1000, dtype=torch.int8)
RuntimeError: value cannnot be converted to type int8 without overflow
Error on indexing a cpu tensor with non-cpu indices (#69607)
Prior to 1.13, cpu_tensor[cuda_indices]
was a valid program that would return a cpu tensor. The original use case for mixed device indexing was for non_cpu_tensor[cpu_indices]
, and allowing the opposite was unintentional (cpu_tensor[non_cpu_indices]
). This behavior appears to be rarely used, and a refactor of our indexing kernels made it difficult to represent an op that takes in (cpu_tensor, non_cpu_tensor) and returns another cpu_tensor, so it is now an error.
To replicate the old behavior for base[indices]
, you can ensure that either indices
lives on the CPU device, or base
and indices
both live on the same device.
1.12.1
>>> a = torch.tensor([1.0, 2.0, 3.0])
>>> b = torch.tensor([0, 2], device='cuda')
>>> a[b]
tensor([1., 3.])
1.13
>>> a = torch.tensor([1.0, 2.0, 3.0])
>>> b = torch.tensor([0, 2], device='cuda')
>>> a[b]
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
# Old behavior can be replicated by moving b to CPU, or a to CUDA
>>> a[b.cpu()]
tensor([1., 3.])
>>> a.cuda()[b]
tensor([1., 3.], device='cuda:0')
Remove deprecated torch.eig
, torch.matrix_rank
, torch.lstsq
(#70982, #70981, #70980)
The deprecation cycle for the above functions has been completed and they have been removed in the 1.13 release.
torch.nn
Enforce that the bias
has the same dtype as input
and weight
for convolutions on CPU (#83686)
To align with the implementation on other devices, the CPU implementation for convolutions was updated to enforce that the dtype
of the bias
matches the dtype
of the input
and weight
.
1.12.1
# input and weight are dtype torch.int64
# bias is torch.float32
>>> out = torch.nn.functional.conv2d(input, weight, bias, ...)
1.13
# input and weight are dtype torch.int64
# bias is torch.float32
>>> with assertRaisesError():
>>> out = torch.nn.functional.conv2d(input, weight, bias, ...)
# Updated code to avoid the error
>>> out = torch.nn.functional.conv2d(input, weight, bias.to(input.dtype), ...)
Autograd
Disallow setting the .data
of a tensor that requires_grad=True
with an integer tensor (#78436)
Setting the .data
of a tensor that requires_grad
with an integer tensor now raises an error.
1.12.1
>>> x = torch.randn(2, requires_grad=True)
>>> x.data = torch.randint(1, (2,))
>>> x
tensor([0, 0], requires_grad=True)
1.13
>>> x = torch.randn(2, requires_grad=True)
>>> x.data = torch.randint(1, (2,))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: data set to a tensor that requires gradients must be floating point or complex dtype
Added variable_list support to ExtractVariables struct (#84583)
Prior to this change, C++ custom autograd Function considers tensors passed in TensorList to not be tensors for the purposes of recording the backward graph. After this change, custom Functions that receive TensorList must modify their backward functions to also compute gradients for these additional tensor inputs. Note that this behavior now differs from that of custom autograd Functions in Python.
1.12.1
struct MyFunction : public Function<MyFunction> {
static Variable forward(AutogradContext* ctx, at::Tensor t, at::TensorList tensors) {
return 2 * tensors[0] + 3 * t;
}
static variable_list backward(
AutogradContext* ctx,
variable_list grad_output) {
return {3 * grad_output[0]};
}
};
1.13
struct MyFunction : public Function<MyFunction> {
static Variable forward(AutogradContext* ctx, at::Tensor t, at::TensorList tensors) {
return 2 * tensors[0] + 3 * t;
}
static variable_list backward(
AutogradContext* ctx,
variable_list grad_output) {
return {3 * grad_output[0], 2 * grad_output[0]};
}
};
Don't detach when making views; force kernel to detach (#84893)
View operations registered as CompositeExplicitAutograd kernels are no longer allowed to return input tensors as-is. You must explic...
PyTorch 1.12.1 Release, small bug fix release
This release is meant to fix the following issues (regressions / silent correctness):
Optim
- Remove overly restrictive assert in adam #80222
Autograd
- Convolution forward over reverse internal asserts in specific case #81111
- 25% Performance regression from v0.1.1 to 0.2.0 when calculating hessian #82504
Distributed
- Fix distributed store to use add for the counter of DL shared seed #80348
- Raise proper timeout when sharing the distributed shared seed #81666
NN
- Allow register float16 weight_norm on cpu and speed up test #80600
- Fix weight norm backward bug on CPU when OMP_NUM_THREADS <= 2 #80930
- Weight_norm is not working with float16 #80599
- New release breaks torch.nn.weight_norm backwards pass and breaks all Wav2Vec2 implementations #80569
- Disable src mask for transformer and multiheadattention fastpath #81277
- Make nn.stateless correctly reset parameters if the forward pass fails #81262
- torchvision.transforms.functional.rgb_to_grayscale() + torch.nn.Conv2d() don`t work on 1080 GPU #81106
- Transformer and CPU path with src_mask raises error with torch 1.12 #81129
Data Loader
- [Locking lower ranks seed recepients https://github.com//pull/81071
CUDA
- os.environ["CUDA_VISIBLE_DEVICES"] has no effect #80876
- share_memory() on CUDA tensors no longer no-ops and instead crashes #80733
- [Prims] Unbreak CUDA lazy init #80899
- PyTorch 1.12 cu113 wheels cudnn discoverability issue #80637
- Remove overly restrictive checks for cudagraph #80881
ONNX
- ONNX cherry picks #82435
MPS
- MPS cherry picks #80898
Other
- Don't error if _warned_capturable_if_run_uncaptured not set #80345
- Initializing libiomp5.dylib, but found libomp.dylib already initialized. #78490
- Assertion error - _dl_shared_seed_recv_cnt - pt 1.12 - multi node #80845
- Add 3.10 stdlib to torch.package #81261
- CPU-only c++ extension libraries (functorch, torchtext) built against PyTorch wheels are not fully compatible with PyTorch wheels #80489
PyTorch 1.12: TorchArrow, Functional API for Modules and nvFuser, are now available
PyTorch 1.12 Release Notes
- Highlights
- Backwards Incompatible Change
- New Features
- Improvements
- Performance
- Documentation
Highlights
We are excited to announce the release of PyTorch 1.12! This release is composed of over 3124 commits, 433 contributors. Along with 1.12, we are releasing beta versions of AWS S3 Integration, PyTorch Vision Models on Channels Last on CPU, Empowering PyTorch on Intel® Xeon® Scalable processors with Bfloat16 and FSDP API. We want to sincerely thank our dedicated community for your contributions.
Summary:
- Functional Module API to functionally apply module computation with a given set of parameters
- Complex32 and Complex Convolutions in PyTorch
- DataPipes from TorchData fully backward compatible with DataLoader
- Functorch with improved coverage for APIs
- nvFuser a deep learning compiler for PyTorch
- Changes to float32 matrix multiplication precision on Ampere and later CUDA hardware
- TorchArrow, a new beta library for machine learning preprocessing over batch data
Backwards Incompatible changes
Python API
Updated type promotion for torch.clamp
(#77035)
In 1.11, the ‘min’ and ‘max’ arguments in torch.clamp
did not participate in type promotion, which made it inconsistent with minimum
and maximum
operations. In 1.12, the ‘min’ and ‘max’ arguments participate in type promotion.
1.11
>>> import torch
>>> a = torch.tensor([1., 2., 3., 4.], dtype=torch.float32)
>>> b = torch.tensor([2., 2., 2., 2.], dtype=torch.float64)
>>> c = torch.tensor([3., 3., 3., 3.], dtype=torch.float64)
>>> torch.clamp(a, b, c).dtype
torch.float32
1.12
>>> import torch
>>> a = torch.tensor([1., 2., 3., 4.], dtype=torch.float32)
>>> b = torch.tensor([2., 2., 2., 2.], dtype=torch.float64)
>>> c = torch.tensor([3., 3., 3., 3.], dtype=torch.float64)
>>> torch.clamp(a, b, c).dtype
torch.float64
Complex Numbers
Fix complex type promotion (#77524)
Updates the type promotion rule such that given a complex scalar and real tensor, the value type of real tensor is preserved
1.11
>>> a = torch.randn((2, 2), dtype=torch.float)
>>> b = torch.tensor(1, dtype=torch.cdouble)
>>> (a + b).dtype
torch.complex128
1.12
>>> a = torch.randn((2, 2), dtype=torch.float)
>>> b = torch.tensor(1, dtype=torch.cdouble)
>>> (a + b).dtype
torch.complex64
LinAlg
Disable TF32 for matmul by default and add high-level control of fp32 matmul precision (#76509)
PyTorch 1.12 makes the default math mode for fp32 matrix multiplications more precise and consistent across hardware. This may affect users on Ampere or later CUDA devices and TPUs. See the PyTorch blog for more details.
Sparse
Use ScatterGatherKernel for scatter_reduce (CPU-only) (#74226, #74608)
In 1.11.0, unlike scatter
which takes a reduce
kwarg or scatter_add
, scatter_reduce
was not an in-place function. That is, it did not allow the user to pass an output tensor which contains data that is reduced together with the scattered data. Instead, the scatter reduction took place on an output tensor initialized under the hood. Indices of the output that were not scattered to were filled with reduction inits (or 0 for options ‘amin’ and ‘amax’).
In 1.12.0, scatter_reduce
(which is in beta) is in-place to align with the API of the related existing functions scatter
/scatter_add
. For this reason, the argument input
in 1.11.0 has been renamed src
in 1.12.0 and the new self
argument now takes a destination tensor to be scattered onto. Since the destination tensor is no longer initialized under the hood, the output_size
kwarg in 1.11.0 that allowed users to specify the size of the output at dimension dim
has been removed. Further, in 1.12.0 we introduce an include_self
kwarg which determines whether values in the self
(destination) tensor are included in the reduction. Setting include_self=True
could, for example, allow users to provide special reduction inits for the scatter_reduction operation. Otherwise, if include_self=False,
indices scattered to are treated as if they were filled with reduction inits.
In the snippet below, we illustrate how the behavior of scatter_reduce
in 1.11.0 can be achieved with the function released in 1.12.0.
Example:
>>> src = torch.arange(6, dtype=torch.float).reshape(3, 2)
>>> index = torch.tensor([[0, 2], [1, 1], [0, 0]])
>>> dim = 1
>>> output_size = 4
>>> reduce = "prod"
1.11
>>> torch.scatter_reduce(src, dim, index, reduce, output_size=output_size)
`tensor([[ 0., 1., 1., 1.],
[ 1., 6., 1., 1.],
[20., 1., 1., 1.]])`
1.12
>>> output_shape = list(src.shape)
>>> output_shape[dim] = output_size
# reduction init for prod is 1
# filling the output with 1 is only necessary if the user wants to preserve the behavior in 1.11
# where indices not scattered to are filled with reduction inits
>>> output = src.new_empty(output_shape).fill_(1)
>>> output.scatter_reduce_(dim, index, src, reduce)
`tensor([[ 0., 1., 1., 1.],
[ 1., 6., 1., 1.],
[20., 1., 1., 1.]])`
torch.nn
nn.GroupNorm
: Report an error if num_channels
is not divisible by num_groups
(#74293)
Previously, nn.GroupNorm
would error out during the forward pass if num_channels
is not divisible by num_groups
. Now, the error is thrown for this case during module construction instead.
1.11
m = torch.nn.GroupNorm(3, 7)
m(...) # errors during forward pass
1.12
m = torch.nn.GroupNorm(3, 7) # errors during construction
nn.Dropout2d
: Return to 1.10 behavior: perform 1D channel-wise dropout for 3D inputs
In PyTorch 1.10 and older, passing a 3D input to nn.Dropout2D
resulted in 1D channel-wise dropout behavior; i.e. such inputs were interpreted as having shape (N, C, L)
with N = batch size and C = # channels and channel-wise dropout was performed along the second dimension.
1.10
x = torch.randn(2, 3, 4)
m = nn.Dropout2d(p=0.5)
out = m(x) # input is assumed to be shape (N, C, L); dropout along the second dim.
With the introduction of no-batch-dim input support in 1.11, 3D inputs were reinterpreted as having shape (C, H, W)
; i.e. an input without a batch dimension, and dropout behavior was changed to drop along the first dimension. This was a silent breaking change.
1.11
x = torch.randn(2, 3, 4)
m = nn.Dropout2d(p=0.5)
out = m(x) # input is assumed to be shape (C, H, W); dropout along the first dim.
The breaking change in 1.11 resulted in a lack of support for 1D channel-wise dropout behavior, so Dropout2d
in PyTorch 1.12 returns to 1.10 behavior with a warning to give some time to adapt before the no-batch-dim interpretation goes back into effect.
1.12
x = torch.randn(2, 3, 4)
m = nn.Dropout2d(p=0.5)
out = m(x) # input is assumed to be shape (N, C, L); dropout along the second dim.
# throws a warning suggesting nn.Dropout1d for 1D channel-wise dropout.
If you want 1D channel-wise dropout behavior, please switch to use of the newly-added nn.Dropout1d
module instead of nn.Dropout2d
. If you want no-batch-dim input behavior, please note that while this is not supported in 1.12, a future release will reinstate the interpretation of 3D inputs to nn.Dropout2d
as those without a batch dimension.
F.cosine_similarity
: Improve numerical stability (#31378)
Previously, we first compute the inner product, then normalize. After this change, we first normalize, then compute inner product. This should be more numerically stable because it avoids losing precision in inner product for inputs with large norms. Because of this change, outputs may be different in some cases.
Composability
Functions in torch.ops.aten.{foo} no longer accept self
as a kwarg
torch.ops.aten.{foo}
objects are now instances of OpOverloadPacket
(instead of a function) that have their __call__
method in Python, which means that you cannot pass self
as a kwarg. You can pass it normally as a positional argument instead.
1.11
>>> torch.ops.aten.sin(self=torch.ones(2))
tensor([0.8415, 0.8415])
1.12
# this now fails
>>> torch.ops.aten.sin(self=torch.ones(2))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: __call__() got multiple values for argument 'self'
# this works
>>> torch.ops.aten.sin(torch.ones(2))
tensor([0.8415, 0.8415])
torch_dispatch now traces individual op overloads instead of op overload packets (#72673)
torch.ops.aten.add
actually corresponds to a bundle of functions from C++, corresponding to all over the overloads of add operator (specifically, add.Tensor
, add.Scalar
and add.out
). Now, __torch_dispatch__
will directly take in an overload corresponding to a single aten function.
1.11
class MyTensor(torch.Tensor):
....
def __torch_dispatch__(cls, func, types, args=(), kwargs=None):
# Before, func refers to a "packet" of all overloads
# for a given operator, e.g. "add"
assert func == torch.ops.aten.add
1.12
class MyTensor(torch.Tensor):
....
def __torch_dispatch__(cls, func, types, args=(), kwargs=No...
PyTorch 1.11, TorchData, and functorch are now available
PyTorch 1.11 Release Notes
- Highlights
- Backwards Incompatible Change
- New Features
- Improvements
- Performance
- Documentation
Highlights
We are excited to announce the release of PyTorch 1.11. This release is composed of over 3,300 commits since 1.10, made by 434 contributors. Along with 1.11, we are releasing beta versions of TorchData and functorch. We want to sincerely thank our community for continuously improving PyTorch.
- TorchData is a new library for common modular data loading primitives for easily constructing flexible and performant data pipelines. View it on GitHub.
- functorch, a library that adds composable function transforms to PyTorch, is now available in beta. View it on GitHub.
- Distributed Data Parallel (DDP) static graph optimizations available in stable.
You can check the blogpost that shows the new features here.
Backwards Incompatible changes
Python API
Fixed python deepcopy
to correctly copy all attributes on Tensor
objects (#65584)
This change ensures that the deepcopy
operation on Tensor properly copies all the attributes (and not just the plain Tensor properties).
1.10.2 | 1.11.0 |
---|---|
a = torch.rand(2)
a.foo = 3
torch.save(a, "bar")
b = torch.load("bar")
print(b.foo)
# Raise AttributeError: "Tensor" object has no attribute "foo"
|
a = torch.rand(2)
a.foo = 3
torch.save(a, "bar")
b = torch.load("bar")
print(b.foo)
# 3
|
steps
argument is no longer optional in torch.linspace
and torch.logspace
This argument used to default to 100 in PyTorch 1.10.2, but was deprecated (previously you would see a deprecation warning if you didn’t explicitly pass in steps
). In PyTorch 1.11, it is not longer optional.
1.10.2 | 1.11.0 |
---|---|
# Works, but raises a deprecation warning
# Steps defaults to 100
a = torch.linspace(1, 10)
# UserWarning: Not providing a value for linspace's steps is deprecated
# and will throw a runtime error in a future release.
# This warning will appear only once per process.
# (Triggered internally at ../aten/src/ATen/native/RangeFactories.cpp:19
|
# In 1.11, you must specify steps
a = torch.linspace(1, 10, steps=100)
|
Remove torch.hub.import_module
function that was mistakenly public (#67990)
This function is not intended for public use.
If you have existing code that relies on it, you can find an equivalent function at torch.hub._import_module
.
C++ API
We’ve cleaned up many of the headers in the C++ frontend to only include the subset of aten
operators that they actually used (#68247, #68687, #68688, #68714, #68689, #68690, #68697, #68691, #68692, #68693, #69840)
When you #include
a header from the C++ frontend, you can no longer assume that every aten
operators are transitively included. You can work around this by directly adding #include <ATen/ATen.h>
in your file, which will maintain the old behavior of including every aten
operators.
Custom implementation for c10::List
and c10::Dict
move constructors have been removed (#69370)
The semantics have changed from "make the moved-from List/Dict empty" to "keep the moved-from List/Dict unchanged"
1.10.2 | 1.11.0 |
---|---|
c10::List list1({"3", "4"});
c10::List list2(std::move(list1));
std::cout << list1.size() // 0
|
c10::List list1({"3", "4"});
c10::List list2(std::move(list1)); // calls copy ctr
std::cout << list1.size() // 2
|
CUDA
Removed THCeilDiv
function and corresponding THC/THCDeviceUtils.cuh
header (#65472)
As part of cleaning up TH
from the codebase, the THCeilDiv
function has been removed. Instead, please use at::ceil_div
, and include the corresponding ATen/ceil_div.h
header
Removed THCudaCheck
(#66391)
You can replace it with C10_CUDA_CHECK
, which has been available since at least PyTorch 1.4, so just replacing is enough even if you support older versions
Removed THCudaMalloc()
, THCudaFree()
, THCThrustAllocator.cuh
(#65492)
If your extension is using THCThrustAllocator.cuh
, please replace it with ATen/cuda/ThrustAllocator.h
and corresponding APIs (see examples in this PR).
This PR also removes THCudaMalloc/THCudaFree
calls. Please use c10::cuda::CUDACachingAllocator::raw_alloc(size)/raw_delete(ptr)
, or, preferably, switch to c10:cuda::CUDaCachingAllocator::allocate
which manages deallocation. Caching allocator APIs are available since PyTorch 1.2, so just replacing it is enough even if you support older versions of PyTorch.
Build
Stopped building shared library for AOT Compiler, libaot_compiler.so
(#66227)
Building aot_compiler.cpp
as a separate library is not necessary, as it’s already included in libtorch.so
.
You can update your build system to only dynamically link libtorch.so
.
Mobile
Make typing.Union
type unsupported for mobile builds (#65556)
typing.Union
support was added for TorchScript in 1.10. It was removed specifically for mobile due to its lack of use and increase in binary size of PyTorch for Mobile builds.
Distributed
torch.distributed.rpc
: Final Removal of ProcessGroup RPC backend (#67363)
ProcessGroup RPC backend is deprecated. In 1.10, it threw an error to help users update their code, and, in 1.11, it is removed completely.
The backend type “PROCESS_GROUP” is now deprecated, e.g.
torch.distributed.rpc.init_rpc("worker0", backend="PROCESS_GROUP", rank=0, world_size=1)
and should be replaced with:
torch.distributed.rpc.init_rpc("worker0", backend="TENSORPIPE", rank=0, world_size=1)
Quantization
Disabled the support for getitem
in FX Graph Mode Quantization (#66647)
getitem
used to be quantized in FX Graph Mode Quantization
, and it is no longer quantized. This won’t break any models but could result in a slight difference in numerics.
1.10.2 | 1.11.0 |
---|---|
from torch.ao.quantization.quantize_fx import convert_fx, prepare_fx
class M(torch.nn.Module):
def __init__(self):
super().__init__()
self.linear = torch.nn.Linear(5, 5)
def forward(self, x):
x = self.linear(x)
y = torch.stack([x], 0)
return y[0]
m = M().eval()
m = prepare_fx(m, {"": torch.ao.quantization.default_qconfig})
m = convert_fx(m)
print(m)
# prints
# GraphModule(
# (linear): QuantizedLinear(in_features=5, out_features=5,
# scale=1.0, zero_point=0, qscheme=torch.per_tensor_affine)
# )
# def forward(self, x):
# linear_input_scale_0 = self.linear_input_scale_0
# linear_input_zero_point_0 = self.linear_input_zero_point_0
# quantize_per_tensor = torch.quantize_per_tensor(x,
# linear_input_scale_0, linear_input_zero_point_0, torch.quint8)
# x = linear_input_scale_0 = linear_input_zero_point_0 = None
# linear = self.linear(quantize_per_tensor)
# quantize_per_tensor = None
# stack = torch.stack([linear], 0); linear = None
# getitem = stack[0]; stack = None
# dequantize_2 = getitem.dequantize(); getitem = None
# return getitem
|
from torch.ao.quantization.quantize_fx import convert_fx, prepare_fx
class M(torch.nn.Module):
def __init__(self):
super().__init__()
self.linear = torch.nn.Linear(5, 5)
def forward(self, x):
x = self.linear(x)
y = torch.stack([x], 0)
return y[0]
m = M().eval()
m = prepare_fx(m, {"": torch.ao.quantization.default_qconfig})
m = convert_fx(m)
print(m)
# prints
# GraphModule(
# (linear): QuantizedLinear(in_features=5, out_features=5, scale=1.0,
zero_point=0, qscheme=torch.per_tensor_affine)
# )
# def forward(self, x):
# linear_input_scale_0 = self.linear_input_scale_0
# linear_input_zero_point_0 = self.linear_input_zero_point_0
# quantize_per_tensor = tor... |
PyTorch 1.10.2 Release, small bug fix release
PyTorch 1.10.1 Release, small bug fix release
This release is meant to fix the following issues (regressions / silent correctness):
- torch.nn.cross_entropy silently incorrect in PyTorch 1.10 on CUDA on non-contiguous inputs #67167
- channels_last significantly degrades accuracy #67239
- Potential strict aliasing rule violation in bitwise_binary_op (on ARM/NEON) #66119
- torch.get_autocast_cpu_dtype() returns a new dtype #65786
- Conv2d grad bias gets wrong value for bfloat16 case #68048
The release tracker should contain all relevant pull requests related to this release as well as links to related issues
PyTorch 1.10 Release, including CUDA Graphs APIs, Frontend and compiler improvements
1.10.0 Release Notes
- Highlights
- Backwards Incompatible Change
- New Features
- Improvements
- Performance
- Documentation
Highlights
We are excited to announce the release of PyTorch 1.10. This release is composed of over 3,400 commits since 1.9, made by 426 contributors. We want to sincerely thank our community for continuously improving PyTorch.
PyTorch 1.10 updates are focused on improving training and performance of PyTorch, and developer usability. Highlights include:
- CUDA Graphs APIs are integrated to reduce CPU overheads for CUDA workloads.
- Several frontend APIs such as FX,
torch.special
, andnn.Module
Parametrization, have moved from beta to stable. - Support for automatic fusion in JIT Compiler expands to CPUs in addition to GPUs.
- Android NNAPI support is now available in beta.
You can check the blogpost that shows the new features here.
Backwards Incompatible changes
Python API
torch.any
/torch.all
behavior changed slightly to be more consistent for zero-dimension, uint8
tensors. (#64642)
These two functions match the behavior of NumPy, returning an output dtype of bool for all support dtypes, except for uint8
(in which case they return a 1 or a 0, but with uint8
dtype). In some cases with 0-dim tensor inputs, the returned uint8
value could mistakenly take on a value > 1. This has now been fixed.
1.9.1 | 1.10.0 |
---|---|
>>> torch.all(torch.tensor(42, dtype=torch.uint8))
tensor(1, dtype=torch.uint8)
>>> torch.all(torch.tensor(42, dtype=torch.uint8), dim=0)
tensor(42, dtype=torch.uint8) # wrong, old behavior
|
>>> torch.all(torch.tensor(42, dtype=torch.uint8))
tensor(1, dtype=torch.uint8)
>>> torch.all(torch.tensor(42, dtype=torch.uint8), dim=0)
tensor(1, dtype=torch.uint8) # new, corrected and consistent behavior
|
Remove deprecated torch.{is,set}_deterministic
(#62158)
This is the end of the deprecation cycle for both of these functions. You should be using torch.use_deterministic_algorithms
andtorch.are_deterministic_algorithms_enabled
instead.
Complex Numbers
Conjugate View: tensor.conj()
now returns a view tensor that aliases the same memory and has conjugate bit set (#54987, #60522, #66082, #63602).
This means that .conj()
is now an O(1) operation and returns a tensor that views the same memory as tensor
and has conjugate bit set. This notion of conjugate bit enables fusion of operations with conjugation which gives a lot of performance benefit for operations like matrix multiplication. All out-of-place operations will have the same behavior as before, but an in-place operation on a conjugated tensor will additionally modify the input tensor.
1.9.1 | 1.10.0 |
---|---|
>>> import torch
>>> x = torch.tensor([1+2j])
>>> y = x.conj()
>>> y.add_(2)
>>> print(x)
tensor([1.+2.j])
|
>>> import torch
>>> x = torch.tensor([1+2j])
>>> y = x.conj()
>>> y.add_(2)
>>> print(x)
tensor([3.+2.j])
|
Note: You can verify if the conj bit is set by calling tensor.is_conj()
. The conjugation can be resolved, i.e., you can obtain a new tensor that doesn’t share storage with the input tensor at any time by calling conjugated_tensor.clone()
or conjugated_tensor.resolve_conj()
.
Note that these conjugated tensors behave differently from the corresponding numpy arrays obtained from np.conj()
when an in-place operation is performed on them (similar to the example shown above).
Negative View: tensor.conj().neg()
returns a view tensor that aliases the same memory as both tensor and tensor.conj()
and has a negative bit set (#56058).
conjugated_tensor.neg()
continues to be an O(1) operation, but the returned tensor shares memory with both tensor
and conjugated_tensor
.
1.9.1 | 1.10.0 |
---|---|
>>> x = torch.tensor([1+2j])
>>> y = x.conj()
>>> z = y.imag
>>> z.add_(2)
>>> print(x)
tensor([1.+2.j])
|
>>> x = torch.tensor([1+2j])
>>> y = x.conj()
>>> z = y.imag
>>> print(z.is_neg())
True
>>> z.add_(2)
>>> print(x)
tensor([1.-0.j])
|
tensor.numpy()
now throws RuntimeError
when called on a tensor with conjugate or negative bit set (#61925).
Because the notion of conjugate bit and negative bit doesn’t exist outside of PyTorch, calling operations that return a Python object viewing the same memory as input like .numpy()
would no longer work for tensors with conjugate or negative bit set.
1.9.1 | 1.10.0 |
---|---|
>>> x = torch.tensor([1+2j])
>>> y = x.conj().imag
>>> print(y.numpy())
[2.]
|
>>> x = torch.tensor([1+2j])
>>> y = x.conj().imag
>>> print(y.numpy())
RuntimeError: Can't call numpy() on Tensor that has negative
bit set. Use tensor.resolve_neg().numpy() instead.
|
Autograd
Raise TypeError
instead of RuntimeError
when assigning to a Tensor’s grad field with wrong type (#64876)
Setting the .grad
field with a non-None and non-Tensor object used to return a RuntimeError
but it now properly returns a TypeError
. If your code was catching this error, you should simply update it to catch a TypeError
instead of a RuntimeError
.
1.9.1 | 1.10.0 |
---|---|
try:
# Assigning an int to a Tensor's grad field
a.grad = 0
except RuntimeError as e:
pass
|
try:
a.grad = 0
except TypeError as e:
pass
|
Raise error when inputs to autograd.grad
are empty (#52016)
Calling autograd.grad
with an empty list of inputs used to do the same as backward. To reduce confusion, it now raises the expected error. If you were relying on this, you can simply update your code as follows:
1.9.1 | 1.10.0 |
---|---|
grad = autograd.grad(out, tuple())
assert grad == tuple()
|
out.backward()
|
Optional arguments to autograd.gradcheck
and autograd.gradgradcheck
are now kwarg-only (#65290)
These two functions now have a significant number of optional arguments controlling what they do (i.e., eps
, atol
, rtol
, raise_exception
, etc.). To improve readability, we made these arguments kwarg-only. If you are passing these arguments to autograd.gradcheck
or autograd.gradgradcheck
as positional arguments, you can update your code as follows:
1.9.1 | 1.10.0 |
---|---|
torch.autograd.gradcheck(fn, x, 1e-6)
|
torch.autograd.gradcheck(fn, x, eps=1e-6)
|
In-place detach (detach_
) now errors for views that return multiple outputs (#58285)
This change is finishing the deprecation cycle for the inplace-over-view logic. In particular, a few things that were warning are updated:
* `detach_` will now raise an error when invoked on any view created by `split`, `split_with_sizes`, or `chunk`. You should use the non-inplace `detach` instead.
* The error message for when an in-place operation (that is not detach) is performed on a view created by `split`, `split_with_size`, and `chunk` has been changed from "This view is an output of a function..." to "This view is the output of a function...".
1.9.1 | 1.10.0 |
---|---|
b = a.split(1)[0]
b.detach_()
|
b = a.split(1)[0]
c = b.detach()
|
Fix saved variable unpacking version counter (#60195)
In-place on the unpacked SavedVariables used to be ignored. They are now properly detected which can lead to errors saying that a variable needed for backward was modified in-place.
This is a valid error and the ...