Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 25 additions & 42 deletions 2.8.0/final.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,24 @@
<td>TODO
</td>
</tr>
<tr>
<td>
</td>
<td>torch::stable::Tensor
</td>
</tr>
<tr>
<td>
</td>
<td>Hierarchical compilation with torch.compile
</td>
</tr>
<tr>
<td>
</td>
<td>Support for Intel GPU distributed backend (XCCL)
</td>
</tr>
</table>

For more details about these highlighted features, you can look at the release blogpost.
Expand Down Expand Up @@ -369,7 +387,7 @@ Note that PT2E quantization has been migrated to `torchao` (https://github.com/p
- Added config variable `aot_inductor.model_name_for_generated_files` for specifying model name ([#154129](https://github.com/pytorch/pytorch/pull/154129))

## MPS
- `MPSInductor`: `torch.compile` for Apple GPUs ([#150121](https://github.com/pytorch/pytorch/issues/150121))
- `MPSInductor`: `torch.compile` for Apple GPUs ([#150121](https://github.com/pytorch/pytorch/issues/150121), [#149342](https://github.com/pytorch/pytorch/pull/149342), [#151449](https://github.com/pytorch/pytorch/pull/151449), [#151754](https://github.com/pytorch/pytorch/pull/151754), [#149687](https://github.com/pytorch/pytorch/pull/149687), [#149180](https://github.com/pytorch/pytorch/pull/149180), [#149221](https://github.com/pytorch/pytorch/pull/149221), [#153598](https://github.com/pytorch/pytorch/pull/153598), [#152788](https://github.com/pytorch/pytorch/pull/152788), [#153787](https://github.com/pytorch/pytorch/pull/153787), [#152214](https://github.com/pytorch/pytorch/pull/152214), [#151152](https://github.com/pytorch/pytorch/pull/151152), [#155891](https://github.com/pytorch/pytorch/pull/155891), [#154578](https://github.com/pytorch/pytorch/pull/154578), [#151272](https://github.com/pytorch/pytorch/pull/151272), [#151288](https://github.com/pytorch/pytorch/pull/151288), [#153997](https://github.com/pytorch/pytorch/pull/153997), [#151871](https://github.com/pytorch/pytorch/pull/151871), [#153362](https://github.com/pytorch/pytorch/pull/153362), [#156566](https://github.com/pytorch/pytorch/pull/156566), [#150661](https://github.com/pytorch/pytorch/pull/150661), [#153582](https://github.com/pytorch/pytorch/pull/153582))

## Python Frontend
- Added Generalized Pareto Distribution (GPD) ([#135968](https://github.com/pytorch/pytorch/pull/135968))
Expand Down Expand Up @@ -537,15 +555,11 @@ Note that PT2E quantization has been migrated to `torchao` (https://github.com/p
- Add tensor overlap check for `cross` ([#154999](https://github.com/pytorch/pytorch/pull/154999))

## MPS
- Added support for operations: `i0e`, `i1e,` `torch.special.bessel_[jy][01], modified_bessel_i1, bicubic2d_aa, modified_bessel_k0, modified_bessel_k1, scaled_modified_bessel_k0, scaled_modified_bessel_k1, nanmedian, hermite_polynomial_h, hermite_polynomial_he, rsub, index_copy, hardshrink, upsample_trilinear, erfc, isin_Scalar_Tensor, isin_Tensor_Scalar, chebyshev_polynomial_t, col2im, nearest_3d, chebyshev_polynomial_[uvw]` ([#149174](https://github.com/pytorch/pytorch/pull/149174), [#149203](https://github.com/pytorch/pytorch/pull/149203) [#149123](https://github.com/pytorch/pytorch/pull/149123), [#149368](https://github.com/pytorch/pytorch/pull/149368), [#149378](https://github.com/pytorch/pytorch/pull/149378), [#149563](https://github.com/pytorch/pytorch/pull/149563), [#149687](https://github.com/pytorch/pytorch/pull/149687), [#149705](https://github.com/pytorch/pytorch/pull/149705), [#149783](https://github.com/pytorch/pytorch/pull/149783), [#149407](https://github.com/pytorch/pytorch/pull/149407)/[#149680](https://github.com/pytorch/pytorch/pull/149680), [#150279](https://github.com/pytorch/pytorch/pull/150279), [#151754](https://github.com/pytorch/pytorch/pull/151754), [#153786](https://github.com/pytorch/pytorch/pull/153786), [#154326](https://github.com/pytorch/pytorch/pull/154326), [#155304](https://github.com/pytorch/pytorch/pull/155304), [#156263](https://github.com/pytorch/pytorch/pull/156263), [#155382](https://github.com/pytorch/pytorch/pull/155382), [#154010](https://github.com/pytorch/pytorch/pull/154010), [#149816](https://github.com/pytorch/pytorch/pull/149816), [#152282](https://github.com/pytorch/pytorch/pull/152282), [#156090](https://github.com/pytorch/pytorch/pull/156090), [#150060](https://github.com/pytorch/pytorch/pull/150060))
- Added `MPSInductor` support for: `modified_bessel_i0, pow, log2, floorToInt, hermite_polynomial_he, modified_bessel_k1, i0e, i1e,`, numpy scalar handling ([#149342](https://github.com/pytorch/pytorch/pull/149342), [#151449](https://github.com/pytorch/pytorch/pull/151449), [#151754](https://github.com/pytorch/pytorch/pull/151754), [#149687](https://github.com/pytorch/pytorch/pull/149687), [#149180](https://github.com/pytorch/pytorch/pull/149180), [#149221](https://github.com/pytorch/pytorch/pull/149221), [#153598](https://github.com/pytorch/pytorch/pull/153598))
- Added support for a number of `torch.special` operations as well as `index_copy`, `hardshrink`, `rsub`, `col2im`, and `isin` ([#149174](https://github.com/pytorch/pytorch/pull/149174), [#149203](https://github.com/pytorch/pytorch/pull/149203) [#149123](https://github.com/pytorch/pytorch/pull/149123), [#149368](https://github.com/pytorch/pytorch/pull/149368), [#149378](https://github.com/pytorch/pytorch/pull/149378), [#149563](https://github.com/pytorch/pytorch/pull/149563), [#149687](https://github.com/pytorch/pytorch/pull/149687), [#149705](https://github.com/pytorch/pytorch/pull/149705), [#149783](https://github.com/pytorch/pytorch/pull/149783), [#149407](https://github.com/pytorch/pytorch/pull/149407)/[#149680](https://github.com/pytorch/pytorch/pull/149680), [#150279](https://github.com/pytorch/pytorch/pull/150279), [#151754](https://github.com/pytorch/pytorch/pull/151754), [#153786](https://github.com/pytorch/pytorch/pull/153786), [#154326](https://github.com/pytorch/pytorch/pull/154326), [#155304](https://github.com/pytorch/pytorch/pull/155304), [#156263](https://github.com/pytorch/pytorch/pull/156263), [#155382](https://github.com/pytorch/pytorch/pull/155382), [#154010](https://github.com/pytorch/pytorch/pull/154010), [#149816](https://github.com/pytorch/pytorch/pull/149816), [#152282](https://github.com/pytorch/pytorch/pull/152282), [#156090](https://github.com/pytorch/pytorch/pull/156090), [#150060](https://github.com/pytorch/pytorch/pull/150060), [#151600](https://github.com/pytorch/pytorch/pull/151600), [#155002](https://github.com/pytorch/pytorch/pull/155002), [#154671](https://github.com/pytorch/pytorch/pull/154671))
- Extended dtype support for:
* `index_put` with half precision floats ([#151869](https://github.com/pytorch/pytorch/pull/151869))
* `ConvTranspose3D` with FP32 and complex ([#154696](https://github.com/pytorch/pytorch/pull/154696))
* `index_copy` with complex dtypes ([#154671](https://github.com/pytorch/pytorch/pull/154671))
* `torch.special.*` with integer dtypes ([#155002](https://github.com/pytorch/pytorch/pull/155002))
* `log1p` and `sigmoid` with int64 ([#151791](https://github.com/pytorch/pytorch/pull/151791))
* `isin` with mixed types ([#151600](https://github.com/pytorch/pytorch/pull/151600))
- Compute activation kernels at float precision ([#155735](https://github.com/pytorch/pytorch/pull/155735))

## Nested Tensor (NJT)
Expand Down Expand Up @@ -723,36 +737,7 @@ Note that PT2E quantization has been migrated to `torchao` (https://github.com/p
- Fixed 32-bit indexing overflows in `ReducedPrecisionGemV` ([#150949](https://github.com/pytorch/pytorch/pull/150949))

## MPS
- Fixed `lerp` for complex numbers ([#152479](https://github.com/pytorch/pytorch/pull/152479))
- Fixed unary/binary ops for `2**32`+ elem tensors ([#155183](https://github.com/pytorch/pytorch/pull/155183))
- Fixed type promotion for `torch.floor_divide` ([#149233](https://github.com/pytorch/pytorch/pull/149233))
- Fixed `where` ([#151176](https://github.com/pytorch/pytorch/pull/151176))
- Fixed logit output for half/bfloat ([#151282](https://github.com/pytorch/pytorch/pull/151282))
- Fixed `index_kernel` for large tensors ([#158239](https://github.com/pytorch/pytorch/pull/158239))
- Fixed memory leak in SDPA for float32 ([#152371](https://github.com/pytorch/pytorch/pull/152371))
- Fixed metal ops with different dtypes ([#149974](https://github.com/pytorch/pytorch/pull/149974))
- Switch Cholesky decomp to column-wise ([#158237](https://github.com/pytorch/pytorch/pull/158237))
- Fixed bug in 3D coords calculation affecting interpolation ([#156375](https://github.com/pytorch/pytorch/pull/156375))
- Fixed float64 scalar tensor handling ([#153582](https://github.com/pytorch/pytorch/pull/153582))
- Fixed crash when inverting matrix with `N>1024` ([#146754](https://github.com/pytorch/pytorch/pull/146754))
- Made fused `rms_norm` traceable ([#150661](https://github.com/pytorch/pytorch/pull/150661))
- Reimplement `tri[ul]` as Metal shaders ([#158867](https://github.com/pytorch/pytorch/pull/158867))
- Fixed complex scalar binding to Metal tensors ([#155184](https://github.com/pytorch/pytorch/pull/155184))
- Fixed ICE for `special.entr` bool instantiation on M1/M2 ([#152204](https://github.com/pytorch/pytorch/pull/152204))

#### MPSInductor
- Fixed `truncdiv` implementation ([#152788](https://github.com/pytorch/pytorch/pull/152788))
- Fixed `conv_transpose` with `channels_last` ([#153787](https://github.com/pytorch/pytorch/pull/153787))
- Fixed the approximation of `polygamma` for n \== 0\ ([#152214](https://github.com/pytorch/pytorch/pull/152214))
- Fixed larger-than-threadgroup Welford reductions ([#151152](https://github.com/pytorch/pytorch/pull/151152))
- Fixed remainder implementation for int types ([#155891](https://github.com/pytorch/pytorch/pull/155891))
- Fixed codegen for nested multistage reductions in `MPSInductor` ([#154578](https://github.com/pytorch/pytorch/pull/154578))
- Fixed silent correctness in bitcast ([#151272](https://github.com/pytorch/pytorch/pull/151272))
- Adjusted convolution memory format detection ([#151288](https://github.com/pytorch/pytorch/pull/151288))
- Fixed `MPSInductor` indexing calculation ([#153997](https://github.com/pytorch/pytorch/pull/153997))
- Implemented `atomic_add` store mode ([#151871](https://github.com/pytorch/pytorch/pull/151871))
- Fixed multistage reduction suffixes ([#153362](https://github.com/pytorch/pytorch/pull/153362))
- Fixed nested loop var elimination ([#156566](https://github.com/pytorch/pytorch/pull/156566))
- Fixed various op support issues: unary/binary ops with `2**32`+ element inputs, binary ops with inputs with different dtypes, ops with complex scalar inputs, `cholesky` decomp, `floor_divide` type promotion, `index_kernel` with large inputs, `lerp` with complex inputs, `logit` with half/bfloat16 inputs, SDPA memory leak, `torch.special.entr`, `tri[ul]`, matrix inversion with `N>1024`, and `where` with non-contiguous `cond` ([#152479](https://github.com/pytorch/pytorch/pull/152479), [#155183](https://github.com/pytorch/pytorch/pull/155183), [#149233](https://github.com/pytorch/pytorch/pull/149233), [#151176](https://github.com/pytorch/pytorch/pull/151176), [#151282](https://github.com/pytorch/pytorch/pull/151282), [#158239](https://github.com/pytorch/pytorch/pull/158239), [#152371](https://github.com/pytorch/pytorch/pull/152371), [#149974](https://github.com/pytorch/pytorch/pull/149974), [#158237](https://github.com/pytorch/pytorch/pull/158237), [#146754](https://github.com/pytorch/pytorch/pull/146754), [#158867](https://github.com/pytorch/pytorch/pull/158867), [#155184](https://github.com/pytorch/pytorch/pull/155184), [#152204](https://github.com/pytorch/pytorch/pull/152204))

## torch.nn
- Fixed `load_state_dict` behavior for `nn.LazyLinear` ([#147599](https://github.com/pytorch/pytorch/pull/147599))
Expand Down Expand Up @@ -846,22 +831,21 @@ binary kernels, SDPA, `linear`, and `cumsum` / `cumprod` ([#152010](https://gith
inputs, max pooling, multi-dimensional reductions, and non-vectorized elementwise kernels ([#149076](https://github.com/pytorch/pytorch/pull/149076), [#149779](https://github.com/pytorch/pytorch/pull/149779), [#149548](https://github.com/pytorch/pytorch/pull/149548), [#151230](https://github.com/pytorch/pytorch/pull/151230), [#152267](https://github.com/pytorch/pytorch/pull/152267), [#154522](https://github.com/pytorch/pytorch/pull/154522), [#154619](https://github.com/pytorch/pytorch/pull/154619), [#155806](https://github.com/pytorch/pytorch/pull/155806), [#153184](https://github.com/pytorch/pytorch/pull/153184))
- Improved scatter add performance on MI250X ([#151724](https://github.com/pytorch/pytorch/pull/151724))
- Extended vectorized elementwise kernel to more heterogenous tensor types ([#149738](https://github.com/pytorch/pytorch/pull/149738))
- Use `HipSparseLT` to further accelerate semi-structured (e.g. 2:4) sparsity ([#150578](https://github.com/pytorch/pytorch/pull/150578))

## Sparse Frontend
- Use HipSparseLT to further accelerate semi-structured (e.g. 2:4) sparsity on ROCm (AMD) ([#150578](https://github.com/pytorch/pytorch/pull/150578))
- Skip sparse tensor invariant validation when loading sparse Tensors from external storage ([#154610](https://github.com/pytorch/pytorch/pull/154610), [#154759](https://github.com/pytorch/pytorch/pull/154759), [#154638](https://github.com/pytorch/pytorch/pull/154638))

## XPU
- Enabled post-op fusion for oneDNN Conv on Intel GPU ([#150287](https://github.com/pytorch/pytorch/pull/150287))
- Enabled post-op fusion for oneDNN convolution on Intel GPU ([#150287](https://github.com/pytorch/pytorch/pull/150287))
- Reduced host overhead for Intel GPU by eliminating meaningless API calls ([#151111](https://github.com/pytorch/pytorch/pull/151111))
- Improved INT4 WOQ GEMM for Intel GPU by introducing a cache mechanism to reduce the oneDNN integration overhead further ([#147693](https://github.com/pytorch/pytorch/pull/147693))
- Improved scalar tensor case handling in `addmm`, `baddmm` to reduce oneDNN integration overhead on Intel GPU ([#153051](https://github.com/pytorch/pytorch/pull/153051))

# Documentation
## Autograd
- Updated docs of `torch.autograd.graph.saved_tensors_hooks` to avoid ref cycle ([#153049](https://github.com/pytorch/pytorch/pull/153049))
- Mention that it's possible to set `debug=True` in `torch.utils.checkpoint.checkpoint` error messages ([#155593](https://github.com/pytorch/pytorch/pull/155593))
- Added more details on why `ctx.save_for_backward` is important in extending autograd note ([#153005](https://github.com/pytorch/pytorch/pull/153005))
- Added more details on why `ctx.save_for_backward` is important in note about extending autograd ([#153005](https://github.com/pytorch/pytorch/pull/153005))
- Updated docs of `torch.autograd.graph.saved_tensors_hooks` to avoid refcycle ([#153049](https://github.com/pytorch/pytorch/pull/153049))
- Updated gradient behavior note in `torch.amin` and `torch.amax` ([#155071](https://github.com/pytorch/pytorch/pull/155071))

## CUDA
Expand Down Expand Up @@ -975,7 +959,6 @@ inputs, max pooling, multi-dimensional reductions, and non-vectorized elementwis

## FX
- Gracefully exit minimizer when there is no discrepancy in block mode ([#154076](https://github.com/pytorch/pytorch/pull/154076))
- Add `__main__` guards to FX tests ([#154715](https://github.com/pytorch/pytorch/pull/154715))

## Optimizer
- Improve decorator typing for Optimizer subclasses ([#153374](https://github.com/pytorch/pytorch/pull/153374))
Expand Down