meta-pytorch · jbschlosser · Jul 30, 2025 · Jul 30, 2025 · Jul 30, 2025 · Jul 30, 2025
diff --git a/2.8.0/final.md b/2.8.0/final.md
@@ -24,6 +24,24 @@
    <td>TODO
    </td>
   </tr>
+  <tr>
+   <td>
+   </td>
+   <td>torch::stable::Tensor
+   </td>
+  </tr>
+  <tr>
+   <td>
+   </td>
+   <td>Hierarchical compilation with torch.compile
+   </td>
+  </tr>
+  <tr>
+   <td>
+   </td>
+   <td>Support for Intel GPU distributed backend (XCCL)
+   </td>
+  </tr>
 </table>
 
 For more details about these highlighted features, you can look at the release blogpost.
@@ -369,7 +387,7 @@ Note that PT2E quantization has been migrated to `torchao` (https://github.com/p
 - Added config variable `aot_inductor.model_name_for_generated_files` for specifying model name ([#154129](https://github.com/pytorch/pytorch/pull/154129))
 
 ## MPS
-- `MPSInductor`: `torch.compile` for Apple GPUs ([#150121](https://github.com/pytorch/pytorch/issues/150121))
+- `MPSInductor`: `torch.compile` for Apple GPUs ([#150121](https://github.com/pytorch/pytorch/issues/150121), [#149342](https://github.com/pytorch/pytorch/pull/149342), [#151449](https://github.com/pytorch/pytorch/pull/151449), [#151754](https://github.com/pytorch/pytorch/pull/151754), [#149687](https://github.com/pytorch/pytorch/pull/149687), [#149180](https://github.com/pytorch/pytorch/pull/149180), [#149221](https://github.com/pytorch/pytorch/pull/149221), [#153598](https://github.com/pytorch/pytorch/pull/153598), [#152788](https://github.com/pytorch/pytorch/pull/152788), [#153787](https://github.com/pytorch/pytorch/pull/153787), [#152214](https://github.com/pytorch/pytorch/pull/152214), [#151152](https://github.com/pytorch/pytorch/pull/151152), [#155891](https://github.com/pytorch/pytorch/pull/155891), [#154578](https://github.com/pytorch/pytorch/pull/154578), [#151272](https://github.com/pytorch/pytorch/pull/151272), [#151288](https://github.com/pytorch/pytorch/pull/151288), [#153997](https://github.com/pytorch/pytorch/pull/153997), [#151871](https://github.com/pytorch/pytorch/pull/151871), [#153362](https://github.com/pytorch/pytorch/pull/153362), [#156566](https://github.com/pytorch/pytorch/pull/156566), [#150661](https://github.com/pytorch/pytorch/pull/150661), [#153582](https://github.com/pytorch/pytorch/pull/153582))
 
 ## Python Frontend
 - Added Generalized Pareto Distribution (GPD) ([#135968](https://github.com/pytorch/pytorch/pull/135968))
@@ -537,15 +555,11 @@ Note that PT2E quantization has been migrated to `torchao` (https://github.com/p
 - Add tensor overlap check for `cross` ([#154999](https://github.com/pytorch/pytorch/pull/154999))
 
 ## MPS
-- Added support for operations: `i0e`, `i1e,` `torch.special.bessel_[jy][01], modified_bessel_i1, bicubic2d_aa, modified_bessel_k0, modified_bessel_k1, scaled_modified_bessel_k0, scaled_modified_bessel_k1, nanmedian, hermite_polynomial_h, hermite_polynomial_he, rsub, index_copy, hardshrink, upsample_trilinear, erfc, isin_Scalar_Tensor, isin_Tensor_Scalar, chebyshev_polynomial_t, col2im, nearest_3d, chebyshev_polynomial_[uvw]` ([#149174](https://github.com/pytorch/pytorch/pull/149174), [#149203](https://github.com/pytorch/pytorch/pull/149203) [#149123](https://github.com/pytorch/pytorch/pull/149123), [#149368](https://github.com/pytorch/pytorch/pull/149368), [#149378](https://github.com/pytorch/pytorch/pull/149378), [#149563](https://github.com/pytorch/pytorch/pull/149563), [#149687](https://github.com/pytorch/pytorch/pull/149687), [#149705](https://github.com/pytorch/pytorch/pull/149705), [#149783](https://github.com/pytorch/pytorch/pull/149783), [#149407](https://github.com/pytorch/pytorch/pull/149407)/[#149680](https://github.com/pytorch/pytorch/pull/149680), [#150279](https://github.com/pytorch/pytorch/pull/150279), [#151754](https://github.com/pytorch/pytorch/pull/151754), [#153786](https://github.com/pytorch/pytorch/pull/153786), [#154326](https://github.com/pytorch/pytorch/pull/154326), [#155304](https://github.com/pytorch/pytorch/pull/155304), [#156263](https://github.com/pytorch/pytorch/pull/156263), [#155382](https://github.com/pytorch/pytorch/pull/155382), [#154010](https://github.com/pytorch/pytorch/pull/154010), [#149816](https://github.com/pytorch/pytorch/pull/149816), [#152282](https://github.com/pytorch/pytorch/pull/152282), [#156090](https://github.com/pytorch/pytorch/pull/156090), [#150060](https://github.com/pytorch/pytorch/pull/150060))
-- Added `MPSInductor` support for: `modified_bessel_i0, pow, log2, floorToInt, hermite_polynomial_he, modified_bessel_k1, i0e, i1e,`, numpy scalar handling ([#149342](https://github.com/pytorch/pytorch/pull/149342), [#151449](https://github.com/pytorch/pytorch/pull/151449), [#151754](https://github.com/pytorch/pytorch/pull/151754), [#149687](https://github.com/pytorch/pytorch/pull/149687), [#149180](https://github.com/pytorch/pytorch/pull/149180), [#149221](https://github.com/pytorch/pytorch/pull/149221), [#153598](https://github.com/pytorch/pytorch/pull/153598))
+- Added support for a number of `torch.special` operations as well as `index_copy`, `hardshrink`, `rsub`, `col2im`, and `isin` ([#149174](https://github.com/pytorch/pytorch/pull/149174), [#149203](https://github.com/pytorch/pytorch/pull/149203) [#149123](https://github.com/pytorch/pytorch/pull/149123), [#149368](https://github.com/pytorch/pytorch/pull/149368), [#149378](https://github.com/pytorch/pytorch/pull/149378), [#149563](https://github.com/pytorch/pytorch/pull/149563), [#149687](https://github.com/pytorch/pytorch/pull/149687), [#149705](https://github.com/pytorch/pytorch/pull/149705), [#149783](https://github.com/pytorch/pytorch/pull/149783), [#149407](https://github.com/pytorch/pytorch/pull/149407)/[#149680](https://github.com/pytorch/pytorch/pull/149680), [#150279](https://github.com/pytorch/pytorch/pull/150279), [#151754](https://github.com/pytorch/pytorch/pull/151754), [#153786](https://github.com/pytorch/pytorch/pull/153786), [#154326](https://github.com/pytorch/pytorch/pull/154326), [#155304](https://github.com/pytorch/pytorch/pull/155304), [#156263](https://github.com/pytorch/pytorch/pull/156263), [#155382](https://github.com/pytorch/pytorch/pull/155382), [#154010](https://github.com/pytorch/pytorch/pull/154010), [#149816](https://github.com/pytorch/pytorch/pull/149816), [#152282](https://github.com/pytorch/pytorch/pull/152282), [#156090](https://github.com/pytorch/pytorch/pull/156090), [#150060](https://github.com/pytorch/pytorch/pull/150060), [#151600](https://github.com/pytorch/pytorch/pull/151600), [#155002](https://github.com/pytorch/pytorch/pull/155002), [#154671](https://github.com/pytorch/pytorch/pull/154671))
 - Extended dtype support for:
   * `index_put` with half precision floats ([#151869](https://github.com/pytorch/pytorch/pull/151869))
   * `ConvTranspose3D` with FP32 and complex ([#154696](https://github.com/pytorch/pytorch/pull/154696))
-  * `index_copy` with complex dtypes ([#154671](https://github.com/pytorch/pytorch/pull/154671))
-  * `torch.special.*` with integer dtypes ([#155002](https://github.com/pytorch/pytorch/pull/155002))
   * `log1p` and `sigmoid` with int64 ([#151791](https://github.com/pytorch/pytorch/pull/151791))
-  * `isin` with mixed types ([#151600](https://github.com/pytorch/pytorch/pull/151600))
 - Compute activation kernels at float precision ([#155735](https://github.com/pytorch/pytorch/pull/155735))
 
 ## Nested Tensor (NJT)
@@ -723,36 +737,7 @@ Note that PT2E quantization has been migrated to `torchao` (https://github.com/p
 - Fixed 32-bit indexing overflows in `ReducedPrecisionGemV` ([#150949](https://github.com/pytorch/pytorch/pull/150949))
 
 ## MPS
-- Fixed `lerp` for complex numbers ([#152479](https://github.com/pytorch/pytorch/pull/152479))
-- Fixed unary/binary ops for `2**32`+ elem tensors ([#155183](https://github.com/pytorch/pytorch/pull/155183))
-- Fixed type promotion for `torch.floor_divide` ([#149233](https://github.com/pytorch/pytorch/pull/149233))
-- Fixed `where` ([#151176](https://github.com/pytorch/pytorch/pull/151176))
-- Fixed logit output for half/bfloat ([#151282](https://github.com/pytorch/pytorch/pull/151282))
-- Fixed `index_kernel` for large tensors ([#158239](https://github.com/pytorch/pytorch/pull/158239))
-- Fixed memory leak in SDPA for float32 ([#152371](https://github.com/pytorch/pytorch/pull/152371))
-- Fixed metal ops with different dtypes ([#149974](https://github.com/pytorch/pytorch/pull/149974))
-- Switch Cholesky decomp to column-wise ([#158237](https://github.com/pytorch/pytorch/pull/158237))
-- Fixed bug in 3D coords calculation affecting interpolation ([#156375](https://github.com/pytorch/pytorch/pull/156375))
-- Fixed float64 scalar tensor handling ([#153582](https://github.com/pytorch/pytorch/pull/153582))
-- Fixed crash when inverting matrix with `N>1024` ([#146754](https://github.com/pytorch/pytorch/pull/146754))
-- Made fused `rms_norm` traceable ([#150661](https://github.com/pytorch/pytorch/pull/150661))
-- Reimplement `tri[ul]` as Metal shaders ([#158867](https://github.com/pytorch/pytorch/pull/158867))
-- Fixed complex scalar binding to Metal tensors ([#155184](https://github.com/pytorch/pytorch/pull/155184))
-- Fixed ICE for `special.entr` bool instantiation on M1/M2 ([#152204](https://github.com/pytorch/pytorch/pull/152204))
-
-#### MPSInductor
-- Fixed `truncdiv` implementation ([#152788](https://github.com/pytorch/pytorch/pull/152788))
-- Fixed `conv_transpose` with `channels_last` ([#153787](https://github.com/pytorch/pytorch/pull/153787))
-- Fixed the approximation of `polygamma` for n \== 0\ ([#152214](https://github.com/pytorch/pytorch/pull/152214))
-- Fixed larger-than-threadgroup Welford reductions ([#151152](https://github.com/pytorch/pytorch/pull/151152))
-- Fixed remainder implementation for int types ([#155891](https://github.com/pytorch/pytorch/pull/155891))
-- Fixed codegen for nested multistage reductions in `MPSInductor` ([#154578](https://github.com/pytorch/pytorch/pull/154578))
-- Fixed silent correctness in bitcast ([#151272](https://github.com/pytorch/pytorch/pull/151272))
-- Adjusted convolution memory format detection ([#151288](https://github.com/pytorch/pytorch/pull/151288))
-- Fixed `MPSInductor` indexing calculation ([#153997](https://github.com/pytorch/pytorch/pull/153997))
-- Implemented `atomic_add` store mode ([#151871](https://github.com/pytorch/pytorch/pull/151871))
-- Fixed multistage reduction suffixes ([#153362](https://github.com/pytorch/pytorch/pull/153362))
-- Fixed nested loop var elimination ([#156566](https://github.com/pytorch/pytorch/pull/156566))
+- Fixed various op support issues: unary/binary ops with `2**32`+ element inputs, binary ops with inputs with different dtypes, ops with complex scalar inputs, `cholesky` decomp, `floor_divide` type promotion, `index_kernel` with large inputs, `lerp` with complex inputs, `logit` with half/bfloat16 inputs, SDPA memory leak, `torch.special.entr`, `tri[ul]`, matrix inversion with `N>1024`, and `where` with non-contiguous `cond` ([#152479](https://github.com/pytorch/pytorch/pull/152479), [#155183](https://github.com/pytorch/pytorch/pull/155183), [#149233](https://github.com/pytorch/pytorch/pull/149233), [#151176](https://github.com/pytorch/pytorch/pull/151176), [#151282](https://github.com/pytorch/pytorch/pull/151282), [#158239](https://github.com/pytorch/pytorch/pull/158239), [#152371](https://github.com/pytorch/pytorch/pull/152371), [#149974](https://github.com/pytorch/pytorch/pull/149974), [#158237](https://github.com/pytorch/pytorch/pull/158237), [#146754](https://github.com/pytorch/pytorch/pull/146754), [#158867](https://github.com/pytorch/pytorch/pull/158867), [#155184](https://github.com/pytorch/pytorch/pull/155184), [#152204](https://github.com/pytorch/pytorch/pull/152204))
 
 ## torch.nn
 - Fixed `load_state_dict` behavior for `nn.LazyLinear` ([#147599](https://github.com/pytorch/pytorch/pull/147599))
@@ -846,22 +831,21 @@ binary kernels, SDPA, `linear`, and `cumsum` / `cumprod` ([#152010](https://gith
 inputs, max pooling, multi-dimensional reductions, and non-vectorized elementwise kernels ([#149076](https://github.com/pytorch/pytorch/pull/149076), [#149779](https://github.com/pytorch/pytorch/pull/149779), [#149548](https://github.com/pytorch/pytorch/pull/149548), [#151230](https://github.com/pytorch/pytorch/pull/151230), [#152267](https://github.com/pytorch/pytorch/pull/152267), [#154522](https://github.com/pytorch/pytorch/pull/154522), [#154619](https://github.com/pytorch/pytorch/pull/154619), [#155806](https://github.com/pytorch/pytorch/pull/155806), [#153184](https://github.com/pytorch/pytorch/pull/153184))
 - Improved scatter add performance on MI250X ([#151724](https://github.com/pytorch/pytorch/pull/151724))
 - Extended vectorized elementwise kernel to more heterogenous tensor types ([#149738](https://github.com/pytorch/pytorch/pull/149738))
+- Use `HipSparseLT` to further accelerate semi-structured (e.g. 2:4) sparsity ([#150578](https://github.com/pytorch/pytorch/pull/150578))
 
 ## Sparse Frontend
-- Use HipSparseLT to further accelerate semi-structured (e.g. 2:4) sparsity on ROCm (AMD) ([#150578](https://github.com/pytorch/pytorch/pull/150578))
 - Skip sparse tensor invariant validation when loading sparse Tensors from external storage ([#154610](https://github.com/pytorch/pytorch/pull/154610), [#154759](https://github.com/pytorch/pytorch/pull/154759), [#154638](https://github.com/pytorch/pytorch/pull/154638))
 
 ## XPU
-- Enabled post-op fusion for oneDNN Conv on Intel GPU ([#150287](https://github.com/pytorch/pytorch/pull/150287))
+- Enabled post-op fusion for oneDNN convolution on Intel GPU ([#150287](https://github.com/pytorch/pytorch/pull/150287))
 - Reduced host overhead for Intel GPU by eliminating meaningless API calls ([#151111](https://github.com/pytorch/pytorch/pull/151111))
 - Improved INT4 WOQ GEMM for Intel GPU by introducing a cache mechanism to reduce the oneDNN integration overhead further ([#147693](https://github.com/pytorch/pytorch/pull/147693))
 - Improved scalar tensor case handling in `addmm`, `baddmm` to reduce oneDNN integration overhead on Intel GPU ([#153051](https://github.com/pytorch/pytorch/pull/153051))
 
 # Documentation
 ## Autograd
-- Updated docs of `torch.autograd.graph.saved_tensors_hooks` to avoid ref cycle ([#153049](https://github.com/pytorch/pytorch/pull/153049))
-- Mention that it's possible to set `debug=True` in `torch.utils.checkpoint.checkpoint` error messages ([#155593](https://github.com/pytorch/pytorch/pull/155593))
-- Added more details on why `ctx.save_for_backward` is important in extending autograd note ([#153005](https://github.com/pytorch/pytorch/pull/153005))
+- Added more details on why `ctx.save_for_backward` is important in note about extending autograd ([#153005](https://github.com/pytorch/pytorch/pull/153005))
+- Updated docs of `torch.autograd.graph.saved_tensors_hooks` to avoid refcycle ([#153049](https://github.com/pytorch/pytorch/pull/153049))
 - Updated gradient behavior note in `torch.amin` and `torch.amax` ([#155071](https://github.com/pytorch/pytorch/pull/155071))
 
 ## CUDA
@@ -975,7 +959,6 @@ inputs, max pooling, multi-dimensional reductions, and non-vectorized elementwis
 
 ## FX
 - Gracefully exit minimizer when there is no discrepancy in block mode ([#154076](https://github.com/pytorch/pytorch/pull/154076))
-- Add `__main__` guards to FX tests ([#154715](https://github.com/pytorch/pytorch/pull/154715))
 
 ## Optimizer
 - Improve decorator typing for Optimizer subclasses ([#153374](https://github.com/pytorch/pytorch/pull/153374))