Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add __torch_function__ for methods #37091

Closed
wants to merge 41 commits into from

Conversation

hameerabbasi
Copy link
Collaborator

@hameerabbasi hameerabbasi commented Apr 22, 2020

According to pytorch/rfcs#3

From the goals in the RFC:

  1. Support subclassing torch.Tensor in Python (done here)
  2. Preserve torch.Tensor subclasses when calling torch functions on them (done here)
  3. Use the PyTorch API with torch.Tensor-like objects that are not torch.Tensor
    subclasses (done in Add the __torch_function__ API override mechanism #30730)
  4. Preserve torch.Tensor subclasses when calling torch.Tensor methods. (done here)
  5. Propagating subclass instances correctly also with operators, using
    views/slices/indexing/etc. (done here)
  6. Preserve subclass attributes when using methods or views/slices/indexing. (done here)
  7. A way to insert code that operates on both functions and methods uniformly
    (so we can write a single function that overrides all operators). (done here)
  8. The ability to give external libraries a way to also define
    functions/methods that follow the __torch_function__ protocol. (will be addressed in a separate PR)

This PR makes the following changes:

  1. Adds the self argument to the arg parser.
  2. Dispatches on self as well if self is not nullptr.
  3. Adds a torch._C.DisableTorchFunction context manager to disable __torch_function__.
  4. Adds a torch::torch_function_enabled() and torch._C._torch_function_enabled() to check the state of __torch_function__.
  5. Dispatches all torch._C.TensorBase and torch.Tensor methods via __torch_function__.

TODO:

  • Sequence Methods
  • Docs
  • Tests

Closes #28361

Benchmarks in #37091 (comment)

@hameerabbasi hameerabbasi changed the title WIP Add __torch_function__ for methods Apr 22, 2020
@dr-ci
Copy link

dr-ci bot commented Apr 22, 2020

💊 CI failures summary and remediations

As of commit e715d90 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 442 times.

@mruberry mruberry requested a review from bhosmer April 22, 2020 18:26
@mruberry mruberry added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 22, 2020
@hameerabbasi hameerabbasi force-pushed the method-torch-function branch 5 times, most recently from 15151c1 to 62bcf35 Compare April 24, 2020 15:06
@hameerabbasi hameerabbasi force-pushed the method-torch-function branch 4 times, most recently from fa1aa96 to 067f892 Compare April 27, 2020 18:14
@hameerabbasi hameerabbasi force-pushed the method-torch-function branch 4 times, most recently from de8ab35 to b4285ff Compare April 29, 2020 14:24
@hameerabbasi hameerabbasi force-pushed the method-torch-function branch 7 times, most recently from 7e76252 to 95d4441 Compare May 1, 2020 15:59
@ezyang
Copy link
Contributor

ezyang commented Aug 3, 2020

Urrgh, there was a force push to the branch, this is going to make keeping the internal FB version in sync substantially more complicated (as I cannot just cherry pick the latest changes from open source)

@hameerabbasi
Copy link
Collaborator Author

Urrgh, there was a force push to the branch, this is going to make keeping the internal FB version in sync substantially more complicated (as I cannot just cherry pick the latest changes from open source)

I apologize, I only force-pushed the last commit: 25f37f3

@ezyang
Copy link
Contributor

ezyang commented Aug 3, 2020

@ezyang should we update the torch_function for https://pytorch.org/blog/pytorch-feature-classification-changes/? What would you say is the classification, Beta

I need to submit it to the internal classification process. But I would agree it's somewhere between prototype and beta; I think __torch_function__ as the original behavior clearly hits the bar for beta, but the new method functionality is much more prototype-y.

@ezyang
Copy link
Contributor

ezyang commented Aug 3, 2020


Aug 03 11:59:59 + python -c 'import torch; print(torch.__config__.show())'
Aug 03 11:59:59 Traceback (most recent call last):
Aug 03 11:59:59   File "<string>", line 1, in <module>
Aug 03 11:59:59   File "/opt/conda/lib/python3.8/site-packages/torch/__init__.py", line 526
Aug 03 11:59:59     quantized_gru = torch.ops.aten.quantized_gru§

test failures

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented Aug 4, 2020

mypy failing now

@hameerabbasi
Copy link
Collaborator Author

Any further action needed in this PR?

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented Aug 5, 2020

just tryin to land it 👍

@rgommers
Copy link
Collaborator

rgommers commented Aug 6, 2020

It landed! <does a little dance and crosses fingers>

Thanks @hameerabbasi and @ezyang!

hameerabbasi added a commit to hameerabbasi/pytorch that referenced this pull request Aug 10, 2020
facebook-github-bot pushed a commit that referenced this pull request Aug 12, 2020
Summary:
This is a follow-up PR for #37091, fixing some of the quirks of that PR as that one was landed early to avoid merge conflicts.

This PR addresses the following action items:

- [x] Use error-handling macros instead of a `try`-`catch`.
- [x] Renamed and added comments to clarify the use of `HANDLED_FUNCTIONS_WRAPPERS` in tests. `HANDLED_FUNCTIONS_NAMESPACES` was already removed in the last PR as we had a way to test for methods.

This PR does NOT address the following action item, as it proved to be difficult:

- [ ] Define `__module__`  for whole API.

Single-line repro-er for why this is hard:

```python
>>> torch.Tensor.grad.__get__.__module__ = "torch.Tensor.grad"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'method-wrapper' object has no attribute '__module__'
```

Explanation: Methods  defined in C/properties don't always have a `__dict__` attribute or a mutable `__module__` slot for us to modify.

The documentation action items were addressed in the following commit, with the additional future task of adding the rendered RFCs to the documentation: pytorch/rfcs@552ba37

Pull Request resolved: #42806

Reviewed By: smessmer

Differential Revision: D23031501

Pulled By: ezyang

fbshipit-source-id: b781c97f7840b8838ede50a0017b4327f96bc98a
clrpackages pushed a commit to clearlinux-pkgs/pytorch that referenced this pull request Feb 25, 2021
….7.1

AWSNB (1):
      Fix bugs in vec256_float_neon.h (#43321)

Aadesh (1):
      grammatical error fix (#43697)

Aayush Naik (1):
      Implement gcd, lcm (#40651)

Abdelrauf (1):
      Vec256 Test cases (#42685)

Abhinav Garlapati (1):
      Add SNPE deps for caffe2 benchmark android binary

Adam Simpkins (1):
      [caffe2] add type annotations for caffe2.distributed.python

Adam Teichert (1):
      fix issue #31759 (allow valid ASCII python identifiers as dimnames) (#40871)

Adam Thompson (1):
      Add complex tensor dtypes for the __cuda_array_interface__ spec (#42918)

Ailing (1):
      Keep manual_kernel_registration only effective in aten codegen. (#42386)

Ailing Zhang (22):
      Add documentation about storage sharing is preserved and serialized f… (#40412)
      Move install_torchvision to common.sh so that it can be sourced. (#40828)
      Check statstical diff rather than exact match for test_dropout_cuda. (#40883)
      Make resize_ use normal device dispatch (#42240)
      Remove redundant kernels calling TypeDefault in VariableType codegen. (#42031)
      Rename XLAPreAutograd to AutogradXLA. (#43047)
      Fix torch.hub for new zipfile format. (#42333)
      Revert D23335106: [quant][graphmode][fix] Fix insert quant dequant for observers without qparams
      Move Autograd to an alias dispatch key (#43070)
      Add tests against autograd precedence and multiple dispatch. (#44037)
      Expose alias key info in dumpState and update test_dispatch. (#44081)
      Resolve Autograd key for disable_variable_dispatch flag. (#44268)
      Check commutativity for computed dispatch table and add a test to check entries. (#44088)
      Update fallback kernel for Autograd keys. (#44349)
      Revert D23583017: move rebuild buckets from end of first iteration to beginning of second iteration
      Use iterator of DispatchKeySet. (#44682)
      Add alias dispatch key Math. (#44354)
      Support Math keyword in native_functions.yaml. (#44556)
      Align casing in test_dispatch with dispatch keys. (#44933)
      Update true_divide_out to use at::. (#45079)
      Resolve comments in #44354. (#45150)
      Move xla codegen to aten. (#45241)

Akash Patel (1):
      find rccl properly (#42072)

Akihiro Nitta (2):
      Fix exception chaining in `torch/` (#43836)
      Fix exception chaining in `test/` (#44193)

Akshit Khurana (1):
      Add mobile_optimized tag to optimized model. (#45479)

Alban Desmaison (6):
      Revert D22552377: [pytorch][PR] Reland split unsafe version
      fix backward compat (#41810)
      Revert D22790718: [pytorch][PR] Enables torch.full bool and integer type inference
      Revert D23242101: [pytorch][PR] Implement first draft of autograd benchmark.
      Revert D23385090: [quant][graphmode][fx] Add support for weight prepack folding
      Revert D23385091: [quant][graphmode][fx] Add top level APIs

Alex (1):
      fix scripts (#44464)

Alex Borcan (1):
      [BUILD] Guard '#pragma unroll' with COMPILING_FOR_MIN_SIZE

Alex Suhan (18):
      [TensorExpr] Simplify conditional select (#43350)
      [TensorExpr] Add aten::sum lowering to the kernel (#43585)
      [TensorExpr] Make sum available from Python (#43730)
      [TensorExpr] Make KernelSumMultipleAxes much faster (#43905)
      [TensorExpr] Check statements in test_kernel.cpp (#43911)
      [TensorExpr] Remove unused functions in kernel.cpp (#43966)
      Check for index-rank consistency in FunctionInliner (#44561)
      [TensorExpr] Support boolean in simplifier (#44659)
      [TensorExpr] Add log1p support to the LLVM backend (#44839)
      [TensorExpr] Fix order comparisons for unsigned types (#44857)
      [TensorExpr] Add Mod support to the LLVM backend (#44823)
      [TensorExpr] Fix operator order in combineMultilane (#45157)
      [TensorExpr] Remove unused EvalConstExpr function (#45180)
      [TensorExpr] Disallow arithmetic binary operations on Bool (#44677)
      [TensorExpr] When lanes differ, insert Broadcast instead of Cast (#45179)
      [TensorExpr] Fix min and max for integral inputs in CUDA backend (#44984)
      [TensorExpr] Move inner loops vectorization logic to its own method (#45287)
      [TensorExpr] Always inline and DCE in the LLVM backend (#45445)

Alex Şuhan (1):
      Support boolean key in dictionary (#42833)

Alexander (2):
      Fix examples Adaptive avg pooling typo (#40217)
      Sparse softmax support (CUDA) (#42307)

Alexander Golynski (1):
      Add warning on ProcessGroup and ProcessGroup::Work APIs (#46366)

Alexander Grund (9):
      Don't add NCCL dependency to gloo if system NCCL is used (#41180)
      Define PSIMD_SOURCE_DIR when including FP16 (#41233)
      Fix flaky test_stream_event_nogil due to missing event sync (#41398)
      Remove needless test duplication (#41583)
      Replace if(NOT ${var}) by if(NOT var) (#41924)
      Don't run tests with custom arguments with pytest (#41397)
      Remove pybind11 from required submodules (#44278)
      Remove Python version upper boundary check (#46315) (#46388)
      Workaround for bug in DistributedDataParallel (#46385)

Alexandru Suhan (1):
      [NNC] Add loop unroll transformation (#42465)

Aliaksandr Ivanou (1):
      Use python 3.8 in pytorch docker image (#45466)

Alphons Jaimon (1):
      Grammar patch 1 (.md) (#41599)

Alvaro (3):
      Add Unflatten Module (#41564)
      Fix docstring in Unflatten (#41835)
      Amend docstring and add test for Flatten module (#42084)

Alyssa Wang (1):
      Export logic op to pytorch

Andres Suarez (1):
      [fbs][2/n] Remove .python3 markers

Andrew Gallagher (1):
      [caffe2/aten] Fix clang build (#44934)

Andrew Jones (1):
      Improves type-checking guards. (#43339)

Ann Shan (20):
      check for unsupported instructions when exporting mobile models (#40791)
      list workaround for CREATE_OBJECT failure (#41129)
      Add operators for smart keyboard to lite interpreter (#41539)
      add named parameters to mobile module (#41376)
      implement lite parameter serializer (#41403)
      Refactor lite serializer dependencies from full jit (#42127)
      refactor save_data as non member function (#42045)
      Implement a light SGD optimizer (#42137)
      Fix lite trainer unit test submodule registration (#42714)
      add training mode to mobile::Module (#42880)
      refactor _save_parameters to _save_data (#43162)
      add _save_parameters to serialize map (#43163)
      [pytorch] add flag for autograd ops to mobile builds (#43154)
      Add lite SequentialSampler to torch mobile (#43299)
      [pytorch] add option to include autograd for code analyzer (#43155)
      [pytorch] Make mobile find_method return an optional (#43965)
      [pytorch] remove code analyzer build folder between builds (#44148)
      [pytorch] Add logging to mobile Method run (#44234)
      [pytorch] Replace mobile run_method with get_method and operator() (#44202)
      [pytorch] Remove mobile nonvariadic run_method (#44235)

Anthony Scopatz (4):
      Nightly checkout tool (#42635)
      Nightly Pull (#43294)
      Fix ToC Link (#43427)
      nightly robustness fixes for linking across devices (#43771)

Anthony Shoumikhin (1):
      [papaya][aten] Fix compiler error: loop variable 'tensor' is always a copy because the range of type 'c10::List<at::Tensor>' does not return a reference. (#40599)

Antonio Cuni (2):
      Fix a broken link in CONTRIBUTING.md (#44701)
      Missing tests about torch.xxx(out=...) (#44465)

Anurag Gupta (1):
      Op to create quant scheme blob (#40760)

Anush Elangovan (1):
      [cmake] Use PROJECT_SOURCE_DIR instead of CMAKE_* (#41387)

Ashish Farmer (1):
      Performance fix for torch.cat operator on ROCm (#46097) (#46323)

Ashish Shenoy (1):
      [dper3] replace LengthsGather lowlevel module's PT implemetnatio to use caffe2 op

Ashkan Aliabadi (27):
      Unify PyTorch mobile's threadpool usage. (#37243)
      Enable XNNPACK ops on iOS and macOS.
      Update psimd to psimd:072586a71b55b7f8c584153d223e95687148a900. (#40522)
      Update FXdiv to FXdiv:b408327ac2a15ec3e43352421954f5b1967701d1. (#40520)
      Update cpuinfo to cpuinfo:63b254577ed77a8004a9be6ac707f3dccc4e1fd9. (#40516)
      Update FP16 to FP16:4dfe081cf6bcd15db339cf2680b9281b8451eeb3. (#40526)
      Respect user set thread count. (#40707)
      Update pthreadpool to pthreadpool:029c88620802e1361ccf41d1970bd5b07fd6b7bb. (#40524)
      Fix memory leak in XNNPACK/MaxPool2D. (#41874)
      Disable validation layers in non-debug builds. (#42122)
      Add Vulkan Test to ATen Mobile Tests. (#42123)
      Add missing header guards. (#42272)
      Const-correctness, variable initialization, and error checking. (#42124)
      Fix ASAN error in QNNPACK's integration of qlinear_dynamic. (#41967)
      Search on system path for Vulkan headers and libraries as a last resort. (#43301)
      Refactor Vulkan context into its own files. Use RAII. (#42273)
      Revert "Revert D23252335: Refactor Vulkan context into its own files. Use RAII." (#43628)
      Move torch/csrc/utils/hash.h to c10/util/hash.h. (#42503)
      Generic Vulkan object cache. (#42394)
      Vulkan (source and binary) shader and shader layout cache. (#42325)
      Vulkan memory allocator. (#42786)
      Vulkan pipeline and pipeline layout cache. (#42395)
      Vulkan descriptor and descriptor layout cache. (#42642)
      Vulkan resource cache. (#42709)
      Vulkan command buffer and pool. (#42930)
      Minor touchups. (#44317)
      Add architectural support for multi-GPU. (#44059)

Basil Hosmer (13):
      Improved coverage for unboxed->boxed kernel wrappers (#38999)
      add Dimname support to IValue (#42054)
      handle multiple returns properly in boxing wrappers (#42437)
      add Quantizer support to IValue (#42438)
      suppress all Autograd keys in AutoNonVariableTypeMode (#42610)
      update DispatchKey::toString() (#42619)
      Include/ExcludeDispatchKeySetGuard API (#42658)
      format for readability (#42851)
      avoid redundant isCustomClassRegistered() checks (#42852)
      add support for optional int list with scalar fill (#43262)
      centralize autograd dispatch key set (#43387)
      pull empty() out of use_c10_dispatcher: full (#43572)
      [wip] fast typeMeta/ScalarType conversion approach 2 (#44965)

Bert Maher (25):
      [tensorexpr][trivial] Remove debug printing from test (#41806)
      Environment variable for controlling type verbosity in debug output (#41906)
      Add documentation for PYTORCH_JIT_TYPE_VERBOSITY (#42241)
      Print TE CUDA kernel (#42692)
      Speed up CUDA kernel launch when block/thread extents are statically known (#42899)
      Fix TE microbenchmark harness to use appropriate fuser/executor (#42900)
      Add a microbenchmark for LSTM elementwise portion (#42901)
      Add executor and fuser options to the fastrnn test fixture (#42946)
      [tensorexpr] Fix promotion of booleans (#43097)
      Fix NaN propagation in fuser's min/max implementation (#43590)
      Remove unnamed namespace in headers (#43689)
      Fix NaN propagation in TE fuser's min/max implementation (#43609)
      Respect canFuseOn{CPU,GPU} in TE fuser (#43967)
      Test TE fuser unary ops and fix sigmoid(half) (#44094)
      [te] Disable reductions by default (#44122)
      Dump optimized graph when logging in already-optimized PE (#44315)
      [te] Fix casting of unsigned char, and abs(int) (#44157)
      Prevent the TE fuser from getting datatypes it can't handle (#44160)
      Fix frac in CUDA fuser (#44152)
      Fix bug simplifying if-then-else when it can be removed (#44462)
      [te] Disable flaky test CudaSharedMemReduce_1 (#44862)
      [pytorch][tensorexpr] Make gtest-style macros in tests match actual gtest signatures (#44861)
      Failing test demonstrating problems with mixed output shapes (#44455)
      Add env variable to bypass CUDACachingAllocator for debugging (#45294)
      Tensor-expression fuser bugfixes for 1.7.1 (#48137)

Bowen Bao (1):
      [ONNX] Add dim_param support in export with onnx shape inference (#44920) (#45755)

BowenBao (12):
      [ONNX] Export torch.eye to ONNX::EyeLike (#41357)
      [ONNX] Export static as_strided (#41569)
      [ONNX] Refactor ONNX fixup for Loop and If (#40943)
      [ONNX] Enable lower_tuple pass for custom layer (#41548)
      [ONNX] Add preprocess pass for onnx export (#41832)
      [ONNX] Fix scalar type cast for comparison ops (#37787)
      [ONNX] Add support for operator `add` between tensor list (#41888)
      [ONNX] Export split_to_sequence as slice when output number is static (#42744)
      [ONNX] Utilize ONNX shape inference for ONNX exporter (#40628)
      [ONNX] Update ONNX shape inference (#43929)
      [ONNX] Enable true_divide scripting export with ONNX shape inference (#43991)
      [ONNX] Update div export to perform true divide (#44831)

Bradley Davis (2):
      update tests to run back-compat check using new binary (#41949)
      Remove expensive call to PyObject_GetAttrString in PyTorch_LookupSpecial (#44684)

Bram Wasti (12):
      [jit] Scaffold a static runtime (#42753)
      [tensorexpr] Autograd for testing (#42548)
      [jit][static runtime] Simplify the graph and add operator whitelist (#43024)
      [Static Runtime] Add OSS build for static runtime benchmarks (#43881)
      Allow no-bias MKLDNN Linear call (#43703)
      [tensorexpr] Alias analysis tests (#44110)
      [static runtime] Swap to out-variant compatible nodes (#44127)
      [tensorexpr] Add flag to fuse with unknown shapes (#44401)
      [static runtime] Add _out variants and reuse memory (#44128)
      Add Deep and wide to test and flatten/tranpose for good measure (#44129)
      [static runtime] Remove ops in static from backwards compatibility checks (#45354)
      [static runtime] Split out graph preparation from runtime (#44131)

Brandon Lin (5):
      [gloo] change ProcessGroupGlooAsyncTest to use gtest (#42313)
      [dper3] Export Caffe2 operator LearningRate to PyTorch
      [dper3] Export PackSegments and UnpackSegments to Pytorch
      [dper3] Create dper LearningRate low-level module
      [dper3] Create dper LearningRate low-level module (#44639)

Brian Hirsh (5):
      renaming TestDdpCommHook class so it doesn't get picked up as a test by pytest (#44905)
      adding a test for ddp save()/load() (#44906)
      Byte-for-byte compatibility fixes in codegen (#44879)
      adding a beta parameter to the smooth_l1 loss fn (#44433)
      Cherrypick smooth l1 loss fixes (#45759)

Brian Johnson (2):
      Update index.rst (#46324)
      Brianjo release feature status (#46892)

Brian Vaughan (3):
      Revert D22396896: [pytorch][PR] run single-threaded gradgradcheck in test_nn
      Revert D22418731: [JIT] Add out-of-source-tree to_backend tests
      Revert D22418716: [JIT] Add support for backend-lowered submodules

Bugra Akyildiz (3):
      Remove Incorrect Comment in tools/build_libtorch and remove Python2 support in the module import (#44888)
      Directly use work.result() to retrieve tensor rather than passing as a separate argument (#44914)
      Remove __future__ imports for legacy Python2 supports (#45033)

Caleb Thomas (1):
      Add iterator like functionality for DispatchKeySet (#44066)

Changji Shi (1):
      Port /test/cpp_extensions/rng_extension.cpp to new operator registration API (#39459)

Cheng Chang (2):
      [NNC] Make it able to normalize loop with variable start (#44133)
      [NNC] Add loop slicing transforms (#43854)

Chris Huynh (1):
      To fix extra memory allocation when using circular padding (#39273)

Christian Puhrsch (1):
      tuple_map / tuple_concat (#42326)

Christian Sarofeen (2):
      [nvFuser] Working towards reductions, codegen improvements (#40864)
      [NVFuser] Enable E2E BCast-PWise-Reduction fusions (#43129)

Christopher Whelan (2):
      [PyFI] Update hypothesis and switch from tp2 (#41645)
      [hypothesis] Deadline followup (#42842)

Chunli Fu (4):
      [Shape Inference] Fix InferFC
      [blob reorder] Seperate user embeddings and ad embeddings in large model loading script
      [DPER3] Separate user embeddings and ad embeddings in blob reorder
      [DPER3] AOT integration

Cloud Han (2):
      [jit] Fix jit not round to even if const is folded (#40897)
      update CONTRIBUTING.md for ccache (#41619)

Colin L Reliability Rice (4):
      Create lazy_dyndeps to avoid caffe2 import costs. (#39488)
      Create lazy_dyndeps to avoid caffe2 import costs. (#41343)
      Modify lazy_dyndep loading to trigger inside workspace. (#41687)
      Partly fix cuda builds of dper broken by caffe2 c++

Daiki Katsuragawa (1):
      Document formatting (#42065)

Daily, Jeff (1):
      install ATen/native/cuda and hip headers (#45097)

Daiming Yang (2):
      RandomSampler generates samples one at a time when replacement=True (#40026)
      Patch for #40026 RandomSampler generates samples one at a time when replacement=True (#41682)

Daniel van Strien (1):
      Update cuda init docstring to improve clarity (#42923)

Danning XIE (1):
      fix `torch.jit.trace_module` documentation (#40248)

Danny Huang (5):
      [caffe2] exposes Net cancellation through pybind state (#44043)
      [caffe2] adds Cancel to OperatorBase and NetBase (#44145)
      [caffe2] adds Cancel to SafeDequeueBlobsOp and SafeEnqueueBlobsOp (#44495)
      [caffe2] adds Cancel to SafeDequeueBlobsOp and SafeEnqueueBlobsOp (#45177)
      [caffe2] adds hypothesis test for queue ops cancel (#45178)

Danqi Huang (1):
      log message at per-test level for`perfpipe_pytorch_test_times` (#43752)

Darius Tan (3):
      [quant] Quantized Average Pool Refactoring (#42009)
      BAND, BOR and BXOR for NCCL (all_)reduce should throw runtime errors (#42669)
      Check if input is ChannelsLast or ChannelsLast3d for quantized AdaptivePool3d. (#42780)

David Reiss (17):
      Re-apply PyTorch pthreadpool changes
      Use CPU Allocator for reading from zip container
      Add channels-last support to bundled_inputs (#36764)
      Add a utility function for bundling large input tensors (#37055)
      Fix and reenable threaded QNNPACK linear (#40587)
      Fix batch size zero for QNNPACK linear_dynamic (#40588)
      In interpolate, use if instead of elif (#37171)
      In interpolate, move exceptional cases to the bottom (#37172)
      In interpolate, inline the call to _interp_output_size (#37173)
      Add support for int[]? arguments in native_functions.yaml (#37174)
      Add support for float[]? arguments in native_functions.yaml (#37175)
      Add interpolate-style overloads to aten::upsample* ops (#37176)
      Trim trailing whitespace
      Remove proprietary notices
      Update quantize_jit to handle new upsample overloads (#43407)
      Add nondeterministic check to new upsample overloads
      Update interpolate to use new upsample overloads (#43025)

Daya Khudia (3):
      [fbgemm] manual submodule update (#44082)
      [caffe2] Replace embedding conversion ops with fbgemm functions (#44843)
      [aten] Call fbgemm functions for embedding prepack/unpack (#44845)

Deepak Velmurugan (1):
      Black to Block for various files (#42913)

DeepakVelmurugan (3):
      Easier english updated tech docs (#42016)
      BlackList to BlockList (#42279)
      Blacklist to Blocklist in onnxifi_transformer (#42590)

Dhruv Matani (1):
      [RFC] Remove per-op-registration related code in caffe2/tools/codegen/gen.py (#45134)

Dianshi Li (2):
      [PT Model Split] Support 2 operators in PT by C2 conversion (#45231)
      Resend diff D23858329 (#45315)

Diego M. Rodriguez (1):
      Add __all__ to torch/_C/_VariableFunctions.pyi (#40499)

Dinesh Govindaraj (1):
      Shape inference for SparseToDense in ExpertCombiner

Dmytro Dzhulgakov (7):
      [easy] Use torch.typename in JIT error messages (#41024)
      [c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249)
      [jit] PyTorchStreamReader::getAllRecord should omit archive name prefix (#43317)
      [serialize] Expose zip file alignment calculation functions (#43531)
      [torch.fx] Pass placeholders through delegate too (#43432)
      Make ExtraFilesMap return bytes instead of str (#43241)
      [jit] Speed up saving in case of many classes (#44589)

Dongxin Liu (2):
      Mish Activation Function (#40856)
      Make Mish support large inputs. (#43037)

Donny Greenberg (1):
      Fix Broken Link in CONTRIBUTING.md (#41066)

Edgar Andrés Margffoy Tuay (1):
      Add regression test for ONNX exports of modules that embed an Embedding layer inside a Sequential (#32598)

Edmund Williams Jr (3):
      cross_layer_equalization (#41685)
      Added Prehook option to prepare method (#41863)
      Bias Correction Implementation (#41845)

Edson Romero (3):
      Exposing Percentile Caffe2 Operator in PyTorch
      Export BatchBucketOneHot Caffe2 Operator to PyTorch
      Export MergeIdLists Caffe2 Operator to PyTorch

Edward Leardi (2):
      Fix HTTP links in documentation to HTTPS (#40878)
      Fix several quantization documentation typos (#40567)

Edward Yang (27):
      Add some syntax sugar for when backends use the same function. (#40182)
      Delete requires_tensor (#40184)
      Generalize Python dispatcher testing API; disallow overwriting fallback (#40469)
      Precompute entries in dispatch tables (#40512)
      Pin torchvision version for doc_push (#40802)
      Fix bug where explicitly providing a namespace never worked. (#40830)
      If ninja is being used, force build_ext to run. (#40837)
      Revert D22418756: [pytorch][PR] Migrate addmm, addbmm and THBlas_gemm to ATen
      Fix a number of deprecation warnings (#40179)
      Upgrade cpp docs Sphinx/breathe/exhale to latest version (#41312)
      Add reference documentation for torch/library.h (#41470)
      Remove dead named_tensors_unsupported_error definitions. (#42171)
      Fix minor typo in comment (#42184)
      Revert D22812445: Update TensorPipe submodule
      Add missing space after -> for topk.values (#42321)
      Add strict mypy type checking and update code_template.py (#42322)
      Delete dead is_named_tensor_only (#42672)
      Fix some mistakes in native_functions.yaml (#43156)
      Add dataclasses to base Docker images. (#43217)
      Make _compute_linear_combination.out a true out function (#43272)
      Update hardcoded pytorch_android_gradle_custom_build_single hash (#43340)
      Reimplement per-operator selective build (#39401)
      Rewrite of ATen code generator (#42629)
      Don't register a fallback for private use to let extensions do it themselves (#44149)
      Add TORCH_SELECTIVE_NAME to AMP definitions (#44711)
      Vectorize complex copy. (#44722)
      Make cudaHostRegister actually useful on cudart. (#45159)

Ehsan K. Ardestani (2):
      NVMified NE Eval
      Remove excessive logging in plan_executor (#42888)

Eileen Pan (3):
      [1/n] Allow dense NaN value in dper raw input processor output
      [2/n][Compute Meta] support analysis for null flag features
      [2/n][Compute Meta] support analysis for null flag features

Eli Uriegas (48):
      Fix backup solution (#40515)
      Bump nightlies to 1.7.0 (#40519)
      .circleci: Remove executor from windows uploads (#40742)
      .circleci: Build docker images as part of CI workflow (#40827)
      .circleci: Output binary sizes, store binaries (#41074)
      bump docker version to more recent tag (#41105)
      .circleci: Fix job-specs-custom docker tag (#41111)
      .circleci: Remove pynightly jobs
      Update ShipIt sync
      test: Add option to continue testing through error (#41136)
      .cirlceci: Setup nvidia runtime for cu as well (#41268)
      .circleci: Explicitly remove nvidia apt repos (#41367)
      .circleci: Re-split postnightly into its own thing (#41354)
      .circleci: Prefix docker jobs with docker- (#41689)
      .circleci: Remove docker_hub_index_job, wasn't used (#41800)
      .circleci: Separate out docs build from push (#41871)
      .circleci: Make sure to install expect for docs push (#41964)
      .circleci: Prefer netrc for docs push (#42136)
      Revert "Conda build (#38796)" (#42472)
      ecr_gc: Iterate through all tags, reduce prints (#42492)
      Revert "Revert D22360735: .circleci: Build docker images as part of C… (#40950)
      .circleci: Have python docs always push to site (#42552)
      test: Disable test_strided_grad_layout on ROCM (#42561)
      .circleci: Hardcode rocm image to previous tag (#42603)
      .circleci: Only do comparisons when available (#42816)
      .circleci: Copy LLVM from pre-built image (#43038)
      .circleci: Simplify binary upload process (#43159)
      .circleci: Don't quote glob for conda upload (#43297)
      .circleci: Remove manual docker installation (#43277)
      .circleci: Use dynamic docker image for android (#43356)
      .circleci: Prefer using env-file for docker run (#43293)
      .jenkins: Remove openssh installs (#43597)
      .circleci: Add CUDA 11 to nightly binary builds (#43366)
      .circleci: Add slash to end of s3 cp (#43792)
      .circleci: Remove un-needed steps from binary builds (#43974)
      ci: Add anaconda pruning to CI pipeline (#44651)
      .circleci: Switch to dynamic MAX_JOBS (#44729)
      .circleci: Upgrade all xcode 9 workers to xcode 11 (#45153)
      docker: Add torchelastic to docker image (#45438)
      Update target determinator to point to release/1.7
      [release/1.7] .circleci: Reintroduce torchvision to docs builds (#46882)
      [release/1.7] .jenkins: Bump torchvision commit (#46933)
      [v1.7.1] Add Python 3.9 support (linux / macOS) (#48133)
      [v1.7.1] Enable Python 3.9 for Windows builds (#48218)
      [v1.7.1] Various setup.py fixes (#48220)
      [v1.7.1] third_party: Update pybind to point to fork (#48312)
      [1.7.1] torch: Stop using _nt_quote_args from distutils (#48618) (#48768)
      [v.1.7.x] Use local env for building CUDA extensions on Windows (#47150) (#48937)

Elias Ellison (45):
      Fork/Join Inline Docs (relanding) (#40438)
      [JIT] freeze doc (#40409)
      [JIT] script if tracing fix (#40468)
      [JIT] fix unfold shape analysis (#40749)
      shape analysis fix for default dtype' (#40938)
      fix grad thrashing of shape analysis (#40939)
      [JIT][Easy]move remove mutation to own file (#41137)
      [JIT] make fastrnns runnable on cpu (#41483)
      [JIT] move remove mutation to its own test file (#41502)
      [JIT] handle specially mapped ops (#41503)
      [JIT] dont count constants in subgraph size (#41436)
      [JIT] optimize autodiff subgraph slicing (#41437)
      [JIT] Don't re run CSE on every block (#41479)
      [JIT] Dont include view ops in autodiff graphs (#42027)
      import freeze (#42319)
      refactor canonical ordering to also be able to do isAfter checks (#42140)
      [JIT] Make create autodiff subgraphs do in place updates to aliasDb (#42141)
      [JIT] Represent profiled types as a node attribute (#43035)
      Add API for unexecuted op (#43629)
      Refactor pass to class (#43630)
      refactor tests (#43631)
      Add undefined specializations in backward (#43632)
      Specialize optionals for grad_sum_to_size (#43633)
      [JIT] Disable broken tests (#43750)
      Update requires grad property (#43634)
      Use prim::TensorExprGroup interned symbol (#43635)
      Add passes to profiling executor pipeline (#43636)
      use types in the IR instead of vmap (#43742)
      Update aliasing in tensorexpr fuser (#43743)
      [JIT] Always map node output in vmap (#43988)
      [JIT] Fuser match on schemas not node kind (#44083)
      [TensorExpr fuser] Guard nodes that have tensor output properties determined by non-tensor inputs (#44137)
      [JIT] Remove references to no longer generated _tanh_backward and _sigmoid_backward (#44138)
      fix lint (#44346)
      Revert D23568330: [pytorch][PR] Moves some of TestTorchMathOps to OpInfos
      Improving ModuleList indexing error msg (#43361)
      [JIT] Erase shapes before fallback graph (#44434)
      [JIT] Remove profiling nodes in autodiff forward graph (#44420)
      [JIT] dont optimize device dtype on inline (#43363)
      [JIT] Dont optimize shape info in batch_mm (#44565)
      [JIT] Fix torch.tensor for empty multidimensional-typed lists (#44652)
      Fix fallback graph in specialize autogradzero (#44654)
      [JIT] improve alias analysis for list constructs (#39111)
      Refactor subgraph merging (#44238)
      [JIT] Regularize tensorexpr fuser strategy with other fusers (#44972)

Emilio Castillo (1):
      Reset `DataLoader` workers instead of creating new ones (#35795)

Eric Cotner (1):
      fix typo "normal" -> "Cauchy" (#40334)

Facebook Community Bot (13):
      Automated submodule update: FBGEMM (#40332)
      Automated submodule update: FBGEMM (#41814)
      Automated submodule update: FBGEMM (#42205)
      Automated submodule update: FBGEMM (#42302)
      Automated submodule update: FBGEMM (#42496)
      Automated submodule update: FBGEMM (#42584)
      Automated submodule update: FBGEMM (#42713)
      Automated submodule update: FBGEMM (#42781)
      Automated submodule update: FBGEMM (#42834)
      Automated submodule update: FBGEMM (#43251)
      Automated submodule update: FBGEMM (#44177)
      Automated submodule update: FBGEMM (#44581)
      Automated submodule update: FBGEMM (#44647)

Fang Zhang (1):
      change self.generator to generator (#44461)

Gang Shen (1):
      Expose the interface of nesterov of SGD Optimizer from caffe2 to dper

Gao, Xiang (20):
      Add CUDA11 build and test (#40452)
      [JIT] Fix typing.Final for python 3.8 (#39568)
      Skip SVD tests when no lapack (#43566)
      Add amax/amin (#43092)
      Document the beta=0 behavior of BLAS functions (#43823)
      #include <string> in loopnest.h (#43835)
      addmm/addmv should accept complex alpha and beta (#43827)
      Enable TF32 support for cuDNN (#40737)
      Remove useless py2 compatibility import __future__, part 1 (#43808)
      Delete THCStream.cpp (#43733)
      Fix THPVariable_float_scalar (#43842)
      Further expand coverage of addmm/addmv, fix 0 stride (#43980)
      Cleanup workarounds for compiler bug of ROCm (#44579)
      CUDA BFloat activations 1 (#44834)
      Enable bfloat16 random kernels on Windows (#44918)
      CUDA BFloat16 addmm, addmv (#44986)
      CUDA BFloat16 losses (#45011)
      Adjust TF32 tests (#44240)
      CUDA BFloat16 neg (#45240)
      Workaround for cublas bug for 45724 (#46001) (#46042)

Garret Catron (1):
      Create experimental FX graph manipulation library (#44775)

Gaurav Subedi (1):
      change 2 instances of blacklist to blocklist in tools/pyi/gen_pyi.py (#41979)

George Guanheng Zhang (2):
      Revert D23299452: [pytorch][PR] fix typo in test_dataloader test_multiprocessing_contexts
      Revert D23379383: Land `code_coverage_tool` to `caffe2/tools` folder

Giuseppe Ottaviano (1):
      [caffe2] Speed up compilation of aten-op.cc (#44440)

Gregory Chanan (27):
      Revert "port masked_select from TH to ATen and optimize perf on CPU (#33269)" (#41828)
      Delete accidentally committed file errors.txt. (#43164)
      Kill unused _pointwise_loss function. (#43523)
      Properly check that reduction strings are valid for l1_loss, smoothl1_loss, and mse_loss. (#43527)
      Add reduction string test for ctc_loss. (#43884)
      Use NewCriterionTest in test_cpp_api_parity.py. (#43954)
      Kill dead code in common_nn as part of merging Criterion and NewCriterionTests. (#43956)
      Actually run backward criterion tests. (#44030)
      Allow criterion backwards test on modules requiring extra args (i.e. CTCLoss). (#44050)
      Merge CriterionTest into NewCriterionTest. (#44055)
      Rename NewCriterionTest to CriterionTest. (#44056)
      For CriterionTests, have check_gradgrad actually only affect gradgrad checks. (#44060)
      Stop ignoring NotImplementedErrors in cuda CriterionTests. (#44381)
      Combine criterion and new criterion tests in test_jit. (#43958)
      Merge criterion_tests and new_criterion_tests. (#44398)
      Fix MSELoss when target.requires_grad is True. (#44437)
      Fix L1Loss when target.requires_grad is True. (#44471)
      Fix SmoothL1Loss when target.requires_grad is True. (#44486)
      Simplify target handling in nn gradcheck. (#44507)
      Always use NewModuleTest instead of ModuleTest. (#44745)
      Stop ignoring errors in cuda nn module tests. (#44783)
      Stop using check_criterion_jacobian. (#44786)
      Turn on gradgrad check for BCELoss Criterion Tests. (#44894)
      Remove convert_target from NN tests. (#45291)
      Remove CriterionTest.test_cuda code for dtype None. (#45316)
      Stop running clang-tidy on torch/csrc/generic/*.cpp. (#46335)
      [v1.7] Fix backward compatibility test by moving dates forward.

Guilherme Leobas (7):
      Add typing annotations to hub.py and _jit_internal.py (#42252)
      Add typing annotations for torch.nn.quantized.dynamic.modules.rnn (#43186)
      add typing annotations for a few torch.utils.* modules (#43806)
      Add type annotations for torch.nn.utils.* (#43080)
      Add typing annotations for torch.utils.data.* modules (#44136)
      Annotate torch.utils.(tensorboard/show_pickle/hypify) (#44216)
      Enable type-checking of torch.nn.quantized.* modules (#43110)

HC Zhu (1):
      [caffe2] Fix spatial_batch_norm_op dividision-by-zero crash (#40806)

Haixin Liu (5):
      Remove print (#40475)
      [PyTorch Numeric Suite] Remove unnecessary Logger in input arguments (#40890)
      Remove unused Logger in get_matching_activations (#41023)
      Move qconfig removal into convert() (#41930)
      Add operator to compute the equalization scale (#45096)

Hameer Abbasi (8):
      Add __torch_function__ for methods (#37091)
      Follow-up for pytorch/pytorch#37091. (#42806)
      Add alias torch.fix for torch.trunc to be compatible with NumPy. (#43326)
      Add alias torch.negative to torch.neg. (#43400)
      Allow Tensor-likes in torch.autograd.gradcheck (#43877)
      Fix documentation to point to torch.overrides instead of _overrides. (#47843)
      Fix incorrect signatures in get_testing_overrides for 1.7 release (#47736)
      Fix output type of torch.max for Tensor subclasses. (#47735)

Hao Lu (13):
      [caffe2] Reimplement RemoveOpsByType with SSA (#40649)
      [TVM] Fix build and sync with caffe2/caffe2/python/dlpack.h (#40888)
      [caffe2] Revert D22220798 (#41302)
      [caffe2][redo] Reimplement RemoveOpsByType with SSA (#41606)
      [BugFix] Fix bug in onnx::SsaRewrite (#42148)
      [caffe2] Special handling of If/AsyncIf op in RemoveOpsByType (#42286)
      [jit] DeepAndWide benchmark (#43096)
      [jit][static] Replace deepcopy with copy (#43182)
      [jit][static] Basic executor (#43647)
      [TVM] Support slice op (#43969)
      [TVM] Support fp16 weights in c2_frontend (#44070)
      [TVM] Support Fused8BitRowwiseQuantizedToFloat op (#44098)
      [caffe2] Support data types in shape hints (#45110)

Hao Wu (1):
      onnx export of fake quantize functions (#39738)

Haoran Li (1):
      Back out "Make grad point to bucket buffer in DDP to save memory usage" (#43557)

Hector Yuen (25):
      fix range of results for pairwise operations (#40728)
      add first implementation of swish (#41085)
      match int8 quantization of nnpi (#41094)
      make Int8 FC bias quantization use round flush to infinity
      fix the range of the random weights used in the int8fc test (#41303)
      reduce logging for layernorm (#41305)
      update operators in the mapping to fp16 emulation
      fix include file path in unary ops
      remove template arguments of layernorm
      vectorize rounding ops (#41439)
      resurrect single quantization op test (#41476)
      fix quantization mechanism to match nnpi (#41494)
      fix dequantization to match nnpi (#41505)
      integrate int8 swish with net transformer
      add net transforms for fusion (#42763)
      remove deadline enforcement for hypothesis (#42871)
      fix int8 FC (#42691)
      make deadline=None for all numerics tests (#43014)
      add fake fp16 fusions to net transforms (#42927)
      default ice-ref to c-step (#4812)
      match batchmatmul on 1.0.0.6 (#43559)
      add tanh + quantize unit test (#44076)
      handle the case of -0.0 on tanh quantization (#44406)
      fuse layernorm + quantize (#44232)
      adjust shape inference in sls tests (#44936)

Heitor Schueroff de Souza (15):
      Added SiLU activation function (#41034)
      Initial implementation of quantile operator (#39417)
      retain undefined tensors in backward pass (#41490)
      Revert D22525217: [pytorch][PR] Initial implementation of quantile operator
      Don't materialize output grads (#41821)
      Added torch::cuda::manual_seed(_all) to mirror torch.cuda.manual_seed(_all) (#42638)
      Initial quantile operator implementation (#42755)
      Implemented non-named version of unflatten (#42563)
      Implemented torch::nn::Unflatten in libtorch (#42613)
      MaxPool1d without indices optimization (#43745)
      Fix lerp.cu bug when given discontiguous out tensor (#44559)
      Update median doc to note return value of even-sized input (#44562)
      Fixed quantile nan propagation and implemented nanquantile (#44393)
      Fixed handling of nan for evenly_distribute_backward (#45280)
      Reorganized Sorting.cpp method order (#45083)

Himangshu (3):
      Change from self to self.class() in _DecoratorManager to ensure a new object is every time a function is called recursively (#44633)
      added check for NumberType (#44375)
      Add check for Complex Type to allow non integral alpha. (#45200)

Ho Young Jhoo (1):
      Change function parameter `self` to `input` in torch.__init__.pyi (#40235)

Hong Xu (36):
      Report error when ATEN_THEADING is OMP and USE_OPENMP is turned off. (#40146)
      Let exp support complex types on CUDA and enable device/dtype in complex tests (#39087)
      Skip some error-producing exp tests that cannot be reliably reproduced (#40824)
      Remove more error-exposing tests in exp that cannot be reliably reproduced (#40825)
      Restore the contiguity preprocessing of linspace (#41286)
      Remove two "return"s that return "void" (#41811)
      Clarify Python 3.5 is the minimum supported version in the installation section. (#41937)
      Make fmod work with zero divisors consistently (#41948)
      Remove unused variable "schema" (#42245)
      Let bfloat16 support promotion with other types (#41698)
      Remove 4 unused variables in lp_pool_op.cc (#42329)
      torch.gcd: Do not use std::abs() because it does not have an unsigned integer overload (#42254)
      Let TensorIterator::nullary_op support check_mem_overlap option (#38693)
      Vectorize arange (#38697)
      Correct the type of some floating point literals in calc_digamma (#42846)
      Test the type promotion between every two dtypes thoroughly (#42585)
      Remove unused variable vecVecStartIdx (#42257)
      Replace all AT_ASSERTM under ATen CPU kernels. (#41876)
      Replace all AT_ASSERTM under ATen CUDA kernels. (#42989)
      Remove erroneous trailing backslashes (#43318)
      Do not define the macro "isnan" (#43242)
      Don't proceed into setup.py too far if Python version is unsupported (#42870)
      Let linspace support bfloat16 and complex dtypes (#43578)
      Remained changes of #43578 (#43921)
      Update torch.range warning message regarding the removal version number (#43569)
      is_numpy_scalar should also consider bool and complex types (#43644)
      Remove THC max and min, which are longer used (#43903)
      Remove many unused THC pointwise math operators (#44230)
      Let logspace support bfloat16 on both CPU and CUDA (#44675)
      For logical tests, use the dtypes decorator (#42483)
      Vectorize int8_t on CPU (#44759)
      Support BFloat16 for binary logical operators on CUDA (#42485)
      Add complex number support for binary logical operators (#43174)
      Vectorize bitwise_not (#45103)
      Support bfloat16 and complex dtypes for logical_not (#43537)
      Remove unnecessary __at_align32__ in int_elementwise_binary_256 (#45470)

Hongfei XU (1):
      Support AMP in nn.parallel (#43102)

Hongyi Jia (6):
      [Gloo] update gloo submodule for PyTorch (#41462)
      [Gloo] alltoall to ProcessGroupGloo (#41424)
      GLOO process group GPU alltoall (#41690)
      [c10d] Template computeLengthsAndOffsets() (#42706)
      [GLOO] handle empty split size (#43256)
      [PyTorch/NCCL] Fix async error handling (#45456)

Hongzheng Shi (1):
      [GradualGating] support better k value change (#41557)

Huamin Li (2):
      check in intel nnpi 1007 into fbcode/tp2
      skip test_tanhquantize for now (#44312)

Igor Sugak (1):
      [caffe2] fix clang build

Ilia Cherniavskii (8):
      Adjust CUDA memory leak test (#40504)
      [rfc] Reduce number of coin flips in RecordFunction (#40758)
      Benchmark RecordFunction overhead on some models (#40952)
      RecordFunction in Dispatcher (#37587)
      Remove ProfiledType (#42570)
      Fix sequence numbers in profiler output (#42565)
      Coalesce TLS accesses in RecordFunction constructor (#44970)
      Source code level attribution in profiler (#43898)

Iurii Zdebskyi (10):
      Add _foreach_add_(TensorList tensors, Scalar scalar) API (#42531)
      Add _foreach_add(TensorList tl1, TensorList tl2) and _foreach_add_(TensorList tl1, TensorList tl2) APIs (#42533)
      Add binary ops for _foreach APIs (#42536)
      Add unary ops: exp and sqrt (#42537)
      Added alpha overloads for add/sub ops with lists (#43413)
      Enable binary ops with Scalar Lists with for foreach APIs (#45298)
      Added optimizers based on multi tensor apply (#45299)
      [RELAND] Added optimizers based on multi tensor apply (#45408)
      Add more tests for mt optimizers (#45475)
      Disable multi tensor tesnor tests on rocm (#45535)

Ivan Kobzarev (29):
      [android][ci] Fix CI packaging headers to aar (#40442)
      [android][readme] Aar native linking add fbjni (#40578)
      [vulkan] Shaders caching (#39384)
      [vulkan] adaptive_avg_pool2d (#41220)
      [vulkan] mm op through addmm (#41221)
      [vulkan] support add for dim < 4 (#41222)
      [vulkan] reshape op (#41223)
      [vulkan][asan] Fix Invalid Memory ops (#41224)
      [vulkan] max_pool2d (#41379)
      [vulkan] VulkanTensor lazy buffer allocation (#42569)
      [vulkan] Ops registration to TORCH_LIBRARY_IMPL (#42194)
      [vulkan] Fix warnings: static_cast, remove unused (#42195)
      [vulkan] inplace add_, relu_ (#41380)
      [vulkan] cat op (concatenate) (#41434)
      [pytorch] BUCK build for Vulkan backend
      [vulkan] fix invalid memory op and tests (#43312)
      [vulkan][ci] Vulkan tests running on linux build via swiftshader (added to docker) (#42614)
      [android][jni] Support Tensor MemoryFormat in java wrappers (#40785)
      [vulkan][op] add.Scalar, mul.Scalar (#42674)
      [vulkan][op] avg_pool2d (#42675)
      [vulkan] glsl shaders relaxed precision mode to cmake option (#43076)
      [pytorch][vulkan][jni] LiteModuleLoader load argument to use vulkan device
      [pytorch][vulkan] Fix downcast warnings-errors, aten_vulkan buck target
      [vulkan][py] torch.utils.optimize_for_vulkan (#44903)
      [vulkan] Remove duplication of op registration and clean unused vars (#44932)
      [vulkan] reshape op to use infer_size to expand -1 (#45104)
      [vulkan] support dimensions negative indexing (#45068)
      [android][vulkan] Module load argument to specify device cpu/vulkan (#44896)
      [vulkan][android][test_app] Add test_app variant that runs module on Vulkan (#44897)

Ivan Yashchuk (4):
      Fix the bug in THCTensor_(baddbmm) and ATen's addmm_cuda for strided views input (#42425)
      Fix error code checks for triangular_solve (CPU) (#44720)
      Added support for complex input for Cholesky decomposition (#44895)
      Updated `cholesky_backward` for complex inputs (#45267)

Jade Nie (1):
      Wrap Caffe2's SparseLengthsSum into a PyTorch op (#39596)

Jae Lee (1):
      Back out "Selective meta programming preparation for prim ops"

James Gilbert (1):
      Remove use of term "blacklist" from tools/autograd/gen_python_functions.py (#42047)

James Reed (28):
      Fix zip serialization for file > 2GiB (#40722)
      Support Pathlike for zipfile serialization (#40723)
      Fix delegating to jit.load from torch.load (#40937)
      s/torch::jit::class_/torch::class_/ (#40795)
      Introduce experimental FX library (#42741)
      [FX] fix lint (#42866)
      [FX] Add interface to reject nodes (#42865)
      [FX] Add in resnet + quantization tests (#43157)
      [FX] Native callables in FX lowering (#43426)
      [FX] Support tensor-valued constants (#43666)
      [FX] Pickle serialization of GraphModule via forward source (#43674)
      [FX] Better error when unpacking Proxy (#43740)
      [FX] Only copy over forward() from exec (#44006)
      [FX] __str__ for GraphModule and Graph (#44166)
      [FX] Fix forward merge conflict breakage (#44221)
      [FX] Only copy over training attr if it\'s there (#44314)
      [FX] Minor fixups in Graph printout (#44214)
      [FX][EZ] Allow constructing GraphModule with dict for root (#44679)
      [FX] Further sanitize generated names (#44808)
      [FX] Fix GraphModule copy methods not regenerating forward (#44806)
      [FX] Pass module's qualname to is_leaf_module (#44966)
      [FX] s/get_param/get_attr/ (#45000)
      Revert D23798016: [FX] s/get_param/get_attr/
      [FX] Make Graphs immutable and make GraphModule recompile after assigning graph (#44830)
      [resubmit][FX] s/get_param/get_attr/ (#45147)
      [FX][EZ] Fix bug where copying node made non-unique name (#45311)
      [FX] Lint pass for Graphs (#44973)
      [1.7] Hide FX (#45631)

Jan Schlüter (1):
      20000x faster audio conversion for SummaryWriter (#44201)

Jane (Yuan) Xu (1):
      Enable typechecking for torch.testing._internal.common_quantized.* (#44805)

Jane Xu (1):
      minor style edits to torch/testing/_internal/common_quantized.py (#44807)

Jannik Bamberger (1):
      Fix arg type annotations in jit.trace and onnx.export (#41093)

Jasmine Liu (3):
      [PyTorch Error Logging][1/N] Adding Error Logging for Run_Method (#40535)
      [PyTorch Error Logging][2/N] Adding Error Logging for Loading Model (#40537)
      [PyTorch Operator] [2/n] Adding python test

Jeff Daily (16):
      skip_if_rocm test_rnn in test_c10d_spawn.py (#40577)
      [ROCm] restore jit tests (#40447)
      ROCm 3.5.1 image (#40385)
      [ROCm] update hip library name (#41813)
      restore at::Half support for caffe2 SumOp (#41952)
      pin numpy version to 1.18.5 (#42670)
      generalize circleci docker build.sh and add centos support (#41255)
      update path in CI script to access ninja (#43236)
      remove thunk fix now that ROCm CI images are >= ROCm 3.5 (#43226)
      [ROCm] skip test_rpc in .jenkins/pytorch/test.sh (#43305)
      [ROCm] allow .jenkins/pytorch/test.sh to run on centos (#42197)
      fix typo in test_dataloader test_multiprocessing_contexts (#43343)
      Enable complex blas for ROCm. (#43744)
      [ROCm] fix cub hipify mappings (#44431)
      [ROCm] remove thrust workaround in ScanKernels (#44553)
      add rocm 3.8 to nightly builds (#45222)

Jeffrey Wan (1):
      Convert num_kernels to int64 before calling into CUDA GET_BLOCKS (#44688)

Jeong Ukjae (3):
      Fix wrong link in docs/source/notes/ddp.rst (#40484)
      replace blacklist in caffe2/python/onnx/frontend.py (#41777)
      Fix typing error of torch/optim/lr_scheduler.pyi (#41775)

Jeremy Lilley (1):
      [torch] Minor: Avoid ostreamstring in Operator's canonicalSchemaString() (#44442)

Jeremy Reizenstein (1):
      Document default dim for cross being None (#41850)

Jerry Zhang (70):
      [jit] Remove unnecessary clone APIs for script::Module and RecursiveScriptModule (#40297)
      [quant][graphmode] Enable inplace option for top level API (#40414)
      [quant] Fix fuse linear pass (#40549)
      [quant][graphmode][fix] dequantize propagation for {add/mul}_scalar (#40596)
      [quant][graphmode][fix] cloning schema in insert_observers (#40624)
      [quant] aten::repeat work for quantized tensor (#40644)
      [quant][graphmode][fix] remove unsupported ops in the list (#40653)
      [quant][graphmode] Support quantization for `aten::apend` (#40743)
      [quant][graphmode][fix] Fold conv bn (#40865)
      [quant][graphmode][fix] Print the node in error message (#40889)
      [quant][graphmode][fix] filter for list append change (#41020)
      [quant][refactor] test_only_eval_fn (#41078)
      [quant] dequantize support list and tuple of tensors (#41079)
      [quant][graphmode] use RemoveMutation to remove append (#41161)
      [quant][graphmode][fix] Make it work with CallMethod on non-Module objects (#41576)
      [quant][graphmode][fix] Remove assert for uses == 1 in remove dequantize pass (#41859)
      [quant][graphmode][fix] Remove useQuantizable check for dynamic quant (#41892)
      [quant][graphmode] Support stack (#42187)
      [quant] Expose register activation post process hook function to user (#42342)
      [quant] Reduce number of variants of add/mul (#42769)
      [quant] Attach qconfig to all modules (#42576)
      [quant][fix] Remove activation_post_process in qat modules (#42343)
      [quant][doc] Print more info for fake quantize module (#43031)
      [reland][quant][fix] Remove activation_post_process in qat modules (#42343) (#43015)
      [quant][graphmode][fx] Add graph mode quantization on fx (#43175)
      [quant][graphmode][fx][test] Add per op test for graph mode quant on fx (#43229)
      [quant][graphmode][fx] Add support for conv module (#43285)
      [quant] Make OP_LIST_TO_FUSER_METHOD public (#43286)
      [quant][graphmode][fx] Add support for conv module + relu (#43287)
      [quant][graphmode][fx] Add support for add (#43331)
      [quant][graphmode][fx] Add support for add relu (#43332)
      [quant][graphmode][fx] Add support for cat (#43333)
      [quant][graphmode][fx] Add support for batchnorm (#43334)
      [quant][graphmode][fx] Add support for batchnorm relu (#43335)
      [quant][grapphmode][fx][test][refactor] Refactor quantized add test (#43372)
      [quant][graphmode][fx] Add support for mul and mul relu (#43373)
      [quant][graphmode][fx] Add support for hardswish (#43374)
      [quant][graphmode][fx] Add support for elu (#43375)
      [quant][graphmode][fx] Add support for layer_norm (#43376)
      [quant][graphmode][fx] Add support for instance_norm (#43377)
      [quant][graphmode][fx] Add support for clamp (#43437)
      [quant][graphmode][fx] Add support for general shape ops (#43438)
      [quant][graphmode][fx] Add support for general value ops (#43439)
      [quant][graphmode[fx][test][refactor] Refactor tests for graph mode quantization on fx (#43445)
      [quant][graphmode][fx] Testing torchvision (#43526)
      [reland][quant][graphmode][fx] Add e2e test on torchvision (#43587)
      [quant][graphmode][fx][fix] enable per channel quantization for functional ops (#43534)
      [quant][graphmode][fx] Add top level APIs (#43581)
      [quant][graphmode][fx] Add support for weight prepack folding (#43728)
      [reland][quant][graphmode][fx] Add top level APIs (#43581) (#43901)
      [quant][graphmode][fix] Fix insert quant dequant for observers without qparams (#43606)
      [reland][quant][graphmode][fx] Add support for weight prepack folding (#43728) (#43902)
      [quant][graphmode][fx][refactor] Move patterns to separate files (#43891)
      [quant][graphmode][fx] Support dynamic quantization without calibration (#43892)
      [quant][graphmode][fx] Support dynamic quantization without calibration (#43952)
      [quant][graphmode][fx] Support quantize per channel in all cases (#44042)
      [quant][graphmode][fx] Support inplace option (#43983)
      [quant][graphmode][fx][api] Call fuse in prepare (#43984)
      [quant][eagermode][refactor] Add set/get method for quantization and fusion mappings (#43990)
      [quant][graphmode][fx][fix] Support dictionary output (#44508)
      [quant][graphmode][fx][fix] Support None qconfig in convert (#44524)
      [quant][graphmode][fx][fix] Remove qconfig in convert (#44526)
      [quant][graphmode][fx] Support fp16 dynamic quantization for linear (#44582)
      [quant] Support clone for per channel affine quantized tensor (#44573)
      [quant][graphmode][jit] Try to support append (#44641)
      [quant][graphmode][fx] Custom module support (#44766)
      [quant][graphmode][jit][api] Expose preserved_attrs from finalize to convert_jit (#44490)
      [quant][graphmode][fx] qconfig_dict support more types of configurations (#44856)
      [quant][eagermode] Custom module support (#44835)
      [quant] Remove unused qconfig argument in qat linear module (#45307)

Jessica Lin (2):
      Remove table of contents at the top of rpc.rst (#40205)
      Update docs feature classifications (#39966)

Jiakai Liu (23):
      [pytorch][ci] run mobile code analysis on PR (#40247)
      [pytorch] factor out binary size upload command (#40188)
      [pytorch][ci] add custom selective build flow for android build (#40199)
      [pytorch] add manual registration for trace type (#40903)
      [pytorch] deprecate PYTORCH_DISABLE_TRACING macro (#41004)
      [pytorch] disable per-op profiling for internal mobile build (#41825)
      [pytorch] bump up variable version regardless of differentiability (#41269)
      [pytorch] fix code analyzer for LLVM 9 & 10 (#42135)
      [pytorch][ci] install nightly instead of stable libtorch for mobile CIs (#42220)
      [pytorch] include all overloads for OSS custom build
      Back out "change pt_defs.bzl to python file"
      [pytorch] check in default generated op dependency graph (#43570)
      [pytorch] deprecate static dispatch (#43564)
      [pytorch][bot] update mobile op deps (#43871)
      [pytorch][bot] update mobile op deps (#43937)
      [pytorch][bot] update mobile op deps (#44018)
      [pytorch][bot] update mobile op deps (#44100)
      [pytorch][bot] update mobile op deps (#44700)
      [pytorch][bot] update mobile op deps (#44854)
      [pytorch] clean up normalized_dynamic_type() hack (#44889)
      [pytorch] refine dispatch keys in native_functions.yaml (1/N) (#45010)
      [reland][pytorch] refine dispatch keys in native_functions.yaml (1/N) (#45137)
      [pytorch] refine dispatch keys in native_functions.yaml (2/N) (#45284)

Jianyu Huang (6):
      [caffe2] Add the dedup implementation of fused RowWiseAdagrad op on GPUs (#40282)
      [caffe2] Fix the issues when using CUB RadixSort (#41299)
      [pt] Add incude_last_offset option to EmbeddingBag mean and max (#42215)
      [caffe2] Fix a performance bug in Dedup SparseAdagrad op (#42287)
      [caffe2] Fix the timeout (stuck) issues of dedup SparseAdagrad C2 kernel
      [caffe2] Extend dedup SparseAdagrad fusion with stochastic rounding FP16 (#43124)

Jiatong Zhou (1):
      move __range_length and __derive_index to lite interpreter (#40533)

Jiayu Liu (1):
      [nit] fix some typo within documentation (#40692)

Jimmy Yao (1):
      delete the space for the docs rendering (#44740)

Jing Ma (1):
      [Dper3] Implementation of squeezed input to DC++

Jithun Nair (3):
      Insert parentheses around kernel name argument to hipLaunchKernelGGL (#41022)
      Add bfloat16 support for nccl path (#38515)
      Fix hipify script for pytorch extensions (#43528)

Jiyan Yang (1):
      Log the net if blob doesn't exist when setting output record (#41971)

Jiyuan Qian (2):
      Add Cost Inference for AdaGrad and RowWiseSparseAdagrad
      Fix potential divide by zero for CostInferenceForRowWiseSparseAdagrad

Jongsoo Park (4):
      Back out "[NCCL] DDP communication hook: getFuture()" (#42152)
      [fbgemm] use new more general depthwise 3d conv interface (#42697)
      [caffe2] fix wrong comment (#42735)
      [caffe2] add cost inference for FusedFakeQuantFC and FusedFakeQuantFCGradient (#44840)

Jordan Fix (4):
      Add use_glow_aot, and include ONNX again as a backend for onnxifiGlow (#4787)
      [caffe2.proto] Add AOTConfig (#44020)
      Add API for onnxifi with AOT Glow ONNX (#44021)
      Add GlowLoadAOTModel flag (#45189)

Joseph Spisak (2):
      Add MSFT Owners to the Windows Maintainership (#42280)
      Update persons_of_interest.rst (#44031)

Justin Huber (1):
      torch.isreal (#41298)

Karel Ha (1):
      Fix link to PyTorch organization (from Governance) (#40984)

Kate Mormysh (1):
      Revert D21232894: Unify PyTorch mobile's threadpool usage.

Kaushik Ram Sadagopan (1):
      Enabled torch.testing._internal.jit_utils.* typechecking. (#44985)

Keigo Kawamura (2):
      Add missing type annotation for Tensor.ndim (#42909)
      Remove `itruediv` because it's already defined in torch/tensor.py (#42962)

Kenichi Maehashi (1):
      Fix return value of PyErr_WarnEx ignored (SystemError) (#44371)

Kenso Trabing (1):
      Fix typo. in error message (#39958)

Kent Gauen (1):
      lr_schedule.py redundant code (#44613)

Kevin Stephano (1):
      [WIP][JIT] Add benchmarking support of NV Fuser with FP16 dtype support (#44101)

Khalid Almufti (2):
      Replace whitelist with allowlist (#42067)
      Replaced whitelist reference with allowlist (#42071)

Kimish Patel (22):
      Add option to preserve certain methods during optimize_for_mobile. (#40629)
      Add benchmark for add op. (#40059)
      [Vec256][neon] Add neon backend for vec256 (#39341)
      Add fused add_relu op. (#39342)
      JIT pass for add relu fusion. (#39343)
      Add add_relu fusion pass to optimize_for_mobile. (#40252)
      Impilcit casting resulting internal build failure. (#41272)
      Support aarch32 neon backend for Vec256 (#41267)
      Calculate inverse of output scale first. (#41342)
      Revert D22939119: [TensorExpr] Fix a way we were createing np arrays in tests.
      Fix freeze_module pass for sharedtype (#42457)
      Fix freeze_module pass for sharedtype (#42457)
      Simple caching allocator for CPU. (#42006)
      Refactor qconv to reduce allocations. (#42007)
      Call qnnpack's conv setup only if input pointer has changed. (#42008)
      Change quantizer to account for input tensor's memory format. (#42178)
      Enable input pointer caching in XNNPACK integration. (#42840)
      Fix bug in caching allocator. (#43719)
      Fix transposed conv2d rewrite pattern to account for convolution api (#44035)
      Fix replaceAtenConvolution for BC. (#44036)
      Implement better caching allocator for segmentation usecase. (#44618)
      Move mobile specific CPUCachingAllocator to c10/mobile folder. (#45364)

Kiran Kumar Matam (1):
      Allocating warp to an input index in compute_cuda_kernel (#43354)

Koki Nishihara (1):
      [quant] Reaname from quantized... to ...quantized_cpu in the native_functions.yaml (#41071)

Ksenija Stanojevic (11):
      [ONNX] Add eliminate_unused_items pass (#38812)
      [ONNX]Fix export of full_like (#40063)
      [ONNX]Fix export of flatten (#40418)
      [ONNX]Add tests for ConvTranspose 1D and 3D (#40703)
      [ONNX] Add pass that fuses Conv and BatchNormalization (#40547)
      [Resending] [ONNX] Add eliminate_unused_items pass (#42743)
      [ONNX] Floordiv (#43022)
      [ONNX] Update slice symbolic function (#42935)
      [ONNX] Move tests to test_pytorch_onnx_onnxruntime (#42684)
      [ONNX] Update len symbolic (#43824)
      [ONNX] add jit pass for lists (#43820)

Kurt Mohler (17):
      Change BCELoss size mismatch warning into an error (#41426)
      Add non-deterministic alert to CUDA operations that use `atomicAdd()` (#40056)
      Reland Add non-deterministic alert to CUDA operations that use `atomicAdd()` (#41538)
      Improve `torch.norm` functionality, errors, and tests (#41956)
      Throw error if `torch.set_deterministic(True)` is called with nondeterministic CuBLAS config (#41377)
      Create CuBLAS PointerModeGuard (#42639)
      Raise error if `at::native::embedding` is given 0-D weight (#42550)
      Fix orgqr input size conditions (#42825)
      Fix manual seed to unpack unsigned long (#42206)
      Fix coding style and safety issues in CuBLAS nondeterministic unit test (#42627)
      Add `torch.linalg.norm` (#42749)
      Update determinism documentation (#41692)
      Add support for integer dim arg in `torch.linalg.norm` (#43907)
      Deprecate torch.norm and torch.functional.norm (#44321)
      Add note comments to enforce nondeterministic alert documentation (#44140)
      Make nuclear and frobenius norm non-out depend on out variants (#44095)
      Clarify that 5-D 'bilinear' grid_sample is actually trilinear (#45090)

Kyle Chen (3):
      added rocm 3.7 docker image (#43576)
      Deleted docker images for rocm 3.3 and rocm 3.5 (#44672)
      added rocm 3.8 docker image (#45205)

Kyle Johnson (1):
      Add operators for LiteLMLSTM to Lite Interpreter (#41270)

Leon Gao (1):
      simplify profile text output by displaying only top-level ops statistics (#42262)

Lillian Johnson (3):
      Error printing extension support for multiline errors (#43807)
      Adjust level of verbosity of debug dumps in graph executor T74227880 (#43682)
      [JIT] Support partially specified sizes/strides in IRParser (#44113)

Lin.Sung (1):
      Change typo 'momemtum' to 'momentum' (#45045)

Linbin Yu (14):
      add eq.str, ne.str, and add.str ops (#40958)
      add "aten::add.str" op and remove two duplicated ops
      clean up duplicated op names (#41092)
      add null check for c2 tensor conversion (#41096)
      add check for duplicated op registration in JIT (#41214)
      Revert D22467871: add check for duplicated op registration in JIT
      [PT] add overload name for int prim ops (#41578)
      [PT] add check for duplicated op names in JIT (#41549)
      Revert D22533824: [PT] add check for duplicated op names in JIT
      [PT] enforce duplicate op name check on mobile
      change pt_defs.bzl to python file (#42725)
      Improve save_for_mobile cxx binary (#43721)
      update build flags for benchmark binaries
      log metadata when model loading failed (#44430)

Lingyi Liu (7):
      Perf improvement of Conv2d and Conv3d (#40324)
      Disable the mkldnn for conv2d in some special cases (#40610)
      Add a new op for converting the dense feature to sparse representation
      Add the sls tensor train op (#33525)
      Back out "Revert D19987020: [pytorch][PR] Add the sls tensor train op" (#43938)
      Optimize Scale function (#44913)
      [hpc]optimize the torch.cat cuda kernel (#44833)

Linyuan Gong (1):
      Allow np.memmap objects (numpy arrays based on files) to be processed… (#39847)

Liu (1):
      Fix module dict key ordering (#40905)

Louis Feng (4):
      DPP Async Tracing (#44252)
      Refactor CallbackManager as a nested class of RecordFunction. (#44645)
      Back out "Revert D23323486: DPP Async Tracing" plus windows build fix. (#44702)
      Back out "Revert D23494065: Refactor CallbackManager as a friend class of RecordFunction." (#44699)

Lu Fang (2):
      Rename capacity to nbytes in ShareExternalPointer to avoid confusion in future (#41461)
      [torch.fx] Add support for custom op (#43248)

Luca Wehrstedt (28):
      Update TensorPipe submodule (#40614)
      [RPC tests] Fix @_skip_if_tensorpipe always skipping for all agents (#40860)
      [RPC tests] Remove world_size and init_method from TensorPipe fixture (#40814)
      [RPC tests] Align ddp_under_dist_autograd test with others (#40815)
      [RPC tests] Fix file descriptor leak (#40913)
      [RPC docs] Remove mention of TensorPipe's SHM and CMA backends as they're not built (#41200)
      Fix torch.cuda.check_error type errors (#41330)
      [RPC tests] Fix test_init_(rpc|pg)_then_(rpc|pg) not shutting down RPC (#41558)
      Update TensorPipe submodule (#42225)
      [RPC tests] Merge TensorPipe tests into single entry point (#40816)
      [RPC tests] Merge tests for faulty agent into single script (#40817)
      [RPC tests] Merge process group tests into single entry point (#40818)
      [RPC tests] Avoid decorators to skip tests (#40819)
      [RPC tests] Make generic fixture an abstract base class (#40820)
      [RPC tests] Move some functions to methods of fixture (#40821)
      [RPC tests] Remove global TEST_CONFIG (#40822)
      [RPC tests] Enroll TensorPipe in missing test suites (#40823)
      [RPC tests] Generate test classes automatically (#42527)
      [RPC tests] Run DdpUnderDistAutogradTest and DdpComparisonTest with fork too (#42528)
      Don't reference TensorPipe headers in our headers (#42521)
      Update TensorPipe submodule (#42522)
      Fix TensorPipe submodule (#42789)
      Remove Python dependency from TensorPipe RPC agent (#42678)
      Enroll TensorPipe agent in C++-only E2E test (#42680)
      Guard TensorPipe agent by USE_TENSORPIPE (#42682)
      Revert D23803951: [pytorch] refine dispatch keys in native_functions.yaml (1/N)
      [RPC] Infer backend type if only options are given (#45065)
      Update TensorPipe submodule (#45433)

Lucas Hosseini (2):
      Extract rpc/tensorpipe_utils.{cpp,h} from rpc/utils.{cpp,h} (#44803)
      Make Ch…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: bc-breaking Related to a BC-breaking change open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improved Tensor subclassing support, preserving subclasses on function/method calls
9 participants