Add __torch_function__ for methods #37091

hameerabbasi · 2020-04-22T18:01:11Z

According to pytorch/rfcs#3

From the goals in the RFC:

Support subclassing torch.Tensor in Python (done here)
Preserve torch.Tensor subclasses when calling torch functions on them (done here)
Use the PyTorch API with torch.Tensor-like objects that are not torch.Tensor
subclasses (done in Add the __torch_function__ API override mechanism #30730)
Preserve torch.Tensor subclasses when calling torch.Tensor methods. (done here)
Propagating subclass instances correctly also with operators, using
views/slices/indexing/etc. (done here)
Preserve subclass attributes when using methods or views/slices/indexing. (done here)
A way to insert code that operates on both functions and methods uniformly
(so we can write a single function that overrides all operators). (done here)
The ability to give external libraries a way to also define
functions/methods that follow the __torch_function__ protocol. (will be addressed in a separate PR)

This PR makes the following changes:

Adds the self argument to the arg parser.
Dispatches on self as well if self is not nullptr.
Adds a torch._C.DisableTorchFunction context manager to disable __torch_function__.
Adds a torch::torch_function_enabled() and torch._C._torch_function_enabled() to check the state of __torch_function__.
Dispatches all torch._C.TensorBase and torch.Tensor methods via __torch_function__.

TODO:

Sequence Methods
Docs
Tests

Closes #28361

Benchmarks in #37091 (comment)

dr-ci · 2020-04-22T18:12:38Z

💊 CI failures summary and remediations

As of commit e715d90 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 442 times.

torch/csrc/utils/python_arg_parser.h

torch/csrc/utils/disable_torch_function.h

ezyang · 2020-08-03T14:42:30Z

Urrgh, there was a force push to the branch, this is going to make keeping the internal FB version in sync substantially more complicated (as I cannot just cherry pick the latest changes from open source)

hameerabbasi · 2020-08-03T14:44:18Z

Urrgh, there was a force push to the branch, this is going to make keeping the internal FB version in sync substantially more complicated (as I cannot just cherry pick the latest changes from open source)

I apologize, I only force-pushed the last commit: 25f37f3

ezyang · 2020-08-03T14:47:50Z

@ezyang should we update the torch_function for https://pytorch.org/blog/pytorch-feature-classification-changes/? What would you say is the classification, Beta

I need to submit it to the internal classification process. But I would agree it's somewhere between prototype and beta; I think __torch_function__ as the original behavior clearly hits the bar for beta, but the new method functionality is much more prototype-y.

ezyang · 2020-08-03T14:48:54Z


Aug 03 11:59:59 + python -c 'import torch; print(torch.__config__.show())'
Aug 03 11:59:59 Traceback (most recent call last):
Aug 03 11:59:59   File "<string>", line 1, in <module>
Aug 03 11:59:59   File "/opt/conda/lib/python3.8/site-packages/torch/__init__.py", line 526
Aug 03 11:59:59     quantized_gru = torch.ops.aten.quantized_gru§

test failures

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang · 2020-08-04T03:17:18Z

mypy failing now

hameerabbasi · 2020-08-04T19:53:43Z

Any further action needed in this PR?

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang · 2020-08-05T02:46:46Z

just tryin to land it 👍

rgommers · 2020-08-06T09:01:38Z

It landed! <does a little dance and crosses fingers>

Thanks @hameerabbasi and @ezyang!

Summary: This is a follow-up PR for #37091, fixing some of the quirks of that PR as that one was landed early to avoid merge conflicts. This PR addresses the following action items: - [x] Use error-handling macros instead of a `try`-`catch`. - [x] Renamed and added comments to clarify the use of `HANDLED_FUNCTIONS_WRAPPERS` in tests. `HANDLED_FUNCTIONS_NAMESPACES` was already removed in the last PR as we had a way to test for methods. This PR does NOT address the following action item, as it proved to be difficult: - [ ] Define `__module__` for whole API. Single-line repro-er for why this is hard: ```python >>> torch.Tensor.grad.__get__.__module__ = "torch.Tensor.grad" Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'method-wrapper' object has no attribute '__module__' ``` Explanation: Methods defined in C/properties don't always have a `__dict__` attribute or a mutable `__module__` slot for us to modify. The documentation action items were addressed in the following commit, with the additional future task of adding the rendered RFCs to the documentation: pytorch/rfcs@552ba37 Pull Request resolved: #42806 Reviewed By: smessmer Differential Revision: D23031501 Pulled By: ezyang fbshipit-source-id: b781c97f7840b8838ede50a0017b4327f96bc98a

….7.1 AWSNB (1): Fix bugs in vec256_float_neon.h (#43321) Aadesh (1): grammatical error fix (#43697) Aayush Naik (1): Implement gcd, lcm (#40651) Abdelrauf (1): Vec256 Test cases (#42685) Abhinav Garlapati (1): Add SNPE deps for caffe2 benchmark android binary Adam Simpkins (1): [caffe2] add type annotations for caffe2.distributed.python Adam Teichert (1): fix issue #31759 (allow valid ASCII python identifiers as dimnames) (#40871) Adam Thompson (1): Add complex tensor dtypes for the __cuda_array_interface__ spec (#42918) Ailing (1): Keep manual_kernel_registration only effective in aten codegen. (#42386) Ailing Zhang (22): Add documentation about storage sharing is preserved and serialized f… (#40412) Move install_torchvision to common.sh so that it can be sourced. (#40828) Check statstical diff rather than exact match for test_dropout_cuda. (#40883) Make resize_ use normal device dispatch (#42240) Remove redundant kernels calling TypeDefault in VariableType codegen. (#42031) Rename XLAPreAutograd to AutogradXLA. (#43047) Fix torch.hub for new zipfile format. (#42333) Revert D23335106: [quant][graphmode][fix] Fix insert quant dequant for observers without qparams Move Autograd to an alias dispatch key (#43070) Add tests against autograd precedence and multiple dispatch. (#44037) Expose alias key info in dumpState and update test_dispatch. (#44081) Resolve Autograd key for disable_variable_dispatch flag. (#44268) Check commutativity for computed dispatch table and add a test to check entries. (#44088) Update fallback kernel for Autograd keys. (#44349) Revert D23583017: move rebuild buckets from end of first iteration to beginning of second iteration Use iterator of DispatchKeySet. (#44682) Add alias dispatch key Math. (#44354) Support Math keyword in native_functions.yaml. (#44556) Align casing in test_dispatch with dispatch keys. (#44933) Update true_divide_out to use at::. (#45079) Resolve comments in #44354. (#45150) Move xla codegen to aten. (#45241) Akash Patel (1): find rccl properly (#42072) Akihiro Nitta (2): Fix exception chaining in `torch/` (#43836) Fix exception chaining in `test/` (#44193) Akshit Khurana (1): Add mobile_optimized tag to optimized model. (#45479) Alban Desmaison (6): Revert D22552377: [pytorch][PR] Reland split unsafe version fix backward compat (#41810) Revert D22790718: [pytorch][PR] Enables torch.full bool and integer type inference Revert D23242101: [pytorch][PR] Implement first draft of autograd benchmark. Revert D23385090: [quant][graphmode][fx] Add support for weight prepack folding Revert D23385091: [quant][graphmode][fx] Add top level APIs Alex (1): fix scripts (#44464) Alex Borcan (1): [BUILD] Guard '#pragma unroll' with COMPILING_FOR_MIN_SIZE Alex Suhan (18): [TensorExpr] Simplify conditional select (#43350) [TensorExpr] Add aten::sum lowering to the kernel (#43585) [TensorExpr] Make sum available from Python (#43730) [TensorExpr] Make KernelSumMultipleAxes much faster (#43905) [TensorExpr] Check statements in test_kernel.cpp (#43911) [TensorExpr] Remove unused functions in kernel.cpp (#43966) Check for index-rank consistency in FunctionInliner (#44561) [TensorExpr] Support boolean in simplifier (#44659) [TensorExpr] Add log1p support to the LLVM backend (#44839) [TensorExpr] Fix order comparisons for unsigned types (#44857) [TensorExpr] Add Mod support to the LLVM backend (#44823) [TensorExpr] Fix operator order in combineMultilane (#45157) [TensorExpr] Remove unused EvalConstExpr function (#45180) [TensorExpr] Disallow arithmetic binary operations on Bool (#44677) [TensorExpr] When lanes differ, insert Broadcast instead of Cast (#45179) [TensorExpr] Fix min and max for integral inputs in CUDA backend (#44984) [TensorExpr] Move inner loops vectorization logic to its own method (#45287) [TensorExpr] Always inline and DCE in the LLVM backend (#45445) Alex Şuhan (1): Support boolean key in dictionary (#42833) Alexander (2): Fix examples Adaptive avg pooling typo (#40217) Sparse softmax support (CUDA) (#42307) Alexander Golynski (1): Add warning on ProcessGroup and ProcessGroup::Work APIs (#46366) Alexander Grund (9): Don't add NCCL dependency to gloo if system NCCL is used (#41180) Define PSIMD_SOURCE_DIR when including FP16 (#41233) Fix flaky test_stream_event_nogil due to missing event sync (#41398) Remove needless test duplication (#41583) Replace if(NOT ${var}) by if(NOT var) (#41924) Don't run tests with custom arguments with pytest (#41397) Remove pybind11 from required submodules (#44278) Remove Python version upper boundary check (#46315) (#46388) Workaround for bug in DistributedDataParallel (#46385) Alexandru Suhan (1): [NNC] Add loop unroll transformation (#42465) Aliaksandr Ivanou (1): Use python 3.8 in pytorch docker image (#45466) Alphons Jaimon (1): Grammar patch 1 (.md) (#41599) Alvaro (3): Add Unflatten Module (#41564) Fix docstring in Unflatten (#41835) Amend docstring and add test for Flatten module (#42084) Alyssa Wang (1): Export logic op to pytorch Andres Suarez (1): [fbs][2/n] Remove .python3 markers Andrew Gallagher (1): [caffe2/aten] Fix clang build (#44934) Andrew Jones (1): Improves type-checking guards. (#43339) Ann Shan (20): check for unsupported instructions when exporting mobile models (#40791) list workaround for CREATE_OBJECT failure (#41129) Add operators for smart keyboard to lite interpreter (#41539) add named parameters to mobile module (#41376) implement lite parameter serializer (#41403) Refactor lite serializer dependencies from full jit (#42127) refactor save_data as non member function (#42045) Implement a light SGD optimizer (#42137) Fix lite trainer unit test submodule registration (#42714) add training mode to mobile::Module (#42880) refactor _save_parameters to _save_data (#43162) add _save_parameters to serialize map (#43163) [pytorch] add flag for autograd ops to mobile builds (#43154) Add lite SequentialSampler to torch mobile (#43299) [pytorch] add option to include autograd for code analyzer (#43155) [pytorch] Make mobile find_method return an optional (#43965) [pytorch] remove code analyzer build folder between builds (#44148) [pytorch] Add logging to mobile Method run (#44234) [pytorch] Replace mobile run_method with get_method and operator() (#44202) [pytorch] Remove mobile nonvariadic run_method (#44235) Anthony Scopatz (4): Nightly checkout tool (#42635) Nightly Pull (#43294) Fix ToC Link (#43427) nightly robustness fixes for linking across devices (#43771) Anthony Shoumikhin (1): [papaya][aten] Fix compiler error: loop variable 'tensor' is always a copy because the range of type 'c10::List<at::Tensor>' does not return a reference. (#40599) Antonio Cuni (2): Fix a broken link in CONTRIBUTING.md (#44701) Missing tests about torch.xxx(out=...) (#44465) Anurag Gupta (1): Op to create quant scheme blob (#40760) Anush Elangovan (1): [cmake] Use PROJECT_SOURCE_DIR instead of CMAKE_* (#41387) Ashish Farmer (1): Performance fix for torch.cat operator on ROCm (#46097) (#46323) Ashish Shenoy (1): [dper3] replace LengthsGather lowlevel module's PT implemetnatio to use caffe2 op Ashkan Aliabadi (27): Unify PyTorch mobile's threadpool usage. (#37243) Enable XNNPACK ops on iOS and macOS. Update psimd to psimd:072586a71b55b7f8c584153d223e95687148a900. (#40522) Update FXdiv to FXdiv:b408327ac2a15ec3e43352421954f5b1967701d1. (#40520) Update cpuinfo to cpuinfo:63b254577ed77a8004a9be6ac707f3dccc4e1fd9. (#40516) Update FP16 to FP16:4dfe081cf6bcd15db339cf2680b9281b8451eeb3. (#40526) Respect user set thread count. (#40707) Update pthreadpool to pthreadpool:029c88620802e1361ccf41d1970bd5b07fd6b7bb. (#40524) Fix memory leak in XNNPACK/MaxPool2D. (#41874) Disable validation layers in non-debug builds. (#42122) Add Vulkan Test to ATen Mobile Tests. (#42123) Add missing header guards. (#42272) Const-correctness, variable initialization, and error checking. (#42124) Fix ASAN error in QNNPACK's integration of qlinear_dynamic. (#41967) Search on system path for Vulkan headers and libraries as a last resort. (#43301) Refactor Vulkan context into its own files. Use RAII. (#42273) Revert "Revert D23252335: Refactor Vulkan context into its own files. Use RAII." (#43628) Move torch/csrc/utils/hash.h to c10/util/hash.h. (#42503) Generic Vulkan object cache. (#42394) Vulkan (source and binary) shader and shader layout cache. (#42325) Vulkan memory allocator. (#42786) Vulkan pipeline and pipeline layout cache. (#42395) Vulkan descriptor and descriptor layout cache. (#42642) Vulkan resource cache. (#42709) Vulkan command buffer and pool. (#42930) Minor touchups. (#44317) Add architectural support for multi-GPU. (#44059) Basil Hosmer (13): Improved coverage for unboxed->boxed kernel wrappers (#38999) add Dimname support to IValue (#42054) handle multiple returns properly in boxing wrappers (#42437) add Quantizer support to IValue (#42438) suppress all Autograd keys in AutoNonVariableTypeMode (#42610) update DispatchKey::toString() (#42619) Include/ExcludeDispatchKeySetGuard API (#42658) format for readability (#42851) avoid redundant isCustomClassRegistered() checks (#42852) add support for optional int list with scalar fill (#43262) centralize autograd dispatch key set (#43387) pull empty() out of use_c10_dispatcher: full (#43572) [wip] fast typeMeta/ScalarType conversion approach 2 (#44965) Bert Maher (25): [tensorexpr][trivial] Remove debug printing from test (#41806) Environment variable for controlling type verbosity in debug output (#41906) Add documentation for PYTORCH_JIT_TYPE_VERBOSITY (#42241) Print TE CUDA kernel (#42692) Speed up CUDA kernel launch when block/thread extents are statically known (#42899) Fix TE microbenchmark harness to use appropriate fuser/executor (#42900) Add a microbenchmark for LSTM elementwise portion (#42901) Add executor and fuser options to the fastrnn test fixture (#42946) [tensorexpr] Fix promotion of booleans (#43097) Fix NaN propagation in fuser's min/max implementation (#43590) Remove unnamed namespace in headers (#43689) Fix NaN propagation in TE fuser's min/max implementation (#43609) Respect canFuseOn{CPU,GPU} in TE fuser (#43967) Test TE fuser unary ops and fix sigmoid(half) (#44094) [te] Disable reductions by default (#44122) Dump optimized graph when logging in already-optimized PE (#44315) [te] Fix casting of unsigned char, and abs(int) (#44157) Prevent the TE fuser from getting datatypes it can't handle (#44160) Fix frac in CUDA fuser (#44152) Fix bug simplifying if-then-else when it can be removed (#44462) [te] Disable flaky test CudaSharedMemReduce_1 (#44862) [pytorch][tensorexpr] Make gtest-style macros in tests match actual gtest signatures (#44861) Failing test demonstrating problems with mixed output shapes (#44455) Add env variable to bypass CUDACachingAllocator for debugging (#45294) Tensor-expression fuser bugfixes for 1.7.1 (#48137) Bowen Bao (1): [ONNX] Add dim_param support in export with onnx shape inference (#44920) (#45755) BowenBao (12): [ONNX] Export torch.eye to ONNX::EyeLike (#41357) [ONNX] Export static as_strided (#41569) [ONNX] Refactor ONNX fixup for Loop and If (#40943) [ONNX] Enable lower_tuple pass for custom layer (#41548) [ONNX] Add preprocess pass for onnx export (#41832) [ONNX] Fix scalar type cast for comparison ops (#37787) [ONNX] Add support for operator `add` between tensor list (#41888) [ONNX] Export split_to_sequence as slice when output number is static (#42744) [ONNX] Utilize ONNX shape inference for ONNX exporter (#40628) [ONNX] Update ONNX shape inference (#43929) [ONNX] Enable true_divide scripting export with ONNX shape inference (#43991) [ONNX] Update div export to perform true divide (#44831) Bradley Davis (2): update tests to run back-compat check using new binary (#41949) Remove expensive call to PyObject_GetAttrString in PyTorch_LookupSpecial (#44684) Bram Wasti (12): [jit] Scaffold a static runtime (#42753) [tensorexpr] Autograd for testing (#42548) [jit][static runtime] Simplify the graph and add operator whitelist (#43024) [Static Runtime] Add OSS build for static runtime benchmarks (#43881) Allow no-bias MKLDNN Linear call (#43703) [tensorexpr] Alias analysis tests (#44110) [static runtime] Swap to out-variant compatible nodes (#44127) [tensorexpr] Add flag to fuse with unknown shapes (#44401) [static runtime] Add _out variants and reuse memory (#44128) Add Deep and wide to test and flatten/tranpose for good measure (#44129) [static runtime] Remove ops in static from backwards compatibility checks (#45354) [static runtime] Split out graph preparation from runtime (#44131) Brandon Lin (5): [gloo] change ProcessGroupGlooAsyncTest to use gtest (#42313) [dper3] Export Caffe2 operator LearningRate to PyTorch [dper3] Export PackSegments and UnpackSegments to Pytorch [dper3] Create dper LearningRate low-level module [dper3] Create dper LearningRate low-level module (#44639) Brian Hirsh (5): renaming TestDdpCommHook class so it doesn't get picked up as a test by pytest (#44905) adding a test for ddp save()/load() (#44906) Byte-for-byte compatibility fixes in codegen (#44879) adding a beta parameter to the smooth_l1 loss fn (#44433) Cherrypick smooth l1 loss fixes (#45759) Brian Johnson (2): Update index.rst (#46324) Brianjo release feature status (#46892) Brian Vaughan (3): Revert D22396896: [pytorch][PR] run single-threaded gradgradcheck in test_nn Revert D22418731: [JIT] Add out-of-source-tree to_backend tests Revert D22418716: [JIT] Add support for backend-lowered submodules Bugra Akyildiz (3): Remove Incorrect Comment in tools/build_libtorch and remove Python2 support in the module import (#44888) Directly use work.result() to retrieve tensor rather than passing as a separate argument (#44914) Remove __future__ imports for legacy Python2 supports (#45033) Caleb Thomas (1): Add iterator like functionality for DispatchKeySet (#44066) Changji Shi (1): Port /test/cpp_extensions/rng_extension.cpp to new operator registration API (#39459) Cheng Chang (2): [NNC] Make it able to normalize loop with variable start (#44133) [NNC] Add loop slicing transforms (#43854) Chris Huynh (1): To fix extra memory allocation when using circular padding (#39273) Christian Puhrsch (1): tuple_map / tuple_concat (#42326) Christian Sarofeen (2): [nvFuser] Working towards reductions, codegen improvements (#40864) [NVFuser] Enable E2E BCast-PWise-Reduction fusions (#43129) Christopher Whelan (2): [PyFI] Update hypothesis and switch from tp2 (#41645) [hypothesis] Deadline followup (#42842) Chunli Fu (4): [Shape Inference] Fix InferFC [blob reorder] Seperate user embeddings and ad embeddings in large model loading script [DPER3] Separate user embeddings and ad embeddings in blob reorder [DPER3] AOT integration Cloud Han (2): [jit] Fix jit not round to even if const is folded (#40897) update CONTRIBUTING.md for ccache (#41619) Colin L Reliability Rice (4): Create lazy_dyndeps to avoid caffe2 import costs. (#39488) Create lazy_dyndeps to avoid caffe2 import costs. (#41343) Modify lazy_dyndep loading to trigger inside workspace. (#41687) Partly fix cuda builds of dper broken by caffe2 c++ Daiki Katsuragawa (1): Document formatting (#42065) Daily, Jeff (1): install ATen/native/cuda and hip headers (#45097) Daiming Yang (2): RandomSampler generates samples one at a time when replacement=True (#40026) Patch for #40026 RandomSampler generates samples one at a time when replacement=True (#41682) Daniel van Strien (1): Update cuda init docstring to improve clarity (#42923) Danning XIE (1): fix `torch.jit.trace_module` documentation (#40248) Danny Huang (5): [caffe2] exposes Net cancellation through pybind state (#44043) [caffe2] adds Cancel to OperatorBase and NetBase (#44145) [caffe2] adds Cancel to SafeDequeueBlobsOp and SafeEnqueueBlobsOp (#44495) [caffe2] adds Cancel to SafeDequeueBlobsOp and SafeEnqueueBlobsOp (#45177) [caffe2] adds hypothesis test for queue ops cancel (#45178) Danqi Huang (1): log message at per-test level for`perfpipe_pytorch_test_times` (#43752) Darius Tan (3): [quant] Quantized Average Pool Refactoring (#42009) BAND, BOR and BXOR for NCCL (all_)reduce should throw runtime errors (#42669) Check if input is ChannelsLast or ChannelsLast3d for quantized AdaptivePool3d. (#42780) David Reiss (17): Re-apply PyTorch pthreadpool changes Use CPU Allocator for reading from zip container Add channels-last support to bundled_inputs (#36764) Add a utility function for bundling large input tensors (#37055) Fix and reenable threaded QNNPACK linear (#40587) Fix batch size zero for QNNPACK linear_dynamic (#40588) In interpolate, use if instead of elif (#37171) In interpolate, move exceptional cases to the bottom (#37172) In interpolate, inline the call to _interp_output_size (#37173) Add support for int[]? arguments in native_functions.yaml (#37174) Add support for float[]? arguments in native_functions.yaml (#37175) Add interpolate-style overloads to aten::upsample* ops (#37176) Trim trailing whitespace Remove proprietary notices Update quantize_jit to handle new upsample overloads (#43407) Add nondeterministic check to new upsample overloads Update interpolate to use new upsample overloads (#43025) Daya Khudia (3): [fbgemm] manual submodule update (#44082) [caffe2] Replace embedding conversion ops with fbgemm functions (#44843) [aten] Call fbgemm functions for embedding prepack/unpack (#44845) Deepak Velmurugan (1): Black to Block for various files (#42913) DeepakVelmurugan (3): Easier english updated tech docs (#42016) BlackList to BlockList (#42279) Blacklist to Blocklist in onnxifi_transformer (#42590) Dhruv Matani (1): [RFC] Remove per-op-registration related code in caffe2/tools/codegen/gen.py (#45134) Dianshi Li (2): [PT Model Split] Support 2 operators in PT by C2 conversion (#45231) Resend diff D23858329 (#45315) Diego M. Rodriguez (1): Add __all__ to torch/_C/_VariableFunctions.pyi (#40499) Dinesh Govindaraj (1): Shape inference for SparseToDense in ExpertCombiner Dmytro Dzhulgakov (7): [easy] Use torch.typename in JIT error messages (#41024) [c10/cuda] Reorganize device_count() and robustly surface ASAN warnings (#42249) [jit] PyTorchStreamReader::getAllRecord should omit archive name prefix (#43317) [serialize] Expose zip file alignment calculation functions (#43531) [torch.fx] Pass placeholders through delegate too (#43432) Make ExtraFilesMap return bytes instead of str (#43241) [jit] Speed up saving in case of many classes (#44589) Dongxin Liu (2): Mish Activation Function (#40856) Make Mish support large inputs. (#43037) Donny Greenberg (1): Fix Broken Link in CONTRIBUTING.md (#41066) Edgar Andrés Margffoy Tuay (1): Add regression test for ONNX exports of modules that embed an Embedding layer inside a Sequential (#32598) Edmund Williams Jr (3): cross_layer_equalization (#41685) Added Prehook option to prepare method (#41863) Bias Correction Implementation (#41845) Edson Romero (3): Exposing Percentile Caffe2 Operator in PyTorch Export BatchBucketOneHot Caffe2 Operator to PyTorch Export MergeIdLists Caffe2 Operator to PyTorch Edward Leardi (2): Fix HTTP links in documentation to HTTPS (#40878) Fix several quantization documentation typos (#40567) Edward Yang (27): Add some syntax sugar for when backends use the same function. (#40182) Delete requires_tensor (#40184) Generalize Python dispatcher testing API; disallow overwriting fallback (#40469) Precompute entries in dispatch tables (#40512) Pin torchvision version for doc_push (#40802) Fix bug where explicitly providing a namespace never worked. (#40830) If ninja is being used, force build_ext to run. (#40837) Revert D22418756: [pytorch][PR] Migrate addmm, addbmm and THBlas_gemm to ATen Fix a number of deprecation warnings (#40179) Upgrade cpp docs Sphinx/breathe/exhale to latest version (#41312) Add reference documentation for torch/library.h (#41470) Remove dead named_tensors_unsupported_error definitions. (#42171) Fix minor typo in comment (#42184) Revert D22812445: Update TensorPipe submodule Add missing space after -> for topk.values (#42321) Add strict mypy type checking and update code_template.py (#42322) Delete dead is_named_tensor_only (#42672) Fix some mistakes in native_functions.yaml (#43156) Add dataclasses to base Docker images. (#43217) Make _compute_linear_combination.out a true out function (#43272) Update hardcoded pytorch_android_gradle_custom_build_single hash (#43340) Reimplement per-operator selective build (#39401) Rewrite of ATen code generator (#42629) Don't register a fallback for private use to let extensions do it themselves (#44149) Add TORCH_SELECTIVE_NAME to AMP definitions (#44711) Vectorize complex copy. (#44722) Make cudaHostRegister actually useful on cudart. (#45159) Ehsan K. Ardestani (2): NVMified NE Eval Remove excessive logging in plan_executor (#42888) Eileen Pan (3): [1/n] Allow dense NaN value in dper raw input processor output [2/n][Compute Meta] support analysis for null flag features [2/n][Compute Meta] support analysis for null flag features Eli Uriegas (48): Fix backup solution (#40515) Bump nightlies to 1.7.0 (#40519) .circleci: Remove executor from windows uploads (#40742) .circleci: Build docker images as part of CI workflow (#40827) .circleci: Output binary sizes, store binaries (#41074) bump docker version to more recent tag (#41105) .circleci: Fix job-specs-custom docker tag (#41111) .circleci: Remove pynightly jobs Update ShipIt sync test: Add option to continue testing through error (#41136) .cirlceci: Setup nvidia runtime for cu as well (#41268) .circleci: Explicitly remove nvidia apt repos (#41367) .circleci: Re-split postnightly into its own thing (#41354) .circleci: Prefix docker jobs with docker- (#41689) .circleci: Remove docker_hub_index_job, wasn't used (#41800) .circleci: Separate out docs build from push (#41871) .circleci: Make sure to install expect for docs push (#41964) .circleci: Prefer netrc for docs push (#42136) Revert "Conda build (#38796)" (#42472) ecr_gc: Iterate through all tags, reduce prints (#42492) Revert "Revert D22360735: .circleci: Build docker images as part of C… (#40950) .circleci: Have python docs always push to site (#42552) test: Disable test_strided_grad_layout on ROCM (#42561) .circleci: Hardcode rocm image to previous tag (#42603) .circleci: Only do comparisons when available (#42816) .circleci: Copy LLVM from pre-built image (#43038) .circleci: Simplify binary upload process (#43159) .circleci: Don't quote glob for conda upload (#43297) .circleci: Remove manual docker installation (#43277) .circleci: Use dynamic docker image for android (#43356) .circleci: Prefer using env-file for docker run (#43293) .jenkins: Remove openssh installs (#43597) .circleci: Add CUDA 11 to nightly binary builds (#43366) .circleci: Add slash to end of s3 cp (#43792) .circleci: Remove un-needed steps from binary builds (#43974) ci: Add anaconda pruning to CI pipeline (#44651) .circleci: Switch to dynamic MAX_JOBS (#44729) .circleci: Upgrade all xcode 9 workers to xcode 11 (#45153) docker: Add torchelastic to docker image (#45438) Update target determinator to point to release/1.7 [release/1.7] .circleci: Reintroduce torchvision to docs builds (#46882) [release/1.7] .jenkins: Bump torchvision commit (#46933) [v1.7.1] Add Python 3.9 support (linux / macOS) (#48133) [v1.7.1] Enable Python 3.9 for Windows builds (#48218) [v1.7.1] Various setup.py fixes (#48220) [v1.7.1] third_party: Update pybind to point to fork (#48312) [1.7.1] torch: Stop using _nt_quote_args from distutils (#48618) (#48768) [v.1.7.x] Use local env for building CUDA extensions on Windows (#47150) (#48937) Elias Ellison (45): Fork/Join Inline Docs (relanding) (#40438) [JIT] freeze doc (#40409) [JIT] script if tracing fix (#40468) [JIT] fix unfold shape analysis (#40749) shape analysis fix for default dtype' (#40938) fix grad thrashing of shape analysis (#40939) [JIT][Easy]move remove mutation to own file (#41137) [JIT] make fastrnns runnable on cpu (#41483) [JIT] move remove mutation to its own test file (#41502) [JIT] handle specially mapped ops (#41503) [JIT] dont count constants in subgraph size (#41436) [JIT] optimize autodiff subgraph slicing (#41437) [JIT] Don't re run CSE on every block (#41479) [JIT] Dont include view ops in autodiff graphs (#42027) import freeze (#42319) refactor canonical ordering to also be able to do isAfter checks (#42140) [JIT] Make create autodiff subgraphs do in place updates to aliasDb (#42141) [JIT] Represent profiled types as a node attribute (#43035) Add API for unexecuted op (#43629) Refactor pass to class (#43630) refactor tests (#43631) Add undefined specializations in backward (#43632) Specialize optionals for grad_sum_to_size (#43633) [JIT] Disable broken tests (#43750) Update requires grad property (#43634) Use prim::TensorExprGroup interned symbol (#43635) Add passes to profiling executor pipeline (#43636) use types in the IR instead of vmap (#43742) Update aliasing in tensorexpr fuser (#43743) [JIT] Always map node output in vmap (#43988) [JIT] Fuser match on schemas not node kind (#44083) [TensorExpr fuser] Guard nodes that have tensor output properties determined by non-tensor inputs (#44137) [JIT] Remove references to no longer generated _tanh_backward and _sigmoid_backward (#44138) fix lint (#44346) Revert D23568330: [pytorch][PR] Moves some of TestTorchMathOps to OpInfos Improving ModuleList indexing error msg (#43361) [JIT] Erase shapes before fallback graph (#44434) [JIT] Remove profiling nodes in autodiff forward graph (#44420) [JIT] dont optimize device dtype on inline (#43363) [JIT] Dont optimize shape info in batch_mm (#44565) [JIT] Fix torch.tensor for empty multidimensional-typed lists (#44652) Fix fallback graph in specialize autogradzero (#44654) [JIT] improve alias analysis for list constructs (#39111) Refactor subgraph merging (#44238) [JIT] Regularize tensorexpr fuser strategy with other fusers (#44972) Emilio Castillo (1): Reset `DataLoader` workers instead of creating new ones (#35795) Eric Cotner (1): fix typo "normal" -> "Cauchy" (#40334) Facebook Community Bot (13): Automated submodule update: FBGEMM (#40332) Automated submodule update: FBGEMM (#41814) Automated submodule update: FBGEMM (#42205) Automated submodule update: FBGEMM (#42302) Automated submodule update: FBGEMM (#42496) Automated submodule update: FBGEMM (#42584) Automated submodule update: FBGEMM (#42713) Automated submodule update: FBGEMM (#42781) Automated submodule update: FBGEMM (#42834) Automated submodule update: FBGEMM (#43251) Automated submodule update: FBGEMM (#44177) Automated submodule update: FBGEMM (#44581) Automated submodule update: FBGEMM (#44647) Fang Zhang (1): change self.generator to generator (#44461) Gang Shen (1): Expose the interface of nesterov of SGD Optimizer from caffe2 to dper Gao, Xiang (20): Add CUDA11 build and test (#40452) [JIT] Fix typing.Final for python 3.8 (#39568) Skip SVD tests when no lapack (#43566) Add amax/amin (#43092) Document the beta=0 behavior of BLAS functions (#43823) #include <string> in loopnest.h (#43835) addmm/addmv should accept complex alpha and beta (#43827) Enable TF32 support for cuDNN (#40737) Remove useless py2 compatibility import __future__, part 1 (#43808) Delete THCStream.cpp (#43733) Fix THPVariable_float_scalar (#43842) Further expand coverage of addmm/addmv, fix 0 stride (#43980) Cleanup workarounds for compiler bug of ROCm (#44579) CUDA BFloat activations 1 (#44834) Enable bfloat16 random kernels on Windows (#44918) CUDA BFloat16 addmm, addmv (#44986) CUDA BFloat16 losses (#45011) Adjust TF32 tests (#44240) CUDA BFloat16 neg (#45240) Workaround for cublas bug for 45724 (#46001) (#46042) Garret Catron (1): Create experimental FX graph manipulation library (#44775) Gaurav Subedi (1): change 2 instances of blacklist to blocklist in tools/pyi/gen_pyi.py (#41979) George Guanheng Zhang (2): Revert D23299452: [pytorch][PR] fix typo in test_dataloader test_multiprocessing_contexts Revert D23379383: Land `code_coverage_tool` to `caffe2/tools` folder Giuseppe Ottaviano (1): [caffe2] Speed up compilation of aten-op.cc (#44440) Gregory Chanan (27): Revert "port masked_select from TH to ATen and optimize perf on CPU (#33269)" (#41828) Delete accidentally committed file errors.txt. (#43164) Kill unused _pointwise_loss function. (#43523) Properly check that reduction strings are valid for l1_loss, smoothl1_loss, and mse_loss. (#43527) Add reduction string test for ctc_loss. (#43884) Use NewCriterionTest in test_cpp_api_parity.py. (#43954) Kill dead code in common_nn as part of merging Criterion and NewCriterionTests. (#43956) Actually run backward criterion tests. (#44030) Allow criterion backwards test on modules requiring extra args (i.e. CTCLoss). (#44050) Merge CriterionTest into NewCriterionTest. (#44055) Rename NewCriterionTest to CriterionTest. (#44056) For CriterionTests, have check_gradgrad actually only affect gradgrad checks. (#44060) Stop ignoring NotImplementedErrors in cuda CriterionTests. (#44381) Combine criterion and new criterion tests in test_jit. (#43958) Merge criterion_tests and new_criterion_tests. (#44398) Fix MSELoss when target.requires_grad is True. (#44437) Fix L1Loss when target.requires_grad is True. (#44471) Fix SmoothL1Loss when target.requires_grad is True. (#44486) Simplify target handling in nn gradcheck. (#44507) Always use NewModuleTest instead of ModuleTest. (#44745) Stop ignoring errors in cuda nn module tests. (#44783) Stop using check_criterion_jacobian. (#44786) Turn on gradgrad check for BCELoss Criterion Tests. (#44894) Remove convert_target from NN tests. (#45291) Remove CriterionTest.test_cuda code for dtype None. (#45316) Stop running clang-tidy on torch/csrc/generic/*.cpp. (#46335) [v1.7] Fix backward compatibility test by moving dates forward. Guilherme Leobas (7): Add typing annotations to hub.py and _jit_internal.py (#42252) Add typing annotations for torch.nn.quantized.dynamic.modules.rnn (#43186) add typing annotations for a few torch.utils.* modules (#43806) Add type annotations for torch.nn.utils.* (#43080) Add typing annotations for torch.utils.data.* modules (#44136) Annotate torch.utils.(tensorboard/show_pickle/hypify) (#44216) Enable type-checking of torch.nn.quantized.* modules (#43110) HC Zhu (1): [caffe2] Fix spatial_batch_norm_op dividision-by-zero crash (#40806) Haixin Liu (5): Remove print (#40475) [PyTorch Numeric Suite] Remove unnecessary Logger in input arguments (#40890) Remove unused Logger in get_matching_activations (#41023) Move qconfig removal into convert() (#41930) Add operator to compute the equalization scale (#45096) Hameer Abbasi (8): Add __torch_function__ for methods (#37091) Follow-up for pytorch/pytorch#37091. (#42806) Add alias torch.fix for torch.trunc to be compatible with NumPy. (#43326) Add alias torch.negative to torch.neg. (#43400) Allow Tensor-likes in torch.autograd.gradcheck (#43877) Fix documentation to point to torch.overrides instead of _overrides. (#47843) Fix incorrect signatures in get_testing_overrides for 1.7 release (#47736) Fix output type of torch.max for Tensor subclasses. (#47735) Hao Lu (13): [caffe2] Reimplement RemoveOpsByType with SSA (#40649) [TVM] Fix build and sync with caffe2/caffe2/python/dlpack.h (#40888) [caffe2] Revert D22220798 (#41302) [caffe2][redo] Reimplement RemoveOpsByType with SSA (#41606) [BugFix] Fix bug in onnx::SsaRewrite (#42148) [caffe2] Special handling of If/AsyncIf op in RemoveOpsByType (#42286) [jit] DeepAndWide benchmark (#43096) [jit][static] Replace deepcopy with copy (#43182) [jit][static] Basic executor (#43647) [TVM] Support slice op (#43969) [TVM] Support fp16 weights in c2_frontend (#44070) [TVM] Support Fused8BitRowwiseQuantizedToFloat op (#44098) [caffe2] Support data types in shape hints (#45110) Hao Wu (1): onnx export of fake quantize functions (#39738) Haoran Li (1): Back out "Make grad point to bucket buffer in DDP to save memory usage" (#43557) Hector Yuen (25): fix range of results for pairwise operations (#40728) add first implementation of swish (#41085) match int8 quantization of nnpi (#41094) make Int8 FC bias quantization use round flush to infinity fix the range of the random weights used in the int8fc test (#41303) reduce logging for layernorm (#41305) update operators in the mapping to fp16 emulation fix include file path in unary ops remove template arguments of layernorm vectorize rounding ops (#41439) resurrect single quantization op test (#41476) fix quantization mechanism to match nnpi (#41494) fix dequantization to match nnpi (#41505) integrate int8 swish with net transformer add net transforms for fusion (#42763) remove deadline enforcement for hypothesis (#42871) fix int8 FC (#42691) make deadline=None for all numerics tests (#43014) add fake fp16 fusions to net transforms (#42927) default ice-ref to c-step (#4812) match batchmatmul on 1.0.0.6 (#43559) add tanh + quantize unit test (#44076) handle the case of -0.0 on tanh quantization (#44406) fuse layernorm + quantize (#44232) adjust shape inference in sls tests (#44936) Heitor Schueroff de Souza (15): Added SiLU activation function (#41034) Initial implementation of quantile operator (#39417) retain undefined tensors in backward pass (#41490) Revert D22525217: [pytorch][PR] Initial implementation of quantile operator Don't materialize output grads (#41821) Added torch::cuda::manual_seed(_all) to mirror torch.cuda.manual_seed(_all) (#42638) Initial quantile operator implementation (#42755) Implemented non-named version of unflatten (#42563) Implemented torch::nn::Unflatten in libtorch (#42613) MaxPool1d without indices optimization (#43745) Fix lerp.cu bug when given discontiguous out tensor (#44559) Update median doc to note return value of even-sized input (#44562) Fixed quantile nan propagation and implemented nanquantile (#44393) Fixed handling of nan for evenly_distribute_backward (#45280) Reorganized Sorting.cpp method order (#45083) Himangshu (3): Change from self to self.class() in _DecoratorManager to ensure a new object is every time a function is called recursively (#44633) added check for NumberType (#44375) Add check for Complex Type to allow non integral alpha. (#45200) Ho Young Jhoo (1): Change function parameter `self` to `input` in torch.__init__.pyi (#40235) Hong Xu (36): Report error when ATEN_THEADING is OMP and USE_OPENMP is turned off. (#40146) Let exp support complex types on CUDA and enable device/dtype in complex tests (#39087) Skip some error-producing exp tests that cannot be reliably reproduced (#40824) Remove more error-exposing tests in exp that cannot be reliably reproduced (#40825) Restore the contiguity preprocessing of linspace (#41286) Remove two "return"s that return "void" (#41811) Clarify Python 3.5 is the minimum supported version in the installation section. (#41937) Make fmod work with zero divisors consistently (#41948) Remove unused variable "schema" (#42245) Let bfloat16 support promotion with other types (#41698) Remove 4 unused variables in lp_pool_op.cc (#42329) torch.gcd: Do not use std::abs() because it does not have an unsigned integer overload (#42254) Let TensorIterator::nullary_op support check_mem_overlap option (#38693) Vectorize arange (#38697) Correct the type of some floating point literals in calc_digamma (#42846) Test the type promotion between every two dtypes thoroughly (#42585) Remove unused variable vecVecStartIdx (#42257) Replace all AT_ASSERTM under ATen CPU kernels. (#41876) Replace all AT_ASSERTM under ATen CUDA kernels. (#42989) Remove erroneous trailing backslashes (#43318) Do not define the macro "isnan" (#43242) Don't proceed into setup.py too far if Python version is unsupported (#42870) Let linspace support bfloat16 and complex dtypes (#43578) Remained changes of #43578 (#43921) Update torch.range warning message regarding the removal version number (#43569) is_numpy_scalar should also consider bool and complex types (#43644) Remove THC max and min, which are longer used (#43903) Remove many unused THC pointwise math operators (#44230) Let logspace support bfloat16 on both CPU and CUDA (#44675) For logical tests, use the dtypes decorator (#42483) Vectorize int8_t on CPU (#44759) Support BFloat16 for binary logical operators on CUDA (#42485) Add complex number support for binary logical operators (#43174) Vectorize bitwise_not (#45103) Support bfloat16 and complex dtypes for logical_not (#43537) Remove unnecessary __at_align32__ in int_elementwise_binary_256 (#45470) Hongfei XU (1): Support AMP in nn.parallel (#43102) Hongyi Jia (6): [Gloo] update gloo submodule for PyTorch (#41462) [Gloo] alltoall to ProcessGroupGloo (#41424) GLOO process group GPU alltoall (#41690) [c10d] Template computeLengthsAndOffsets() (#42706) [GLOO] handle empty split size (#43256) [PyTorch/NCCL] Fix async error handling (#45456) Hongzheng Shi (1): [GradualGating] support better k value change (#41557) Huamin Li (2): check in intel nnpi 1007 into fbcode/tp2 skip test_tanhquantize for now (#44312) Igor Sugak (1): [caffe2] fix clang build Ilia Cherniavskii (8): Adjust CUDA memory leak test (#40504) [rfc] Reduce number of coin flips in RecordFunction (#40758) Benchmark RecordFunction overhead on some models (#40952) RecordFunction in Dispatcher (#37587) Remove ProfiledType (#42570) Fix sequence numbers in profiler output (#42565) Coalesce TLS accesses in RecordFunction constructor (#44970) Source code level attribution in profiler (#43898) Iurii Zdebskyi (10): Add _foreach_add_(TensorList tensors, Scalar scalar) API (#42531) Add _foreach_add(TensorList tl1, TensorList tl2) and _foreach_add_(TensorList tl1, TensorList tl2) APIs (#42533) Add binary ops for _foreach APIs (#42536) Add unary ops: exp and sqrt (#42537) Added alpha overloads for add/sub ops with lists (#43413) Enable binary ops with Scalar Lists with for foreach APIs (#45298) Added optimizers based on multi tensor apply (#45299) [RELAND] Added optimizers based on multi tensor apply (#45408) Add more tests for mt optimizers (#45475) Disable multi tensor tesnor tests on rocm (#45535) Ivan Kobzarev (29): [android][ci] Fix CI packaging headers to aar (#40442) [android][readme] Aar native linking add fbjni (#40578) [vulkan] Shaders caching (#39384) [vulkan] adaptive_avg_pool2d (#41220) [vulkan] mm op through addmm (#41221) [vulkan] support add for dim < 4 (#41222) [vulkan] reshape op (#41223) [vulkan][asan] Fix Invalid Memory ops (#41224) [vulkan] max_pool2d (#41379) [vulkan] VulkanTensor lazy buffer allocation (#42569) [vulkan] Ops registration to TORCH_LIBRARY_IMPL (#42194) [vulkan] Fix warnings: static_cast, remove unused (#42195) [vulkan] inplace add_, relu_ (#41380) [vulkan] cat op (concatenate) (#41434) [pytorch] BUCK build for Vulkan backend [vulkan] fix invalid memory op and tests (#43312) [vulkan][ci] Vulkan tests running on linux build via swiftshader (added to docker) (#42614) [android][jni] Support Tensor MemoryFormat in java wrappers (#40785) [vulkan][op] add.Scalar, mul.Scalar (#42674) [vulkan][op] avg_pool2d (#42675) [vulkan] glsl shaders relaxed precision mode to cmake option (#43076) [pytorch][vulkan][jni] LiteModuleLoader load argument to use vulkan device [pytorch][vulkan] Fix downcast warnings-errors, aten_vulkan buck target [vulkan][py] torch.utils.optimize_for_vulkan (#44903) [vulkan] Remove duplication of op registration and clean unused vars (#44932) [vulkan] reshape op to use infer_size to expand -1 (#45104) [vulkan] support dimensions negative indexing (#45068) [android][vulkan] Module load argument to specify device cpu/vulkan (#44896) [vulkan][android][test_app] Add test_app variant that runs module on Vulkan (#44897) Ivan Yashchuk (4): Fix the bug in THCTensor_(baddbmm) and ATen's addmm_cuda for strided views input (#42425) Fix error code checks for triangular_solve (CPU) (#44720) Added support for complex input for Cholesky decomposition (#44895) Updated `cholesky_backward` for complex inputs (#45267) Jade Nie (1): Wrap Caffe2's SparseLengthsSum into a PyTorch op (#39596) Jae Lee (1): Back out "Selective meta programming preparation for prim ops" James Gilbert (1): Remove use of term "blacklist" from tools/autograd/gen_python_functions.py (#42047) James Reed (28): Fix zip serialization for file > 2GiB (#40722) Support Pathlike for zipfile serialization (#40723) Fix delegating to jit.load from torch.load (#40937) s/torch::jit::class_/torch::class_/ (#40795) Introduce experimental FX library (#42741) [FX] fix lint (#42866) [FX] Add interface to reject nodes (#42865) [FX] Add in resnet + quantization tests (#43157) [FX] Native callables in FX lowering (#43426) [FX] Support tensor-valued constants (#43666) [FX] Pickle serialization of GraphModule via forward source (#43674) [FX] Better error when unpacking Proxy (#43740) [FX] Only copy over forward() from exec (#44006) [FX] __str__ for GraphModule and Graph (#44166) [FX] Fix forward merge conflict breakage (#44221) [FX] Only copy over training attr if it\'s there (#44314) [FX] Minor fixups in Graph printout (#44214) [FX][EZ] Allow constructing GraphModule with dict for root (#44679) [FX] Further sanitize generated names (#44808) [FX] Fix GraphModule copy methods not regenerating forward (#44806) [FX] Pass module's qualname to is_leaf_module (#44966) [FX] s/get_param/get_attr/ (#45000) Revert D23798016: [FX] s/get_param/get_attr/ [FX] Make Graphs immutable and make GraphModule recompile after assigning graph (#44830) [resubmit][FX] s/get_param/get_attr/ (#45147) [FX][EZ] Fix bug where copying node made non-unique name (#45311) [FX] Lint pass for Graphs (#44973) [1.7] Hide FX (#45631) Jan Schlüter (1): 20000x faster audio conversion for SummaryWriter (#44201) Jane (Yuan) Xu (1): Enable typechecking for torch.testing._internal.common_quantized.* (#44805) Jane Xu (1): minor style edits to torch/testing/_internal/common_quantized.py (#44807) Jannik Bamberger (1): Fix arg type annotations in jit.trace and onnx.export (#41093) Jasmine Liu (3): [PyTorch Error Logging][1/N] Adding Error Logging for Run_Method (#40535) [PyTorch Error Logging][2/N] Adding Error Logging for Loading Model (#40537) [PyTorch Operator] [2/n] Adding python test Jeff Daily (16): skip_if_rocm test_rnn in test_c10d_spawn.py (#40577) [ROCm] restore jit tests (#40447) ROCm 3.5.1 image (#40385) [ROCm] update hip library name (#41813) restore at::Half support for caffe2 SumOp (#41952) pin numpy version to 1.18.5 (#42670) generalize circleci docker build.sh and add centos support (#41255) update path in CI script to access ninja (#43236) remove thunk fix now that ROCm CI images are >= ROCm 3.5 (#43226) [ROCm] skip test_rpc in .jenkins/pytorch/test.sh (#43305) [ROCm] allow .jenkins/pytorch/test.sh to run on centos (#42197) fix typo in test_dataloader test_multiprocessing_contexts (#43343) Enable complex blas for ROCm. (#43744) [ROCm] fix cub hipify mappings (#44431) [ROCm] remove thrust workaround in ScanKernels (#44553) add rocm 3.8 to nightly builds (#45222) Jeffrey Wan (1): Convert num_kernels to int64 before calling into CUDA GET_BLOCKS (#44688) Jeong Ukjae (3): Fix wrong link in docs/source/notes/ddp.rst (#40484) replace blacklist in caffe2/python/onnx/frontend.py (#41777) Fix typing error of torch/optim/lr_scheduler.pyi (#41775) Jeremy Lilley (1): [torch] Minor: Avoid ostreamstring in Operator's canonicalSchemaString() (#44442) Jeremy Reizenstein (1): Document default dim for cross being None (#41850) Jerry Zhang (70): [jit] Remove unnecessary clone APIs for script::Module and RecursiveScriptModule (#40297) [quant][graphmode] Enable inplace option for top level API (#40414) [quant] Fix fuse linear pass (#40549) [quant][graphmode][fix] dequantize propagation for {add/mul}_scalar (#40596) [quant][graphmode][fix] cloning schema in insert_observers (#40624) [quant] aten::repeat work for quantized tensor (#40644) [quant][graphmode][fix] remove unsupported ops in the list (#40653) [quant][graphmode] Support quantization for `aten::apend` (#40743) [quant][graphmode][fix] Fold conv bn (#40865) [quant][graphmode][fix] Print the node in error message (#40889) [quant][graphmode][fix] filter for list append change (#41020) [quant][refactor] test_only_eval_fn (#41078) [quant] dequantize support list and tuple of tensors (#41079) [quant][graphmode] use RemoveMutation to remove append (#41161) [quant][graphmode][fix] Make it work with CallMethod on non-Module objects (#41576) [quant][graphmode][fix] Remove assert for uses == 1 in remove dequantize pass (#41859) [quant][graphmode][fix] Remove useQuantizable check for dynamic quant (#41892) [quant][graphmode] Support stack (#42187) [quant] Expose register activation post process hook function to user (#42342) [quant] Reduce number of variants of add/mul (#42769) [quant] Attach qconfig to all modules (#42576) [quant][fix] Remove activation_post_process in qat modules (#42343) [quant][doc] Print more info for fake quantize module (#43031) [reland][quant][fix] Remove activation_post_process in qat modules (#42343) (#43015) [quant][graphmode][fx] Add graph mode quantization on fx (#43175) [quant][graphmode][fx][test] Add per op test for graph mode quant on fx (#43229) [quant][graphmode][fx] Add support for conv module (#43285) [quant] Make OP_LIST_TO_FUSER_METHOD public (#43286) [quant][graphmode][fx] Add support for conv module + relu (#43287) [quant][graphmode][fx] Add support for add (#43331) [quant][graphmode][fx] Add support for add relu (#43332) [quant][graphmode][fx] Add support for cat (#43333) [quant][graphmode][fx] Add support for batchnorm (#43334) [quant][graphmode][fx] Add support for batchnorm relu (#43335) [quant][grapphmode][fx][test][refactor] Refactor quantized add test (#43372) [quant][graphmode][fx] Add support for mul and mul relu (#43373) [quant][graphmode][fx] Add support for hardswish (#43374) [quant][graphmode][fx] Add support for elu (#43375) [quant][graphmode][fx] Add support for layer_norm (#43376) [quant][graphmode][fx] Add support for instance_norm (#43377) [quant][graphmode][fx] Add support for clamp (#43437) [quant][graphmode][fx] Add support for general shape ops (#43438) [quant][graphmode][fx] Add support for general value ops (#43439) [quant][graphmode[fx][test][refactor] Refactor tests for graph mode quantization on fx (#43445) [quant][graphmode][fx] Testing torchvision (#43526) [reland][quant][graphmode][fx] Add e2e test on torchvision (#43587) [quant][graphmode][fx][fix] enable per channel quantization for functional ops (#43534) [quant][graphmode][fx] Add top level APIs (#43581) [quant][graphmode][fx] Add support for weight prepack folding (#43728) [reland][quant][graphmode][fx] Add top level APIs (#43581) (#43901) [quant][graphmode][fix] Fix insert quant dequant for observers without qparams (#43606) [reland][quant][graphmode][fx] Add support for weight prepack folding (#43728) (#43902) [quant][graphmode][fx][refactor] Move patterns to separate files (#43891) [quant][graphmode][fx] Support dynamic quantization without calibration (#43892) [quant][graphmode][fx] Support dynamic quantization without calibration (#43952) [quant][graphmode][fx] Support quantize per channel in all cases (#44042) [quant][graphmode][fx] Support inplace option (#43983) [quant][graphmode][fx][api] Call fuse in prepare (#43984) [quant][eagermode][refactor] Add set/get method for quantization and fusion mappings (#43990) [quant][graphmode][fx][fix] Support dictionary output (#44508) [quant][graphmode][fx][fix] Support None qconfig in convert (#44524) [quant][graphmode][fx][fix] Remove qconfig in convert (#44526) [quant][graphmode][fx] Support fp16 dynamic quantization for linear (#44582) [quant] Support clone for per channel affine quantized tensor (#44573) [quant][graphmode][jit] Try to support append (#44641) [quant][graphmode][fx] Custom module support (#44766) [quant][graphmode][jit][api] Expose preserved_attrs from finalize to convert_jit (#44490) [quant][graphmode][fx] qconfig_dict support more types of configurations (#44856) [quant][eagermode] Custom module support (#44835) [quant] Remove unused qconfig argument in qat linear module (#45307) Jessica Lin (2): Remove table of contents at the top of rpc.rst (#40205) Update docs feature classifications (#39966) Jiakai Liu (23): [pytorch][ci] run mobile code analysis on PR (#40247) [pytorch] factor out binary size upload command (#40188) [pytorch][ci] add custom selective build flow for android build (#40199) [pytorch] add manual registration for trace type (#40903) [pytorch] deprecate PYTORCH_DISABLE_TRACING macro (#41004) [pytorch] disable per-op profiling for internal mobile build (#41825) [pytorch] bump up variable version regardless of differentiability (#41269) [pytorch] fix code analyzer for LLVM 9 & 10 (#42135) [pytorch][ci] install nightly instead of stable libtorch for mobile CIs (#42220) [pytorch] include all overloads for OSS custom build Back out "change pt_defs.bzl to python file" [pytorch] check in default generated op dependency graph (#43570) [pytorch] deprecate static dispatch (#43564) [pytorch][bot] update mobile op deps (#43871) [pytorch][bot] update mobile op deps (#43937) [pytorch][bot] update mobile op deps (#44018) [pytorch][bot] update mobile op deps (#44100) [pytorch][bot] update mobile op deps (#44700) [pytorch][bot] update mobile op deps (#44854) [pytorch] clean up normalized_dynamic_type() hack (#44889) [pytorch] refine dispatch keys in native_functions.yaml (1/N) (#45010) [reland][pytorch] refine dispatch keys in native_functions.yaml (1/N) (#45137) [pytorch] refine dispatch keys in native_functions.yaml (2/N) (#45284) Jianyu Huang (6): [caffe2] Add the dedup implementation of fused RowWiseAdagrad op on GPUs (#40282) [caffe2] Fix the issues when using CUB RadixSort (#41299) [pt] Add incude_last_offset option to EmbeddingBag mean and max (#42215) [caffe2] Fix a performance bug in Dedup SparseAdagrad op (#42287) [caffe2] Fix the timeout (stuck) issues of dedup SparseAdagrad C2 kernel [caffe2] Extend dedup SparseAdagrad fusion with stochastic rounding FP16 (#43124) Jiatong Zhou (1): move __range_length and __derive_index to lite interpreter (#40533) Jiayu Liu (1): [nit] fix some typo within documentation (#40692) Jimmy Yao (1): delete the space for the docs rendering (#44740) Jing Ma (1): [Dper3] Implementation of squeezed input to DC++ Jithun Nair (3): Insert parentheses around kernel name argument to hipLaunchKernelGGL (#41022) Add bfloat16 support for nccl path (#38515) Fix hipify script for pytorch extensions (#43528) Jiyan Yang (1): Log the net if blob doesn't exist when setting output record (#41971) Jiyuan Qian (2): Add Cost Inference for AdaGrad and RowWiseSparseAdagrad Fix potential divide by zero for CostInferenceForRowWiseSparseAdagrad Jongsoo Park (4): Back out "[NCCL] DDP communication hook: getFuture()" (#42152) [fbgemm] use new more general depthwise 3d conv interface (#42697) [caffe2] fix wrong comment (#42735) [caffe2] add cost inference for FusedFakeQuantFC and FusedFakeQuantFCGradient (#44840) Jordan Fix (4): Add use_glow_aot, and include ONNX again as a backend for onnxifiGlow (#4787) [caffe2.proto] Add AOTConfig (#44020) Add API for onnxifi with AOT Glow ONNX (#44021) Add GlowLoadAOTModel flag (#45189) Joseph Spisak (2): Add MSFT Owners to the Windows Maintainership (#42280) Update persons_of_interest.rst (#44031) Justin Huber (1): torch.isreal (#41298) Karel Ha (1): Fix link to PyTorch organization (from Governance) (#40984) Kate Mormysh (1): Revert D21232894: Unify PyTorch mobile's threadpool usage. Kaushik Ram Sadagopan (1): Enabled torch.testing._internal.jit_utils.* typechecking. (#44985) Keigo Kawamura (2): Add missing type annotation for Tensor.ndim (#42909) Remove `itruediv` because it's already defined in torch/tensor.py (#42962) Kenichi Maehashi (1): Fix return value of PyErr_WarnEx ignored (SystemError) (#44371) Kenso Trabing (1): Fix typo. in error message (#39958) Kent Gauen (1): lr_schedule.py redundant code (#44613) Kevin Stephano (1): [WIP][JIT] Add benchmarking support of NV Fuser with FP16 dtype support (#44101) Khalid Almufti (2): Replace whitelist with allowlist (#42067) Replaced whitelist reference with allowlist (#42071) Kimish Patel (22): Add option to preserve certain methods during optimize_for_mobile. (#40629) Add benchmark for add op. (#40059) [Vec256][neon] Add neon backend for vec256 (#39341) Add fused add_relu op. (#39342) JIT pass for add relu fusion. (#39343) Add add_relu fusion pass to optimize_for_mobile. (#40252) Impilcit casting resulting internal build failure. (#41272) Support aarch32 neon backend for Vec256 (#41267) Calculate inverse of output scale first. (#41342) Revert D22939119: [TensorExpr] Fix a way we were createing np arrays in tests. Fix freeze_module pass for sharedtype (#42457) Fix freeze_module pass for sharedtype (#42457) Simple caching allocator for CPU. (#42006) Refactor qconv to reduce allocations. (#42007) Call qnnpack's conv setup only if input pointer has changed. (#42008) Change quantizer to account for input tensor's memory format. (#42178) Enable input pointer caching in XNNPACK integration. (#42840) Fix bug in caching allocator. (#43719) Fix transposed conv2d rewrite pattern to account for convolution api (#44035) Fix replaceAtenConvolution for BC. (#44036) Implement better caching allocator for segmentation usecase. (#44618) Move mobile specific CPUCachingAllocator to c10/mobile folder. (#45364) Kiran Kumar Matam (1): Allocating warp to an input index in compute_cuda_kernel (#43354) Koki Nishihara (1): [quant] Reaname from quantized... to ...quantized_cpu in the native_functions.yaml (#41071) Ksenija Stanojevic (11): [ONNX] Add eliminate_unused_items pass (#38812) [ONNX]Fix export of full_like (#40063) [ONNX]Fix export of flatten (#40418) [ONNX]Add tests for ConvTranspose 1D and 3D (#40703) [ONNX] Add pass that fuses Conv and BatchNormalization (#40547) [Resending] [ONNX] Add eliminate_unused_items pass (#42743) [ONNX] Floordiv (#43022) [ONNX] Update slice symbolic function (#42935) [ONNX] Move tests to test_pytorch_onnx_onnxruntime (#42684) [ONNX] Update len symbolic (#43824) [ONNX] add jit pass for lists (#43820) Kurt Mohler (17): Change BCELoss size mismatch warning into an error (#41426) Add non-deterministic alert to CUDA operations that use `atomicAdd()` (#40056) Reland Add non-deterministic alert to CUDA operations that use `atomicAdd()` (#41538) Improve `torch.norm` functionality, errors, and tests (#41956) Throw error if `torch.set_deterministic(True)` is called with nondeterministic CuBLAS config (#41377) Create CuBLAS PointerModeGuard (#42639) Raise error if `at::native::embedding` is given 0-D weight (#42550) Fix orgqr input size conditions (#42825) Fix manual seed to unpack unsigned long (#42206) Fix coding style and safety issues in CuBLAS nondeterministic unit test (#42627) Add `torch.linalg.norm` (#42749) Update determinism documentation (#41692) Add support for integer dim arg in `torch.linalg.norm` (#43907) Deprecate torch.norm and torch.functional.norm (#44321) Add note comments to enforce nondeterministic alert documentation (#44140) Make nuclear and frobenius norm non-out depend on out variants (#44095) Clarify that 5-D 'bilinear' grid_sample is actually trilinear (#45090) Kyle Chen (3): added rocm 3.7 docker image (#43576) Deleted docker images for rocm 3.3 and rocm 3.5 (#44672) added rocm 3.8 docker image (#45205) Kyle Johnson (1): Add operators for LiteLMLSTM to Lite Interpreter (#41270) Leon Gao (1): simplify profile text output by displaying only top-level ops statistics (#42262) Lillian Johnson (3): Error printing extension support for multiline errors (#43807) Adjust level of verbosity of debug dumps in graph executor T74227880 (#43682) [JIT] Support partially specified sizes/strides in IRParser (#44113) Lin.Sung (1): Change typo 'momemtum' to 'momentum' (#45045) Linbin Yu (14): add eq.str, ne.str, and add.str ops (#40958) add "aten::add.str" op and remove two duplicated ops clean up duplicated op names (#41092) add null check for c2 tensor conversion (#41096) add check for duplicated op registration in JIT (#41214) Revert D22467871: add check for duplicated op registration in JIT [PT] add overload name for int prim ops (#41578) [PT] add check for duplicated op names in JIT (#41549) Revert D22533824: [PT] add check for duplicated op names in JIT [PT] enforce duplicate op name check on mobile change pt_defs.bzl to python file (#42725) Improve save_for_mobile cxx binary (#43721) update build flags for benchmark binaries log metadata when model loading failed (#44430) Lingyi Liu (7): Perf improvement of Conv2d and Conv3d (#40324) Disable the mkldnn for conv2d in some special cases (#40610) Add a new op for converting the dense feature to sparse representation Add the sls tensor train op (#33525) Back out "Revert D19987020: [pytorch][PR] Add the sls tensor train op" (#43938) Optimize Scale function (#44913) [hpc]optimize the torch.cat cuda kernel (#44833) Linyuan Gong (1): Allow np.memmap objects (numpy arrays based on files) to be processed… (#39847) Liu (1): Fix module dict key ordering (#40905) Louis Feng (4): DPP Async Tracing (#44252) Refactor CallbackManager as a nested class of RecordFunction. (#44645) Back out "Revert D23323486: DPP Async Tracing" plus windows build fix. (#44702) Back out "Revert D23494065: Refactor CallbackManager as a friend class of RecordFunction." (#44699) Lu Fang (2): Rename capacity to nbytes in ShareExternalPointer to avoid confusion in future (#41461) [torch.fx] Add support for custom op (#43248) Luca Wehrstedt (28): Update TensorPipe submodule (#40614) [RPC tests] Fix @_skip_if_tensorpipe always skipping for all agents (#40860) [RPC tests] Remove world_size and init_method from TensorPipe fixture (#40814) [RPC tests] Align ddp_under_dist_autograd test with others (#40815) [RPC tests] Fix file descriptor leak (#40913) [RPC docs] Remove mention of TensorPipe's SHM and CMA backends as they're not built (#41200) Fix torch.cuda.check_error type errors (#41330) [RPC tests] Fix test_init_(rpc|pg)_then_(rpc|pg) not shutting down RPC (#41558) Update TensorPipe submodule (#42225) [RPC tests] Merge TensorPipe tests into single entry point (#40816) [RPC tests] Merge tests for faulty agent into single script (#40817) [RPC tests] Merge process group tests into single entry point (#40818) [RPC tests] Avoid decorators to skip tests (#40819) [RPC tests] Make generic fixture an abstract base class (#40820) [RPC tests] Move some functions to methods of fixture (#40821) [RPC tests] Remove global TEST_CONFIG (#40822) [RPC tests] Enroll TensorPipe in missing test suites (#40823) [RPC tests] Generate test classes automatically (#42527) [RPC tests] Run DdpUnderDistAutogradTest and DdpComparisonTest with fork too (#42528) Don't reference TensorPipe headers in our headers (#42521) Update TensorPipe submodule (#42522) Fix TensorPipe submodule (#42789) Remove Python dependency from TensorPipe RPC agent (#42678) Enroll TensorPipe agent in C++-only E2E test (#42680) Guard TensorPipe agent by USE_TENSORPIPE (#42682) Revert D23803951: [pytorch] refine dispatch keys in native_functions.yaml (1/N) [RPC] Infer backend type if only options are given (#45065) Update TensorPipe submodule (#45433) Lucas Hosseini (2): Extract rpc/tensorpipe_utils.{cpp,h} from rpc/utils.{cpp,h} (#44803) Make Ch…

hameerabbasi changed the title ~~WIP~~ Add __torch_function__ for methods Apr 22, 2020

pytorchbot added the open source label Apr 22, 2020

mruberry requested a review from bhosmer April 22, 2020 18:26

mruberry added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Apr 22, 2020

hameerabbasi force-pushed the method-torch-function branch 5 times, most recently from 15151c1 to 62bcf35 Compare April 24, 2020 15:06

hameerabbasi mentioned this pull request Apr 25, 2020

torch.cat does not call __torch_function__ properly #34294

Closed

hameerabbasi force-pushed the method-torch-function branch from 62bcf35 to f89356b Compare April 27, 2020 10:00

rgommers linked an issue Apr 27, 2020 that may be closed by this pull request

Improved Tensor subclassing support, preserving subclasses on function/method calls #28361

Closed

hameerabbasi force-pushed the method-torch-function branch 4 times, most recently from fa1aa96 to 067f892 Compare April 27, 2020 18:14

ezyang reviewed Apr 27, 2020

View reviewed changes

torch/csrc/utils/python_arg_parser.h Outdated Show resolved Hide resolved

hameerabbasi force-pushed the method-torch-function branch 4 times, most recently from de8ab35 to b4285ff Compare April 29, 2020 14:24

peterbell10 reviewed Apr 29, 2020

View reviewed changes

torch/csrc/utils/disable_torch_function.h Outdated Show resolved Hide resolved

hameerabbasi force-pushed the method-torch-function branch 7 times, most recently from 7e76252 to 95d4441 Compare May 1, 2020 15:59

hameerabbasi added 3 commits August 3, 2020 09:58

Fix tests.

69403f0

Rename torch._overrides to torch.overrides.

b422a1b

Merge branch 'master' into method-torch-function

07750e3

facebook-github-bot reviewed Aug 4, 2020

View reviewed changes

Fix MyPy.

e715d90

facebook-github-bot reviewed Aug 5, 2020

View reviewed changes

facebook-github-bot closed this in 3d46e02 Aug 6, 2020

hameerabbasi added a commit to hameerabbasi/pytorch that referenced this pull request Aug 10, 2020

Follow-up for pytorch#37091.

4dcbc73

hameerabbasi mentioned this pull request Aug 10, 2020

Follow-up to __torch_function__ for methods PR #42806

Closed

3 tasks

ezyang mentioned this pull request Sep 17, 2020

[FX] Schema normalization tooling #44316

Open

ezyang mentioned this pull request Oct 13, 2020

Publish a mechanism for disabling subclass preservation of Tensor subclasses #46159

Closed

robieta mentioned this pull request Oct 21, 2020

Optimize comparison overhead from _wrap_type_error_to_not_implemented #46689

Closed

hameerabbasi mentioned this pull request Nov 12, 2020

Incorrect info about overriding torch tensors in version 1.7.0 #47697

Closed

xwang233 mentioned this pull request Dec 2, 2020

__torch_function__ PR may cause performance regression on GPU training #48748

Closed

robieta mentioned this pull request Dec 6, 2020

Optimize dispatch for non __torch_function__ calls #48776

Closed

anirudh2290 mentioned this pull request Jan 16, 2021

Training slowdown from 1.6 to 1.7.1 #50636

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add __torch_function__ for methods #37091

Add __torch_function__ for methods #37091

hameerabbasi commented Apr 22, 2020 •

edited

dr-ci bot commented Apr 22, 2020 •

edited

ezyang commented Aug 3, 2020

hameerabbasi commented Aug 3, 2020

ezyang commented Aug 3, 2020

ezyang commented Aug 3, 2020

facebook-github-bot left a comment

ezyang commented Aug 4, 2020

hameerabbasi commented Aug 4, 2020

facebook-github-bot left a comment

ezyang commented Aug 5, 2020

rgommers commented Aug 6, 2020

Add __torch_function__ for methods #37091

Add __torch_function__ for methods #37091

Conversation

hameerabbasi commented Apr 22, 2020 • edited

dr-ci bot commented Apr 22, 2020 • edited

💊 CI failures summary and remediations

ezyang commented Aug 3, 2020

hameerabbasi commented Aug 3, 2020

ezyang commented Aug 3, 2020

ezyang commented Aug 3, 2020

facebook-github-bot left a comment

Choose a reason for hiding this comment

ezyang commented Aug 4, 2020

hameerabbasi commented Aug 4, 2020

facebook-github-bot left a comment

Choose a reason for hiding this comment

ezyang commented Aug 5, 2020

rgommers commented Aug 6, 2020

hameerabbasi commented Apr 22, 2020 •

edited

dr-ci bot commented Apr 22, 2020 •

edited