[ROCm] Use MIOpen for transpose convolutions #26172

iotamudelta · 2019-09-13T17:40:07Z

Provides significant performance uplift where used.

Provides significant performance uplift.

bddppq

wow nice

any numbers of typical perf improvements?

bddppq · 2019-09-13T17:43:41Z

cc @xw285cornell

iotamudelta · 2019-09-13T17:59:13Z

@bddppq we've observed 2.5x

facebook-github-bot

@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Provides significant performance uplift where used. Pull Request resolved: pytorch/pytorch#26172 Differential Revision: D17374862 Pulled By: bddppq fbshipit-source-id: 85d2df3c67b8935bc54f3a81a912a25c0102743a

facebook-github-bot · 2019-09-15T08:33:09Z

@bddppq merged this pull request in e86d99a.

* C++ Average Pool Module (#25800) Summary: This PR adds Average Pool module to C++ front-end. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25800 Differential Revision: D17318094 Pulled By: yf225 fbshipit-source-id: c914c0e802bbe5f1d1f0a21a669c28bc956899db * Better error messages in C2 ONNX backend (#25809) Summary: Just a tiny fix to make debugging easier (output errors to stderr and include in the exception message) Pull Request resolved: https://github.com/pytorch/pytorch/pull/25809 Reviewed By: zrphercule Differential Revision: D17329957 Pulled By: houseroad fbshipit-source-id: 0d73dd9f62c735fbc5096e6a7c0e5f58e4cd90ae * Add new API for Fully Connected and Convolution Operators in QNNPACK (#25862) Summary: This change adds a new prepack and run function for FC and Convolution operators in QNNPACK. The new functions added are `PackBMatrix`, `qnnpackLinear`, `PrePackConvWeights` and `qnnpackConv` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25862 Test Plan: QNNPACK unit tests fully-connected-test convolution-test Differential Revision: D17299260 Pulled By: supriyar fbshipit-source-id: fdc4e2d5f1232675acd153f3efb9d17ed8628a54 * Enable more mGPU tests (#26055) Summary: Enable mGPU tests that pass on ROCm as of 2.7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26055 Differential Revision: D17331484 Pulled By: bddppq fbshipit-source-id: 51f956a84a6c14a1a41473d322950994fa29c25c * remove verbose in pytorch_ci hypothesis profile (#26075) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26075 att, remove verbose argument to reduce noice in the logs Test Plan: ci Imported from OSS Differential Revision: D17335935 fbshipit-source-id: 2e4289e838bf4489dcad8d5533353eebcff0d481 * TorchScript Serialization for dynamic LSTM module Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25877 Test Plan: Imported from OSS Reviewed By: jianyuh Differential Revision: D17275746 Pulled By: jamesr66a fbshipit-source-id: db2f38ddd99f02ccb4fb754fa1c1e6cad4425fa8 * Upgrade the naming for fbgemm quantized op (#26064) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26064 Just changing the names after https://github.com/pytorch/pytorch/pull/25678. ghstack-source-id: 89944542 Test Plan: CI Differential Revision: D17332068 fbshipit-source-id: 5e9febed7a2fcd10d44273e55643b277d33a3ad7 * Use BytesIO instead of tempfile (#25976) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25976 As recommended in https://github.com/pytorch/pytorch/pull/25877/files#r322956051: > We should move more of these toward using BytesIO. Using files in tests is generally considered bad practice because it introduces syscalls and dependencies on the execution environment, and thus can cause test flakiness/instability. ghstack-source-id: 89929947 Test Plan: CI Differential Revision: D17310441 fbshipit-source-id: ba97cce4224225df45ff44062f1bc8ebefb25922 * Revert "TorchScript Serialization for dynamic LSTM module" (#26079) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26079 This reverts commit e3039612d851d0fbd337546c8debc27ec7cfc4e4. Test Plan: Imported from OSS Differential Revision: D17337585 Pulled By: jamesr66a fbshipit-source-id: 4b93a4c5ca2fe491d609da889a42d22be8e52889 * Add Runtime flag for quantized backend. (#25680) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25680 Add a runtime flag to choose between FBGEMM and QNNPACK when compiled with both. The flag can be set by using torch.backends.quantized.engine = torch.fbgemm/torch.qnnpack or ctx::setPreferredQuantizedEngine(at::QEngine) ghstack-source-id: 89935643 Test Plan: Verified torch.backends.quantized.engine works Differential Revision: D17198233 fbshipit-source-id: e5449d06f4136385e0e6d18bd4237f8654a61672 * Dynamic registration of RPC backends (#25734) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25734 [pytorch] Dynamic registration of RPC backends Allow non-pg rpc backends to be plugged in as a backend. ghstack-source-id: 89938296 Differential Revision: D17183789 fbshipit-source-id: 885fed12d80b82b60f9a125f78302a161e708089 * Make regular softmax warp size aware (#25956) Summary: Enable one unit test that passes now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25956 Differential Revision: D17298150 Pulled By: bddppq fbshipit-source-id: 8763e71ad7ef80be915fe93a3471b29f27f3f0a4 * Move NamedTensorMetaInterface definitions to TensorImpl.h (#26030) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26030 Test Plan: - [namedtensor ci] Pull Request resolved: https://github.com/pytorch/pytorch/pull/26030 Differential Revision: D17322383 Pulled By: zou3519 fbshipit-source-id: d5b914d646b48a6f4e0104aceb435e694b72bd96 * Experimental warning for named tensors (#26050) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26050 Throws a warning once when someone attempts to attach names to a tensor. This is guaranteed to happen at the callsite `set_named_tensor_meta`. Test Plan: - run tests [namedtensor ci] Differential Revision: D17331634 Pulled By: zou3519 fbshipit-source-id: 44f5e5c95acd9c7ba543c1210a3b1314aab348f0 * print source code when a function is executed (#25868) Summary: While this isn't ideal as it might print out the same source every time a function is run; it's still easier to go and tweak python code to reduce loop counts, than to insert `std::cout` and recompile cpp code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25868 Differential Revision: D17318386 Pulled By: Krovatkin fbshipit-source-id: 928ba6543204042924ab41a724635594709630de * Disable test_cuda.test_stream_event_nogil on ROCm (#26087) Summary: Was recently enabled in https://github.com/pytorch/pytorch/pull/26055, it's flaky on master: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/37575 https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/37577 ``` 05:39:35 test_stream_event_nogil (__main__.TestCuda) ... Exception in thread Thread-3: 05:39:40 Traceback (most recent call last): 05:39:40 File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner 05:39:40 self.run() 05:39:40 File "/usr/lib/python2.7/threading.py", line 754, in run 05:39:40 self.__target(*self.__args, **self.__kwargs) 05:39:40 File "test_cuda.py", line 1894, in _test_stream_event_nogil 05:39:40 c2p.put(sync_func(self, TestCuda.FIFTY_MIL_CYCLES)) 05:39:40 File "test_cuda.py", line 1882, in _event_wait 05:39:40 self.assertTrue(s1.query()) 05:39:40 File "/usr/lib/python2.7/unittest/case.py", line 422, in assertTrue 05:39:40 raise self.failureException(msg) 05:39:40 AssertionError: False is not true ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/26087 Differential Revision: D17340891 Pulled By: bddppq fbshipit-source-id: b2b70beb1b068db53197a5f9f6a80cb046e66ebd * TorchScript Serialization for dynamic LSTM Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26084 Test Plan: Imported from OSS Differential Revision: D17339315 Pulled By: jamesr66a fbshipit-source-id: 03a2674edcf779becfe3b8ec96f1bae23c74b11c * Automatic update of fbcode/onnx to 7988d8360b11e6003560076e9b1d4aa426db3244 (#25959) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25959 Previous import was 28ca699b69b5a31892619defca2391044a9a6052 Included changes: - **[7988d836](https://github.com/onnx/onnx/commit/7988d836)**: Supporting negative axes for all existing onnx ops (#2281) <Negin Raoof> - **[5ca0a09e](https://github.com/onnx/onnx/commit/5ca0a09e)**: Update managingexperimentalops.md (#1981) <Joseph Spisak> - **[bc0495c1](https://github.com/onnx/onnx/commit/bc0495c1)**: Fix link to community docs in readme (#2261) <Prasanth Pulavarthi> - **[2fdb3ef6](https://github.com/onnx/onnx/commit/2fdb3ef6)**: move map and sequence types to onnx domain, (#2244) <Ke Zhang> - **[568b65aa](https://github.com/onnx/onnx/commit/568b65aa)**: Improve compatiblity with proto3 and enable reading attributes (#2288) <Dmitri Smirnov> - **[1f350f2c](https://github.com/onnx/onnx/commit/1f350f2c)**: Remove type info for loop variadic input in Loop op used to compose the Range op (#2287) <Hariharan Seshadri> - **[eb139446](https://github.com/onnx/onnx/commit/eb139446)**: Add Foundation WG to working-groups.md (#2276) <Ryan Loney> - **[4eabc4b3](https://github.com/onnx/onnx/commit/4eabc4b3)**: Fix testdata model for CumSum. Add exclusive attribute. (#2271) <jignparm> - **[1a62afdb](https://github.com/onnx/onnx/commit/1a62afdb)**: Support GatherND operator in ONNX (#2106) <Hariharan Seshadri> - **[0e330e9d](https://github.com/onnx/onnx/commit/0e330e9d)**: Support ScatterND operator in ONNX (#2220) <Bowen Bao> - **[733f7a6a](https://github.com/onnx/onnx/commit/733f7a6a)**: Add Det to ONNX (#2233) <Bowen Bao> - **[52187738](https://github.com/onnx/onnx/commit/52187738)**: Update the description of nearest_mode of resize op (#2257) <daquexian> - **[64b4b686](https://github.com/onnx/onnx/commit/64b4b686)**: Adding sparse tensor to ONNX (#2019) <G. Ramalingam> - **[c8a8b7cc](https://github.com/onnx/onnx/commit/c8a8b7cc)**: Support Range operator in ONNX (#2242) <Hariharan Seshadri> - **[44b0d6d5](https://github.com/onnx/onnx/commit/44b0d6d5)**: Update resize op (#2057) <daquexian> - **[7d907964](https://github.com/onnx/onnx/commit/7d907964)**: Add function to fuse dynamic quantization graph into 1 node (#2187) <Ashwini Khade> - **[36f8e6d9](https://github.com/onnx/onnx/commit/36f8e6d9)**: Update logo_request.md (#2231) <Prasanth Pulavarthi> - **[4eb737c8](https://github.com/onnx/onnx/commit/4eb737c8)**: Update Clip in opset 11 to support min/max as inputs instead of attributes (#2096) <Bowen Bao> - **[a25e1388](https://github.com/onnx/onnx/commit/a25e1388)**: Fix segfault in tile shape inference (#2221) <daquexian> - **[2dc273c7](https://github.com/onnx/onnx/commit/2dc273c7)**: update onehot shape inference to reflect the spec for depth input (#2224) <Ashwini Khade> - **[665211c1](https://github.com/onnx/onnx/commit/665211c1)**: Add GatherElements Op and Rename ScatterElements (#2143) <Lara Haidar> - **[3ba2e31a](https://github.com/onnx/onnx/commit/3ba2e31a)**: Unique (#2141) <liqunfu> - **[5a5588ad](https://github.com/onnx/onnx/commit/5a5588ad)**: Clarify dimension variable scoping (#2211) <G. Ramalingam> - **[fabe39d5](https://github.com/onnx/onnx/commit/fabe39d5)**: Liqun/topk sort (#2126) <liqunfu> - **[453aa644](https://github.com/onnx/onnx/commit/453aa644)**: Update document for NMS (#2193) <Hector Li> - **[34e28ec2](https://github.com/onnx/onnx/commit/34e28ec2)**: Handle negative 'axis' value in Split type and shape inferencing (#2177) <Scott McKay> - **[28ec4583](https://github.com/onnx/onnx/commit/28ec4583)**: depth to space shuffle order (#2163) <Negin Raoof> - **[98f72629](https://github.com/onnx/onnx/commit/98f72629)**: minor updates to fix links in readme (#2189) <Prasanth Pulavarthi> - **[321d1467](https://github.com/onnx/onnx/commit/321d1467)**: Add check to disallow squeezing input axes which are not 1 (#2204) <Ashwini Khade> - **[573f0dc9](https://github.com/onnx/onnx/commit/573f0dc9)**: fix a bug in fun shape inference (#2188) <Tang, Cheng> - **[36dc7110](https://github.com/onnx/onnx/commit/36dc7110)**: Clarify ambiguity in gather spec regarding indices expectation (#2202) <Ashwini Khade> - **[a2449673](https://github.com/onnx/onnx/commit/a2449673)**: Fix some minor issues in IR.md and Versioning.md (#2108) <edgchen1> - **[349aff69](https://github.com/onnx/onnx/commit/349aff69)**: Skip install typing package for python >=3.5 (#2199) <bddppq> Test Plan: ci Reviewed By: bddppq, benoitsteiner Differential Revision: D17296390 fbshipit-source-id: 9f9f5ce85d9694128008d756c2ea393bd4e0cb71 * Skip test_triangular_solve_batched (#26108) Summary: cc: gchanan zou3519 I will look into why this is failing spuriously. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26108 Differential Revision: D17348399 Pulled By: zou3519 fbshipit-source-id: aed4ccfc3f106692d4e32acc029740309570b0c3 * Exposing Fused8BitRowwiseQuantizedToFloat in PyTorch (#26080) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26080 Will be used in c2 ctr_mbl_feed model to PyTorch conversion Test Plan: Unit test Reviewed By: yinghai Differential Revision: D17337604 fbshipit-source-id: a90d9f5dc38301608d1562c6f2418e7f4616e753 * make sure all out stringstreams start out empty in jit_log.hpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25863 Differential Revision: D17347386 Pulled By: Krovatkin fbshipit-source-id: a42cf56680a27bc3e50fd945ab372a409225b875 * tracing with an opt-in by file name (#25895) Summary: This basically works a simple filter as you suggested ZolotukhinM `export PYTORCH_JIT_LOG_LEVEL=guard_elimination` will print all `GRAPH_DUMP` and `GRAPH_UPDATE` statements. `export PYTORCH_JIT_LOG_LEVEL=>guard_elimination:>alias_analysis` will print all `GRAPH_DUMP`, `GRAPH_UPDATE` **and** `GRAPH_DEBUG` statements in `guard_elimination.cpp` **and** in `alias_analysis.cpp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/25895 Differential Revision: D17309090 Pulled By: Krovatkin fbshipit-source-id: 8fa9e67cc9af566b084d66cc15223633fda08444 * Stop re-ordering TH(C)Blas arguments. (#25606) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25606 This just complicates the codegen for no benefit. Test Plan: Imported from OSS Differential Revision: D17172498 Pulled By: gchanan fbshipit-source-id: d2f50e45400ac0336792422518e03dbae3a1bedc * Kill TH(C)Blas kwarg_only declarations. (#25607) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25607 Since we don't generate these as end-user bindings, and we no longer reorder based on this property, we can just get rid of the property. Test Plan: Imported from OSS Differential Revision: D17172500 Pulled By: gchanan fbshipit-source-id: f84fd8bb2b13598501897f56871b21339585d844 * simplify build_android_gradle.sh (#25897) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25897 It doesn't hurt to set all variables unconditionally. And we can create link to lib directory instead of specific files - this way it's easier to switch between dynamic/static library names. Test Plan: - check android gradle CI; - use stack diff to check all 4 architectures on PR; Pull Request resolved: https://github.com/pytorch/pytorch/pull/25897 Differential Revision: D17307240 Pulled By: ljk53 fbshipit-source-id: c975085ddda852ef7da1c29935c2f6a28d797e5a * change gradle build to use static libtorch + gc-sections (#25984) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25984 Link static libtorch libraries into pytorch.so (API library for android) with "-Wl,--gc-sections" flag to remove unused symbols in libtorch. Test Plan: - full gradle CI with stacked PR; - will check final artifacts.tgz size change; Differential Revision: D17312859 Pulled By: ljk53 fbshipit-source-id: 99584d15922867a7b3c3d661ba238a6f99f43db5 * remove "build_deps" arg from setup.py command in (#26113) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26113 After https://github.com/pytorch/pytorch/pull/16914, passing in an argument such as "build_deps" (i.e. python setup.py build_deps develop) is invalid since it gets picked up as an invalid argument. ghstack-source-id: 90003508 Test Plan: Before, this script would execute "python setup.py build_deps develop", which errored. Now it executes "python setup.py develop" without an error. Verified by successfully running the script on devgpu. In setup.py, there is already a `RUN_BUILD_DEPS = True` flag. Differential Revision: D17350359 fbshipit-source-id: 91278c3e9d9f7c7ed8dea62380f18ba5887ab081 * Stop reordering TH random function arguments. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25608 Test Plan: Imported from OSS Differential Revision: D17172494 Pulled By: gchanan fbshipit-source-id: 5a46889cc040297231e2473ae5b2879b39f8d60a * fix base_lr overridden in cyclic lr (#26105) Summary: base_lr parameter was being overridden by super `__init__`, see https://github.com/pytorch/pytorch/issues/21965. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26105 Reviewed By: yf225 Differential Revision: D17346724 Pulled By: vincentqb fbshipit-source-id: 4b146bd64f4f385c0a9c4f4df8eb8991312fb15c * Skip inserting duplicate observers (#25504) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25504 Skip inserting duplicate observers for values observed in forward method of a child module or other methods in the current module. Test Plan: python test/test_jit.py -- 'TestJit.insert_observers' python test/test_jit.py -- 'TestJit.insert_observers_child_qconfig' python test/test_jit.py -- 'TestJit.insert_observers_skip_values' Imported from OSS Differential Revision: D17208888 fbshipit-source-id: e04f1c22ab1c4f410933a17a3ef31acf5f217323 * Implementation of ConstantThenLinearWarmupLRPolicy and CompositeCyclicalLRPolicy (#25970) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25970 ConstantThenLinearWarmupLRPolicy: * first use a constant warm up * then ramp up to the fixed learning rate linearly CompositeCyclicalLRPolicy: * first use a constant warm up * then ramp up to the fixed learning rate linearly * then use cyclical learning rates for the rest of time Pull Request resolved: https://our.intern.facebook.com/intern/opensource/shipit/preview/D17302632/ Test Plan: * buck test * https://our.intern.facebook.com/intern/testinfra/testconsole/testrun/5910974518377039/ * https://our.intern.facebook.com/intern/testinfra/testrun/1407375027118303 * checked the consistency of learning rates w.r.t. iterations with offline simulations n143987 Reviewed By: swatirallapalli Differential Revision: D17302632 fbshipit-source-id: 1098d4dd9109a48932b76e36d78239e49f8077a1 * Fix build warning in vec256_qint.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26121 Test Plan: Imported from OSS Differential Revision: D17351960 Pulled By: jamesr66a fbshipit-source-id: 12389729fe5fb8d863cf47288920ea375a3e74ab * Kill kwarg_only declarations in Declarations.cwrap. (#25609) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25609 They don't do anything anymore. Test Plan: Imported from OSS Differential Revision: D17172497 Pulled By: gchanan fbshipit-source-id: 5cf7fdcf7d2da0054ac1bd7d8d2b70a2264b8c93 * Support quantizing any methods called (#25505) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25505 Support for quantizing all the methods called by forward method, including child module methods and other methods in the current module It relies on module level constant prop, we need to figure out a way to do constant prop for these methods as well. We can either do constant prop in the module level or do constant prop in the quantization function, but this will need some discussion. Test Plan: python test/test_jit.py 'TestJit.insert_quant_dequant' python test/test_quantizer.py Imported from OSS Differential Revision: D17208887 fbshipit-source-id: 21749457b21b00a6edada290c26324e2fb210b10 * C++ unregister_module function for Module (#26088) Summary: This PR adds ```unregister_module``` to ```nn::Module``` and ```erase``` function to ```OrderedDict```. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26088 Differential Revision: D17360058 Pulled By: yf225 fbshipit-source-id: f1f375b4751317da85b8da1458e092fe2405ceec * Port fuse_linear from pytorch/tvm (#25623) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25623 Port over fuse_linear pass from pytorch/tvm project, we'll need this in backend specific quantization pass to match aten::linear and swap it with quantized linear Test Plan: python test/test_jit.py 'TestJit.test_fuse_linear' Imported from OSS Differential Revision: D17208890 fbshipit-source-id: f4ff3889ae4525797d3b986f46ae37e50ea49116 * Add device check before accessing data_ptr in PackLayer (#26056) Summary: fixes https://github.com/pytorch/xla/issues/927 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26056 Differential Revision: D17331859 Pulled By: ailzhang fbshipit-source-id: bdc334f03c8dcbb4ef4f5e059a63ef188a0b8b61 * Create TensorBoard test classes in all cases (#26005) Summary: To give better signal to the user, we will now always create the TensorBoard tests classes and just disable tests if TensorBoard is not installed. cc lanpa sanekmelnikov natalialunova pietern [test macos] Pull Request resolved: https://github.com/pytorch/pytorch/pull/26005 Reviewed By: sanekmelnikov Differential Revision: D17352430 Pulled By: orionr fbshipit-source-id: 87a592064f4768ffded76a3d666a8e508a1ef164 * Automatic update of fbcode/onnx to 95252c2adec185e305e34486c6756ece9aa8f57f (#26137) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26137 Previous import was 7988d8360b11e6003560076e9b1d4aa426db3244 Included changes: - **[95252c2a](https://github.com/onnx/onnx/commit/95252c2a)**: Fix shapeinference function (#2296) <jignparm> - **[414285bb](https://github.com/onnx/onnx/commit/414285bb)**: fix the buffer overflow problem in shape inference logic of Squeeze op <Lu Fang> - **[797cdd0f](https://github.com/onnx/onnx/commit/797cdd0f)**: Support for negative indices in 'Gather', 'GatherElements', 'ScatterElements', 'OneHot' (#2260) <Negin Raoof> - **[7636978d](https://github.com/onnx/onnx/commit/7636978d)**: Fix collect_snippets warnings (#2277) <Lutz Roeder> - **[fa70c33b](https://github.com/onnx/onnx/commit/fa70c33b)**: Update printable_graph in helper.py to output details of initializers that do not have matching graph inputs. (#2135) <Scott McKay> - **[428d09b0](https://github.com/onnx/onnx/commit/428d09b0)**: test int64 input type for 'where' op (#2253) <Negin Raoof> Test Plan: ci Reviewed By: bddppq Differential Revision: D17353795 fbshipit-source-id: 6d4f39754863a30f427f4512c7b228e45d3ce84f * Add fusion for quantized linear (#25624) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25624 First fuse the splitted op into aten::linear and then fuse `dequant - aten::linear - quant` into quantized linear op Test Plan: python test/test_jit.py 'TestJit.quant_fusion' Imported from OSS Differential Revision: D17208891 fbshipit-source-id: 864b19fabab2e8e6f8f8ad35eb3dbbf2d5fdb8c4 * Implement tensor.refine_names (#25842) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25842 `tensor.refine_names(*names)` takes `tensor` and attempts to name its dimensions `names` out-of-place. If a dimension `i` already had a name, then it cannot be changed (so tensor.names[i] must equal names[i]); if the original dimension did not have a name, then the new name (names[i]) can be anything. `tensor.refine_names(*names)` also accepts a glob '*' that greedily selects names from `tensor`. Here are some examples: - `Tensor[None].refine_names('N') -> Tensor[N]` - `Tensor[N].refine_names('N') -> Tensor[N]` - `Tensor[N].refine_names('D') -> Error!` - `Tensor[N].refine_names(None) -> Error!` - `Tensor[None, None].refine_names('*', D) -> Tensor[None, D]` Test Plan: - new tests [namedtensor ci] Differential Revision: D17255548 Pulled By: zou3519 fbshipit-source-id: fdbdb3a12f24fbe37ce1e53ed09dc8a42589d928 * Implement tensor.align_as(other), change tensor.align_to(names) (#25843) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25843 `tensor.align_to(*names)` permutes the dimensions of `tensor` and adds additional 1-sized dimensions such that the output tensor has dimensions in the same order as `names`. All dimensions of `tensor` must be present in `names`, in addition, this function requires that all dims of `tensor` be named. `tensor.align_as(other)` is equivalent to `tensor.align_to(*other.names)`. I'm planning on changing `torch.align_tensors(*tensors)` to align closer to these semantics because there didn't seem to be a clear use case for the old semantics that preserve unnamed dimensions. That will come in a future change. Test Plan: - new tests [namedtensor ci] Differential Revision: D17255549 Pulled By: zou3519 fbshipit-source-id: 1e437ad81e9359b4d5bd0e7e64c3a1be441fc3e3 * C++ API parity: at::Tensor::data Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26008 Test Plan: Imported from OSS Differential Revision: D17343488 Pulled By: pbelevich fbshipit-source-id: b9ba5e26cad621a428a14292446d7fb5a6e5535d * Fix bug with named tensors and (no) tracer support (#26106) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26106 Previously, in the named tensors build, an operator is marked as non-traceable if ANY of its overloads are named tensor overloads. This breaks the tracer for things like torch.full (has a names= overload for named tensor) and tensor.sum (has a Dimname overload for named tensor). This PR fixes the problem by putting the "no tracer support" logic into the location where the tracer attempts to construct a graph by adding a Dimname/DimnameList argument to a node. Test Plan: - new test in test_jit.py to check if torch.full is traceable - new test in test_namedtensor.py to check what happens when someone tries to trace a function that uses named tensor APIs. - [namedtensor ci] Differential Revision: D17353452 Pulled By: zou3519 fbshipit-source-id: b0b843c8357ffe54baee6e8df86db914f0b1ece4 * Add data field to Tensor pyi. (#26093) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26093 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: vsiles Differential Revision: D17366320 Pulled By: ezyang fbshipit-source-id: 025f1c3d75d294fc1b51ddc540e542a05dc72b6a * Change schedulers to chainable form (#24352) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24352 Enable chainable schedulers as requested in #13022 by implementing the changes mentioned below from [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513370208). * Changing the behavior of schedulers to the chainable formula when available * Using the closed form whenever epoch is different from None until the next release with a deprecation warning * Making `get_computed_values` the supported way of obtaining the last computed learning rate by the scheduler (see [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513940729) for new syntax) * Returning a deprecation warning when invoking the undocumented get_lr function (see [comment](https://github.com/pytorch/pytorch/pull/21800#discussion_r294305485)) referring to `get_computed_values`, and deprecating it in the next release. * `CosineAnnealingWarmRestart` still takes an epoch parameter as it is the only one with a mechanic relying on fractional epoch * `MultiplicativeLR` is consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax. # #20527 ### Before The user calls scheduler with a constant epoch either across loops or in the same loop. ``` import torch.optim as optim from torch import nn conv = nn.Conv2d(3,3,3) optimizer = optim.Adam(conv.parameters()) lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2) # Scheduler with sometimes-constant epoch number for epoch in [0, 0, 1, 1, 2, 2, 3, 3]: lr_scheduler.step(epoch) print(optimizer.param_groups[0]['lr']) ``` ### After If the user wants to step ``` import torch.optim as optim from torch import nn conv = nn.Conv2d(3,3,3) optimizer = optim.Adam(conv.parameters()) lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2) last_epoch = -1 for epoch in [0, 0, 1, 1, 2, 2, 3, 3]: # Check if epoch number has changed manually if epoch-last_epoch > 0: lr_scheduler.step() last_epoch = epoch print(epoch, scheduler.get_computed_values()) ``` # #22107 ### Before ``` import torch from torchvision.models import resnet18 net = resnet18() optimizer = torch.optim.SGD(net.parameters(), 0.1) scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1) scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1) for i in range(10): # Scheduler computes and returns new learning rate, leading to unexpected behavior print(i, scheduler.get_lr()) scheduler.step() ``` ### After ``` import torch from torchvision.models import resnet18 net = resnet18() optimizer = torch.optim.SGD(net.parameters(), 0.1) lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1) lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1) for i in range(10): # Returns last computed learning rate by scheduler print(i, lr_scheduler.get_computed_values()) lr_scheduler.step() ``` Test Plan: Imported from OSS Differential Revision: D17349760 Pulled By: vincentqb fbshipit-source-id: 0a6ac01e2a6b45000bc6f9df732033dd81f0d89f * Run PyTorch macOS CPU-only build/test on all PRs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26096 Test Plan: Imported from OSS Differential Revision: D17366419 Pulled By: pietern fbshipit-source-id: 138659dae346aad3cde52d488cd1780614e7692f * Use CircleCI commands for brew update/install (#26159) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26159 The snippets for working with Homebrew were duplicated across binary builds, macOS builds, and iOS builds. In #25336, the CircleCI configuration version was updated to version 2.1, which supports parameterized commands. This means we no longer have to use YAML tricks to duplicate stanzas and instead can natively define a series of reusable steps. Motivation for doing this is that the macOS binary builds were still using the slow `brew update` instead of `git fetch` (see #25988). [test macos] [test wheel] Test Plan: Imported from OSS Differential Revision: D17366538 Pulled By: pietern fbshipit-source-id: 194c0f37c1dc999705f3ba97fdabf4ff18728d93 * Turn should_run_job into command Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26160 Test Plan: Imported from OSS Differential Revision: D17366539 Pulled By: pietern fbshipit-source-id: a870d6da21925764986c6c748ad291440b78e6fd * Turn setup_linux_system_environment into command Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26162 Test Plan: Imported from OSS Differential Revision: D17366537 Pulled By: pietern fbshipit-source-id: 98413daa344812f06578c3373d8516292d2f21f5 * Turn setup_ci_environment into command Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26163 Test Plan: Imported from OSS Differential Revision: D17366536 Pulled By: pietern fbshipit-source-id: 07181a77aaeba5457aa716ceac9cc404aacefe5f * Kill most defaults in Declarations.cwrap. (#25610) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25610 They don't do anything anymore, since this isn't the end-user interface. Test Plan: Imported from OSS Differential Revision: D17172495 Pulled By: gchanan fbshipit-source-id: a380d970f0836ed85eb9ac2aa42eb73655d775aa * Get rid of more defaults in Declarations.cwrap. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25611 Test Plan: Imported from OSS Differential Revision: D17172493 Pulled By: gchanan fbshipit-source-id: 0f4319f8024ac4eca62576231214227b341f56c4 * Kill remaining defaults in Declarations.cwrap. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25612 Test Plan: Imported from OSS Differential Revision: D17172499 Pulled By: gchanan fbshipit-source-id: f99e813a4a90e8576541da317027e6f8ae76079b * Remove requests as dependency (#26083) Summary: local build is slow... test in CI... Pull Request resolved: https://github.com/pytorch/pytorch/pull/26083 Differential Revision: D17346949 Pulled By: ailzhang fbshipit-source-id: f552d1a4be55ad4e2bd915af7c5a2c1b6667c446 * Fix 'in' return true incorrectly (#24156) Summary: Because of 'return NotImplemented', __contains__ return True when the element is not a number. bool(NotImplemented) == True Pull Request resolved: https://github.com/pytorch/pytorch/pull/24156 Differential Revision: D16829895 Pulled By: zou3519 fbshipit-source-id: 9d3d58025b2b78b33a26fdfcfa6029d0d049f11f * guard dyndep with a lock (#26153) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26153 I am suspecting that our multithreaded test-system causes issue with dyndep, if two places try to concurrently InitOpsLibrary. So perhaps we just guard this by a lock. This is just a guess-fix, as it is impossible to repro. Test Plan: sandcastle Reviewed By: bddppq Differential Revision: D17361310 fbshipit-source-id: 596634a2098b18881abbd26a5a727a5ba0d03b6e * Add documentation to logging Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26175 Differential Revision: D17371085 Pulled By: Krovatkin fbshipit-source-id: ea06f4e16fc320940a299e8e1d4f4d7c76f5950a * Fold quantize op into module (#25625) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25625 We want to fold the quantize op for weights/bias into module to avoid quantizing weights on the fly. Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D17208889 fbshipit-source-id: 1854b8953b065855d210bc1166533c08ca264354 * Revert D17349760: Change schedulers to chainable form Test Plan: revert-hammer Differential Revision: D17349760 Original commit changeset: 0a6ac01e2a6b fbshipit-source-id: 41c2c136215dabc26cad5098a08eff2a2a29b715 * Use torch::from_blob instead of shareExternalPointer, nits (#25973) Summary: The main part is to switch at::Tensor creation from usage of `torch::empty(torch::IntArrayRef(...))->ShareExternalPointer(...) to torch::from_blob(...)` Removed explicit set of `device CPU` as `at::TensorOptions` by default `device CPU` And renaming of local variables removing `input` prefix to make them shorter Pull Request resolved: https://github.com/pytorch/pytorch/pull/25973 Differential Revision: D17356837 Pulled By: IvanKobzarev fbshipit-source-id: 679e099b8aebd787dbf8ed422dae07a81243e18f * Make schema part of RegisterOperators::Options (#26114) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26114 With this diff, the operator schema or name can be specified as part of the options objects: ``` static auto registry = torch::RegisterOperators() .op(torch::RegisterOperators::options().schema("my_op").kernel(&kernel)) .op(...); ``` This does not break backwards compatibility, all old APIs are kept as shorthands. This (a) makes the API more consistent, accumulating all options into the options objects and not treating schema special anymore, and (b) this is required for allowing the c10 dispatcher to forward registration calls to ATenDispatch for ops that are still on that dispatcher, see plan in https://github.com/pytorch/pytorch/issues/24132 ghstack-source-id: 90049402 Test Plan: unit tests Differential Revision: D17350383 fbshipit-source-id: cbb8f33a52dccb2a4522753e7b5ac8ba35b908fd * Allow overwriting catch-all kernels (#25947) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25947 Previously, the c10 dispatcher didn't allow having a catch-all kernel and backend specific kernels at the same time. This is also the long term goal. But to make the current XLA implementation work, we need to allow them to overwrite these ops with XLA variants. This diff changes that so that ops can have both, catchall and backend specific kernels, and will call into the catchall kernel if there is no more specific kernel registered. This is also the current behavior of globalATenDispatch. ghstack-source-id: 90049398 Test Plan: unit tests Differential Revision: D17293036 fbshipit-source-id: f2d5928e904c1dc9b6b89e9bb468debe48a4056c * Register ATen ops with c10 (#26131) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26131 Changes in this PR: - For each operator with use_c10_dispatcher: True, additionally generate a c10 registration line in TypeDefault.cpp, CPUType.cpp, and other backend files. - This doesn't change globalATenDispatch yet, the c10 registration is purely additional and the operator calling path doesn't change. A diff further up the stack will change these things. - Enable the use_c10_dispatcher: True flag for about ~70% of operators - This also changes the c10->jit operator export because ATen ops are already exported to JIT directly and we don't want to export the registered c10 ops because they would clash - For this, we need a way to recognize if a certain operator is already moved from ATen to c10, this is done by generating a OpsAlreadyMovedToC10.cpp file with the list. A diff further up in the stack will also need this file to make sure we don't break the backend extension API for these ops. Reasons for some ops to be excluded (i.e. not have the `use_c10_dispatcher` flag set to true): - `Tensor?(a!)` (i.e. optional tensor with annotations) not supported in c++ function schema parser yet - `-> void` in native_functions.yaml vs `-> ()` expected by function schema parser - out functions have different argument order in C++ as in the jit schema - `Tensor?` (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None. - fixed-size arrays like `int[3]` not supported in c10 yet These will be fixed in separate diffs and then the exclusion tag will be removed. ghstack-source-id: 90060748 Test Plan: a diff stacked on top uses these registrations to call these ops from ATen Differential Revision: D16603131 fbshipit-source-id: 315eb83d0b567eb0cd49973060b44ee1d6d64bfb * Updating submodules Summary: GitHub commits: https://github.com/facebook/rocksdb/commit/83a6a614e9bf5f3f06abc265b736e868acee498b https://github.com/pytorch/fbgemm/commit/c8cac64995d8d8af871e461affbf505ac7fce4d8 Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 1f5bc1e065fe13d89eeb42539f21a8ab0ab8b8a1 * Nightly build for for iOS (#26074) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26074 ### Summary This PR creates a nightly job for iOS builds. The job will generate a couple of static libraries that contains three architectures(x86, arm64, armv7s) and upload them to AWS s3. ### Note The test phase in this job is missing right now, meaning if there is a linking error, we won't be able to know it. To add the test jobs, we have to put a dummy test App in the repo and manually link the libraries to the app after the build finishes. This will be done in the next following PRs Test Plan: Imported from OSS Differential Revision: D17363066 Pulled By: xta0 fbshipit-source-id: 5beeb4263af5722f0a852297023f37aaea9ba4b1 * Change the source link in podspec (#26089) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26089 ### Summary A couple of changes 1. Replace the source link with the newly nightly build address 2. Remove module support for Swift and Objective-C 3. Expose all static libraries instead of archiving them into one single library. This is because those static libraries might contain object files that have the same name, e.g. `init.c.o` in both `libcupinfo.a` and `libqnnpack.a`. If we archive them into one using this `libtool -static` command, by default, it only picks one object file and discards the others, which could result in undefined symbols when linking the executable. The change here is to expose all the static libraries and let the linker decide which one to use. ### Test Plan - pod spec lint succeed - `pod spec lint --verbose --allow-warnings --no-clean --use-libraries --skip-import-validation` Test Plan: Imported from OSS Differential Revision: D17363037 Pulled By: xta0 fbshipit-source-id: ba77b0001b58e6e2353d8379d932db598166d37d * Updating submodules Summary: GitHub commits: https://github.com/facebook/rocksdb/commit/97631357aa274d06a7ab09b3cde7b909262cc4dd https://github.com/pytorch/fbgemm/commit/2f1477dfee9465c1e2dbdf21722970b3fa1baf86 Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 33029d2e8c6a3664a35823829670f6ed9dfc3b44 * Tensor renaming to dtype, shape; support long, double (#26183) Summary: Applying dzhulgakov review comments org.pytorch.Tensor: - dims renamed to shape - typeCode to dtype - numElements to numel newFloatTensor, newIntTensor... to newTensor(...) Add support of dtype=long, double Resorted in code byte,int,float,long,double For if conditions order float,int,byte,long,double as I expect that float and int branches will be used more often Tensor.toString() does not have data, only numel (data buffer capacity) Pull Request resolved: https://github.com/pytorch/pytorch/pull/26183 Differential Revision: D17374332 Pulled By: IvanKobzarev fbshipit-source-id: ee93977d9c43c400b6c054b6286080321ccb81bc * use whitelist for selecting observed values (#25974) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25974 Previously we observe all the Tensor values, but what we want is actually observing only the ones that can be quantized. Test Plan: python test/test_jit.py python test/test_quantizer.py Imported from OSS Differential Revision: D17348986 fbshipit-source-id: 55be0d73862a0e7eb1e7fd882d16e0d830618b63 * fix circle CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26225 Test Plan: Imported from OSS Differential Revision: D17379899 Pulled By: xta0 fbshipit-source-id: 4077aa0149b23560f3a9e29531ca9bc612a2c09c * Add histogram observer (#23959) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23959 Add histogram observer that records the running histogram of tensor values along with min/max values. ghstack-source-id: 90076996 Test Plan: Added a test test_histogram_observer buck test mode/dev caffe2/test:quantization -- 'test_histogram_observer' buck test mode/dev caffe2/test:quantization -- 'test_observer_scriptable' Differential Revision: D16692835 fbshipit-source-id: 0f047d3349cb9770fad4a2b6cb346c51d9e99cd4 * Add isBackwardCompatibleWith for Argument and FunctionSchema (#23409) Summary: we intend to be conservative, and will relax the checks in future if necessary. So far, we consider the following three conditions as backward compatible: 1) two schemas are equal 2) two schemas have same number of arguments, and this schema's arguments are backward compatible with the corresponding ones in argument list of old_schema. 3) this schema has m argument, old_argument has n argument, m > n. the first n arguments of this schema are backward compatible with the corresponding arguments of old_schema. the remaning arguments must be either OptionalType or provide default values. Pull Request resolved: https://github.com/pytorch/pytorch/pull/23409 ghstack-source-id: 90111021 Test Plan: buck test //caffe2/test:function_schema Reviewed By: hl475 Differential Revision: D16505203 fbshipit-source-id: e4099537776a60e8945e5c3cd57fa861f3598a9b * Creates generic device type testing framework (#25967) Summary: This PR addresses https://github.com/pytorch/pytorch/issues/24851 by... 1. lets device types easily register themselves for testing 2. lets tests be written to run on multiple devices and with multiple dtypes 3. provides a mechanism to instantiate those tests so they are discoverable and filterable by unittest and pytest It refactors three tests from test_torch.py to demonstrate how to use it. `test_diagonal` is the simplest example. Most tests just need to be modified to accept 'device' as an argument. The framework will then instantiate `test_diagonal_cpu` and `test_diagonal_cuda` (when CUDA is available) which call `test_diagonal` with the appropriate 'device' argument. `test_neg` also has dtype variants. It accepts both 'device' and 'dtype' as arguments, and the dtypes it runs with are specified with the 'dtypes' decorator. Dtypes can be specified for all device types and particular device types. The framework instantiates tests like `test_neg_cpu_torch.float`. `test_inverse` has device-specific dependencies. These dependencies are expressed with the sugary 'skipCUDAIfNoMagma' and 'skipCPUIfNoLapack' decorators. These decorators are device-specific so CPU testing is not skipped if Magma is not installed, and there conditions may be checked after or before the test case has been initialized. This means that skipCUDAIfNoMagma does not initialize CUDA. In fact, CUDA is only initialized if a CUDA test is run. These instantiated tests may be run as usual and with pytest filtering it's easy to run one test on all device types, run all the tests for a particular device type, or run a device type and dtype combination. See the note "Generic Device-Type Testing" for more detail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25967 Differential Revision: D17381987 Pulled By: mruberry fbshipit-source-id: 4a639641130f0a59d22da0efe0951b24b5bc4bfb * adds sync to flaky test_events_multi_gpu_query (#26231) Summary: This test can sometimes fail in CI. I suspect this flakiness is because the test asks a CUDA stream to record an event, fails to synchronize the CPU with that stream, then checks if the event is recorded on the CPU. There is no guarantee this will have happened. This one-line change preserves the intent of the test while ensuring the GPU has recorded the event before the CPU queries it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26231 Differential Revision: D17382110 Pulled By: mruberry fbshipit-source-id: 35b701f87f41c24b208aafde48bf10e1a54de059 * Added possible out of shared memory error message (#25730) Summary: Fixes https://github.com/pytorch/pytorch/issues/5040 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25730 Differential Revision: D17226214 Pulled By: pbelevich fbshipit-source-id: 92278272aab74e6690f14fc9597acfd1a98854b7 * Remove armv7s build from iOS (#26222) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26222 ### Summary The last generation of armv7s device is Phone 5C. As discussed with David offline, we decided not to support iOS armv7s devices. ### Test plan - CI finishes successfully - Builds can be run only on X86_64 and arm64 devices Test Plan: Imported from OSS Differential Revision: D17385308 Pulled By: xta0 fbshipit-source-id: f883999aed18224ea3386b1f016964a33270fa34 * Back out "[quant][observer] Add histogram observer" (#26236) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26236 Original diff broke oss CI. Reverting. Original commit changeset: 0f047d3349cb ghstack-source-id: 90125990 Test Plan: testinprod Reviewed By: hx89 Differential Revision: D17385490 fbshipit-source-id: 4258502bbc0e3a6dd6852c8ce01ed05eee618b1a * Ports most of test_torch.py to generic device type framework (#26232) Summary: This PR moves many tests in test_torch.py to the generic device type framework. This means that many CUDA tests now run in test_torch.py and there is greater consistency in how tests for many device types are written. One change is that all MAGMA tests are run on the default stream due to intermittent instability running MAGMA on the non-default stream. This is a known issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26232 Test Plan: While this PR edits the tests itself, it was validated using two independent methods: (1) The code was reviewed and it was verified that all deleted functions were actually moved. (2) The output of the TestTorch CI was reviewed and test outputs were matched before and after this PR. Differential Revision: D17386370 Pulled By: mruberry fbshipit-source-id: 843d14911bbd52e8aac6861c0d9bc3d0d9418219 * Add type hint for cuda.set_rng_state (#26200) Summary: Fixes https://github.com/pytorch/pytorch/issues/26199 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26200 Differential Revision: D17386885 Pulled By: soumith fbshipit-source-id: 9da03aae29281b2ed691cbfdd7b85fde55e5b7ef * Add a wrapper for inspect in JIT to produce better error message (#25415) Summary: If source code is not available due to packaging (e.g. sources are compiled to .pyc), TorchScript produces very obscure error message. This tries to make it nicer and allow to customize message by overriding _utils_internal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25415 Test Plan: Really hard to unittest properly. Did one off testing by compiling to .pyc and checking the message. Differential Revision: D17118238 Pulled By: dzhulgakov fbshipit-source-id: 3cbfee0abddc8613000680548bfe0b8ed52a36b0 * Use MIOpen for transpose convolutions (#26172) Summary: Provides significant performance uplift where used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26172 Differential Revision: D17374862 Pulled By: bddppq fbshipit-source-id: 85d2df3c67b8935bc54f3a81a912a25c0102743a * Call aten ops through c10 dispatcher (#23668) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/23668 - The eager mode frontend now calls operators who are defined in native_functions.yaml with `use_c10_dispatcher: True` through the c10 dispatcher and not anymore through globalATenDispatch(). - These operators aren't registered with globalAtenDispatch anymore, only on c10 now. - Backend extensions calling globalATenDispatch().registerOp() to add their own kernels still work, this function will forward the registration to the c10 dispatcher for them. ghstack-source-id: 90130455 Test Plan: benchmarks at https://docs.google.com/document/d/1gpzKZcFf1JJameY1vKxF7Cloul9s6D8HKIK2_Pp1hFo/edit# Differential Revision: D16603133 fbshipit-source-id: 991f17b355e9c78c5e86fee4fa381df7ab98ac82 * Remove unboxedAutogradKernel from c10 (#26130) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26130 Since we now just use TensorTypeId::VariableTensorId, there's no need to treat autograd kernels any differently. ghstack-source-id: 90130457 Test Plan: unit tests Differential Revision: D17353873 fbshipit-source-id: d4468506a5366bc5e7429144b090b3e78af9de62 * Refines test_torch.py generic device testing (#26244) Summary: - Adds SkipCUDAIfRocm and skipCPUIfNoMkl decorators, ports corresponding tests - Changes "SkipIf" input semantics for consistency - Removes torchtest, which has been replaced with this new generic framework - Refactors some common parts out of CUDA tests to TestTorchDeviceType - Ensures all MAGMA tests run on default stream by putting the skipCUDANonDefaultStreamIf in the skipCUDAIfNoMagma decorator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26244 Differential Revision: D17389060 Pulled By: mruberry fbshipit-source-id: 1375774f24c2266049e6d4b899e7300ddf32eac8 * Fix Windows build (#26246) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26246 Broken due to https://github.com/pytorch/pytorch/issues/12117. Try fixing it. ghstack-source-id: 90137033 Test Plan: waitforsandcastle Reviewed By: zou3519 Differential Revision: D17387317 fbshipit-source-id: 705998c0b1608668d510b47f4fe20cecf5057c5f * Fix CI (#26250) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26250 Exclude some ops from the c10 dispatcher that don't work with it yet. ghstack-source-id: 90138046 Test Plan: waitforsandcastle Reviewed By: zou3519 Differential Revision: D17390117 fbshipit-source-id: a87fb3048aeba2c3293b95d610ddb8e94369f8fe * Back out "[pytorch][PR] Refines test_torch.py generic device testing" (#26252) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26252 Original commit changeset: 1375774f24c2 Testing to see if this is somehow the source of hangs on ROCm builds. Test Plan: Change is to tests themselves. This diff is for testing the ROCm hang, however. Differential Revision: D17390575 fbshipit-source-id: a6ffd5eb1df3971b99b6d42271a8d3d501ac79c6 * Fix namedtensor ci (#26257) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26257 In native_functions.yaml, all overloads must have unique overload names. This PR fixes `flatten` to have unique names for the overloads. Test Plan: - tested locally, but also [namedtensor ci] Differential Revision: D17391243 Pulled By: zou3519 fbshipit-source-id: aaef654953b4275c43b9d7bd949c46bd011f6c73 * Switch to the new profiler infrastructure (#26174) Summary: The ones supported going forward are rocprofiler and roctracer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26174 Differential Revision: D17387538 Pulled By: bddppq fbshipit-source-id: 19d9828d9d07b5073ab5fa288e24fd65a8b18b52 * Fix binary size of OpsAlreadyMovedToC10.cpp (#26237) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26237 Calling a lot of `std::string` constructors is horrible for binary size, see t53997334. Using `const char*` instead should make the binary size much smaller. ghstack-source-id: 90145501 Test Plan: size checks on the diff Differential Revision: D17386002 fbshipit-source-id: c5420adf225e535396e806a0df92419a7e2ad3e8 * Fix no auto batching bugs: cannot bulk load; not work with namedtuple (#26065) Summary: see title Pull Request resolved: https://github.com/pytorch/pytorch/pull/26065 Differential Revision: D17392851 Pulled By: soumith fbshipit-source-id: 468cd41c8e03d689ff2e0261d948e28daad6bfaf * Upgrade MKLDNN to v0.20.5 (#25757) Summary: 1. Fix issues exposed by below posts. https://github.com/pytorch/pytorch/issues/25242 https://github.com/pytorch/pytorch/issues/25101 https://github.com/pytorch/pytorch/issues/23825 2. Fix RNN support issue in mkldnn-bridge Pull Request resolved: https://github.com/pytorch/pytorch/pull/25757 Differential Revision: D17367948 Pulled By: VitalyFedyunin fbshipit-source-id: d8430d3909ecbf853afa0ce3d968735f86f1da31 * fix hypothesis timeout (#26280) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26280 ghstack-source-id: 90160270 Test Plan: testinprod Differential Revision: D17396861 fbshipit-source-id: ee2348ffa7f6092e2c5647a42d0e17879dcfacd0 * Migrate away from using Variable( in test_nn.py (#26077) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26077 As per #26071, we would like to get rid of the calls to Variable( where possible. This diff removes the calls in the test file test_nn.py. The unit tests should all still pass as expected. ghstack-source-id: 90086624 Test Plan: tests in `test_nn.py` should all pass. Differential Revision: D17336484 fbshipit-source-id: 43fc7bd0b0be835ae89d06162ce1cbe4e0056d91 * Enabled conv methods for the bfloat16 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26167 Differential Revision: D17367728 Pulled By: izdeby fbshipit-source-id: 0a7bd9a6dbc15815af195d644c9372af2135e93a * Move the CUDA implementation of round to ATen. (#25041) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25041 Fix #24617 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25041 Test Plan: Imported from OSS Differential Revision: D17114368 Pulled By: VitalyFedyunin fbshipit-source-id: 6ec6ef99b4451acd7e93491fd4b44fca9ce1809d * Whiltelist and fusion support for quantized::linear - addmm (#26208) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26208 Supporing `addmm` -> `quantized::linear` quant fusion Test Plan: python test/test_jit.py 'TestJit.test_quant_fusion' Imported from OSS Differential Revision: D17380074 fbshipit-source-id: fae88f118f85663d777648695768b0504ed7ccf9 * Whiltelist and fusion support for quantized::linear - matmul(without bias) (#26209) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26209 Support quant fusion for `matmul`(without bias) -> `quantized::linear` Test Plan: python test/test_jit.py 'TestJit.test_quant_fusion' Imported from OSS Differential Revision: D17380075 fbshipit-source-id: 290caee7f7bcf94d2731c0ee9bd40054f0fb9b07 * Updating submodules Summary: GitHub commits: https://github.com/facebook/mcrouter/commit/653434b898ea35810d7369d0911e3bdab9a1c3ac https://github.com/facebook/proxygen/commit/b74fbefc1a69de78989f540d9d0d312945aeadeb https://github.com/facebook/rocksdb/commit/9bd5fce6e89fcb294a1d193f32f3e4bb2e41d994 https://github.com/facebookincubator/mvfst/commit/6efcef720fac04011708840b89d1f174d3f290d0 https://github.com/facebookresearch/pytorch-biggraph/commit/cb7830b6b30d2d24b591178705eaf9e8209ecd09 https://github.com/pytorch/fbgemm/commit/53f0c0d175ae4283609a5b251052f9c6598b8aee Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 78d0e24f5601aa990391a2404ae9d23b325de93f * Add ProcessGroupGloo::createDefaultDevice (#26166) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26166 There were 2 variants to create a new device. One to do so based the name of a network interface, and one to do so based on a hostname or address. In the latter, if the address was not specified, it would lookup the local hostname and try to resolve that. If that failed, the process would crash. In this default path, we now try to lookup and use the local hostname, and if that fails we fallback to using the loopback address. If the local hostname doesn't resolve to an address that we can bind to, it is very likely that this process won't join other processes over the network, and that the user is trying to run a local test. If this assumption is wrong, the user can override the default interface selection by setting the environment variable `GLOO_SOCKET_IFNAME` to the name of the external network interface. I tested this by changing the local hostname to a bogus name and confirmed that default initialization works as expected. Closes #26049. Test Plan: Imported from OSS Differential Revision: D17397898 Pulled By: pietern fbshipit-source-id: 95a2467761d89df87b520d6e5837b92184b0dc12 * Disable broken unit tests (#26301) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26301 - ghstack-source-id: 90176419 Test Plan: waitforsandcastle Differential Revision: D17400971 fbshipit-source-id: b6f9cb27fe955b0200d62591300c70ba79a90e5f * Kill defaults in nn.yaml. (#26282) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26282 Since this isn't the end-user API anymore, we shouldn't have defaults. Test Plan: Imported from OSS Differential Revision: D17397153 Pulled By: gchanan fbshipit-source-id: d44040bec0ee9c70734a53ebcc10a96f12226a29 * Upgrade Caffe2 docker images to 306 to include roctracer and rocprofiler Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26260 Differential Revision: D17391902 Pulled By: bddppq fbshipit-source-id: 89ab3dedf05ba398acb7300fac95f03cfb31f0ba * Whiltelist and fusion support for quantized::linear - matmul(with bias) (#26204) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26204 Support quant fusion for `matmul` with bias to `quantized::linear`. Test Plan: python test/test_jit.py 'TestJit.test_quant_fusion' Imported from OSS Differential Revision: D17380073 fbshipit-source-id: 00014469a852cc5d5b66469fc4b8d05eafba1e3e * Add __s390x__ compiler define for s390 builds. (#26233) Summary: pytorch builds fail on 390 architecture because in simd.h the ifdef macros default to an x86 asm instruction. This patchs adds an ifdef __s390x__ to be able to build on s390. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26233 Differential Revision: D17392714 Pulled By: soumith fbshipit-source-id: 037672bfea64fc5e52da2390d93b973534137c12 * Clarified ambiguous docstring in NegativeBinomial Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25923 Differential Revision: D17392848 Pulled By: soumith fbshipit-source-id: 2833e72fe449c74dfd8273a7b1eb46c05c63d999 * Dynamic quantization for bias. (#26057) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26057 bias is now unquantized (i.e. floating type) for qconv and qlinear. It is dynamically quantized by fbgemm. TODO: Add some performance numbers. Tests: test:quantization ``` Summary (total time 8.41s): PASS: 24 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0More details at https://our.intern.facebook.com/intern/buck/build/74d5f6f7-55c9-4350-a618-2013042fffd8 OMIT: 0 ``` test:quantized ``` Summary (total time 13.21s): PASS: 43 FAIL: 0 SKIP: 5 caffe2/test:quantized - test_qnnpack_maxpool2d (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_compare_tensor_scalar (test_quantized.TestComparatorOps) caffe2/test:quantized - test_qnnpack_linear (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_add (test_quantized.TestQNNPackOps) FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` ghstack-source-id: 90166254 Test Plan: buck test mode/dev caffe2/test:quantization buck test mode/dev caffe2/test:quantized Differential Revision: D17328028 fbshipit-source-id: d4a163d730d0f4a03e8e0faf7420710cf36eec09 * Use expected_wrapper only if CMAKE_{C,CXX}_COMPILER and/or is not set by user (#26306) Summary: This will honor user's preference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26306 Differential Revision: D17408030 Pulled By: soumith fbshipit-source-id: 6841b805603d40cd7caf78dbb42405a0c931f052 * Add derivative of cholesky_solve (#26185) Summary: Changelog: - Add derivative of cholesky_solve. The equations are derived akin to the derivative of solve methods using the technique detailed [here](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXrOjIyM7kAhWstlkKHRxqCDgQFjAAegQIAhAC&url=https%3A%2F%2Fpeople.maths.ox.ac.uk%2Fgilesm%2Ffiles%2FNA-08-01.pdf&usg=AOvVaw0BNISOvM_I9KjPrl0xv1R_) Pull Request resolved: https://github.com/pytorch/pytorch/pull/26185 Test Plan: - Added tests for cholesky_solve in test_autograd.py Closes half of https://github.com/pytorch/pytorch/issues/4669. Differential Revision: D17408123 Pulled By: soumith fbshipit-source-id: f9668c8d4d758c0dc658941a8b730a17683091aa * Kill 'default_init', which isn't needed anymore. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26281 Test Plan: Imported from OSS Differential Revision: D17397097 Pulled By: gchanan fbshipit-source-id: fb53e90637a3dfb2300fca78f414abe2d82832f3 * Export round (#26126) Summary: Added round export in opset 11 Pull Request resolved: https:…

enabled MIOpen Convolution Transpose

bc80f9b

Provides significant performance uplift.

iotamudelta added module: rocm AMD GPU support for Pytorch open source labels Sep 13, 2019

iotamudelta requested a review from bddppq September 13, 2019 17:40

pytorchbot added the module: operators label Sep 13, 2019

bddppq approved these changes Sep 13, 2019

View reviewed changes

facebook-github-bot reviewed Sep 13, 2019

View reviewed changes

facebook-github-bot closed this in e86d99a Sep 15, 2019

facebook-github-bot added the merged label Sep 15, 2019

iotamudelta deleted the transpose_convolutions_miopen branch September 16, 2019 15:15

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] Use MIOpen for transpose convolutions #26172

[ROCm] Use MIOpen for transpose convolutions #26172

Uh oh!

iotamudelta commented Sep 13, 2019

Uh oh!

bddppq left a comment

Uh oh!

bddppq commented Sep 13, 2019

Uh oh!

iotamudelta commented Sep 13, 2019

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented Sep 15, 2019

Uh oh!

Uh oh!

[ROCm] Use MIOpen for transpose convolutions #26172

[ROCm] Use MIOpen for transpose convolutions #26172

Uh oh!

Conversation

iotamudelta commented Sep 13, 2019

Uh oh!

bddppq left a comment

Choose a reason for hiding this comment

Uh oh!

bddppq commented Sep 13, 2019

Uh oh!

iotamudelta commented Sep 13, 2019

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Sep 15, 2019

Uh oh!

Uh oh!