New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose a torch.result_type and simplify tensor iterator #26012
Conversation
ghstack-source-id: 0e625685f41bd76f8aa8a79bef3657545c511eba Pull Request resolved: #26012
ghstack-source-id: d7dd15ad18df72f72147acea0a251bba6f765c58 Pull Request resolved: #26012
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
ghstack-source-id: 2abffc139c3d2d384c2826ab25d2eefde5374c1b Pull Request resolved: #26012
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
ghstack-source-id: 3a42aa839bda5b9fe48e7d09eb8b8c4979392bc2 Pull Request resolved: #26012
- func: result_type(Tensor tensor, Tensor other) -> ScalarType | ||
variants: function | ||
|
||
- func: result_type(Tensor tensor, Scalar other) -> ScalarType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so...it's not actually enforced that we have overload names? @dzhulgakov / @smessmer is your current story that we just don't need to worry about it?
IValue(ScalarType t) : tag(Tag::ScalarType) { | ||
payload.as_int = static_cast<std::underlying_type<ScalarType>::type>(t); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naively, I would also expect an isScalarType and that toScalarType actually checks the tag is a scalar type.
I'm admittedly not sure if we need to deal with BC constraints here, though. @suo do you know?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was wrong, will fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added the is_intrusive_ptr, isScalarType, and added the check in toScalarType
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, this looks correct now, but I don't know about the BC concerns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is it that would potentially backwards incompatible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed the isScalarType check. toLayout, toMemoryFormat, etc don't have it and it was causing test failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For correctness, adding a new tag requires adding a corresponding type in jit_type.h, and adjusting the schema/type parsers to understand that 'ScalarType' maps to this type. This PR seems to just add the tag and not add a type. If we need to keep the tag around, then we should also add the corresponding type and fix the type parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zdevito this PR does add a new type in jit_type.h, is it incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed almost all jit changes
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non-JIT parts look good -- I'd defer to @suo or others on the JIT parts.
@pytorchbot retest this please |
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
related issue: #25472 |
Summary: Pull Request resolved: pytorch/pytorch#26012 Test Plan: Imported from OSS Differential Revision: D17556197 Pulled By: nairbv fbshipit-source-id: c0be3ac9e99fecc26a181e301defc1942bc6708c
ghstack-source-id: e24e33e2259a4b643567ef32ccbab73e511a6bc3 Pull Request resolved: pytorch#26012
* Typo fix (#26417) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26417 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17548776 Pulled By: ezyang fbshipit-source-id: 8c79893ee4216780edb838671e701de5518c4cd0 * Don't generate named tensor functions to RegistrationFunctions.h (#26685) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26685 This prevents XLA from picking up on named tensor APIs. I ran into some problems while attempting to support dimname overloads in XLA; since we don't need the first iteration of named tensors to work with XLA this is OK. Test Plan: - run CI. Differential Revision: D17538893 Pulled By: zou3519 fbshipit-source-id: 93d579c93f5b1dc68541c07c4a3d61792859507d * Updating submodules Summary: GitHub commits: https://github.com/facebook/litho/commit/ff4a61094e9405310b39219a35c6ff8e44300573 https://github.com/facebookincubator/mvfst/commit/ad81c3823ec7910296f97d2050fde181be1d4ac4 https://github.com/pytorch/fbgemm/commit/518d8a1832cf1eb1dda2feace1a278e9e4f302ba Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 2a9a47805569a43e05d044c5494b57f6a7996bc4 * Add tests for C++ functional cosine_similarity and pairwise_distance, and clean up functional test code (#26559) Summary: This ensures that `F::cosine_similarity` and `F::pairwise_distance` can be used simply by including `torch/torch.h` and set `namespace F = torch::nn::functional`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26559 Differential Revision: D17507421 Pulled By: yf225 fbshipit-source-id: f895dde3634d5c8ca66ee036903e327e5cdab6b1 * allow building docker without torchvision (#26168) Summary: There is an issue with the torchvision version not matching the pytorch version if one builds the docker from a tag, see issue https://github.com/pytorch/pytorch/issues/25917. The current solution requires one to re-init the submodules or manually change the version of torchvision. This PR allows one to build the docker image without torchvision, which not only fixes the above mentioned bug but also frees non-image pytorch users from the tyranny of torchvision :laughing:. In all seriousness, for NLP researchers especially torchvision isn't a necessity for pytorch and all non-essential items shouldn't be in the docker. This option removes one extra thing that can go wrong. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26168 Differential Revision: D17550001 Pulled By: soumith fbshipit-source-id: 48b8b9e22b75eef3afb392c618742215d3920e9d * Speed up an integer to the power of a positive integer on CPU (#26020) Summary: Current integer scalar exps are always cast to double. This commit avoids cast if the tensor is also integral and the scalar is positive to speed up. Benchmark (Debian Buster, g++ 8, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz 0 0:0 3300.00 MHz , Debug build, Turbo turned off): ```python import timeit for n, t in [(1000, 13000), (10_000, 1300)]: for e in (2, 3, 4): for dtype in ('torch.int16', 'torch.int32', 'torch.int64'): print(f'a.pow({e}) (a.numel() == {n}) for {t} times') print(f'dtype {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a.pow({e})', setup=f'import torch; a = torch.arange({n}, device="cpu", dtype={dtype})', number=t)) ``` Before: ``` a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 1.6958350749996498 a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 0.7989626339999631 a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 0.7973162800003593 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 1.8660746679997828 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 0.8101709959996697 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 0.8135280149999744 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 5.010833072999958 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 4.801007671999741 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 3.963344578000033 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 1.6216251330001796 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 0.5672429639998882 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 0.5544572270000572 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 1.656308512999658 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 1.502670819999821 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 0.5757876879997639 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 4.775718216999849 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 4.754745475000163 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 3.737249878000057 ``` After: ``` a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 1.1006453190002503 a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 1.0849009019998448 a.pow(2) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 1.093259106000005 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 1.0859826279997833 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 1.1076840900000207 a.pow(3) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 1.0755480369998622 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int16, 13000 times 1.918211066999902 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int32, 13000 times 1.9183043200000611 a.pow(4) (a.numel() == 1000) for 13000 times dtype torch.int64, 13000 times 1.930021430999659 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 0.7271483560002707 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 0.7289002070001516 a.pow(2) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 0.7267536800000016 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 0.7301799359997858 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 0.7289195180001116 a.pow(3) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 0.7270008230002531 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int16, 1300 times 1.5354506029998447 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int32, 1300 times 1.528263066999898 a.pow(4) (a.numel() == 10000) for 1300 times dtype torch.int64, 1300 times 1.5369428439998956 ``` --- Best viewed with whitespace changes turned off Pull Request resolved: https://github.com/pytorch/pytorch/pull/26020 Differential Revision: D17485400 Pulled By: VitalyFedyunin fbshipit-source-id: 3a16b074825a5aab0f7e7af3d8100f9e4b7011a3 * Use noop observer to pass dtype for dynamic quantization (#26709) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26709 Polishes implementation from #25975. Primarily, we use NoopObserver to communicate that weights need to be quantized to float16. The very top-level API (quantize_dynamic) stays the same with `dtype` argument but the implementation follows the common flow. One can argue that dynamic fp16 quantization doesn't really fit into the 'observer' mechanism. It's in fact not ideal, but it's better to have the same flow than branching on both dtype and qconfig. Test Plan: Imported from OSS Differential Revision: D17544103 Pulled By: dzhulgakov fbshipit-source-id: 6af3f18c35929a1a53ea734079c005f656e4925f * Remove duplicate calculation of output shape (#26684) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26684 Output heights and widths are already calculated by conv_p. Remove the duplicate calculation. ghstack-source-id: 90633432 Test Plan: buck test mode/dev caffe2/test:quantized ``` Summary (total time 18.69s): PASS: 45 FAIL: 0 SKIP: 10 caffe2/test:quantized - test_qadd_scalar_relu (test_quantized.TestQuantizedOps) caffe2/test:quantized - test_equal (test_quantized.TestQuantizedOps) caffe2/test:quantized - test_qnnpack_add (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qconv_unpack (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qlinear_unpack (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_compare_tensor_scalar (test_quantized.TestComparatorOps) caffe2/test:quantized - test_qconv_qnnpack (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qlinear_qnnpack (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps) caffe2/test:quantized - test_qnnpack_maxpoolMore details at https://our.intern.facebook.com/intern/buck/build/3b394f1e-ab99-4e59-bdf5-2766f46e9869 2d (test_quantized.TestQNNPackOps) FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Differential Revision: D17538375 fbshipit-source-id: b4b60e93fdec4cc7bbf6aee7182381221dfac243 * Expands TestAutogradDeviceType (#26708) Summary: - Ports all CUDA tests to TestAutogradDeviceType except those using multiple devices Pull Request resolved: https://github.com/pytorch/pytorch/pull/26708 Differential Revision: D17549435 Pulled By: mruberry fbshipit-source-id: b564186444201d1351934b6a7d21f67bdfca6e3b * Add traces to specialize_autograd and lower_grad_of (2nd try) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22752 Differential Revision: D17543836 Pulled By: Krovatkin fbshipit-source-id: 5cbca220943a580169bf60ac09780b6e67075d2b * Setting automatic default selection for ONNX IR v4 semantics in ONNX export API (#26146) Summary: This is a follow-up PR for https://github.com/pytorch/pytorch/pull/23284. In that PR we had removed changing the default behavior for `keep_initializers_as_input` argument to the export API. With this PR we are enabling that change in that if `keep_initializers_as_input` is not specified then value/behavior for this argument is chosen automatically depending on whether the export type is ONNX or not. This was part of the earlier PR was removed for further review. The test points have also been updated. This change may fail some internal tests which may require explicitly setting `keep_initializers_as_input=True` to preserve old behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26146 Reviewed By: hl475 Differential Revision: D17369677 Pulled By: houseroad fbshipit-source-id: 2aec2cff50d215714ee8769505ef24d2b7865a11 * Enable hub tests on MacOS (#26697) Summary: fix https://github.com/pytorch/pytorch/issues/26032. This was broken by a bad openssl release in conda. Should be fixed now. Testing... Pull Request resolved: https://github.com/pytorch/pytorch/pull/26697 Differential Revision: D17542095 Pulled By: ailzhang fbshipit-source-id: ba99f9b36ef2a7c793842cf91bd46fb2634ac1aa * Trivial quantized torch.mean implementation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26253 Test Plan: Imported from OSS Differential Revision: D17529994 Pulled By: jamesr66a fbshipit-source-id: e3aff71da35b05ed61710cdb88d72b51c944168b * Remove _dequantize_per_channel in the pattern (#26680) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26680 This was introduced before under the assumption that we'll have a qconv_per_tensor_affine and a qconv_per_channel_affine, but turns out we don't have these, so we'll remove thse functions. Test Plan: python test/test_jit.py 'TestJit.test_quant_fusion' Imported from OSS Differential Revision: D17542607 fbshipit-source-id: b90ce5738170f0922bdc2eb1c4dbecd930f68a48 * Register values listed in __constants__ as attributes of the Module. (#26581) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26581 We're currently inlining immediate values of the constants directly into IR when we generate it providing no way to access these values by their names later. This change registers such values as atrtibutes of the module so that they are not lost after IR generation. Differential Revision: D17513451 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: cf8f9b450e7178692211abd905ffd2d7ce5a6ce1 * Un-hardcode epsilon constant in FoldConvBatchNorm2d. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26584 Test Plan: Imported from OSS Differential Revision: D17514653 Pulled By: ZolotukhinM fbshipit-source-id: 7d9cc8f619b7dbe26fa58eac37cc131929c004d4 * Add doc building instructions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26553 Differential Revision: D17551426 Pulled By: driazati fbshipit-source-id: 53ce05882091aca4617586bc53944ee4c8b3a622 * Make `is_optional` check more robust (#26312) Summary: If the `Union` contains a non-class type, `issubclass` would fail, this adds a check for that case ](https://our.intern.facebook.com/intern/diff/17505206/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/26312 Pulled By: driazati Differential Revision: D17505206 fbshipit-source-id: 1331e412f938e2f08ecb079972147f11e3ec77cd * Remove _dequantize_per_tensor (#26681) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26681 att Test Plan: ci Imported from OSS Differential Revision: D17542833 fbshipit-source-id: 653e906b0e146763609c69ef0de7f9cf38621586 * fix annotation regex for flake8 (#26694) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26694 Previously we would not properly populate `errorDesc` for: ``` ./torch/jit/__init__.py:13:1: F401 'torch.nn.ModuleList' imported but unused ``` because we wanted only letters and spaces. Be more permissive Test Plan: Imported from OSS Differential Revision: D17551999 Pulled By: suo fbshipit-source-id: b82567df1fa3c9729e7427dc3461bedfb40933dc * Add C++ nn::Identity (#26713) Summary: **Summary**: Adds `torch::nn::Identity` module support for the C++ API. **Issue**: https://github.com/pytorch/pytorch/issues/25883 **Reviewer**: yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26713 Differential Revision: D17550982 Pulled By: yf225 fbshipit-source-id: f24483846e82d5d276d77a1a0c50884f3bc05112 * add timeout parameter to connect function in TCPStore (#26554) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26554 Previously, in `TCPStore`'s constructor we did not pass in a timeout to the `connect` function, which thus used the default timeout (-1, so infinite). But the timeout variable in `TCPStore.cpp `is configurable by the user and set to be 300 seconds by default, so we should be passing this into the connect function. Test Plan: see above. Differential Revision: D17486779 fbshipit-source-id: 42d38a3b8d492d9e9ff09110990a8e4a3a1292b2 * Add threadpool in qlinear and qconv for mobile (#26728) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26728 Use Caffe2::mobile_threadpool() in linear and conv operators Perf Without threadpool - 76ms With threadpool - 41 ms Test Plan: python test/test_quantized.py TestQNNPackOps Imported from OSS Differential Revision: D17553510 fbshipit-source-id: dd5b06f526f65d87727ec7e3dad0a5fa74cba9f9 * Update ONNX Export for Interpolate in Opset 11 (#24805) Summary: - Add support for linear and cubic interpolate in opset 11. - Add support for 1d and 3d interpolate in nearest mode for opset 7 and 8. - Add tests for all cases of interpolate in ORT tests (nearest/linear/cubic, 1d/2d/3d, upsample/downsample). Pull Request resolved: https://github.com/pytorch/pytorch/pull/24805 Reviewed By: hl475 Differential Revision: D17330801 Pulled By: houseroad fbshipit-source-id: 1bdefff9e72f5e70c51f4721e1d7347478b7505b * Refactor android torchvision: not hardcoded mean/std (#26690) Summary: - Normalization mean and std specified as parameters instead of hardcode - imageYUV420CenterCropToFloat32Tensor before this change worked only with square tensors (width==height) - added generalization to support width != height with all rotations and scalings - javadocs Pull Request resolved: https://github.com/pytorch/pytorch/pull/26690 Differential Revision: D17556006 Pulled By: IvanKobzarev fbshipit-source-id: 63f3321ea2e6b46ba5c34f9e92c48d116f7dc5ce * Simplify operator `sign` using the helper. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25592 Test Plan: Imported from OSS Differential Revision: D17552470 Pulled By: VitalyFedyunin fbshipit-source-id: 6c8cc4f46dd390c231b2d0aac664ad2a6ac8876e * Revert D17514653: [quant] Un-hardcode epsilon constant in FoldConvBatchNorm2d. Test Plan: revert-hammer Differential Revision: D17514653 Original commit changeset: 7d9cc8f619b7 fbshipit-source-id: 2cf32082a46fe169a1db4926df78a9f3256616ad * Revert D17513451: Register values listed in __constants__ as attributes of the Module. Test Plan: revert-hammer Differential Revision: D17513451 Original commit changeset: cf8f9b450e71 fbshipit-source-id: 319ec9399173eb06556969dc6be365b319c1ab6c * Make ONNX_ATEN_FALLBACK also works for _export (#26738) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26738 someone may use torch._export directly. Here we change the onnx_export_type's default value to None, and if it's pytorch onnx caffe2 bundle, we set it to ONNX_ATEN_FALLBACK, otherwise, it's ONNX. Test Plan: ci Reviewed By: hl475 Differential Revision: D17546452 fbshipit-source-id: 38e53926e2b101484bbbce7b58ebcd6af8c42438 * Address review comments in https://github.com/pytorch/pytorch/pull/26272 (#26587) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26587 - ghstack-source-id: 90557226 Test Plan: unit tests Differential Revision: D17515048 fbshipit-source-id: 3459ee80efec29080060ec29d67642d789dd8749 * move more functions to InsertObserversHelper (#26696) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26696 att Test Plan: ci Imported from OSS Differential Revision: D17558701 fbshipit-source-id: 96ef87db74bd1a5d4ddc69867ae71d78c0df83fd * Added test case for reinit (#26506) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26506 [pytorch] [distributed] Made test forgiving to allow rpc agent to return one of the two errors. ghstack-source-id: 90667534 Test Plan: Made sure pg based UT works. Differential Revision: D17488899 fbshipit-source-id: 41f76cf4b4a0ca5e651a5403d6e67b639f0b9c4f * Switch our Android CI to Clang (#26656) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26656 Updating the NDK to r18 or newer triggers a path in our CI scripts so that we now build with clang instead of gcc. Google discontinued the gcc support for android quite a while ago, clang is the only way forward. ghstack-source-id: 90698985 Test Plan: CI Reviewed By: dreiss Differential Revision: D17533570 fbshipit-source-id: 5eef4d5a539d8bb1a6682f000d0b5d33b3752819 * quantized_tensor tests (#25429) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25429 Previously we are using empty to generate test tensors, this PR changes the test tensors to use randint so that we can test things properly Also added a set_sizes_and_strides and removed .contiguous() in int_repr function to preserve the original size and strides Test Plan: python test/test_quantized_tensor.py Imported from OSS Differential Revision: D17559660 fbshipit-source-id: d4ce81d577296c1137270fdaa6b1359fb703896f * Add a lot of dimname overloads (#26636) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26636 This PR defines a lot of dimname overloads so that when named tensor support is added for those operators, we will not have to modify the autogenerated TensorMethods.h, thereby avoiding potential merge conflicts in the future. Overloads were added for the following: - all - any - argmax - argmin - cumsum - cumprod - index_copy - kthvalue - mode - permute - squeeze - index_add - index_fill - scatter - scatter_add - index_select - gather - sort - argsort Test Plan: - [namedtensor ci] Differential Revision: D17522984 Pulled By: zou3519 fbshipit-source-id: eca6dea819ba4e4e43b71b700d5cf09176f00061 * Automatic update of fbcode/onnx to ab6b94203c595f74b1f126eb118eef22e4c05a57 (#26736) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26736 Previous import was 23bb6ea1a71f08e200114a153f48bd7adb66d486 Included changes: - **[ab6b9420](https://github.com/onnx/onnx/commit/ab6b9420)**: Relax IF's shape inference rule (#2345) <Wei-Sheng Chin> - **[c5af774a](https://github.com/onnx/onnx/commit/c5af774a)**: Clarify behavior in ConvTranspose (#2343) <Wei-Sheng Chin> - **[a20ba2f1](https://github.com/onnx/onnx/commit/a20ba2f1)**: Fix node test case model for Gemm scalar bias case (#2342) <Hariharan Seshadri> - **[1aa176e0](https://github.com/onnx/onnx/commit/1aa176e0)**: Update pybind (#2340) <Changming Sun> - **[7840504d](https://github.com/onnx/onnx/commit/7840504d)**: Update gen_doc script to validate proto3 files (#2122) <Raymond Yang> - **[bd35e623](https://github.com/onnx/onnx/commit/bd35e623)**: Fix some backend tests (#2335) <Hariharan Seshadri> Test Plan: ci Reviewed By: hl475 Differential Revision: D17552449 fbshipit-source-id: 424acb261b54fc98485f782f6922b11b28c836eb * Add whitelist for backward compatible checks for function schemas (#26740) Summary: Now, we skip all function schema contains quantize key word Pull Request resolved: https://github.com/pytorch/pytorch/pull/26740 Reviewed By: hl475 Differential Revision: D17561753 Pulled By: houseroad fbshipit-source-id: c5e47ada072e71bfa2341a0af8f1743e86ef733c * Revert D17558701: [refactor] move more functions to InsertObserversHelper Test Plan: revert-hammer Differential Revision: D17558701 Original commit changeset: 96ef87db74bd fbshipit-source-id: fc398d3b8bb1cd0bae573e3fdac5cfb883b31373 * Wrap dimensions during named inference (#26558) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26558 Previously, name inference gets called after dimensions are wrapped. This PR makes it so that name inference always wraps dimensions so that it can be called anywhere. Ideally we would only wrap dimensions once, but many of our operators wrap dimensions in weird places. Wrapping dimensions in name inference is pretty inexpensive and only happens for named tensors (name inference does not run on unnamed tensors.) Test Plan: - [namedtensor ci] Differential Revision: D17557049 Pulled By: zou3519 fbshipit-source-id: 68c5636489e233dbf2588ab6ad4e379a6fe4c8ba * Fix builtin lookup for Python functions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26688 Pulled By: driazati Differential Revision: D17560634 fbshipit-source-id: e1c50d1ca24e0313c2b7d704c488a29ef6a47cad * Revert D17330801: [pytorch][PR] Update ONNX Export for Interpolate in Opset 11 Test Plan: revert-hammer Differential Revision: D17330801 Original commit changeset: 1bdefff9e72f fbshipit-source-id: dff07477403170c27260f736ab6e6010f0deca9f * Revert D17559660: [fix] quantized_tensor tests Test Plan: revert-hammer Differential Revision: D17559660 Original commit changeset: d4ce81d57729 fbshipit-source-id: b6c9dc31f08935d255fa9eb3a830bafc76a13799 * use new fbgemm PackedDepthWiseConvMatrix without template parameter (#26760) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26760 Follow-up of D17514003 . Change Caffe2 code to use the new PackedDepthWiseConvMatrix interface. Test Plan: CI Reviewed By: dskhudia Differential Revision: D17514350 fbshipit-source-id: 691d9f1fd35bdb7dd8ba152287f3a34359dc1f4c * Add comments for multidim tensor factory limitations, and rename ListInitTensor for better clarity (#26756) Summary: This PR includes the following improvements: 1. Add comments for limitations of the multidim tensor factory function `torch::tensor(...)`, noting the fact that `torch::tensor({})` and mixed data type such as `torch::tensor({{bool, 2.0}})` are not supported at the moment. (I will also update https://pytorch.org/cppdocs/notes/tensor_creation.html to include usage examples for the multidim tensor factory function `torch::tensor(...)`) 2. Rename `ListInitTensor` to `InitListTensor`, for better naming consistency. This addresses reviews in https://github.com/pytorch/pytorch/pull/26210. I will work on a separate PR to move the factory function to `at::`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26756 Differential Revision: D17560136 Pulled By: yf225 fbshipit-source-id: eb8b45226e999784da48f75cc8953a998582df99 * rename caffe2::mobile_threadpool to caffe2::mobile_pthreadpool Summary: Rename old mobile_threadpool() API, replace it with a new version that returns caffe2::ThreadPool instead of pthreadpool_t. Test Plan: - builds Differential Revision: D17543413 Pulled By: ljk53 fbshipit-source-id: a3effd24e8ce9d677a2a04ebe6b6e1582e6f0a65 * Improve error message in IR parser when accessing undefined variable. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26771 Test Plan: Imported from OSS Differential Revision: D17562853 Pulled By: ZolotukhinM fbshipit-source-id: b4d4bc6001e3ea06f4d1b8691ad2a339a04c16ea * Handle DeQuantStub() for QAT (#26518) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26518 Skip Dequantize() modules for QAT alone. For fake quant insertion, DeQuantize() is a no-op and we should not be inserting fake-quant. ghstack-source-id: 90704220 Test Plan: buck test caffe2/test:quantization -- --print-passing-details Tests in test_quantization pass with changes: Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/281475121296989 Summary (total time 73.03s): PASS: 28 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 Differential Revision: D17439333 fbshipit-source-id: f716c23500324ae08c8d104ee2c9587fa6926571 * Add <cinttypes> include to resolve PRIu32 macro (#26745) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26745 This file doesn't appear to be included by default on GCC 7.3 and causes compilation to fail. Adding this include fixes compilation. Test Plan: Imported from OSS Differential Revision: D17566444 Pulled By: pietern fbshipit-source-id: 9afb3d4596e424efc5a6ea6ab3b1cffdb2b41fbb * Fake quantization enhancements for QAT/PTQ support (#26420) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26420 Flags for enabling/disabling observer and fake quant independently. Improve repr for fake quant. ghstack-source-id: 90704254 Test Plan: buck test caffe2/test:fake_quant -- --print-passing-details buck test caffe2/test:quantization -- --print-passing-details Differential Revision: D17458232 fbshipit-source-id: f44380c60f1a10a8ea09bca8ab79ba5d1867ed62 * Revert D17458232: Fake quantization enhancements for QAT/PTQ support Test Plan: revert-hammer Differential Revision: D17458232 Original commit changeset: f44380c60f1a fbshipit-source-id: 64a244c720b61fa912bacbb23fcbf9faed0757c2 * Named tensor support for: atan2, output_nr, detach{_}, requires_grad_ (#26543) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26543 Also adds a test for logical_xor (it already had named tensor support but there was no test) Test Plan: - [namedtensor ci] Differential Revision: D17501403 Pulled By: zou3519 fbshipit-source-id: 49be15580be9fb520e25a8020164e5a599d22d40 * Update ONNX Export for Interpolate in Opset 11 (#26778) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26778 - Add support for linear and cubic interpolate in opset 11. - Add support for 1d and 3d interpolate in nearest mode for opset 7 and 8. - Add tests for all cases of interpolate in ORT tests (nearest/linear/cubic, 1d/2d/3d, upsample/downsample). Original PR resolved: https://github.com/pytorch/pytorch/pull/24805 Reviewed By: hl475 Differential Revision: D17564911 Pulled By: houseroad fbshipit-source-id: 591e1f5b361854ace322eca1590f8f84d29c1a5d * Support Negative Axis in Size in ONNX (#26436) Summary: Currently, we export invalid ONNX models when size() is used with a negative dim. This PR fixes the issue and allows exporting these models to ONNX (ex: input.size(-1)). Pull Request resolved: https://github.com/pytorch/pytorch/pull/26436 Reviewed By: hl475 Differential Revision: D17565905 Pulled By: houseroad fbshipit-source-id: 036bc384b25de77506ef9fbe24ceec0f7e3cff8b * Expose a torch.result_type and simplify tensor iterator Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26012 Test Plan: Imported from OSS Differential Revision: D17556197 Pulled By: nairbv fbshipit-source-id: c0be3ac9e99fecc26a181e301defc1942bc6708c * Named tensor support for logsumexp, mode, kthvalue, median, min, max (#26563) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26563 This adds name inference rules for pre-existing logsumexp, mode, kthvalue, and median ops. Also adds overloads so that they can take `Dimname` dimensions. There are a lot of min/max overloads. This PR adds name inference to the following overloads for (both) min and max: - min(Tensor, int dim) - min(Tensor, Dimname dim) - min(Tensor) (full reduction) Test Plan: - new tests and [namedtensor ci] Differential Revision: D17557050 Pulled By: zou3519 fbshipit-source-id: a099a0ef04ad90d021a38a0668fc44902e1c7171 * Delete backwards compatibility Backend overload for registerOp (#25914) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25914 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17284083 Pulled By: ezyang fbshipit-source-id: 430ac7ea2bd042b1f4bb874e53679d0fde326dec * Implement multiple dispatch in boxed c10 dispatcher (#26118) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26118 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17404367 Pulled By: ezyang fbshipit-source-id: 14a16baa4b59f97182725092531a54603f3d92b8 * Remove unnecessary include from TensorBody (#26360) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26360 This is not just for aesthetics: this include blocks the inclusion of headers like ivalue.h from ATenDispatch.h (as it causes an include cycle.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17429163 Pulled By: ezyang fbshipit-source-id: 03feb210c12bc891d95bbb5a11ffd694ec05005c * Add some missing constructors to IValue. (#26718) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26718 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17549623 Pulled By: ezyang fbshipit-source-id: 8880c09d85a15b2a63dcf0c242ba6a2dd941decb * Updating submodules Summary: GitHub commits: https://github.com/facebook/litho/commit/6668c21398a9b71f12cff9574bb8c7d8ebf93463 https://github.com/pytorch/fbgemm/commit/189aebb34442a6e96bf88734a047eaae7b258195 Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: f2037290b58ac295eeb94626e172491a8526875d * Revert D17549623: Add some missing constructors to IValue. Test Plan: revert-hammer Differential Revision: D17549623 Original commit changeset: 8880c09d85a1 fbshipit-source-id: 002bb1173dbcf6a1d18e1c4b84b4365f145c38dd * Hub improvements (#26723) Summary: Resubmit of https://github.com/pytorch/pytorch/pull/25980. Our old serialization was in tar (like `resnet18-5c106cde.pth` was in this format) so let's only support automatically unzip if checkpoints are zipfiles. We can still manage to get it work with tarfile, but let's delay it when there's an ask. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26723 Differential Revision: D17551795 Pulled By: ailzhang fbshipit-source-id: 00b4e7621f1e753ca9aa07b1fe356278c6693a1e * Upgrade sleef to v3.4.0. (#26749) Summary: This reset the sleef submodule to upstream, since everything else except a small build sanity fix <https://github.com/zdevito/sleef/commit/191f655caa25526ae226cf88dd2529265176014a> has been merged to upstream. The new release includes an important fix for trigonometric functions on MacOS, which would unblock https://github.com/pytorch/pytorch/issues/26431. This should supersede https://github.com/pytorch/pytorch/issues/20536. Close https://github.com/pytorch/pytorch/issues/20536. cc colesbury resistor Pull Request resolved: https://github.com/pytorch/pytorch/pull/26749 Differential Revision: D17572783 Pulled By: ezyang fbshipit-source-id: dd7827e8c8500a0050e3e318d184134c792d3ecc * Updating submodules Summary: GitHub commits: https://github.com/facebook/litho/commit/5096b0ae1f5ef28bc0b948e260eb512626c6fea9 https://github.com/facebook/proxygen/commit/ecd6c10ea3df82cb0d221798150a0cf1f07315c3 https://github.com/facebookincubator/mvfst/commit/67abe5d0aaf42659358fa1d96a4159e5832f9c70 https://github.com/facebookincubator/profilo/commit/90580f7e064c25bac9c0a1f59afb4da55f46d3cd https://github.com/facebookresearch/pytorch-biggraph/commit/7f98961c7b70bda098c371a8b1395f0d6ff5434c https://github.com/pytorch/fbgemm/commit/f8da6e6e36b5970e95bf150521a1b3af844638be Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 60ce61531cf6d4ac8616b3986b40b423abc7de15 * move more functions to InsertObserversHelper (#26773) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26773 att Test Plan: ci Imported from OSS Differential Revision: D17563673 fbshipit-source-id: 5a6fb4238b6886695c2d25db11fec22ebe5d0c08 * autodiff changes to enable profiling Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25397 Differential Revision: D17565747 Pulled By: Krovatkin fbshipit-source-id: b772437d9e02df99db6e662cb7d1227359959bed * Lets generic tests use multiple devices (#26594) Summary: - Separates device type from default (test) device - Adds multidevice decorator - Updates generic tests to use multidevice decorator where applicable TorchXLA wants to change the default test device based on the test environment. Separating the device type and the default (test) device enables that functionality. Additionally, many existing tests only run on multiple devices and are required, as a consequence, to make CUDA-specific API calls. The multidevice decorator simplifies the existing code and limits the CUDA dependency. Eventually this should let us run multidevice tests on multiple device types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26594 Test Plan: tests were manually run with the CUDA test device set to 'cuda:1'. Differential Revision: D17568910 Pulled By: mruberry fbshipit-source-id: c442f748a31a970be8c21deb12a67c3b315c1128 * quantized_tensor tests (#26784) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26784 Previously we are using empty to generate test tensors, this PR changes the test tensors to use randint so that we can test things properly Also added a set_sizes_and_strides and removed .contiguous() in int_repr function to preserve the original size and strides Test Plan: python test/test_quantized_tensor.py Imported from OSS Differential Revision: D17566575 fbshipit-source-id: 89379fb09b500dd156118e6ee0709df59f169990 * Refactor checked_tensor_unwrap to take DeviceType instead of Backend (#26290) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26290 Fixes #26206 Happily, I also can delete the dead Dense***Tensor cases, since they are for the defunct THS backend. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17404368 Pulled By: ezyang fbshipit-source-id: 79d71ad40c4325c9f52d2825aceb65074d2e20e8 * Use Caffe2's implementation of grouped depthwise 3x3 convolutions (#26556) Summary: Use Caffe2's implementation of grouped depthwise 3x3 convolutions instead of NNPACK. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26556 Test Plan: _Correctness_ - Manually check the results using the --print-output flag on speed_benchmark_torch. _Performance_ - All measurements below on Pixel 2 **Before**: Multi-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25" > > Main run finished. Milliseconds per iter: **876.002**. Iters per second: 1.14155 Single-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25 > --caffe2_threadpool_force_inline=true" > > Main run finished. Milliseconds per iter: **459.409**. Iters per second: 2.17671 **After**: Multi-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25 > > Main run finished. Milliseconds per iter: **285.68**. Iters per second: 3.50042 Single-threaded: > adb shell "./speed_benchmark_torch \ > --model=./xraymobilev3.pt \ > --input_dims="1,3,224,224" \ > --input_type=float --warmup=5 \ > --iter=25 > --caffe2_threadpool_force_inline=true" > Main run finished. Milliseconds per iter: **278.999**. Iters per second: 3.58425 > Differential Revision: D17533311 Pulled By: AshkanAliabadi fbshipit-source-id: 9ee8acf02b8e3e8da1922b188ed0a6459a90b67d * Port CUDA implementation of expm1 to ATen (#26598) Summary: Closes https://github.com/pytorch/pytorch/issues/24562 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26598 Differential Revision: D17531503 Pulled By: VitalyFedyunin fbshipit-source-id: 8119c796e142f073ad4e274dda1ad99344215c48 * add function to get NCCL version for logging (#26583) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26583 Adds a function that uses the nccl api to get the version code. Converts it to a readable version. Will be used for logging NCCL version in exception messages. Test Plan: See above Differential Revision: D17473200 fbshipit-source-id: 4881ed5221b397f2f967262668c2b376b6bf3c64 * Remove one unnecessary copy of the output during the type promotion. (#26816) Summary: Output tensors doesn't need to be copied during type promotion as we are not using any data from them. Simple allocation gives steady 10% performance gain. BEFORE ``` In [1]: x = torch.randn(64, 2048, 7,7) In [2]: y = torch.randn(64, 2048, 7,7, dtype=torch.float64) In [3]: timeit x.add_(y) 77.3 ms ± 257 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` AFTER ``` In [1]: x = torch.randn(64, 2048, 7,7) In [2]: y = torch.randn(64, 2048, 7,7, dtype=torch.float64) In [3]: timeit x.add_(y) 68.2 ms ± 713 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/26816 Differential Revision: D17573455 Pulled By: VitalyFedyunin fbshipit-source-id: 47286abce5e7e665eb61e46ae358c896e945bef2 * Prepare for Cocoapods 1.3 Release (#26751) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26751 ### Summary We're going to use the AWS s3 bucket - `s3://ossci-ios` to store the release binary. To release the cocoapods, we can follow the steps below: 1. Open a fake PR to trigger the CI job that pulls the code from the 1.3.0 tag branch and does the building and uploading. 2. Verify the binary locally - Run tests on both arm64 and simulator 3. Publish the cocoapods officially ### Test plan - podspec lint command succeeds - `pod spec lint --verbose --allow-warnings --no-clean --use-libraries --skip-import-validation` Test Plan: Imported from OSS Differential Revision: D17577131 Pulled By: xta0 fbshipit-source-id: 55fee918ecc5c4e0b6d714488a12351b4370afac * Validate Docker version in CI. (#26496) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26496 It is a BAD BAD idea to deploy Docker versions which are not deployed (per ossci-job-dsl) because those versions will get GC'ed after two weeks. At the moment, there is no verification that your Docker version is deployed. This adds an Azure job to check this. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D17575100 Pulled By: ezyang fbshipit-source-id: 8df2331c6e6899c585bc2917b55e8955908b0e4a * Fix CI docker builds (#26704) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26704 nccl 2.1.15 isn't available for CUDA 10.1 and 2.4.8 isn't available for cuda 9.1 :( ghstack-source-id: 90714191 Test Plan: build docker images on Jenkins Differential Revision: D17543120 fbshipit-source-id: 882c5a005a9a3ef78f9209dea9dcec1782060b25 * Export baddbmm (#25738) Summary: Added ONNX export for baddbmm in opset9 Pull Request resolved: https://github.com/pytorch/pytorch/pull/25738 Reviewed By: hl475 Differential Revision: D17565828 Pulled By: houseroad fbshipit-source-id: 85f605a7b3fa4783ef4f6ced86223133c85062d5 * Fix Future default constructor missing for ParallelNative Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26739 Test Plan: Imported from OSS Differential Revision: D17577908 Pulled By: bwasti fbshipit-source-id: a09cdbd8619a926e93418a692ce859d4157f2da8 * Quantized Interpolate Kernel(upsample_bilinear2d) (#26631) Summary: We implement the quantized upsample_bilinear2d case for interpolate kernel in this PR. For nhwc performance improvement: import torch, time for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('****', str(dtype), '*****') x = torch.rand(1, 56, 56, 256) q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype) q_x = q_x.permute([0, 3, 1, 2]) x = x.permute([0, 3, 1, 2]) NITER = 100 s = time.time() for i in range(NITER): float_out = torch.nn.functional.interpolate(x, size=5, scale_factor=None, mode="bilinear", align_corners=True) time_per_iter_float = (time.time() - s) / NITER s = time.time() for i in range(NITER): quant_out = torch.nn.quantized.functional.interpolate(q_x, size=5, scale_factor=None, mode="bilinear", align_corners=True) time_per_iter_quant = (time.time() - s) / NITER ref_quantized = torch.quantize_per_tensor(float_out, 0.5, 1, dtype) # torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize()) print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t') print(time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') bytes_float = (x.numel() + float_out.numel()) * x.element_size() bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size() float_bw_gbps = bytes_float / time_per_iter_float / 1e9 quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9 print('GB/s float', 'GB/s quant', sep='\t') print(float_bw_gbps, quant_bw_gbps, sep='\t') ===========without nhwc handling=========== **** torch.qint8 ***** time/iter ms (float) time/iter ms (quant) quant/float 1.999044418334961 2.5860953330993652 1.2936657681940702 GB/s float GB/s quant 1.6192056416115257 0.3129103516188541 **** torch.quint8 ***** time/iter ms (float) time/iter ms (quant) quant/float 2.02730655670166 2.6061582565307617 1.2855274639721328 GB/s float GB/s quant 1.596632728927902 0.3105014816242217 **** torch.qint32 ***** time/iter ms (float) time/iter ms (quant) quant/float 2.0180463790893555 2.4047350883483887 1.1916153728010588 GB/s float GB/s quant 1.603959172365819 1.3460376636426636 ===========with nhwc handling=========== **** torch.qint8 ***** time/iter ms (float) time/iter ms (quant) quant/float 2.0913314819335938 0.09696483612060547 0.04636512047863123 GB/s float GB/s quant 1.5477527249803915 8.345458337015 **** torch.quint8 ***** time/iter ms (float) time/iter ms (quant) quant/float 2.1065664291381836 0.09959936141967773 0.04728042754408879 GB/s float GB/s quant 1.5365591871338384 8.124710725706763 **** torch.qint32 ***** time/iter ms (float) time/iter ms (quant) quant/float 2.044203281402588 0.6003522872924805 0.29368521846837126 GB/s float GB/s quant 1.5834354779917448 5.391607675216635 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26631 Differential Revision: D17521498 Pulled By: llyfacebook fbshipit-source-id: 385ae0f77777cd8bee385cafb80e492127b7d103 * Typevar matching fix + implicit conversions from Scalar to int/float (#26453) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26453 Previously, schema matching would incorrectly widen typevar bindings when later occurrences were supertypes of earlier ones. This allowed callsites like `floatlist.append(tensor.item())` to pass the typechecker, causing a runtime assert (issue #24856). An earlier, reverted fix (#25136) insisted on strict equality across all occurrences of a typevar, necessitating explicit casts around Scalar-typed arguments to int- or float-typed parameters, like `tensor.item()` above. This was per the original type system design, but turned out to break existing user code that relied on the de facto dynamic downcast. (The error required a specialized list representation.) The current fix includes the prevention of typevar widening, but adds logic to insert implicit conversions from Scalar to float or int as needed to satisfy a matched schema. Test Plan: Imported from OSS Differential Revision: D17470598 Pulled By: bhosmer fbshipit-source-id: d260dbf3cd78b9c2f2229bc61afc84e1910b5659 * Improve C++ maxpool and avgpool (#26521) Summary: This PR makes the following improvements: 1. Add `forward_with_indices` method to all C++ MaxPool modules, to return the max indices along with the outputs. (We can't make two `forward` methods that return different types based on input, because that will break the type deduction of `torch::detail::return_type_of_forward_t`) 2. Add `max_poolNd_with_indices` to `torch::nn::functional`, to be used when indices of the max values are needed. (We can't merge this with `torch::nn::functional::max_poolNd` because the return type of `max_poolNd` has to be defined statically). 3. Improve `pretty_print` of C++ MaxPoolNd and AvgPoolNd modules to match the Python `extra_repr`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26521 Differential Revision: D17507358 Pulled By: yf225 fbshipit-source-id: b6c0e2b27b38378cdc0c75f4bfc797b3c6b17cd9 * Revert D17565828: [pytorch][PR] [ONNX] Export baddbmm Test Plan: revert-hammer Differential Revision: D17565828 Original commit changeset: 85f605a7b3fa fbshipit-source-id: 7705325087d83362f71a717be880a13e9f575b37 * Cuda101 upgrade (#26823) Summary: test run: https://github.com/pytorch/pytorch/issues/26732 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26823 Reviewed By: soumith Differential Revision: D17576095 Pulled By: mingbowan fbshipit-source-id: 269cf443aea18b47bbee63996d035bc5bcd2726b * Convert TensorIterator to use function_ref, a lightweight alternative to std::function. (#26592) Summary: function_ref is pulled over from LLVM. It is to callables what StringRef is to strings. This allows it to be substantially lighter weight, particularly in code size. That comes at the cost of not being usable in situations where the callable's lifetime is shorter than the function_ref. This means it is suitable for callback-like scenarios, but not for situations where the callable needs to be stored. In converting TensorIterator, I only encountered one situation that required refactoring to comply with function_ref's constraints. In my local Release build, this reduces the size of libtorch by 4MB, from 70MB->66MB. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26592 Differential Revision: D17516202 fbshipit-source-id: 267476891f767f4827a4d38149f70e5035c56c48 * Revert D17473200: [pytorch][distributed] add function to get NCCL version for logging Test Plan: revert-hammer Differential Revision: D17473200 Original commit changeset: 4881ed5221b3 fbshipit-source-id: c5635ce89de1644d2135b657427cbd0c3af83576 * Named tensor support for: all, any, bitwise_not, cumprod, cumsum, and more (#26815) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26815 This PR adds named tensor support for: - any, all, `bitwise_not(_)`, cumprod, cumsum, `logical_not` In addition, it adds smoke tests for a variety of tensor attributes and fns: - is_shared, is_signed - retain_grad, register_hook Test Plan: - [namedtensor ci] Differential Revision: D17575905 Pulled By: zou3519 fbshipit-source-id: 37bfa327e68112c5bf0f6bf1f467a527f50fa1c4 * torch.load default encoding change to 'utf-8' (#26421) Summary: Default encoding when using torch.load to 'utf-8' This commit provides changes for cases where user tries to torch.load a pickled module with non-ASCII characters in the docstring as discussed in https://github.com/pytorch/pytorch/issues/21743. The default encoding was changed from 'ascii' to 'utf-8'. Documentation for `torch.load` was updated and two tests (loading py2 unicode module with unicode in it; error throwing when user explicitly sets wrong encoding) were written. ~~This commit provides changes for better error handling in cases where user tries to `torch.load` a pickled module with non-ASCII characters in the docstring as discussed in https://github.com/pytorch/pytorch/issues/21743.~~ Ping ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/26421 Differential Revision: D17581633 Pulled By: yf225 fbshipit-source-id: f8e77dcf7907092771149aad8ede6cfb73c21620 * fix to operate on cuda kernel with clang and libc++ (#25553) Summary: We find a bug about `std::tuple` with nvcc. In C++11, `std::tuple` constructor is constexpr in libstdc++, but is not constexpr in libc++. https://github.com/pytorch/pytorch/blob/c36b77fcdad3d54227cf0fd51693eb57035002c0/aten/src/ATen/native/cuda/Loops.cuh#L109-L111 The lines have occurred crashes in CUDA with a message `scan failed with synchronize`. It is a error message of cuda initialization. The purpose of this PR is fixed for loop in nvcc and libc++ by not using `std::tuple`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25553 Differential Revision: D17582118 Pulled By: yf225 fbshipit-source-id: d6f62ed46c2415b48eb49f8a051cf3c0e7cb23ce * Do not call cpuinfo_initialize() on other than x86 arch. (#26265) Summary: cpuinfo_initialize() was not implemented for s390 arch. cpuinfo calls are x86 specific to determine vector extensions AVX, AVX512 etc. Without this patch an unnecessary error log is printed in s390 arch: Error in cpuinfo: processor architecture is not supported in cpuinfo Pull Request resolved: https://github.com/pytorch/pytorch/pull/26265 Differential Revision: D17452301 Pulled By: izdeby fbshipit-source-id: 9ca485550385c26dec18aac5953c887f1ffbfb7a * support iterables, rangevalue in list comprehensions (#26768) Summary: Support IterableValue expressions and rangevalue in list comprehensions. Just as with supporting list comprehensions where the expression changes the input list types, we need to correctly type the list we create and it works. Fixes https://github.com/pytorch/pytorch/issues/26693 Fixes https://github.com/pytorch/pytorch/issues/22483 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26768 Differential Revision: D17562762 Pulled By: eellison fbshipit-source-id: 7ce8bf8605758dfd99057bc0376b4b724c4f9251 * Fix CUDA named tensor `copy_` (#26829) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26829 The TensorIterator loop for `copy_` uses operations that are currently unsupported by named tensors. The solution is to wrap `copy_` in a function that does the name propagation and ignore names when running the implementation of `copy_`. There is no test case because I'm not sure how to trigger the incorrect behavior, but there is definitely code in CUDA copy that doesn't support named tensors (expand_as isn't supported): https://github.com/pytorch/pytorch/blob/aaf30cdf36839bc3f21b1622fb91ff3e2983e8ea/aten/src/ATen/native/cuda/Copy.cu#L141-L148 Test Plan: - [namedtensor ci] Differential Revision: D17577310 Pulled By: zou3519 fbshipit-source-id: e11c52243800e1331fad738084304badcfd51ae2 * Highlighting in the doc that square root comes before adding epsilon Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26735 Test Plan: Imported from OSS Differential Revision: D17558505 Pulled By: vincentqb fbshipit-source-id: 36449c501f3ab3bc7cadd1f580258904b39369d4 * Bytecode export flow (#25187) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25187 The bytecode export flow: dump the bytecode format for the light weighted interpreter. * The bytecode is generated without input spec optimization. It would be more generic (input independent) with no obvious performance degradation (to be tested). * Main API: torch::jit::script::Module::save(filename, extra_files, bool *bytecode_format* = false). * Both bytecode and module object are exported in pickle format. * The module object (in data.pkl) is the same as the original JIT model. * The serializer is dependent on pickle only (no protobuf or Json). * The major functionality is forked in ScriptModuleSerializer2::serialize(). * The test loader is test_bc_export.cpp. * Simple APIs are added in Code and its implementation to get necessary information (instructions, operators and constants). * Since there's no dependency on graph/node, GetAttr is promoted from an operator to first-class instruction (https://github.com/pytorch/pytorch/pull/25151) . * Some definitions (instructions, writeArchive, etc) that are shared by full JIT and bytecode are pulled out of the local namespace (https://github.com/pytorch/pytorch/pull/25148). The output layout looks like: * folders of methods. * In each method folder (for example, forward/): * bytecode.pkl: instructions and operators * constants{.pkl,/}: constant list in constants.pkl. If there are tensors in constants, the binary tensor files in constants/ folder. * data{.pkl,/}: the module object, with binary tensor files in data/ folder. The same as in torchscript. Test Plan: Imported from OSS Differential Revision: D17076411 fbshipit-source-id: 46eb298e7320d1e585b0101effc0fcfd09219046 * Move the CUDA implementation of log to ATen. (#26494) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26494 Close #24586 Test Plan: Imported from OSS Differential Revision: D17572497 Pulled By: VitalyFedyunin fbshipit-source-id: e1bcd33021464eaa4affd4c6d3283c8403069945 * enable double backward for non-cudnn LSTM and GRU (#26660) Summary: An attempt to enable double backward for non-cudnn LSTM and GRU (see https://github.com/pytorch/pytorch/issues/25315, https://github.com/pytorch/pytorch/issues/20449). RNN works already because it does not rely on fused kernels. This does not implement double backward function itself, because that is pretty hard to spell out. Instead, it implements backward using differentiable operations, so that double backward can be done automatically. The good: seems to work, no effect on performance on the usual case without double backward. because fused lstm backward is used. The bad: Performance of backward and, especially, double backward, is pretty bad. Scripting would still be a preferred way if we want a performant solution. Performance and/or memory use can be slightly improved if in-place variants can be used for sigmoid_backward and tanh_backward to avoid cat in the end, but I'm not yet sure it's possible, and in any case it is only slight improvement. The ugly: I could not figure out a way to reuse workspace that contains the sum of the gates with the applied sigmoid and tanh operations, so that's probably another perf and memory hit. cc soumith, albanD. If you think this approach is viable, I can extend to GRU and RNN. Thanks to mcarilli whose approach to double backward in weight norm I copied. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26660 Test Plan: added tests to check gradgrad for GRU and LSTM with cudnn disabled. Differential Revision: D17581489 Pulled By: ngimel fbshipit-source-id: efd204289e9a0e94d94896a0b3bff5cf6246cafa * Migrate multinomial from the TH to Aten (CUDA) (#26481) Summary: https://github.com/pytorch/pytorch/issues/24604 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26481 Differential Revision: D17489859 Pulled By: ifedan fbshipit-source-id: 0702044c7c0f78e5e30826e8a5a83da27156bdb3 * QEngine::QNNPACK enabled, module.eval() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26855 Test Plan: Imported from OSS Differential Revision: D17589837 Pulled By: IvanKobzarev fbshipit-source-id: 0084538e9b9d760a8728cdcd5723fc7fae5838c7 * Use optimized_graph in graph_executor. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26705 Test Plan: Imported from OSS Differential Revision: D17543281 Pulled By: ZolotukhinM fbshipit-source-id: 91c40559aac6f2a1f77060fa28c33725a2b8e5f9 * Remove convert_to_ssa argument from runCleanupPasses - it is only used in one place. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26703 Test Plan: Imported from OSS Differential Revision: D17543131 Pulled By: ZolotukhinM fbshipit-source-id: c4a209c55ac76d8472e64af79f76e9a61fd2a941 * Throw if someone tries to torch.save() quantized modules (#26828) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26828 Pickle serialization for quantized modules is currently broken by https://github.com/pytorch/pytorch/issues/24045, so let's be loud and fail if the user tries to do it Test Plan: Imported from OSS Differential Revision: D17579127 Pulled By: jamesr66a fbshipit-source-id: 3deccac7e4590c6f648f22bb79c57badf3bf0487 * Fix broken failure messages for OverloadedMethodValue Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26846 Test Plan: Imported from OSS Differential Revision: D17587050 Pulled By: jamesr66a fbshipit-source-id: e5f3ea05b496afae15994b539f018ed0499ca62b * Re-write of tensor-scalar quantized add Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26766 Test Plan: Imported from OSS Differential Revision: D17587105 Pulled By: jamesr66a fbshipit-source-id: 4da6ea98a4c5cc36fd191d9845c1ef409efce464 * Try to disable annoying hypothesis warnings again (#26853) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26853 This is the same as https://github.com/pytorch/pytorch/pull/25188 but we add a version check for if the hypothesis version is too old Test Plan: Imported from OSS Differential Revision: D17589086 Pulled By: jamesr66a fbshipit-source-id: b968965719593ff989d612384e00dfb823cf0a73 * Remove three unused declaration. (#26699) Summary: `frac()` in `Vec256<int{16,32,64}_t>` is not overridden. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26699 Differential Revision: D17549502 Pulled By: soumith fbshipit-source-id: 87c65286032bfc88c447ec4eef1e3ebc73da5d27 * Fix building with PARALLEL_BACKEND=NATIVE_TBB (#26742) Summary: Fixing https://github.com/pytorch/pytorch/issues/26721 Pull Request resolved: https://github.com/pytorch/pytorch/pull/26742 Test Plan: ``` export USE_OPENMP=0 export USE_TBB=1 export BLAS=MKL export MKL_THREADING=TBB export MKLDNN_THREADING=TBB export PARALLEL_BACKEND=NATIVE_TBB export USE_CUDA=0 python setup.py build ``` Reviewed By: dskhudia Differential Revision: D17586233 Pulled By: ilia-cher fbshipit-source-id: 8e8befa6aa776b8c2b27bb4b79a3bff33dbcba7e * Remove unnecessary functions and cleanup code in quantization.cpp. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26852 Test Plan: Imported from OSS Differential Revision: D17587742 Pulled By: ZolotukhinM fbshipit-source-id: f345ea4d524fde9741d6629dec1ea8ab870e49a5 * Updating submodules Summary: GitHub commits: https://github.com/pytorch/fbgemm/commit/f767351c4b85cb29f6ea07d1a3bc27d62cca5150 Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: d0bfc9e5e62669ada8d56b853490a373eb8ba2f7 * Improvements to GuardElimination and InsertBailouts Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25430 Differential Revision: D17584722 Pulled By: Krovatkin fbshipit-source-id: 9db099b904d71572c1bf3aef5419d38435cecbb5 * add mobile friendly at:parallel_for backend Summary: This diff implemented at::parallel_for()/parallel_reduce() and other ATen/Parallel.h APIs for mobile using caffe2::ThreadPool. caffe2::ThreadPool doesn't support submitting individual tasks separately and running them in parallel - all tasks need to be submit in one batch which will lock the thread pool until all of them finish - as a result we didn't wrap caffe2::ThreadPool with TaskThreadPoolBase interface and reuse at::parallel_for() implementation in ParallelNative.h. Because of this constraint, intraop_launch() / intraop_launch_future() are not supported yet. This diff doesn't touch inter-ops pool - it's still default native c10 thread pool. Will work on it when it's widely used. Test Plan: - This is early draft to receive feedback. Will do more thorough tests. Differential Revision: D17543412 Pulled By: ljk53 fbshipit-source-id: 53a3259409c7207d837b9135d87d8daa6ad15e30 * remove backward functions from jit-op-registry for mobile build (#26851) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26851 Add codegen option to remove backward ops from jit-op-registry as they are not likely to be used for inference only mobile build. Measured ARM-v7 AAR build size change: 5,804,182 -> 5,331,219. Test Plan: - build and integrate with demo app; Differential Revision: D17587422 Pulled By: ljk53 fbshipit-source-id: 08c0fc7a710698a0d4baaf16bbb73cb812b1126a * Enable batch_size = 0 support in DNNLOWP Concat operator (#26849) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26849 We were having division-by-zero errors when one of the input tensor dimension is 0 . Examples: P111481720 and P111481374 This diff adds unit tests for empty input tensors and fixes division-by-zero errors in the partition function. Test Plan: buck test caffe2/caffe2/quantization/server:concat_dnnlowp_op_test -- --stress-runs=100 Reviewed By: jianyuh Differential Revision: D17574566 fbs…
Stack from ghstack:
Differential Revision: D17556197