Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose a torch.result_type and simplify tensor iterator #26012

Closed
wants to merge 18 commits into from

Conversation

nairbv
Copy link
Collaborator

@nairbv nairbv commented Sep 11, 2019

Stack from ghstack:

Differential Revision: D17556197

@pytorchbot pytorchbot added module: autograd Related to torch.autograd, and the autograd engine in general module: docs Related to our documentation, both in docs/ and docblocks module: internals Related to internal abstractions in c10 and ATen module: operators module: pybind Related to our Python bindings / interactions with other Python libraries labels Sep 11, 2019
nairbv added a commit that referenced this pull request Sep 11, 2019
ghstack-source-id: 0e625685f41bd76f8aa8a79bef3657545c511eba
Pull Request resolved: #26012
nairbv added a commit that referenced this pull request Sep 11, 2019
ghstack-source-id: d7dd15ad18df72f72147acea0a251bba6f765c58
Pull Request resolved: #26012
@nairbv nairbv requested a review from gchanan September 11, 2019 20:58
nairbv added a commit that referenced this pull request Sep 16, 2019
ghstack-source-id: 2abffc139c3d2d384c2826ab25d2eefde5374c1b
Pull Request resolved: #26012
nairbv added a commit that referenced this pull request Sep 17, 2019
ghstack-source-id: 3a42aa839bda5b9fe48e7d09eb8b8c4979392bc2
Pull Request resolved: #26012
aten/src/ATen/core/OpsAlreadyMovedToC10.cpp Outdated Show resolved Hide resolved
torch/csrc/autograd/utils/wrap_outputs.h Show resolved Hide resolved
torch/_torch_docs.py Outdated Show resolved Hide resolved
torch/_torch_docs.py Outdated Show resolved Hide resolved
aten/src/ATen/native/native_functions.yaml Outdated Show resolved Hide resolved
- func: result_type(Tensor tensor, Tensor other) -> ScalarType
variants: function

- func: result_type(Tensor tensor, Scalar other) -> ScalarType
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so...it's not actually enforced that we have overload names? @dzhulgakov / @smessmer is your current story that we just don't need to worry about it?

aten/src/ATen/core/ivalue.h Outdated Show resolved Hide resolved
IValue(ScalarType t) : tag(Tag::ScalarType) {
payload.as_int = static_cast<std::underlying_type<ScalarType>::type>(t);
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

naively, I would also expect an isScalarType and that toScalarType actually checks the tag is a scalar type.

I'm admittedly not sure if we need to deal with BC constraints here, though. @suo do you know?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was wrong, will fix

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added the is_intrusive_ptr, isScalarType, and added the check in toScalarType

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, this looks correct now, but I don't know about the BC concerns.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is it that would potentially backwards incompatible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the isScalarType check. toLayout, toMemoryFormat, etc don't have it and it was causing test failures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For correctness, adding a new tag requires adding a corresponding type in jit_type.h, and adjusting the schema/type parsers to understand that 'ScalarType' maps to this type. This PR seems to just add the tag and not add a type. If we need to keep the tag around, then we should also add the corresponding type and fix the type parser.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zdevito this PR does add a new type in jit_type.h, is it incorrect?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed almost all jit changes

torch/_torch_docs.py Outdated Show resolved Hide resolved
@pytorchbot pytorchbot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Sep 20, 2019
Copy link
Contributor

@gchanan gchanan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-JIT parts look good -- I'd defer to @suo or others on the JIT parts.

@nairbv
Copy link
Collaborator Author

nairbv commented Sep 21, 2019

@pytorchbot retest this please

@nairbv nairbv requested a review from suo September 23, 2019 13:21
@nairbv nairbv added this to the 1.3 milestone Sep 24, 2019
@nairbv
Copy link
Collaborator Author

nairbv commented Sep 25, 2019

related issue: #25472

@facebook-github-bot
Copy link
Contributor

@nairbv merged this pull request in 002c250.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Sep 25, 2019
Summary: Pull Request resolved: pytorch/pytorch#26012

Test Plan: Imported from OSS

Differential Revision: D17556197

Pulled By: nairbv

fbshipit-source-id: c0be3ac9e99fecc26a181e301defc1942bc6708c
gchanan pushed a commit to gchanan/pytorch that referenced this pull request Sep 26, 2019
ghstack-source-id: e24e33e2259a4b643567ef32ccbab73e511a6bc3
Pull Request resolved: pytorch#26012
rohithkrn added a commit to ROCm/pytorch that referenced this pull request Oct 1, 2019
* Typo fix (#26417)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26417

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17548776

Pulled By: ezyang

fbshipit-source-id: 8c79893ee4216780edb838671e701de5518c4cd0

* Don't generate named tensor functions to RegistrationFunctions.h (#26685)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26685

This prevents XLA from picking up on named tensor APIs. I ran into some
problems while attempting to support dimname overloads in XLA; since we
don't need the first iteration of named tensors to work with XLA this is
OK.

Test Plan: - run CI.

Differential Revision: D17538893

Pulled By: zou3519

fbshipit-source-id: 93d579c93f5b1dc68541c07c4a3d61792859507d

* Updating submodules

Summary:
GitHub commits:

https://github.com/facebook/litho/commit/ff4a61094e9405310b39219a35c6ff8e44300573
https://github.com/facebookincubator/mvfst/commit/ad81c3823ec7910296f97d2050fde181be1d4ac4
https://github.com/pytorch/fbgemm/commit/518d8a1832cf1eb1dda2feace1a278e9e4f302ba

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 2a9a47805569a43e05d044c5494b57f6a7996bc4

* Add tests for C++ functional cosine_similarity and pairwise_distance, and clean up functional test code (#26559)

Summary:
This ensures that `F::cosine_similarity` and `F::pairwise_distance` can be used simply by including `torch/torch.h` and set `namespace F = torch::nn::functional`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26559

Differential Revision: D17507421

Pulled By: yf225

fbshipit-source-id: f895dde3634d5c8ca66ee036903e327e5cdab6b1

* allow building docker without torchvision (#26168)

Summary:
There is an issue with the torchvision version not matching the pytorch version if one builds the docker from a tag, see issue https://github.com/pytorch/pytorch/issues/25917.  The current solution requires one to re-init the submodules or manually change the version of torchvision.  This PR allows one to build the docker image without torchvision, which not only fixes the above mentioned bug but also frees non-image pytorch users from the tyranny of torchvision :laughing:.

In all seriousness, for NLP researchers especially torchvision isn't a necessity for pytorch and all non-essential items shouldn't be in the docker.  This option removes one extra thing that can go wrong.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26168

Differential Revision: D17550001

Pulled By: soumith

fbshipit-source-id: 48b8b9e22b75eef3afb392c618742215d3920e9d

* Speed up an integer to the power of a positive integer on CPU (#26020)

Summary:
Current integer scalar exps are always cast to double. This commit avoids cast if the tensor is also
integral and the scalar is positive to speed up.

Benchmark (Debian Buster, g++ 8, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz	0	0:0	3300.00 MHz	, Debug
build, Turbo turned off):

```python
import timeit

for n, t in [(1000, 13000),
            (10_000, 1300)]:
    for e in (2, 3, 4):
        for dtype in ('torch.int16', 'torch.int32', 'torch.int64'):
            print(f'a.pow({e}) (a.numel() == {n}) for {t} times')
            print(f'dtype {dtype}, {t} times', end='\t\t')
            print(timeit.timeit(f'a.pow({e})',
                                setup=f'import torch; a = torch.arange({n}, device="cpu", dtype={dtype})',
                                number=t))
```

Before:

```
a.pow(2) (a.numel() == 1000) for 13000 times
dtype torch.int16, 13000 times		1.6958350749996498
a.pow(2) (a.numel() == 1000) for 13000 times
dtype torch.int32, 13000 times		0.7989626339999631
a.pow(2) (a.numel() == 1000) for 13000 times
dtype torch.int64, 13000 times		0.7973162800003593
a.pow(3) (a.numel() == 1000) for 13000 times
dtype torch.int16, 13000 times		1.8660746679997828
a.pow(3) (a.numel() == 1000) for 13000 times
dtype torch.int32, 13000 times		0.8101709959996697
a.pow(3) (a.numel() == 1000) for 13000 times
dtype torch.int64, 13000 times		0.8135280149999744
a.pow(4) (a.numel() == 1000) for 13000 times
dtype torch.int16, 13000 times		5.010833072999958
a.pow(4) (a.numel() == 1000) for 13000 times
dtype torch.int32, 13000 times		4.801007671999741
a.pow(4) (a.numel() == 1000) for 13000 times
dtype torch.int64, 13000 times		3.963344578000033
a.pow(2) (a.numel() == 10000) for 1300 times
dtype torch.int16, 1300 times		1.6216251330001796
a.pow(2) (a.numel() == 10000) for 1300 times
dtype torch.int32, 1300 times		0.5672429639998882
a.pow(2) (a.numel() == 10000) for 1300 times
dtype torch.int64, 1300 times		0.5544572270000572
a.pow(3) (a.numel() == 10000) for 1300 times
dtype torch.int16, 1300 times		1.656308512999658
a.pow(3) (a.numel() == 10000) for 1300 times
dtype torch.int32, 1300 times		1.502670819999821
a.pow(3) (a.numel() == 10000) for 1300 times
dtype torch.int64, 1300 times		0.5757876879997639
a.pow(4) (a.numel() == 10000) for 1300 times
dtype torch.int16, 1300 times		4.775718216999849
a.pow(4) (a.numel() == 10000) for 1300 times
dtype torch.int32, 1300 times		4.754745475000163
a.pow(4) (a.numel() == 10000) for 1300 times
dtype torch.int64, 1300 times		3.737249878000057
```

After:

```
a.pow(2) (a.numel() == 1000) for 13000 times
dtype torch.int16, 13000 times		1.1006453190002503
a.pow(2) (a.numel() == 1000) for 13000 times
dtype torch.int32, 13000 times		1.0849009019998448
a.pow(2) (a.numel() == 1000) for 13000 times
dtype torch.int64, 13000 times		1.093259106000005
a.pow(3) (a.numel() == 1000) for 13000 times
dtype torch.int16, 13000 times		1.0859826279997833
a.pow(3) (a.numel() == 1000) for 13000 times
dtype torch.int32, 13000 times		1.1076840900000207
a.pow(3) (a.numel() == 1000) for 13000 times
dtype torch.int64, 13000 times		1.0755480369998622
a.pow(4) (a.numel() == 1000) for 13000 times
dtype torch.int16, 13000 times		1.918211066999902
a.pow(4) (a.numel() == 1000) for 13000 times
dtype torch.int32, 13000 times		1.9183043200000611
a.pow(4) (a.numel() == 1000) for 13000 times
dtype torch.int64, 13000 times		1.930021430999659
a.pow(2) (a.numel() == 10000) for 1300 times
dtype torch.int16, 1300 times		0.7271483560002707
a.pow(2) (a.numel() == 10000) for 1300 times
dtype torch.int32, 1300 times		0.7289002070001516
a.pow(2) (a.numel() == 10000) for 1300 times
dtype torch.int64, 1300 times		0.7267536800000016
a.pow(3) (a.numel() == 10000) for 1300 times
dtype torch.int16, 1300 times		0.7301799359997858
a.pow(3) (a.numel() == 10000) for 1300 times
dtype torch.int32, 1300 times		0.7289195180001116
a.pow(3) (a.numel() == 10000) for 1300 times
dtype torch.int64, 1300 times		0.7270008230002531
a.pow(4) (a.numel() == 10000) for 1300 times
dtype torch.int16, 1300 times		1.5354506029998447
a.pow(4) (a.numel() == 10000) for 1300 times
dtype torch.int32, 1300 times		1.528263066999898
a.pow(4) (a.numel() == 10000) for 1300 times
dtype torch.int64, 1300 times		1.5369428439998956
```

 ---

Best viewed with whitespace changes turned off
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26020

Differential Revision: D17485400

Pulled By: VitalyFedyunin

fbshipit-source-id: 3a16b074825a5aab0f7e7af3d8100f9e4b7011a3

* Use noop observer to pass dtype for dynamic quantization (#26709)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26709

Polishes implementation from #25975. Primarily, we use NoopObserver to communicate that weights need to be quantized to float16. The very top-level API (quantize_dynamic) stays the same with `dtype` argument but the implementation follows the common flow.

One can argue that dynamic fp16 quantization doesn't really fit into the 'observer' mechanism. It's in fact not ideal, but it's better to have the same flow than branching on both dtype and qconfig.

Test Plan: Imported from OSS

Differential Revision: D17544103

Pulled By: dzhulgakov

fbshipit-source-id: 6af3f18c35929a1a53ea734079c005f656e4925f

* Remove duplicate calculation of output shape (#26684)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26684

Output heights and widths are already calculated by conv_p. Remove the duplicate calculation.
ghstack-source-id: 90633432

Test Plan:
buck test mode/dev caffe2/test:quantized
```
Summary (total time 18.69s):
  PASS: 45
  FAIL: 0
  SKIP: 10
    caffe2/test:quantized - test_qadd_scalar_relu (test_quantized.TestQuantizedOps)
    caffe2/test:quantized - test_equal (test_quantized.TestQuantizedOps)
    caffe2/test:quantized - test_qnnpack_add (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qconv_unpack (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qlinear_unpack (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_compare_tensor_scalar (test_quantized.TestComparatorOps)
    caffe2/test:quantized - test_qconv_qnnpack (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qlinear_qnnpack (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qnnpack_maxpoolMore details at https://our.intern.facebook.com/intern/buck/build/3b394f1e-ab99-4e59-bdf5-2766f46e9869
2d (test_quantized.TestQNNPackOps)
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```

Differential Revision: D17538375

fbshipit-source-id: b4b60e93fdec4cc7bbf6aee7182381221dfac243

* Expands TestAutogradDeviceType (#26708)

Summary:
- Ports all CUDA tests to TestAutogradDeviceType except those using multiple devices
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26708

Differential Revision: D17549435

Pulled By: mruberry

fbshipit-source-id: b564186444201d1351934b6a7d21f67bdfca6e3b

* Add traces to specialize_autograd and lower_grad_of (2nd try)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/22752

Differential Revision: D17543836

Pulled By: Krovatkin

fbshipit-source-id: 5cbca220943a580169bf60ac09780b6e67075d2b

* Setting automatic default selection for ONNX IR v4 semantics in ONNX export API (#26146)

Summary:
This is a follow-up PR for https://github.com/pytorch/pytorch/pull/23284. In that PR we had removed changing the default behavior for `keep_initializers_as_input` argument to the export API. With this PR we are enabling that change in that if `keep_initializers_as_input` is not specified then value/behavior for this argument is chosen automatically depending on whether the export type is ONNX or not.

This was part of the earlier PR was removed for further review. The test points have also been updated.

This change may fail some internal tests which may require explicitly setting `keep_initializers_as_input=True` to preserve old behavior.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26146

Reviewed By: hl475

Differential Revision: D17369677

Pulled By: houseroad

fbshipit-source-id: 2aec2cff50d215714ee8769505ef24d2b7865a11

* Enable hub tests on MacOS (#26697)

Summary:
fix https://github.com/pytorch/pytorch/issues/26032.
This was broken by a bad openssl release in conda. Should be fixed now. Testing...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26697

Differential Revision: D17542095

Pulled By: ailzhang

fbshipit-source-id: ba99f9b36ef2a7c793842cf91bd46fb2634ac1aa

* Trivial quantized torch.mean implementation

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26253

Test Plan: Imported from OSS

Differential Revision: D17529994

Pulled By: jamesr66a

fbshipit-source-id: e3aff71da35b05ed61710cdb88d72b51c944168b

* Remove _dequantize_per_channel in the pattern (#26680)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26680

This was introduced before under the assumption that we'll have a qconv_per_tensor_affine
and a qconv_per_channel_affine, but turns out we don't have these, so we'll remove
thse functions.

Test Plan:
python test/test_jit.py 'TestJit.test_quant_fusion'

Imported from OSS

Differential Revision: D17542607

fbshipit-source-id: b90ce5738170f0922bdc2eb1c4dbecd930f68a48

* Register values listed in __constants__ as attributes of the Module. (#26581)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26581

We're currently inlining immediate values of the constants directly into
IR when we generate it providing no way to access these values by their
names later. This change registers such values as atrtibutes of the
module so that they are not lost after IR generation.

Differential Revision: D17513451

Test Plan: Imported from OSS

Pulled By: ZolotukhinM

fbshipit-source-id: cf8f9b450e7178692211abd905ffd2d7ce5a6ce1

* Un-hardcode epsilon constant in FoldConvBatchNorm2d.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26584

Test Plan: Imported from OSS

Differential Revision: D17514653

Pulled By: ZolotukhinM

fbshipit-source-id: 7d9cc8f619b7dbe26fa58eac37cc131929c004d4

* Add doc building instructions

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26553

Differential Revision: D17551426

Pulled By: driazati

fbshipit-source-id: 53ce05882091aca4617586bc53944ee4c8b3a622

* Make `is_optional` check more robust (#26312)

Summary:
If the `Union` contains a non-class type, `issubclass` would fail, this
adds a check for that case
](https://our.intern.facebook.com/intern/diff/17505206/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26312

Pulled By: driazati

Differential Revision: D17505206

fbshipit-source-id: 1331e412f938e2f08ecb079972147f11e3ec77cd

* Remove _dequantize_per_tensor (#26681)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26681

att

Test Plan:
ci

Imported from OSS

Differential Revision: D17542833

fbshipit-source-id: 653e906b0e146763609c69ef0de7f9cf38621586

* fix annotation regex for flake8 (#26694)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26694

Previously we would not properly populate `errorDesc` for:
```
./torch/jit/__init__.py:13:1: F401 'torch.nn.ModuleList' imported but unused
```

because we wanted only letters and spaces. Be more permissive

Test Plan: Imported from OSS

Differential Revision: D17551999

Pulled By: suo

fbshipit-source-id: b82567df1fa3c9729e7427dc3461bedfb40933dc

* Add C++ nn::Identity (#26713)

Summary:
**Summary**:
Adds `torch::nn::Identity` module support for the C++ API.

**Issue**: https://github.com/pytorch/pytorch/issues/25883

**Reviewer**: yf225
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26713

Differential Revision: D17550982

Pulled By: yf225

fbshipit-source-id: f24483846e82d5d276d77a1a0c50884f3bc05112

* add timeout parameter to connect function in TCPStore (#26554)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26554

Previously, in `TCPStore`'s constructor we did not pass in a timeout to
the `connect` function, which thus used the default timeout (-1, so infinite).
But the timeout variable in `TCPStore.cpp `is configurable by the user and set to
be 300 seconds by default, so we should be passing this into the connect function.

Test Plan: see above.

Differential Revision: D17486779

fbshipit-source-id: 42d38a3b8d492d9e9ff09110990a8e4a3a1292b2

* Add threadpool in qlinear and qconv for mobile (#26728)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26728

Use Caffe2::mobile_threadpool() in linear and conv operators

Perf
Without threadpool - 76ms
With threadpool - 41 ms

Test Plan:
python test/test_quantized.py TestQNNPackOps

Imported from OSS

Differential Revision: D17553510

fbshipit-source-id: dd5b06f526f65d87727ec7e3dad0a5fa74cba9f9

* Update ONNX Export for Interpolate in Opset 11 (#24805)

Summary:
- Add support for linear and cubic interpolate in opset 11.
- Add support for 1d and 3d interpolate in nearest mode for opset 7 and 8.
- Add tests for all cases of interpolate in ORT tests (nearest/linear/cubic, 1d/2d/3d, upsample/downsample).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24805

Reviewed By: hl475

Differential Revision: D17330801

Pulled By: houseroad

fbshipit-source-id: 1bdefff9e72f5e70c51f4721e1d7347478b7505b

* Refactor android torchvision: not hardcoded mean/std (#26690)

Summary:
- Normalization mean and std specified as parameters instead of hardcode
 - imageYUV420CenterCropToFloat32Tensor before this change worked only with square tensors (width==height) - added generalization to support width != height with all rotations and scalings
- javadocs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26690

Differential Revision: D17556006

Pulled By: IvanKobzarev

fbshipit-source-id: 63f3321ea2e6b46ba5c34f9e92c48d116f7dc5ce

* Simplify operator `sign` using the helper.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25592

Test Plan: Imported from OSS

Differential Revision: D17552470

Pulled By: VitalyFedyunin

fbshipit-source-id: 6c8cc4f46dd390c231b2d0aac664ad2a6ac8876e

* Revert D17514653: [quant] Un-hardcode epsilon constant in FoldConvBatchNorm2d.

Test Plan: revert-hammer

Differential Revision:
D17514653

Original commit changeset: 7d9cc8f619b7

fbshipit-source-id: 2cf32082a46fe169a1db4926df78a9f3256616ad

* Revert D17513451: Register values listed in __constants__ as attributes of the Module.

Test Plan: revert-hammer

Differential Revision:
D17513451

Original commit changeset: cf8f9b450e71

fbshipit-source-id: 319ec9399173eb06556969dc6be365b319c1ab6c

* Make ONNX_ATEN_FALLBACK also works for _export (#26738)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26738

someone may use torch._export directly. Here we change the onnx_export_type's default value to None,
and if it's pytorch onnx caffe2 bundle, we set it to ONNX_ATEN_FALLBACK, otherwise, it's ONNX.

Test Plan: ci

Reviewed By: hl475

Differential Revision: D17546452

fbshipit-source-id: 38e53926e2b101484bbbce7b58ebcd6af8c42438

* Address review comments in https://github.com/pytorch/pytorch/pull/26272 (#26587)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26587

-
ghstack-source-id: 90557226

Test Plan: unit tests

Differential Revision: D17515048

fbshipit-source-id: 3459ee80efec29080060ec29d67642d789dd8749

* move more functions to InsertObserversHelper (#26696)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26696

att

Test Plan:
ci

Imported from OSS

Differential Revision: D17558701

fbshipit-source-id: 96ef87db74bd1a5d4ddc69867ae71d78c0df83fd

* Added test case for reinit (#26506)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26506

[pytorch] [distributed] Made test forgiving to allow rpc agent to return one of the two errors.
ghstack-source-id: 90667534

Test Plan: Made sure pg based UT works.

Differential Revision: D17488899

fbshipit-source-id: 41f76cf4b4a0ca5e651a5403d6e67b639f0b9c4f

* Switch our Android CI to Clang (#26656)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26656

Updating the NDK to r18 or newer triggers a path in our CI scripts so that we now build with clang instead of gcc.
Google discontinued the gcc support for android quite a while ago, clang is the only way forward.
ghstack-source-id: 90698985

Test Plan: CI

Reviewed By: dreiss

Differential Revision: D17533570

fbshipit-source-id: 5eef4d5a539d8bb1a6682f000d0b5d33b3752819

* quantized_tensor tests (#25429)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25429

Previously we are using empty to generate test tensors, this PR changes the test tensors to use
randint so that we can test things properly
Also added a set_sizes_and_strides and removed .contiguous() in int_repr function to preserve the
original size and strides

Test Plan:
python test/test_quantized_tensor.py

Imported from OSS

Differential Revision: D17559660

fbshipit-source-id: d4ce81d577296c1137270fdaa6b1359fb703896f

* Add a lot of dimname overloads (#26636)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26636

This PR defines a lot of dimname overloads so that when named tensor
support is added for those operators, we will not have to modify the
autogenerated TensorMethods.h, thereby avoiding potential merge
conflicts in the future.

Overloads were added for the following:
- all
- any
- argmax
- argmin
- cumsum
- cumprod
- index_copy
- kthvalue
- mode
- permute
- squeeze
- index_add
- index_fill
- scatter
- scatter_add
- index_select
- gather
- sort
- argsort

Test Plan: - [namedtensor ci]

Differential Revision: D17522984

Pulled By: zou3519

fbshipit-source-id: eca6dea819ba4e4e43b71b700d5cf09176f00061

* Automatic update of fbcode/onnx to ab6b94203c595f74b1f126eb118eef22e4c05a57 (#26736)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26736

Previous import was 23bb6ea1a71f08e200114a153f48bd7adb66d486

Included changes:
- **[ab6b9420](https://github.com/onnx/onnx/commit/ab6b9420)**: Relax IF's shape inference rule (#2345) <Wei-Sheng Chin>
- **[c5af774a](https://github.com/onnx/onnx/commit/c5af774a)**: Clarify behavior in ConvTranspose (#2343) <Wei-Sheng Chin>
- **[a20ba2f1](https://github.com/onnx/onnx/commit/a20ba2f1)**: Fix node test case model for Gemm scalar bias case (#2342) <Hariharan Seshadri>
- **[1aa176e0](https://github.com/onnx/onnx/commit/1aa176e0)**: Update pybind (#2340) <Changming Sun>
- **[7840504d](https://github.com/onnx/onnx/commit/7840504d)**: Update gen_doc script to validate proto3 files (#2122) <Raymond Yang>
- **[bd35e623](https://github.com/onnx/onnx/commit/bd35e623)**: Fix some backend tests  (#2335) <Hariharan Seshadri>

Test Plan: ci

Reviewed By: hl475

Differential Revision: D17552449

fbshipit-source-id: 424acb261b54fc98485f782f6922b11b28c836eb

* Add whitelist for backward compatible checks for function schemas (#26740)

Summary:
Now, we skip all function schema contains quantize key word
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26740

Reviewed By: hl475

Differential Revision: D17561753

Pulled By: houseroad

fbshipit-source-id: c5e47ada072e71bfa2341a0af8f1743e86ef733c

* Revert D17558701: [refactor] move more functions to InsertObserversHelper

Test Plan: revert-hammer

Differential Revision:
D17558701

Original commit changeset: 96ef87db74bd

fbshipit-source-id: fc398d3b8bb1cd0bae573e3fdac5cfb883b31373

* Wrap dimensions during named inference (#26558)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26558

Previously, name inference gets called after dimensions are wrapped.
This PR makes it so that name inference always wraps dimensions so that
it can be called anywhere. Ideally we would only wrap dimensions once,
but many of our operators wrap dimensions in weird places.

Wrapping dimensions in name inference is pretty inexpensive and only
happens for named tensors (name inference does not run on unnamed
tensors.)

Test Plan: - [namedtensor ci]

Differential Revision: D17557049

Pulled By: zou3519

fbshipit-source-id: 68c5636489e233dbf2588ab6ad4e379a6fe4c8ba

* Fix builtin lookup for Python functions

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26688

Pulled By: driazati

Differential Revision: D17560634

fbshipit-source-id: e1c50d1ca24e0313c2b7d704c488a29ef6a47cad

* Revert D17330801: [pytorch][PR] Update ONNX Export for Interpolate in Opset 11

Test Plan: revert-hammer

Differential Revision:
D17330801

Original commit changeset: 1bdefff9e72f

fbshipit-source-id: dff07477403170c27260f736ab6e6010f0deca9f

* Revert D17559660: [fix] quantized_tensor tests

Test Plan: revert-hammer

Differential Revision:
D17559660

Original commit changeset: d4ce81d57729

fbshipit-source-id: b6c9dc31f08935d255fa9eb3a830bafc76a13799

* use new fbgemm PackedDepthWiseConvMatrix without template parameter (#26760)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26760

Follow-up of D17514003 . Change Caffe2 code to use the new PackedDepthWiseConvMatrix interface.

Test Plan: CI

Reviewed By: dskhudia

Differential Revision: D17514350

fbshipit-source-id: 691d9f1fd35bdb7dd8ba152287f3a34359dc1f4c

* Add comments for multidim tensor factory limitations, and rename ListInitTensor for better clarity (#26756)

Summary:
This PR includes the following improvements:
1. Add comments for limitations of the multidim tensor factory function `torch::tensor(...)`, noting the fact that `torch::tensor({})` and mixed data type such as `torch::tensor({{bool, 2.0}})` are not supported at the moment. (I will also update https://pytorch.org/cppdocs/notes/tensor_creation.html to include usage examples for the multidim tensor factory function `torch::tensor(...)`)
2. Rename `ListInitTensor` to `InitListTensor`, for better naming consistency.

This addresses reviews in https://github.com/pytorch/pytorch/pull/26210. I will work on a separate PR to move the factory function to `at::`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26756

Differential Revision: D17560136

Pulled By: yf225

fbshipit-source-id: eb8b45226e999784da48f75cc8953a998582df99

* rename caffe2::mobile_threadpool to caffe2::mobile_pthreadpool

Summary:
Rename old mobile_threadpool() API, replace it with a new version that
returns caffe2::ThreadPool instead of pthreadpool_t.

Test Plan: - builds

Differential Revision: D17543413

Pulled By: ljk53

fbshipit-source-id: a3effd24e8ce9d677a2a04ebe6b6e1582e6f0a65

* Improve error message in IR parser when accessing undefined variable.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26771

Test Plan: Imported from OSS

Differential Revision: D17562853

Pulled By: ZolotukhinM

fbshipit-source-id: b4d4bc6001e3ea06f4d1b8691ad2a339a04c16ea

* Handle DeQuantStub() for QAT (#26518)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26518

Skip Dequantize() modules for QAT alone. For fake quant insertion, DeQuantize() is a no-op and we should not be inserting fake-quant.
ghstack-source-id: 90704220

Test Plan:
buck test caffe2/test:quantization -- --print-passing-details

Tests in test_quantization pass with changes:
Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/281475121296989
Summary (total time 73.03s):
  PASS: 28
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0

Differential Revision: D17439333

fbshipit-source-id: f716c23500324ae08c8d104ee2c9587fa6926571

* Add <cinttypes> include to resolve PRIu32 macro (#26745)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26745

This file doesn't appear to be included by default on GCC 7.3 and
causes compilation to fail. Adding this include fixes compilation.

Test Plan: Imported from OSS

Differential Revision: D17566444

Pulled By: pietern

fbshipit-source-id: 9afb3d4596e424efc5a6ea6ab3b1cffdb2b41fbb

* Fake quantization enhancements for QAT/PTQ support (#26420)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26420

Flags for enabling/disabling observer and fake quant independently. Improve repr for fake quant.
ghstack-source-id: 90704254

Test Plan:
buck test caffe2/test:fake_quant --  --print-passing-details
buck test caffe2/test:quantization -- --print-passing-details

Differential Revision: D17458232

fbshipit-source-id: f44380c60f1a10a8ea09bca8ab79ba5d1867ed62

* Revert D17458232: Fake quantization enhancements for QAT/PTQ support

Test Plan: revert-hammer

Differential Revision:
D17458232

Original commit changeset: f44380c60f1a

fbshipit-source-id: 64a244c720b61fa912bacbb23fcbf9faed0757c2

* Named tensor support for: atan2, output_nr, detach{_}, requires_grad_ (#26543)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26543

Also adds a test for logical_xor (it already had named tensor support
but there was no test)

Test Plan: - [namedtensor ci]

Differential Revision: D17501403

Pulled By: zou3519

fbshipit-source-id: 49be15580be9fb520e25a8020164e5a599d22d40

* Update ONNX Export for Interpolate in Opset 11 (#26778)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26778

- Add support for linear and cubic interpolate in opset 11.
- Add support for 1d and 3d interpolate in nearest mode for opset 7 and 8.
- Add tests for all cases of interpolate in ORT tests (nearest/linear/cubic, 1d/2d/3d, upsample/downsample).
Original PR resolved: https://github.com/pytorch/pytorch/pull/24805

Reviewed By: hl475

Differential Revision: D17564911

Pulled By: houseroad

fbshipit-source-id: 591e1f5b361854ace322eca1590f8f84d29c1a5d

* Support Negative Axis in Size in ONNX (#26436)

Summary:
Currently, we export invalid ONNX models when size() is used with a negative dim.
This PR fixes the issue and allows exporting these models to ONNX (ex: input.size(-1)).
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26436

Reviewed By: hl475

Differential Revision: D17565905

Pulled By: houseroad

fbshipit-source-id: 036bc384b25de77506ef9fbe24ceec0f7e3cff8b

* Expose a torch.result_type and simplify tensor iterator

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26012

Test Plan: Imported from OSS

Differential Revision: D17556197

Pulled By: nairbv

fbshipit-source-id: c0be3ac9e99fecc26a181e301defc1942bc6708c

* Named tensor support for logsumexp, mode, kthvalue, median, min, max (#26563)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26563

This adds name inference rules for pre-existing logsumexp, mode,
kthvalue, and median ops. Also adds overloads so that they can take
`Dimname` dimensions.

There are a lot of min/max overloads. This PR adds name inference to
the following overloads for (both) min and max:
- min(Tensor, int dim)
- min(Tensor, Dimname dim)
- min(Tensor)  (full reduction)

Test Plan: - new tests and [namedtensor ci]

Differential Revision: D17557050

Pulled By: zou3519

fbshipit-source-id: a099a0ef04ad90d021a38a0668fc44902e1c7171

* Delete backwards compatibility Backend overload for registerOp (#25914)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25914

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17284083

Pulled By: ezyang

fbshipit-source-id: 430ac7ea2bd042b1f4bb874e53679d0fde326dec

* Implement multiple dispatch in boxed c10 dispatcher (#26118)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26118

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17404367

Pulled By: ezyang

fbshipit-source-id: 14a16baa4b59f97182725092531a54603f3d92b8

* Remove unnecessary include from TensorBody (#26360)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26360

This is not just for aesthetics: this include blocks the inclusion
of headers like ivalue.h from ATenDispatch.h (as it causes an
include cycle.)

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17429163

Pulled By: ezyang

fbshipit-source-id: 03feb210c12bc891d95bbb5a11ffd694ec05005c

* Add some missing constructors to IValue. (#26718)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26718

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17549623

Pulled By: ezyang

fbshipit-source-id: 8880c09d85a15b2a63dcf0c242ba6a2dd941decb

* Updating submodules

Summary:
GitHub commits:

https://github.com/facebook/litho/commit/6668c21398a9b71f12cff9574bb8c7d8ebf93463
https://github.com/pytorch/fbgemm/commit/189aebb34442a6e96bf88734a047eaae7b258195

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: f2037290b58ac295eeb94626e172491a8526875d

* Revert D17549623: Add some missing constructors to IValue.

Test Plan: revert-hammer

Differential Revision:
D17549623

Original commit changeset: 8880c09d85a1

fbshipit-source-id: 002bb1173dbcf6a1d18e1c4b84b4365f145c38dd

* Hub improvements (#26723)

Summary:
Resubmit of https://github.com/pytorch/pytorch/pull/25980.
Our old serialization was in tar (like `resnet18-5c106cde.pth` was in this format) so let's only support automatically unzip if checkpoints are zipfiles.
We can still manage to get it work with tarfile, but let's delay it when there's an ask.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26723

Differential Revision: D17551795

Pulled By: ailzhang

fbshipit-source-id: 00b4e7621f1e753ca9aa07b1fe356278c6693a1e

* Upgrade sleef to v3.4.0. (#26749)

Summary:
This reset the sleef submodule to upstream, since everything else except
a small build sanity fix
<https://github.com/zdevito/sleef/commit/191f655caa25526ae226cf88dd2529265176014a>
has been merged to upstream. The new release includes an important fix
for trigonometric functions on MacOS, which would unblock https://github.com/pytorch/pytorch/issues/26431.

This should supersede https://github.com/pytorch/pytorch/issues/20536.

Close https://github.com/pytorch/pytorch/issues/20536.

cc colesbury resistor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26749

Differential Revision: D17572783

Pulled By: ezyang

fbshipit-source-id: dd7827e8c8500a0050e3e318d184134c792d3ecc

* Updating submodules

Summary:
GitHub commits:

https://github.com/facebook/litho/commit/5096b0ae1f5ef28bc0b948e260eb512626c6fea9
https://github.com/facebook/proxygen/commit/ecd6c10ea3df82cb0d221798150a0cf1f07315c3
https://github.com/facebookincubator/mvfst/commit/67abe5d0aaf42659358fa1d96a4159e5832f9c70
https://github.com/facebookincubator/profilo/commit/90580f7e064c25bac9c0a1f59afb4da55f46d3cd
https://github.com/facebookresearch/pytorch-biggraph/commit/7f98961c7b70bda098c371a8b1395f0d6ff5434c
https://github.com/pytorch/fbgemm/commit/f8da6e6e36b5970e95bf150521a1b3af844638be

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 60ce61531cf6d4ac8616b3986b40b423abc7de15

* move more functions to InsertObserversHelper (#26773)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26773

att

Test Plan:
ci

Imported from OSS

Differential Revision: D17563673

fbshipit-source-id: 5a6fb4238b6886695c2d25db11fec22ebe5d0c08

* autodiff changes to enable profiling

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25397

Differential Revision: D17565747

Pulled By: Krovatkin

fbshipit-source-id: b772437d9e02df99db6e662cb7d1227359959bed

* Lets generic tests use multiple devices (#26594)

Summary:
- Separates device type from default (test) device
- Adds multidevice decorator
- Updates generic tests to use multidevice decorator where applicable

TorchXLA wants to change the default test device based on the test environment. Separating the device type and the default (test) device enables that functionality.

Additionally, many existing tests only run on multiple devices and are required, as a consequence, to make CUDA-specific API calls. The multidevice decorator simplifies the existing code and limits the CUDA dependency. Eventually this should let us run multidevice tests on multiple device types.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26594

Test Plan: tests were manually run with the CUDA test device set to 'cuda:1'.

Differential Revision: D17568910

Pulled By: mruberry

fbshipit-source-id: c442f748a31a970be8c21deb12a67c3b315c1128

* quantized_tensor tests (#26784)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26784

Previously we are using empty to generate test tensors, this PR changes the test tensors to use
randint so that we can test things properly
Also added a set_sizes_and_strides and removed .contiguous() in int_repr function to preserve the
original size and strides

Test Plan:
python test/test_quantized_tensor.py

Imported from OSS

Differential Revision: D17566575

fbshipit-source-id: 89379fb09b500dd156118e6ee0709df59f169990

* Refactor checked_tensor_unwrap to take DeviceType instead of Backend (#26290)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26290

Fixes #26206

Happily, I also can delete the dead Dense***Tensor cases, since they
are for the defunct THS backend.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17404368

Pulled By: ezyang

fbshipit-source-id: 79d71ad40c4325c9f52d2825aceb65074d2e20e8

* Use Caffe2's implementation of grouped depthwise 3x3 convolutions (#26556)

Summary:
Use Caffe2's implementation of grouped depthwise 3x3 convolutions instead of NNPACK.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26556

Test Plan:
_Correctness_ - Manually check the results using the --print-output flag on speed_benchmark_torch.

_Performance_ - All measurements below on Pixel 2

**Before**:

Multi-threaded:

> adb shell "./speed_benchmark_torch \
>  --model=./xraymobilev3.pt \
>  --input_dims="1,3,224,224" \
>  --input_type=float --warmup=5 \
>  --iter=25"
>
> Main run finished. Milliseconds per iter: **876.002**. Iters per second: 1.14155

Single-threaded:

> adb shell "./speed_benchmark_torch \
>  --model=./xraymobilev3.pt \
>  --input_dims="1,3,224,224" \
>  --input_type=float --warmup=5 \
>  --iter=25
>  --caffe2_threadpool_force_inline=true"
>
> Main run finished. Milliseconds per iter: **459.409**. Iters per second: 2.17671

**After**:

Multi-threaded:

> adb shell "./speed_benchmark_torch \
>  --model=./xraymobilev3.pt \
>  --input_dims="1,3,224,224" \
>  --input_type=float --warmup=5 \
>  --iter=25
>
> Main run finished. Milliseconds per iter: **285.68**. Iters per second: 3.50042

Single-threaded:

> adb shell "./speed_benchmark_torch \
>  --model=./xraymobilev3.pt \
>  --input_dims="1,3,224,224" \
>  --input_type=float --warmup=5 \
>  --iter=25
>  --caffe2_threadpool_force_inline=true"
> Main run finished. Milliseconds per iter: **278.999**. Iters per second: 3.58425
>

Differential Revision: D17533311

Pulled By: AshkanAliabadi

fbshipit-source-id: 9ee8acf02b8e3e8da1922b188ed0a6459a90b67d

* Port CUDA implementation of expm1 to ATen (#26598)

Summary:
Closes https://github.com/pytorch/pytorch/issues/24562
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26598

Differential Revision: D17531503

Pulled By: VitalyFedyunin

fbshipit-source-id: 8119c796e142f073ad4e274dda1ad99344215c48

* add function to get NCCL version for logging (#26583)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26583

Adds a function that uses the nccl api to get the version code. Converts it to a readable version. Will be
used for logging NCCL version in exception messages.

Test Plan: See above

Differential Revision: D17473200

fbshipit-source-id: 4881ed5221b397f2f967262668c2b376b6bf3c64

* Remove one unnecessary copy of the output during the type promotion. (#26816)

Summary:
Output tensors doesn't need to be copied during type promotion as we are not using any data from them. Simple allocation gives steady 10% performance gain.

BEFORE

```
In [1]: x = torch.randn(64, 2048, 7,7)
In [2]: y = torch.randn(64, 2048, 7,7, dtype=torch.float64)
In [3]: timeit x.add_(y)
77.3 ms ± 257 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```

AFTER

```
In [1]: x = torch.randn(64, 2048, 7,7)
In [2]: y = torch.randn(64, 2048, 7,7, dtype=torch.float64)
In [3]: timeit x.add_(y)
68.2 ms ± 713 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26816

Differential Revision: D17573455

Pulled By: VitalyFedyunin

fbshipit-source-id: 47286abce5e7e665eb61e46ae358c896e945bef2

* Prepare for Cocoapods 1.3 Release (#26751)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26751

### Summary

We're going to use the AWS s3 bucket - `s3://ossci-ios` to store the release binary. To release the cocoapods, we can follow the steps below:

1.  Open a fake PR to trigger the CI job that pulls the code from the 1.3.0 tag branch and does the building and uploading.
2. Verify the binary locally  - Run tests on both arm64 and simulator
3. Publish the cocoapods officially

### Test plan

- podspec lint command succeeds
    - `pod spec lint --verbose --allow-warnings --no-clean --use-libraries --skip-import-validation`

Test Plan: Imported from OSS

Differential Revision: D17577131

Pulled By: xta0

fbshipit-source-id: 55fee918ecc5c4e0b6d714488a12351b4370afac

* Validate Docker version in CI. (#26496)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26496

It is a BAD BAD idea to deploy Docker versions which are not deployed
(per ossci-job-dsl) because those versions will get GC'ed after two
weeks.  At the moment, there is no verification that your Docker version
is deployed.  This adds an Azure job to check this.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Differential Revision: D17575100

Pulled By: ezyang

fbshipit-source-id: 8df2331c6e6899c585bc2917b55e8955908b0e4a

* Fix CI docker builds (#26704)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26704

nccl 2.1.15 isn't available for CUDA 10.1 and 2.4.8 isn't available for cuda 9.1 :(

ghstack-source-id: 90714191

Test Plan: build docker images on Jenkins

Differential Revision: D17543120

fbshipit-source-id: 882c5a005a9a3ef78f9209dea9dcec1782060b25

* Export baddbmm (#25738)

Summary:
Added ONNX export for baddbmm in opset9
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25738

Reviewed By: hl475

Differential Revision: D17565828

Pulled By: houseroad

fbshipit-source-id: 85f605a7b3fa4783ef4f6ced86223133c85062d5

* Fix Future default constructor missing for ParallelNative

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26739

Test Plan: Imported from OSS

Differential Revision: D17577908

Pulled By: bwasti

fbshipit-source-id: a09cdbd8619a926e93418a692ce859d4157f2da8

* Quantized Interpolate Kernel(upsample_bilinear2d) (#26631)

Summary:
We implement the quantized upsample_bilinear2d case for interpolate kernel in this PR.

For nhwc performance improvement:
import torch, time

for dtype in [torch.qint8, torch.quint8, torch.qint32]:
    print('****', str(dtype), '*****')
    x = torch.rand(1, 56, 56, 256)

    q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype)
    q_x = q_x.permute([0, 3, 1, 2])

    x = x.permute([0, 3, 1, 2])

    NITER = 100

    s = time.time()
    for i in range(NITER):
        float_out = torch.nn.functional.interpolate(x, size=5, scale_factor=None, mode="bilinear", align_corners=True)
    time_per_iter_float = (time.time() - s) / NITER

    s = time.time()
    for i in range(NITER):
        quant_out = torch.nn.quantized.functional.interpolate(q_x, size=5, scale_factor=None, mode="bilinear", align_corners=True)
    time_per_iter_quant = (time.time() - s) / NITER

    ref_quantized = torch.quantize_per_tensor(float_out, 0.5, 1, dtype)
    #  torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize())

    print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t')
    print(time_per_iter_float * 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t')

    bytes_float = (x.numel() + float_out.numel()) * x.element_size()
    bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size()

    float_bw_gbps = bytes_float / time_per_iter_float / 1e9
    quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9

    print('GB/s float', 'GB/s quant', sep='\t')
    print(float_bw_gbps, quant_bw_gbps, sep='\t')

===========without nhwc handling===========
**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
1.999044418334961       2.5860953330993652      1.2936657681940702
GB/s float      GB/s quant
1.6192056416115257      0.3129103516188541
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.02730655670166        2.6061582565307617      1.2855274639721328
GB/s float      GB/s quant
1.596632728927902       0.3105014816242217
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.0180463790893555      2.4047350883483887      1.1916153728010588
GB/s float      GB/s quant
1.603959172365819       1.3460376636426636

===========with nhwc handling===========

**** torch.qint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.0913314819335938      0.09696483612060547     0.04636512047863123
GB/s float      GB/s quant
1.5477527249803915      8.345458337015
**** torch.quint8 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.1065664291381836      0.09959936141967773     0.04728042754408879
GB/s float      GB/s quant
1.5365591871338384      8.124710725706763
**** torch.qint32 *****
time/iter ms (float)    time/iter ms (quant)    quant/float
2.044203281402588       0.6003522872924805      0.29368521846837126
GB/s float      GB/s quant
1.5834354779917448      5.391607675216635
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26631

Differential Revision: D17521498

Pulled By: llyfacebook

fbshipit-source-id: 385ae0f77777cd8bee385cafb80e492127b7d103

* Typevar matching fix + implicit conversions from Scalar to int/float (#26453)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26453

Previously, schema matching would incorrectly widen typevar bindings
when later occurrences were supertypes of earlier ones. This allowed
callsites like `floatlist.append(tensor.item())` to pass the typechecker,
causing a runtime assert (issue #24856).

An earlier, reverted fix (#25136) insisted on strict equality across all
occurrences of a typevar, necessitating explicit casts around Scalar-typed
arguments to int- or float-typed parameters, like `tensor.item()` above.
This was per the original type system design, but turned out to break
existing user code that relied on the de facto dynamic downcast. (The
error required a specialized list representation.)

The current fix includes the prevention of typevar widening, but
adds logic to insert implicit conversions from Scalar to float or int
as needed to satisfy a matched schema.

Test Plan: Imported from OSS

Differential Revision: D17470598

Pulled By: bhosmer

fbshipit-source-id: d260dbf3cd78b9c2f2229bc61afc84e1910b5659

* Improve C++ maxpool and avgpool (#26521)

Summary:
This PR makes the following improvements:
1. Add `forward_with_indices` method to all C++ MaxPool modules, to return the max indices along with the outputs. (We can't make two `forward` methods that return different types based on input, because that will break the type deduction of `torch::detail::return_type_of_forward_t`)
2. Add `max_poolNd_with_indices` to `torch::nn::functional`, to be used when indices of the max values are needed. (We can't merge this with `torch::nn::functional::max_poolNd` because the return type of `max_poolNd` has to be defined statically).
3. Improve `pretty_print` of C++ MaxPoolNd and AvgPoolNd modules to match the Python `extra_repr`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26521

Differential Revision: D17507358

Pulled By: yf225

fbshipit-source-id: b6c0e2b27b38378cdc0c75f4bfc797b3c6b17cd9

* Revert D17565828: [pytorch][PR] [ONNX] Export baddbmm

Test Plan: revert-hammer

Differential Revision:
D17565828

Original commit changeset: 85f605a7b3fa

fbshipit-source-id: 7705325087d83362f71a717be880a13e9f575b37

* Cuda101 upgrade (#26823)

Summary:
test run: https://github.com/pytorch/pytorch/issues/26732
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26823

Reviewed By: soumith

Differential Revision: D17576095

Pulled By: mingbowan

fbshipit-source-id: 269cf443aea18b47bbee63996d035bc5bcd2726b

* Convert TensorIterator to use function_ref, a lightweight alternative to std::function. (#26592)

Summary:
function_ref is pulled over from LLVM.  It is to callables what StringRef is to strings.
This allows it to be substantially lighter weight, particularly in code size.  That comes
at the cost of not being usable in situations where the callable's lifetime is shorter
than the function_ref.  This means it is suitable for callback-like scenarios, but not
for situations where the callable needs to be stored.  In converting TensorIterator,
I only encountered one situation that required refactoring to comply with function_ref's
constraints.

In my local Release build, this reduces the size of libtorch by 4MB, from 70MB->66MB.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26592

Differential Revision: D17516202

fbshipit-source-id: 267476891f767f4827a4d38149f70e5035c56c48

* Revert D17473200: [pytorch][distributed] add function to get NCCL version for logging

Test Plan: revert-hammer

Differential Revision:
D17473200

Original commit changeset: 4881ed5221b3

fbshipit-source-id: c5635ce89de1644d2135b657427cbd0c3af83576

* Named tensor support for: all, any, bitwise_not, cumprod, cumsum, and more (#26815)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26815

This PR adds named tensor support for:
- any, all, `bitwise_not(_)`, cumprod, cumsum, `logical_not`

In addition, it adds smoke tests for a variety of tensor attributes and
fns:
- is_shared, is_signed
- retain_grad, register_hook

Test Plan: - [namedtensor ci]

Differential Revision: D17575905

Pulled By: zou3519

fbshipit-source-id: 37bfa327e68112c5bf0f6bf1f467a527f50fa1c4

* torch.load default encoding change to 'utf-8' (#26421)

Summary:
Default encoding when using torch.load to 'utf-8'

This commit provides changes for cases where user tries to torch.load
a pickled module with non-ASCII characters in the docstring as
discussed in https://github.com/pytorch/pytorch/issues/21743. The default encoding was changed from 'ascii'
to 'utf-8'. Documentation for `torch.load` was updated and two tests
(loading py2 unicode module with unicode in it; error throwing when
user explicitly sets wrong encoding) were written.

~~This commit provides changes for better error handling in cases
where user tries to `torch.load` a pickled module with non-ASCII
characters in the docstring as discussed in https://github.com/pytorch/pytorch/issues/21743.~~

Ping ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26421

Differential Revision: D17581633

Pulled By: yf225

fbshipit-source-id: f8e77dcf7907092771149aad8ede6cfb73c21620

* fix to operate on cuda kernel with clang and libc++ (#25553)

Summary:
We find a bug about `std::tuple` with nvcc.

In C++11, `std::tuple` constructor is constexpr in libstdc++, but is not constexpr in libc++.

https://github.com/pytorch/pytorch/blob/c36b77fcdad3d54227cf0fd51693eb57035002c0/aten/src/ATen/native/cuda/Loops.cuh#L109-L111

The lines have occurred crashes in CUDA with a message `scan failed with synchronize`. It is a error message of cuda initialization.

The purpose of this PR is fixed for loop in nvcc and libc++ by not using `std::tuple`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25553

Differential Revision: D17582118

Pulled By: yf225

fbshipit-source-id: d6f62ed46c2415b48eb49f8a051cf3c0e7cb23ce

* Do not call cpuinfo_initialize() on other than x86 arch. (#26265)

Summary:
cpuinfo_initialize() was not implemented for s390 arch.
cpuinfo calls are x86 specific to determine vector extensions AVX, AVX512 etc.
Without this patch an unnecessary error log is printed in s390 arch:
Error in cpuinfo: processor architecture is not supported in cpuinfo
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26265

Differential Revision: D17452301

Pulled By: izdeby

fbshipit-source-id: 9ca485550385c26dec18aac5953c887f1ffbfb7a

* support iterables, rangevalue in list comprehensions (#26768)

Summary:
Support IterableValue expressions and rangevalue in list comprehensions. Just as with supporting list comprehensions where the expression changes the input list types, we need to correctly type the list we create and it works.

Fixes https://github.com/pytorch/pytorch/issues/26693
Fixes https://github.com/pytorch/pytorch/issues/22483
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26768

Differential Revision: D17562762

Pulled By: eellison

fbshipit-source-id: 7ce8bf8605758dfd99057bc0376b4b724c4f9251

* Fix CUDA named tensor `copy_` (#26829)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26829

The TensorIterator loop for `copy_` uses operations that are currently
unsupported by named tensors. The solution is to wrap `copy_` in a
function that does the name propagation and ignore names when running
the implementation of `copy_`. There is no test case because I'm not
sure how to trigger the incorrect behavior, but there is definitely code
in CUDA copy that doesn't support named tensors (expand_as isn't
supported):

https://github.com/pytorch/pytorch/blob/aaf30cdf36839bc3f21b1622fb91ff3e2983e8ea/aten/src/ATen/native/cuda/Copy.cu#L141-L148

Test Plan: - [namedtensor ci]

Differential Revision: D17577310

Pulled By: zou3519

fbshipit-source-id: e11c52243800e1331fad738084304badcfd51ae2

* Highlighting in the doc that square root comes before adding epsilon

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26735

Test Plan: Imported from OSS

Differential Revision: D17558505

Pulled By: vincentqb

fbshipit-source-id: 36449c501f3ab3bc7cadd1f580258904b39369d4

* Bytecode export flow (#25187)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25187

The bytecode export flow: dump the bytecode format for the light weighted interpreter.
* The bytecode is generated without input spec optimization. It would be more generic (input independent) with no obvious performance degradation (to be tested).
* Main API: torch::jit::script::Module::save(filename, extra_files, bool *bytecode_format* = false).
* Both bytecode and module object are exported in pickle format.
    * The module object (in data.pkl) is the same as the original JIT model.
    * The serializer is dependent on pickle only (no protobuf or Json).
    * The major functionality is forked in ScriptModuleSerializer2::serialize().
    * The test loader is test_bc_export.cpp.
* Simple APIs are added in Code and its implementation to get necessary information (instructions, operators and constants).
* Since there's no dependency on graph/node, GetAttr is promoted from an operator to first-class instruction (https://github.com/pytorch/pytorch/pull/25151) .
* Some definitions (instructions, writeArchive, etc) that are shared by full JIT and bytecode are pulled out of the local namespace (https://github.com/pytorch/pytorch/pull/25148).

The output layout looks like:

* folders of methods.
    * In each method folder (for example, forward/):
        * bytecode.pkl: instructions and operators
        * constants{.pkl,/}: constant list in constants.pkl. If there are tensors in constants, the binary tensor files in constants/ folder.
* data{.pkl,/}: the module object, with binary tensor files in data/ folder. The same as in torchscript.

Test Plan: Imported from OSS

Differential Revision: D17076411

fbshipit-source-id: 46eb298e7320d1e585b0101effc0fcfd09219046

* Move the CUDA implementation of log to ATen. (#26494)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26494

Close #24586

Test Plan: Imported from OSS

Differential Revision: D17572497

Pulled By: VitalyFedyunin

fbshipit-source-id: e1bcd33021464eaa4affd4c6d3283c8403069945

* enable double backward for non-cudnn LSTM and GRU (#26660)

Summary:
An attempt to enable double backward for non-cudnn LSTM and GRU (see https://github.com/pytorch/pytorch/issues/25315, https://github.com/pytorch/pytorch/issues/20449). RNN works already because it does not rely on fused kernels.
This does not implement double backward function itself, because that is pretty hard to spell out. Instead, it implements backward using differentiable operations, so that double backward can be done automatically.
The good: seems to work, no effect on performance on the usual case without double backward. because fused lstm backward is used.
The bad: Performance of backward and, especially, double backward, is pretty bad. Scripting would still be a preferred way if we want a performant solution. Performance and/or memory use can be slightly improved if in-place variants can be used for sigmoid_backward and tanh_backward to avoid cat in the end, but I'm not yet sure it's possible, and in any case it is only slight improvement.
The ugly: I could not figure out a way to reuse workspace that contains the sum of the gates with the applied sigmoid and tanh operations, so that's probably another perf and memory hit.
cc soumith, albanD. If you think this approach is viable, I can extend to GRU and RNN.
Thanks to mcarilli whose approach to double backward in weight norm I copied.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26660

Test Plan: added tests to check gradgrad for GRU and LSTM with cudnn disabled.

Differential Revision: D17581489

Pulled By: ngimel

fbshipit-source-id: efd204289e9a0e94d94896a0b3bff5cf6246cafa

* Migrate multinomial from the TH to Aten (CUDA) (#26481)

Summary:
https://github.com/pytorch/pytorch/issues/24604
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26481

Differential Revision: D17489859

Pulled By: ifedan

fbshipit-source-id: 0702044c7c0f78e5e30826e8a5a83da27156bdb3

* QEngine::QNNPACK enabled, module.eval()

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26855

Test Plan: Imported from OSS

Differential Revision: D17589837

Pulled By: IvanKobzarev

fbshipit-source-id: 0084538e9b9d760a8728cdcd5723fc7fae5838c7

* Use optimized_graph in graph_executor.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26705

Test Plan: Imported from OSS

Differential Revision: D17543281

Pulled By: ZolotukhinM

fbshipit-source-id: 91c40559aac6f2a1f77060fa28c33725a2b8e5f9

* Remove convert_to_ssa argument from runCleanupPasses - it is only used in one place.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26703

Test Plan: Imported from OSS

Differential Revision: D17543131

Pulled By: ZolotukhinM

fbshipit-source-id: c4a209c55ac76d8472e64af79f76e9a61fd2a941

* Throw if someone tries to torch.save() quantized modules (#26828)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26828

Pickle serialization for quantized modules is currently broken by https://github.com/pytorch/pytorch/issues/24045, so let's be loud and fail if the user tries to do it

Test Plan: Imported from OSS

Differential Revision: D17579127

Pulled By: jamesr66a

fbshipit-source-id: 3deccac7e4590c6f648f22bb79c57badf3bf0487

* Fix broken failure messages for OverloadedMethodValue

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26846

Test Plan: Imported from OSS

Differential Revision: D17587050

Pulled By: jamesr66a

fbshipit-source-id: e5f3ea05b496afae15994b539f018ed0499ca62b

* Re-write of tensor-scalar quantized add

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26766

Test Plan: Imported from OSS

Differential Revision: D17587105

Pulled By: jamesr66a

fbshipit-source-id: 4da6ea98a4c5cc36fd191d9845c1ef409efce464

* Try to disable annoying hypothesis warnings again (#26853)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26853

This is the same as https://github.com/pytorch/pytorch/pull/25188 but we add a version check for if the hypothesis version is too old

Test Plan: Imported from OSS

Differential Revision: D17589086

Pulled By: jamesr66a

fbshipit-source-id: b968965719593ff989d612384e00dfb823cf0a73

* Remove three unused declaration. (#26699)

Summary:
`frac()` in `Vec256<int{16,32,64}_t>` is not overridden.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26699

Differential Revision: D17549502

Pulled By: soumith

fbshipit-source-id: 87c65286032bfc88c447ec4eef1e3ebc73da5d27

* Fix building with PARALLEL_BACKEND=NATIVE_TBB (#26742)

Summary:
Fixing https://github.com/pytorch/pytorch/issues/26721
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26742

Test Plan:
```
export USE_OPENMP=0
export USE_TBB=1
export BLAS=MKL
export MKL_THREADING=TBB
export MKLDNN_THREADING=TBB
export PARALLEL_BACKEND=NATIVE_TBB
export USE_CUDA=0
python setup.py build
```

Reviewed By: dskhudia

Differential Revision: D17586233

Pulled By: ilia-cher

fbshipit-source-id: 8e8befa6aa776b8c2b27bb4b79a3bff33dbcba7e

* Remove unnecessary functions and cleanup code in quantization.cpp.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26852

Test Plan: Imported from OSS

Differential Revision: D17587742

Pulled By: ZolotukhinM

fbshipit-source-id: f345ea4d524fde9741d6629dec1ea8ab870e49a5

* Updating submodules

Summary:
GitHub commits:

https://github.com/pytorch/fbgemm/commit/f767351c4b85cb29f6ea07d1a3bc27d62cca5150

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: d0bfc9e5e62669ada8d56b853490a373eb8ba2f7

* Improvements to GuardElimination and InsertBailouts

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25430

Differential Revision: D17584722

Pulled By: Krovatkin

fbshipit-source-id: 9db099b904d71572c1bf3aef5419d38435cecbb5

* add mobile friendly at:parallel_for backend

Summary:
This diff implemented at::parallel_for()/parallel_reduce() and other
ATen/Parallel.h APIs for mobile using caffe2::ThreadPool.

caffe2::ThreadPool doesn't support submitting individual tasks
separately and running them in parallel - all tasks need to be submit in
one batch which will lock the thread pool until all of them finish - as a
result we didn't wrap caffe2::ThreadPool with TaskThreadPoolBase interface
and reuse at::parallel_for() implementation in ParallelNative.h. Because
of this constraint, intraop_launch() / intraop_launch_future() are not
supported yet.

This diff doesn't touch inter-ops pool - it's still default native c10
thread pool. Will work on it when it's widely used.

Test Plan: - This is early draft to receive feedback. Will do more thorough tests.

Differential Revision: D17543412

Pulled By: ljk53

fbshipit-source-id: 53a3259409c7207d837b9135d87d8daa6ad15e30

* remove backward functions from jit-op-registry for mobile build (#26851)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26851

Add codegen option to remove backward ops from jit-op-registry as they are not
likely to be used for inference only mobile build.

Measured ARM-v7 AAR build size change: 5,804,182 -> 5,331,219.

Test Plan: - build and integrate with demo app;

Differential Revision: D17587422

Pulled By: ljk53

fbshipit-source-id: 08c0fc7a710698a0d4baaf16bbb73cb812b1126a

* Enable batch_size = 0 support in DNNLOWP Concat operator (#26849)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26849

We were having division-by-zero errors when one of the input tensor dimension is 0 . Examples: P111481720 and P111481374
This diff adds unit tests for empty input tensors and fixes division-by-zero errors in the partition function.

Test Plan: buck test caffe2/caffe2/quantization/server:concat_dnnlowp_op_test -- --stress-runs=100

Reviewed By: jianyuh

Differential Revision: D17574566

fbs…
@facebook-github-bot facebook-github-bot deleted the gh/nairbv/1/head branch October 28, 2019 22:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged module: autograd Related to torch.autograd, and the autograd engine in general module: docs Related to our documentation, both in docs/ and docblocks module: internals Related to internal abstractions in c10 and ATen module: pybind Related to our Python bindings / interactions with other Python libraries oncall: jit Add this issue/PR to JIT oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants