Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] Use MIOpen for transpose convolutions #26172

Closed

Conversation

iotamudelta
Copy link
Contributor

Provides significant performance uplift where used.

Provides significant performance uplift.
Copy link
Contributor

@bddppq bddppq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow nice

any numbers of typical perf improvements?

@bddppq
Copy link
Contributor

bddppq commented Sep 13, 2019

cc @xw285cornell

@iotamudelta
Copy link
Contributor Author

@bddppq we've observed 2.5x

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Sep 15, 2019
Summary:
Provides significant performance uplift where used.
Pull Request resolved: pytorch/pytorch#26172

Differential Revision: D17374862

Pulled By: bddppq

fbshipit-source-id: 85d2df3c67b8935bc54f3a81a912a25c0102743a
@facebook-github-bot
Copy link
Contributor

@bddppq merged this pull request in e86d99a.

@iotamudelta iotamudelta deleted the transpose_convolutions_miopen branch September 16, 2019 15:15
rohithkrn added a commit to ROCm/pytorch that referenced this pull request Sep 21, 2019
* C++ Average Pool Module (#25800)

Summary:
This PR adds Average Pool module to C++ front-end.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25800

Differential Revision: D17318094

Pulled By: yf225

fbshipit-source-id: c914c0e802bbe5f1d1f0a21a669c28bc956899db

* Better error messages in C2 ONNX backend (#25809)

Summary:
Just a tiny fix to make debugging easier (output errors to stderr and include in the exception message)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25809

Reviewed By: zrphercule

Differential Revision: D17329957

Pulled By: houseroad

fbshipit-source-id: 0d73dd9f62c735fbc5096e6a7c0e5f58e4cd90ae

* Add new API for Fully Connected and Convolution Operators in QNNPACK (#25862)

Summary:
This change adds a new prepack and run function for FC and Convolution operators in QNNPACK.
The new functions added are `PackBMatrix`, `qnnpackLinear`, `PrePackConvWeights` and `qnnpackConv`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25862

Test Plan:
QNNPACK unit tests
fully-connected-test
convolution-test

Differential Revision: D17299260

Pulled By: supriyar

fbshipit-source-id: fdc4e2d5f1232675acd153f3efb9d17ed8628a54

* Enable more mGPU tests (#26055)

Summary:
Enable mGPU tests that pass on ROCm as of 2.7.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26055

Differential Revision: D17331484

Pulled By: bddppq

fbshipit-source-id: 51f956a84a6c14a1a41473d322950994fa29c25c

* remove verbose in pytorch_ci hypothesis profile (#26075)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26075

att, remove verbose argument to reduce noice in the logs

Test Plan:
ci

Imported from OSS

Differential Revision: D17335935

fbshipit-source-id: 2e4289e838bf4489dcad8d5533353eebcff0d481

* TorchScript Serialization for dynamic LSTM module

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25877

Test Plan: Imported from OSS

Reviewed By: jianyuh

Differential Revision: D17275746

Pulled By: jamesr66a

fbshipit-source-id: db2f38ddd99f02ccb4fb754fa1c1e6cad4425fa8

* Upgrade the naming for fbgemm quantized op (#26064)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26064

Just changing the names after https://github.com/pytorch/pytorch/pull/25678.
ghstack-source-id: 89944542

Test Plan: CI

Differential Revision: D17332068

fbshipit-source-id: 5e9febed7a2fcd10d44273e55643b277d33a3ad7

* Use BytesIO instead of tempfile (#25976)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25976

As recommended in https://github.com/pytorch/pytorch/pull/25877/files#r322956051:

> We should move more of these toward using BytesIO. Using files in tests is generally considered bad practice because it introduces syscalls and dependencies on the execution environment, and thus can cause test flakiness/instability.
ghstack-source-id: 89929947

Test Plan: CI

Differential Revision: D17310441

fbshipit-source-id: ba97cce4224225df45ff44062f1bc8ebefb25922

* Revert "TorchScript Serialization for dynamic LSTM module" (#26079)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26079

This reverts commit e3039612d851d0fbd337546c8debc27ec7cfc4e4.

Test Plan: Imported from OSS

Differential Revision: D17337585

Pulled By: jamesr66a

fbshipit-source-id: 4b93a4c5ca2fe491d609da889a42d22be8e52889

* Add Runtime flag for quantized backend. (#25680)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25680

Add a runtime flag to choose between FBGEMM and QNNPACK when compiled with both.

The flag can be set by using torch.backends.quantized.engine = torch.fbgemm/torch.qnnpack or ctx::setPreferredQuantizedEngine(at::QEngine)
ghstack-source-id: 89935643

Test Plan: Verified torch.backends.quantized.engine works

Differential Revision: D17198233

fbshipit-source-id: e5449d06f4136385e0e6d18bd4237f8654a61672

* Dynamic registration of RPC backends (#25734)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25734

[pytorch] Dynamic registration of RPC backends
Allow non-pg rpc backends to be plugged in as a backend.
ghstack-source-id: 89938296

Differential Revision: D17183789

fbshipit-source-id: 885fed12d80b82b60f9a125f78302a161e708089

* Make regular softmax warp size aware (#25956)

Summary:
Enable one unit test that passes now.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25956

Differential Revision: D17298150

Pulled By: bddppq

fbshipit-source-id: 8763e71ad7ef80be915fe93a3471b29f27f3f0a4

* Move NamedTensorMetaInterface definitions to TensorImpl.h (#26030)

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26030

Test Plan:
- [namedtensor ci]

Pull Request resolved: https://github.com/pytorch/pytorch/pull/26030

Differential Revision: D17322383

Pulled By: zou3519

fbshipit-source-id: d5b914d646b48a6f4e0104aceb435e694b72bd96

* Experimental warning for named tensors (#26050)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26050

Throws a warning once when someone attempts to attach names to a tensor.
This is guaranteed to happen at the callsite `set_named_tensor_meta`.

Test Plan: - run tests [namedtensor ci]

Differential Revision: D17331634

Pulled By: zou3519

fbshipit-source-id: 44f5e5c95acd9c7ba543c1210a3b1314aab348f0

* print source code when a function is executed (#25868)

Summary:
While this isn't ideal as it might print out the same source every time a function is run; it's still easier to go and tweak python code to reduce loop counts, than to insert `std::cout` and recompile cpp code.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25868

Differential Revision: D17318386

Pulled By: Krovatkin

fbshipit-source-id: 928ba6543204042924ab41a724635594709630de

* Disable test_cuda.test_stream_event_nogil on ROCm (#26087)

Summary:
Was recently enabled in https://github.com/pytorch/pytorch/pull/26055, it's flaky on master:

https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/37575
https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/37577
```
05:39:35 test_stream_event_nogil (__main__.TestCuda) ... Exception in thread Thread-3:
05:39:40 Traceback (most recent call last):
05:39:40   File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
05:39:40     self.run()
05:39:40   File "/usr/lib/python2.7/threading.py", line 754, in run
05:39:40     self.__target(*self.__args, **self.__kwargs)
05:39:40   File "test_cuda.py", line 1894, in _test_stream_event_nogil
05:39:40     c2p.put(sync_func(self, TestCuda.FIFTY_MIL_CYCLES))
05:39:40   File "test_cuda.py", line 1882, in _event_wait
05:39:40     self.assertTrue(s1.query())
05:39:40   File "/usr/lib/python2.7/unittest/case.py", line 422, in assertTrue
05:39:40     raise self.failureException(msg)
05:39:40 AssertionError: False is not true
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26087

Differential Revision: D17340891

Pulled By: bddppq

fbshipit-source-id: b2b70beb1b068db53197a5f9f6a80cb046e66ebd

* TorchScript Serialization for dynamic LSTM

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26084

Test Plan: Imported from OSS

Differential Revision: D17339315

Pulled By: jamesr66a

fbshipit-source-id: 03a2674edcf779becfe3b8ec96f1bae23c74b11c

* Automatic update of fbcode/onnx to 7988d8360b11e6003560076e9b1d4aa426db3244 (#25959)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25959

Previous import was 28ca699b69b5a31892619defca2391044a9a6052

Included changes:
- **[7988d836](https://github.com/onnx/onnx/commit/7988d836)**: Supporting negative axes for all existing onnx ops (#2281) <Negin Raoof>
- **[5ca0a09e](https://github.com/onnx/onnx/commit/5ca0a09e)**: Update managingexperimentalops.md (#1981) <Joseph Spisak>
- **[bc0495c1](https://github.com/onnx/onnx/commit/bc0495c1)**: Fix link to community docs in readme (#2261) <Prasanth Pulavarthi>
- **[2fdb3ef6](https://github.com/onnx/onnx/commit/2fdb3ef6)**: move map and sequence types to onnx domain, (#2244) <Ke Zhang>
- **[568b65aa](https://github.com/onnx/onnx/commit/568b65aa)**: Improve compatiblity with proto3 and enable reading attributes (#2288) <Dmitri Smirnov>
- **[1f350f2c](https://github.com/onnx/onnx/commit/1f350f2c)**: Remove type info for loop variadic input in Loop op used to compose the Range op (#2287) <Hariharan Seshadri>
- **[eb139446](https://github.com/onnx/onnx/commit/eb139446)**: Add Foundation WG to working-groups.md (#2276) <Ryan Loney>
- **[4eabc4b3](https://github.com/onnx/onnx/commit/4eabc4b3)**: Fix testdata model for CumSum. Add exclusive attribute. (#2271) <jignparm>
- **[1a62afdb](https://github.com/onnx/onnx/commit/1a62afdb)**: Support GatherND operator in ONNX (#2106) <Hariharan Seshadri>
- **[0e330e9d](https://github.com/onnx/onnx/commit/0e330e9d)**: Support ScatterND operator in ONNX (#2220) <Bowen Bao>
- **[733f7a6a](https://github.com/onnx/onnx/commit/733f7a6a)**: Add Det to ONNX (#2233) <Bowen Bao>
- **[52187738](https://github.com/onnx/onnx/commit/52187738)**: Update the description of nearest_mode of resize op (#2257) <daquexian>
- **[64b4b686](https://github.com/onnx/onnx/commit/64b4b686)**: Adding sparse tensor to ONNX (#2019) <G. Ramalingam>
- **[c8a8b7cc](https://github.com/onnx/onnx/commit/c8a8b7cc)**: Support Range operator in ONNX (#2242) <Hariharan Seshadri>
- **[44b0d6d5](https://github.com/onnx/onnx/commit/44b0d6d5)**: Update resize op (#2057) <daquexian>
- **[7d907964](https://github.com/onnx/onnx/commit/7d907964)**: Add function to fuse dynamic quantization graph into 1 node (#2187) <Ashwini Khade>
- **[36f8e6d9](https://github.com/onnx/onnx/commit/36f8e6d9)**: Update logo_request.md (#2231) <Prasanth Pulavarthi>
- **[4eb737c8](https://github.com/onnx/onnx/commit/4eb737c8)**: Update Clip in opset 11 to support min/max as inputs instead of attributes (#2096) <Bowen Bao>
- **[a25e1388](https://github.com/onnx/onnx/commit/a25e1388)**: Fix segfault in tile shape inference (#2221) <daquexian>
- **[2dc273c7](https://github.com/onnx/onnx/commit/2dc273c7)**: update onehot shape inference to reflect the spec for depth input (#2224) <Ashwini Khade>
- **[665211c1](https://github.com/onnx/onnx/commit/665211c1)**: Add GatherElements Op and Rename ScatterElements (#2143) <Lara Haidar>
- **[3ba2e31a](https://github.com/onnx/onnx/commit/3ba2e31a)**: Unique (#2141) <liqunfu>
- **[5a5588ad](https://github.com/onnx/onnx/commit/5a5588ad)**: Clarify dimension variable scoping (#2211) <G. Ramalingam>
- **[fabe39d5](https://github.com/onnx/onnx/commit/fabe39d5)**: Liqun/topk sort (#2126) <liqunfu>
- **[453aa644](https://github.com/onnx/onnx/commit/453aa644)**: Update document for NMS (#2193) <Hector Li>
- **[34e28ec2](https://github.com/onnx/onnx/commit/34e28ec2)**: Handle negative 'axis' value in Split type and shape inferencing (#2177) <Scott McKay>
- **[28ec4583](https://github.com/onnx/onnx/commit/28ec4583)**: depth to space shuffle order (#2163) <Negin Raoof>
- **[98f72629](https://github.com/onnx/onnx/commit/98f72629)**: minor updates to fix links in readme (#2189) <Prasanth Pulavarthi>
- **[321d1467](https://github.com/onnx/onnx/commit/321d1467)**: Add check to disallow squeezing input axes which are not 1 (#2204) <Ashwini Khade>
- **[573f0dc9](https://github.com/onnx/onnx/commit/573f0dc9)**: fix a bug in fun shape inference (#2188) <Tang, Cheng>
- **[36dc7110](https://github.com/onnx/onnx/commit/36dc7110)**: Clarify ambiguity in gather spec regarding indices expectation (#2202) <Ashwini Khade>
- **[a2449673](https://github.com/onnx/onnx/commit/a2449673)**: Fix some minor issues in IR.md and Versioning.md (#2108) <edgchen1>
- **[349aff69](https://github.com/onnx/onnx/commit/349aff69)**: Skip install typing package for python >=3.5 (#2199) <bddppq>

Test Plan: ci

Reviewed By: bddppq, benoitsteiner

Differential Revision: D17296390

fbshipit-source-id: 9f9f5ce85d9694128008d756c2ea393bd4e0cb71

* Skip test_triangular_solve_batched (#26108)

Summary:
cc: gchanan zou3519

I will look into why this is failing spuriously.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26108

Differential Revision: D17348399

Pulled By: zou3519

fbshipit-source-id: aed4ccfc3f106692d4e32acc029740309570b0c3

* Exposing Fused8BitRowwiseQuantizedToFloat in PyTorch (#26080)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26080

Will be used in c2 ctr_mbl_feed model to PyTorch conversion

Test Plan: Unit test

Reviewed By: yinghai

Differential Revision: D17337604

fbshipit-source-id: a90d9f5dc38301608d1562c6f2418e7f4616e753

* make sure all out stringstreams start out empty in jit_log.hpp

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25863

Differential Revision: D17347386

Pulled By: Krovatkin

fbshipit-source-id: a42cf56680a27bc3e50fd945ab372a409225b875

* tracing with an opt-in by file name (#25895)

Summary:
This basically works a simple filter as you suggested ZolotukhinM

`export PYTORCH_JIT_LOG_LEVEL=guard_elimination` will print all `GRAPH_DUMP` and `GRAPH_UPDATE` statements.
`export PYTORCH_JIT_LOG_LEVEL=>guard_elimination:>alias_analysis` will print all `GRAPH_DUMP`, `GRAPH_UPDATE` **and** `GRAPH_DEBUG` statements in `guard_elimination.cpp` **and** in `alias_analysis.cpp`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25895

Differential Revision: D17309090

Pulled By: Krovatkin

fbshipit-source-id: 8fa9e67cc9af566b084d66cc15223633fda08444

* Stop re-ordering TH(C)Blas arguments. (#25606)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25606

This just complicates the codegen for no benefit.

Test Plan: Imported from OSS

Differential Revision: D17172498

Pulled By: gchanan

fbshipit-source-id: d2f50e45400ac0336792422518e03dbae3a1bedc

* Kill TH(C)Blas kwarg_only declarations. (#25607)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25607

Since we don't generate these as end-user bindings, and we no longer reorder based on this property, we can just get rid of the property.

Test Plan: Imported from OSS

Differential Revision: D17172500

Pulled By: gchanan

fbshipit-source-id: f84fd8bb2b13598501897f56871b21339585d844

* simplify build_android_gradle.sh (#25897)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25897

It doesn't hurt to set all variables unconditionally.
And we can create link to lib directory instead of specific files - this
way it's easier to switch between dynamic/static library names.

Test Plan:
- check android gradle CI;
- use stack diff to check all 4 architectures on PR;

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25897

Differential Revision: D17307240

Pulled By: ljk53

fbshipit-source-id: c975085ddda852ef7da1c29935c2f6a28d797e5a

* change gradle build to use static libtorch + gc-sections (#25984)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25984

Link static libtorch libraries into pytorch.so (API library for android)
with "-Wl,--gc-sections" flag to remove unused symbols in libtorch.

Test Plan:
- full gradle CI with stacked PR;
- will check final artifacts.tgz size change;

Differential Revision: D17312859

Pulled By: ljk53

fbshipit-source-id: 99584d15922867a7b3c3d661ba238a6f99f43db5

* remove "build_deps" arg from setup.py command in (#26113)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26113

After https://github.com/pytorch/pytorch/pull/16914, passing in an
argument such as "build_deps" (i.e. python setup.py build_deps develop) is
invalid since it gets picked up as an invalid argument.
ghstack-source-id: 90003508

Test Plan:
Before, this script would execute "python setup.py build_deps
develop", which errored. Now it executes "python setup.py develop" without an
error. Verified by successfully running the script on devgpu. In setup.py,
there is already a `RUN_BUILD_DEPS = True` flag.

Differential Revision: D17350359

fbshipit-source-id: 91278c3e9d9f7c7ed8dea62380f18ba5887ab081

* Stop reordering TH random function arguments.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25608

Test Plan: Imported from OSS

Differential Revision: D17172494

Pulled By: gchanan

fbshipit-source-id: 5a46889cc040297231e2473ae5b2879b39f8d60a

* fix base_lr overridden in cyclic lr (#26105)

Summary:
base_lr parameter was being overridden by super `__init__`, see https://github.com/pytorch/pytorch/issues/21965.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26105

Reviewed By: yf225

Differential Revision: D17346724

Pulled By: vincentqb

fbshipit-source-id: 4b146bd64f4f385c0a9c4f4df8eb8991312fb15c

* Skip inserting duplicate observers (#25504)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25504

Skip inserting duplicate observers for values observed
in forward method of a child module or other methods in
the current module.

Test Plan:
python test/test_jit.py -- 'TestJit.insert_observers'
python test/test_jit.py -- 'TestJit.insert_observers_child_qconfig'
python test/test_jit.py -- 'TestJit.insert_observers_skip_values'

Imported from OSS

Differential Revision: D17208888

fbshipit-source-id: e04f1c22ab1c4f410933a17a3ef31acf5f217323

* Implementation of ConstantThenLinearWarmupLRPolicy and CompositeCyclicalLRPolicy (#25970)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25970

ConstantThenLinearWarmupLRPolicy:
* first use a constant warm up
* then ramp up to the fixed learning rate linearly

CompositeCyclicalLRPolicy:
* first use a constant warm up
* then ramp up to the fixed learning rate linearly
* then use cyclical learning rates for the rest of time

Pull Request resolved: https://our.intern.facebook.com/intern/opensource/shipit/preview/D17302632/

Test Plan:
* buck test
 * https://our.intern.facebook.com/intern/testinfra/testconsole/testrun/5910974518377039/
 * https://our.intern.facebook.com/intern/testinfra/testrun/1407375027118303
* checked the consistency of learning rates w.r.t. iterations with offline simulations n143987

Reviewed By: swatirallapalli

Differential Revision: D17302632

fbshipit-source-id: 1098d4dd9109a48932b76e36d78239e49f8077a1

* Fix build warning in vec256_qint.h

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26121

Test Plan: Imported from OSS

Differential Revision: D17351960

Pulled By: jamesr66a

fbshipit-source-id: 12389729fe5fb8d863cf47288920ea375a3e74ab

* Kill kwarg_only declarations in Declarations.cwrap. (#25609)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25609

They don't do anything anymore.

Test Plan: Imported from OSS

Differential Revision: D17172497

Pulled By: gchanan

fbshipit-source-id: 5cf7fdcf7d2da0054ac1bd7d8d2b70a2264b8c93

* Support quantizing any methods called (#25505)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25505

Support for quantizing all the methods called by forward method, including
child module methods and other methods in the current module

It relies on module level constant prop, we need to figure out a way to do constant prop
for these methods as well. We can either do constant prop in the module level or do constant
prop in the quantization function, but this will need some discussion.

Test Plan:
python test/test_jit.py 'TestJit.insert_quant_dequant'
python test/test_quantizer.py

Imported from OSS

Differential Revision: D17208887

fbshipit-source-id: 21749457b21b00a6edada290c26324e2fb210b10

* C++ unregister_module function for Module (#26088)

Summary:
This PR adds ```unregister_module``` to ```nn::Module``` and ```erase``` function to ```OrderedDict```.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26088

Differential Revision: D17360058

Pulled By: yf225

fbshipit-source-id: f1f375b4751317da85b8da1458e092fe2405ceec

* Port fuse_linear from pytorch/tvm (#25623)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25623

Port over fuse_linear pass from pytorch/tvm project, we'll need this
in backend specific quantization pass to match aten::linear and swap
it with quantized linear

Test Plan:
python test/test_jit.py 'TestJit.test_fuse_linear'

Imported from OSS

Differential Revision: D17208890

fbshipit-source-id: f4ff3889ae4525797d3b986f46ae37e50ea49116

* Add device check before accessing data_ptr in PackLayer (#26056)

Summary:
fixes https://github.com/pytorch/xla/issues/927
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26056

Differential Revision: D17331859

Pulled By: ailzhang

fbshipit-source-id: bdc334f03c8dcbb4ef4f5e059a63ef188a0b8b61

* Create TensorBoard test classes in all cases (#26005)

Summary:
To give better signal to the user, we will now always create the TensorBoard tests classes and just  disable tests if TensorBoard is not installed.

cc lanpa sanekmelnikov natalialunova pietern
[test macos]
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26005

Reviewed By: sanekmelnikov

Differential Revision: D17352430

Pulled By: orionr

fbshipit-source-id: 87a592064f4768ffded76a3d666a8e508a1ef164

* Automatic update of fbcode/onnx to 95252c2adec185e305e34486c6756ece9aa8f57f (#26137)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26137

Previous import was 7988d8360b11e6003560076e9b1d4aa426db3244

Included changes:
- **[95252c2a](https://github.com/onnx/onnx/commit/95252c2a)**: Fix shapeinference function (#2296) <jignparm>
- **[414285bb](https://github.com/onnx/onnx/commit/414285bb)**: fix the buffer overflow problem in shape inference logic of Squeeze op <Lu Fang>
- **[797cdd0f](https://github.com/onnx/onnx/commit/797cdd0f)**: Support for negative indices in 'Gather', 'GatherElements', 'ScatterElements', 'OneHot' (#2260) <Negin Raoof>
- **[7636978d](https://github.com/onnx/onnx/commit/7636978d)**: Fix collect_snippets warnings (#2277) <Lutz Roeder>
- **[fa70c33b](https://github.com/onnx/onnx/commit/fa70c33b)**: Update printable_graph in helper.py to output details of initializers that do not have matching graph inputs. (#2135) <Scott McKay>
- **[428d09b0](https://github.com/onnx/onnx/commit/428d09b0)**: test int64 input type for 'where' op (#2253) <Negin Raoof>

Test Plan: ci

Reviewed By: bddppq

Differential Revision: D17353795

fbshipit-source-id: 6d4f39754863a30f427f4512c7b228e45d3ce84f

* Add fusion for quantized linear (#25624)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25624

First fuse the splitted op into aten::linear and then fuse
`dequant - aten::linear - quant` into quantized linear op

Test Plan:
python test/test_jit.py 'TestJit.quant_fusion'

Imported from OSS

Differential Revision: D17208891

fbshipit-source-id: 864b19fabab2e8e6f8f8ad35eb3dbbf2d5fdb8c4

* Implement tensor.refine_names (#25842)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25842

`tensor.refine_names(*names)` takes `tensor` and attempts to name its
dimensions `names` out-of-place. If a dimension `i` already had a name,
then it cannot be changed (so tensor.names[i] must equal names[i]);
if the original dimension did not have a name, then the new name
(names[i]) can be anything.

`tensor.refine_names(*names)` also accepts a glob '*' that greedily selects
names from `tensor`. Here are some examples:

- `Tensor[None].refine_names('N') -> Tensor[N]`
- `Tensor[N].refine_names('N') -> Tensor[N]`
- `Tensor[N].refine_names('D') -> Error!`
- `Tensor[N].refine_names(None) -> Error!`
- `Tensor[None, None].refine_names('*', D) -> Tensor[None, D]`

Test Plan: - new tests [namedtensor ci]

Differential Revision: D17255548

Pulled By: zou3519

fbshipit-source-id: fdbdb3a12f24fbe37ce1e53ed09dc8a42589d928

* Implement tensor.align_as(other), change tensor.align_to(names) (#25843)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25843

`tensor.align_to(*names)` permutes the dimensions of `tensor` and adds
additional 1-sized dimensions such that the output tensor has dimensions
in the same order as `names`. All dimensions of `tensor` must be
present in `names`, in addition, this function requires that all dims of
`tensor` be named.

`tensor.align_as(other)` is equivalent to
`tensor.align_to(*other.names)`.

I'm planning on changing `torch.align_tensors(*tensors)` to align closer
to these semantics because there didn't seem to be a clear use case for the old
semantics that preserve unnamed dimensions. That will come in a future
change.

Test Plan: - new tests [namedtensor ci]

Differential Revision: D17255549

Pulled By: zou3519

fbshipit-source-id: 1e437ad81e9359b4d5bd0e7e64c3a1be441fc3e3

* C++ API parity: at::Tensor::data

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26008

Test Plan: Imported from OSS

Differential Revision: D17343488

Pulled By: pbelevich

fbshipit-source-id: b9ba5e26cad621a428a14292446d7fb5a6e5535d

* Fix bug with named tensors and (no) tracer support (#26106)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26106

Previously, in the named tensors build, an operator is marked as
non-traceable if ANY of its overloads are named tensor overloads. This
breaks the tracer for things like torch.full (has a names= overload for
named tensor) and tensor.sum (has a Dimname overload for named tensor).

This PR fixes the problem by putting the "no tracer support" logic into
the location where the tracer attempts to construct a graph by adding a
Dimname/DimnameList argument to a node.

Test Plan:
- new test in test_jit.py to check if torch.full is traceable
- new test in test_namedtensor.py to check what happens when someone
tries to trace a function that uses named tensor APIs.
- [namedtensor ci]

Differential Revision: D17353452

Pulled By: zou3519

fbshipit-source-id: b0b843c8357ffe54baee6e8df86db914f0b1ece4

* Add data field to Tensor pyi. (#26093)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26093

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: vsiles

Differential Revision: D17366320

Pulled By: ezyang

fbshipit-source-id: 025f1c3d75d294fc1b51ddc540e542a05dc72b6a

* Change schedulers to chainable form (#24352)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24352

Enable chainable schedulers as requested in #13022 by implementing the changes mentioned below from [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513370208).

* Changing the behavior of schedulers to the chainable formula when available
* Using the closed form whenever epoch is different from None until the next release with a deprecation warning
* Making `get_computed_values` the supported way of obtaining the last computed learning rate by the scheduler (see [comment](https://github.com/pytorch/pytorch/pull/21800#issuecomment-513940729) for new syntax)
* Returning a deprecation warning when invoking the undocumented get_lr function (see [comment](https://github.com/pytorch/pytorch/pull/21800#discussion_r294305485)) referring to `get_computed_values`, and deprecating it in the next release.
* `CosineAnnealingWarmRestart` still takes an epoch parameter as it is the only one with a mechanic relying on fractional epoch
* `MultiplicativeLR` is consumes a function providing the multiplicative factor at each epoch. It mimics `LambdaLR` in its syntax.

# #20527

### Before

The user calls scheduler with a constant epoch either across loops or in the same loop.
```
import torch.optim as optim
from torch import nn

conv = nn.Conv2d(3,3,3)
optimizer = optim.Adam(conv.parameters())
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)

# Scheduler with sometimes-constant epoch number
for epoch in [0, 0, 1, 1, 2, 2, 3, 3]:
  lr_scheduler.step(epoch)
  print(optimizer.param_groups[0]['lr'])
```

### After

If the user wants to step
```
import torch.optim as optim
from torch import nn

conv = nn.Conv2d(3,3,3)
optimizer = optim.Adam(conv.parameters())
lr_scheduler = optim.lr_scheduler.StepLR(optimizer, 2)

last_epoch = -1
for epoch in [0, 0, 1, 1, 2, 2, 3, 3]:

  # Check if epoch number has changed manually
  if epoch-last_epoch > 0:
    lr_scheduler.step()
  last_epoch = epoch

  print(epoch, scheduler.get_computed_values())
```

# #22107

### Before

```
import torch
from torchvision.models import resnet18
net = resnet18()

optimizer = torch.optim.SGD(net.parameters(), 0.1)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)

for i in range(10):
  # Scheduler computes and returns new learning rate, leading to unexpected behavior
  print(i, scheduler.get_lr())
  scheduler.step()
```

### After

```
import torch
from torchvision.models import resnet18

net = resnet18()
optimizer = torch.optim.SGD(net.parameters(), 0.1)
lr_scheduler = torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones=[3, 6, 9], gamma=0.1)
lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 3, gamma=0.1)

for i in range(10):
    # Returns last computed learning rate by scheduler
    print(i, lr_scheduler.get_computed_values())
    lr_scheduler.step()
```

Test Plan: Imported from OSS

Differential Revision: D17349760

Pulled By: vincentqb

fbshipit-source-id: 0a6ac01e2a6b45000bc6f9df732033dd81f0d89f

* Run PyTorch macOS CPU-only build/test on all PRs

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26096

Test Plan: Imported from OSS

Differential Revision: D17366419

Pulled By: pietern

fbshipit-source-id: 138659dae346aad3cde52d488cd1780614e7692f

* Use CircleCI commands for brew update/install (#26159)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26159

The snippets for working with Homebrew were duplicated across binary
builds, macOS builds, and iOS builds. In #25336, the CircleCI
configuration version was updated to version 2.1, which supports
parameterized commands. This means we no longer have to use YAML
tricks to duplicate stanzas and instead can natively define a series
of reusable steps.

Motivation for doing this is that the macOS binary builds were still
using the slow `brew update` instead of `git fetch` (see #25988).

[test macos]
[test wheel]

Test Plan: Imported from OSS

Differential Revision: D17366538

Pulled By: pietern

fbshipit-source-id: 194c0f37c1dc999705f3ba97fdabf4ff18728d93

* Turn should_run_job into command

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26160

Test Plan: Imported from OSS

Differential Revision: D17366539

Pulled By: pietern

fbshipit-source-id: a870d6da21925764986c6c748ad291440b78e6fd

* Turn setup_linux_system_environment into command

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26162

Test Plan: Imported from OSS

Differential Revision: D17366537

Pulled By: pietern

fbshipit-source-id: 98413daa344812f06578c3373d8516292d2f21f5

* Turn setup_ci_environment into command

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26163

Test Plan: Imported from OSS

Differential Revision: D17366536

Pulled By: pietern

fbshipit-source-id: 07181a77aaeba5457aa716ceac9cc404aacefe5f

* Kill most defaults in Declarations.cwrap. (#25610)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25610

They don't do anything anymore, since this isn't the end-user interface.

Test Plan: Imported from OSS

Differential Revision: D17172495

Pulled By: gchanan

fbshipit-source-id: a380d970f0836ed85eb9ac2aa42eb73655d775aa

* Get rid of more defaults in Declarations.cwrap.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25611

Test Plan: Imported from OSS

Differential Revision: D17172493

Pulled By: gchanan

fbshipit-source-id: 0f4319f8024ac4eca62576231214227b341f56c4

* Kill remaining defaults in Declarations.cwrap.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25612

Test Plan: Imported from OSS

Differential Revision: D17172499

Pulled By: gchanan

fbshipit-source-id: f99e813a4a90e8576541da317027e6f8ae76079b

* Remove requests as dependency (#26083)

Summary:
local build is slow... test in CI...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26083

Differential Revision: D17346949

Pulled By: ailzhang

fbshipit-source-id: f552d1a4be55ad4e2bd915af7c5a2c1b6667c446

* Fix 'in' return true incorrectly (#24156)

Summary:
Because of 'return NotImplemented', __contains__ return True when the element is not a number.
bool(NotImplemented) == True
Pull Request resolved: https://github.com/pytorch/pytorch/pull/24156

Differential Revision: D16829895

Pulled By: zou3519

fbshipit-source-id: 9d3d58025b2b78b33a26fdfcfa6029d0d049f11f

* guard dyndep with a lock (#26153)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26153

I am suspecting that our multithreaded test-system causes issue with dyndep, if two places try to concurrently InitOpsLibrary. So perhaps we just guard this by a lock. This is just a guess-fix, as it is impossible to repro.

Test Plan: sandcastle

Reviewed By: bddppq

Differential Revision: D17361310

fbshipit-source-id: 596634a2098b18881abbd26a5a727a5ba0d03b6e

* Add documentation to logging

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26175

Differential Revision: D17371085

Pulled By: Krovatkin

fbshipit-source-id: ea06f4e16fc320940a299e8e1d4f4d7c76f5950a

* Fold quantize op into module (#25625)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25625

We want to fold the quantize op for weights/bias into module to avoid quantizing weights on the fly.

Test Plan:
python test/test_jit.py

Imported from OSS

Differential Revision: D17208889

fbshipit-source-id: 1854b8953b065855d210bc1166533c08ca264354

* Revert D17349760: Change schedulers to chainable form

Test Plan: revert-hammer

Differential Revision:
D17349760

Original commit changeset: 0a6ac01e2a6b

fbshipit-source-id: 41c2c136215dabc26cad5098a08eff2a2a29b715

* Use torch::from_blob instead of shareExternalPointer, nits (#25973)

Summary:
The main part is to switch at::Tensor creation from usage of `torch::empty(torch::IntArrayRef(...))->ShareExternalPointer(...) to torch::from_blob(...)`
Removed explicit set of `device CPU` as `at::TensorOptions` by default `device CPU`
And renaming of local variables removing `input` prefix to make them shorter
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25973

Differential Revision: D17356837

Pulled By: IvanKobzarev

fbshipit-source-id: 679e099b8aebd787dbf8ed422dae07a81243e18f

* Make schema part of RegisterOperators::Options (#26114)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26114

With this diff, the operator schema or name can be specified as part of the options objects:

```
static auto registry = torch::RegisterOperators()
  .op(torch::RegisterOperators::options().schema("my_op").kernel(&kernel))
  .op(...);
```

This does not break backwards compatibility, all old APIs are kept as shorthands.

This (a) makes the API more consistent, accumulating all options into the options objects and not treating schema special anymore, and (b) this is required for allowing the c10 dispatcher to forward registration calls to ATenDispatch for ops that are still on that dispatcher, see plan in https://github.com/pytorch/pytorch/issues/24132
ghstack-source-id: 90049402

Test Plan: unit tests

Differential Revision: D17350383

fbshipit-source-id: cbb8f33a52dccb2a4522753e7b5ac8ba35b908fd

* Allow overwriting catch-all kernels (#25947)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25947

Previously, the c10 dispatcher didn't allow having a catch-all kernel and backend specific kernels at the same time.
This is also the long term goal. But to make the current XLA implementation work, we need to allow them to overwrite these ops with XLA variants.

This diff changes that so that ops can have both, catchall and backend specific kernels, and will call into the catchall kernel if there is no more specific kernel registered.
This is also the current behavior of globalATenDispatch.
ghstack-source-id: 90049398

Test Plan: unit tests

Differential Revision: D17293036

fbshipit-source-id: f2d5928e904c1dc9b6b89e9bb468debe48a4056c

* Register ATen ops with c10 (#26131)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26131

Changes in this PR:
- For each operator with use_c10_dispatcher: True, additionally generate a c10 registration line in TypeDefault.cpp, CPUType.cpp, and other backend files.
- This doesn't change globalATenDispatch yet, the c10 registration is purely additional and the operator calling path doesn't change. A diff further up the stack will change these things.
- Enable the use_c10_dispatcher: True flag for about ~70% of operators
- This also changes the c10->jit operator export because ATen ops are already exported to JIT directly and we don't want to export the registered c10 ops because they would clash
- For this, we need a way to recognize if a certain operator is already moved from ATen to c10, this is done by generating a OpsAlreadyMovedToC10.cpp file with the list. A diff further up in the stack will also need this file to make sure we don't break the backend extension API for these ops.

Reasons for some ops to be excluded (i.e. not have the `use_c10_dispatcher` flag set to true):
- `Tensor?(a!)` (i.e. optional tensor with annotations) not supported in c++ function schema parser yet
- `-> void` in native_functions.yaml vs `-> ()` expected by function schema parser
- out functions have different argument order in C++ as in the jit schema
- `Tensor?` (i.e. optional tensor) doesn't work nicely with undefined tensor sometimes being undefined tensor and sometimes being None.
- fixed-size arrays like `int[3]` not supported in c10 yet

These will be fixed in separate diffs and then the exclusion tag will be removed.
ghstack-source-id: 90060748

Test Plan: a diff stacked on top uses these registrations to call these ops from ATen

Differential Revision: D16603131

fbshipit-source-id: 315eb83d0b567eb0cd49973060b44ee1d6d64bfb

* Updating submodules

Summary:
GitHub commits:

https://github.com/facebook/rocksdb/commit/83a6a614e9bf5f3f06abc265b736e868acee498b
https://github.com/pytorch/fbgemm/commit/c8cac64995d8d8af871e461affbf505ac7fce4d8

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 1f5bc1e065fe13d89eeb42539f21a8ab0ab8b8a1

* Nightly build for for iOS (#26074)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26074

### Summary

This PR creates a nightly job for iOS builds. The job will generate a couple of static libraries that contains three architectures(x86, arm64, armv7s) and upload them to AWS s3.

### Note

The test phase in this job is missing right now, meaning if there is a linking error, we won't be able to know it. To add the test jobs, we have to put a dummy test App in the repo and manually link the libraries to the app after the build finishes. This will be done in the next following PRs

Test Plan: Imported from OSS

Differential Revision: D17363066

Pulled By: xta0

fbshipit-source-id: 5beeb4263af5722f0a852297023f37aaea9ba4b1

* Change the source link in podspec (#26089)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26089

### Summary

A couple of changes

1. Replace the source link with the newly nightly build address
2. Remove module support for Swift and Objective-C
3. Expose all static libraries instead of archiving them into one single library. This is because those static libraries might contain object files that have the same name, e.g. `init.c.o` in both `libcupinfo.a` and `libqnnpack.a`. If we archive them into one using this `libtool -static` command, by default, it only picks one object file and discards the others, which could result in undefined symbols when linking the executable. The change here is to expose all the static libraries and let the linker decide which one to use.

### Test Plan

- pod spec lint succeed
 - `pod spec lint --verbose --allow-warnings --no-clean --use-libraries --skip-import-validation`

Test Plan: Imported from OSS

Differential Revision: D17363037

Pulled By: xta0

fbshipit-source-id: ba77b0001b58e6e2353d8379d932db598166d37d

* Updating submodules

Summary:
GitHub commits:

https://github.com/facebook/rocksdb/commit/97631357aa274d06a7ab09b3cde7b909262cc4dd
https://github.com/pytorch/fbgemm/commit/2f1477dfee9465c1e2dbdf21722970b3fa1baf86

Test Plan: n/a

Reviewed By: 2d2d2d2d2d

fbshipit-source-id: 33029d2e8c6a3664a35823829670f6ed9dfc3b44

* Tensor renaming to dtype, shape; support long, double (#26183)

Summary:
Applying dzhulgakov  review comments

org.pytorch.Tensor:
  - dims renamed to shape
  - typeCode to dtype
  - numElements to numel

newFloatTensor, newIntTensor... to newTensor(...)

Add support of dtype=long, double
Resorted in code byte,int,float,long,double
For if conditions order float,int,byte,long,double as I expect that float and int branches will be used more often

Tensor.toString() does not have data, only numel (data buffer capacity)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26183

Differential Revision: D17374332

Pulled By: IvanKobzarev

fbshipit-source-id: ee93977d9c43c400b6c054b6286080321ccb81bc

* use whitelist for selecting observed values (#25974)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25974

Previously we observe all the Tensor values, but what we want is actually
observing only the ones that can be quantized.

Test Plan:
python test/test_jit.py
python test/test_quantizer.py

Imported from OSS

Differential Revision: D17348986

fbshipit-source-id: 55be0d73862a0e7eb1e7fd882d16e0d830618b63

* fix circle CI

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26225

Test Plan: Imported from OSS

Differential Revision: D17379899

Pulled By: xta0

fbshipit-source-id: 4077aa0149b23560f3a9e29531ca9bc612a2c09c

* Add histogram observer (#23959)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23959

Add histogram observer that records the running histogram of tensor values along with min/max values.
ghstack-source-id: 90076996

Test Plan:
Added a test test_histogram_observer
buck test mode/dev caffe2/test:quantization -- 'test_histogram_observer'

buck test mode/dev caffe2/test:quantization -- 'test_observer_scriptable'

Differential Revision: D16692835

fbshipit-source-id: 0f047d3349cb9770fad4a2b6cb346c51d9e99cd4

* Add isBackwardCompatibleWith for Argument and FunctionSchema (#23409)

Summary:
we intend to be conservative, and will relax the checks in future if necessary.
So far, we consider the following three conditions as backward compatible:
   1) two schemas are equal
   2) two schemas have same number of arguments, and this schema's
      arguments are backward compatible with the corresponding ones in
      argument list of old_schema.
   3) this schema has m argument, old_argument has n argument, m > n.
      the first n arguments of this schema are backward compatible with
      the corresponding arguments of old_schema. the remaning arguments
      must be either OptionalType or provide default values.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/23409
ghstack-source-id: 90111021

Test Plan: buck test //caffe2/test:function_schema

Reviewed By: hl475

Differential Revision: D16505203

fbshipit-source-id: e4099537776a60e8945e5c3cd57fa861f3598a9b

* Creates generic device type testing framework (#25967)

Summary:
This PR addresses https://github.com/pytorch/pytorch/issues/24851 by...

1. lets device types easily register themselves for testing
2. lets tests be written to run on multiple devices and with multiple dtypes
3. provides a mechanism to instantiate those tests so they are discoverable and filterable by unittest and pytest

It refactors three tests from test_torch.py to demonstrate how to use it.

`test_diagonal` is the simplest example. Most tests just need to be modified to accept 'device' as an argument. The framework will then instantiate `test_diagonal_cpu` and `test_diagonal_cuda` (when CUDA is available) which call `test_diagonal` with the appropriate 'device' argument.

`test_neg` also has dtype variants. It accepts both 'device' and 'dtype' as arguments, and the dtypes it runs with are specified with the 'dtypes' decorator. Dtypes can be specified for all device types and particular device types. The framework instantiates tests like `test_neg_cpu_torch.float`.

`test_inverse` has device-specific dependencies. These dependencies are expressed with the sugary 'skipCUDAIfNoMagma' and 'skipCPUIfNoLapack' decorators. These decorators are device-specific so CPU testing is not skipped if Magma is not installed, and there conditions may be checked after or before the test case has been initialized. This means that skipCUDAIfNoMagma does not initialize CUDA. In fact, CUDA is only initialized if a CUDA test is run.

These instantiated tests may be run as usual and with pytest filtering it's easy to run one test on all device types, run all the tests for a particular device type, or run a device type and dtype combination.

See the note "Generic Device-Type Testing" for more detail.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25967

Differential Revision: D17381987

Pulled By: mruberry

fbshipit-source-id: 4a639641130f0a59d22da0efe0951b24b5bc4bfb

* adds sync to flaky test_events_multi_gpu_query (#26231)

Summary:
This test can sometimes fail in CI.

I suspect this flakiness is because the test asks a CUDA stream to record an event, fails to synchronize the CPU with that stream, then checks if the event is recorded on the CPU. There is no guarantee this will have happened.

This one-line change preserves the intent of the test while ensuring the GPU has recorded the event before the CPU queries it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26231

Differential Revision: D17382110

Pulled By: mruberry

fbshipit-source-id: 35b701f87f41c24b208aafde48bf10e1a54de059

* Added possible out of shared memory error message (#25730)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/5040
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25730

Differential Revision: D17226214

Pulled By: pbelevich

fbshipit-source-id: 92278272aab74e6690f14fc9597acfd1a98854b7

* Remove armv7s build from iOS (#26222)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26222

### Summary

The last generation of armv7s device is Phone 5C. As discussed with David offline, we decided not to support iOS armv7s devices.

### Test plan

- CI finishes successfully
- Builds can be run only on X86_64 and arm64 devices

Test Plan: Imported from OSS

Differential Revision: D17385308

Pulled By: xta0

fbshipit-source-id: f883999aed18224ea3386b1f016964a33270fa34

* Back out "[quant][observer] Add histogram observer" (#26236)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26236

Original diff broke oss CI. Reverting.

Original commit changeset: 0f047d3349cb
ghstack-source-id: 90125990

Test Plan: testinprod

Reviewed By: hx89

Differential Revision: D17385490

fbshipit-source-id: 4258502bbc0e3a6dd6852c8ce01ed05eee618b1a

* Ports most of test_torch.py to generic device type framework (#26232)

Summary:
This PR moves many tests in test_torch.py to the generic device type framework. This means that many CUDA tests now run in test_torch.py and there is greater consistency in how tests for many device types are written.

One change is that all MAGMA tests are run on the default stream due to intermittent instability running MAGMA on the non-default stream. This is a known issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26232

Test Plan:
While this PR edits the tests itself, it was validated using two independent methods:

(1) The code was reviewed and it was verified that all deleted functions were actually moved.
(2) The output of the TestTorch CI was reviewed and test outputs were matched before and after this PR.

Differential Revision: D17386370

Pulled By: mruberry

fbshipit-source-id: 843d14911bbd52e8aac6861c0d9bc3d0d9418219

* Add type hint for cuda.set_rng_state (#26200)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/26199
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26200

Differential Revision: D17386885

Pulled By: soumith

fbshipit-source-id: 9da03aae29281b2ed691cbfdd7b85fde55e5b7ef

* Add a wrapper for inspect in JIT to produce better error message (#25415)

Summary:
If source code is not available due to packaging (e.g. sources are compiled to .pyc), TorchScript produces very obscure error message. This tries to make it nicer and allow to customize message by overriding _utils_internal.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25415

Test Plan: Really hard to unittest properly. Did one off testing by compiling to .pyc and checking the message.

Differential Revision: D17118238

Pulled By: dzhulgakov

fbshipit-source-id: 3cbfee0abddc8613000680548bfe0b8ed52a36b0

* Use MIOpen for transpose convolutions (#26172)

Summary:
Provides significant performance uplift where used.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26172

Differential Revision: D17374862

Pulled By: bddppq

fbshipit-source-id: 85d2df3c67b8935bc54f3a81a912a25c0102743a

* Call aten ops through c10 dispatcher (#23668)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/23668

- The eager mode frontend now calls operators who are defined in native_functions.yaml with `use_c10_dispatcher: True` through the c10 dispatcher and not anymore through globalATenDispatch().
- These operators aren't registered with globalAtenDispatch anymore, only on c10 now.
- Backend extensions calling globalATenDispatch().registerOp() to add their own kernels still work, this function will forward the registration to the c10 dispatcher for them.

ghstack-source-id: 90130455

Test Plan: benchmarks at https://docs.google.com/document/d/1gpzKZcFf1JJameY1vKxF7Cloul9s6D8HKIK2_Pp1hFo/edit#

Differential Revision: D16603133

fbshipit-source-id: 991f17b355e9c78c5e86fee4fa381df7ab98ac82

* Remove unboxedAutogradKernel from c10 (#26130)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26130

Since we now just use TensorTypeId::VariableTensorId, there's no need to treat autograd kernels any differently.
ghstack-source-id: 90130457

Test Plan: unit tests

Differential Revision: D17353873

fbshipit-source-id: d4468506a5366bc5e7429144b090b3e78af9de62

* Refines test_torch.py generic device testing (#26244)

Summary:
- Adds SkipCUDAIfRocm and skipCPUIfNoMkl decorators, ports corresponding tests
- Changes "SkipIf" input semantics for consistency
- Removes torchtest, which has been replaced with this new generic framework
- Refactors some common parts out of CUDA tests to TestTorchDeviceType
- Ensures all MAGMA tests run on default stream by putting the skipCUDANonDefaultStreamIf in the skipCUDAIfNoMagma decorator.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26244

Differential Revision: D17389060

Pulled By: mruberry

fbshipit-source-id: 1375774f24c2266049e6d4b899e7300ddf32eac8

* Fix Windows build (#26246)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26246

Broken due to https://github.com/pytorch/pytorch/issues/12117. Try fixing it.
ghstack-source-id: 90137033

Test Plan: waitforsandcastle

Reviewed By: zou3519

Differential Revision: D17387317

fbshipit-source-id: 705998c0b1608668d510b47f4fe20cecf5057c5f

* Fix CI (#26250)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26250

Exclude some ops from the c10 dispatcher that don't work with it yet.
ghstack-source-id: 90138046

Test Plan: waitforsandcastle

Reviewed By: zou3519

Differential Revision: D17390117

fbshipit-source-id: a87fb3048aeba2c3293b95d610ddb8e94369f8fe

* Back out "[pytorch][PR] Refines test_torch.py generic device testing" (#26252)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26252

Original commit changeset: 1375774f24c2

Testing to see if this is somehow the source of hangs on ROCm builds.

Test Plan: Change is to tests themselves. This diff is for testing the ROCm hang, however.

Differential Revision: D17390575

fbshipit-source-id: a6ffd5eb1df3971b99b6d42271a8d3d501ac79c6

* Fix namedtensor ci (#26257)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26257

In native_functions.yaml, all overloads must have unique overload names.
This PR fixes `flatten` to have unique names for the overloads.

Test Plan: - tested locally, but also [namedtensor ci]

Differential Revision: D17391243

Pulled By: zou3519

fbshipit-source-id: aaef654953b4275c43b9d7bd949c46bd011f6c73

* Switch to the new profiler infrastructure (#26174)

Summary:
The ones supported going forward are rocprofiler and roctracer.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26174

Differential Revision: D17387538

Pulled By: bddppq

fbshipit-source-id: 19d9828d9d07b5073ab5fa288e24fd65a8b18b52

* Fix binary size of OpsAlreadyMovedToC10.cpp (#26237)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26237

Calling a lot of `std::string` constructors is horrible for binary size, see t53997334.

Using `const char*` instead should make the binary size much smaller.
ghstack-source-id: 90145501

Test Plan: size checks on the diff

Differential Revision: D17386002

fbshipit-source-id: c5420adf225e535396e806a0df92419a7e2ad3e8

* Fix no auto batching bugs: cannot bulk load; not work with namedtuple (#26065)

Summary:
see title
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26065

Differential Revision: D17392851

Pulled By: soumith

fbshipit-source-id: 468cd41c8e03d689ff2e0261d948e28daad6bfaf

* Upgrade MKLDNN to v0.20.5 (#25757)

Summary:
1. Fix issues exposed by below posts.
https://github.com/pytorch/pytorch/issues/25242
https://github.com/pytorch/pytorch/issues/25101
https://github.com/pytorch/pytorch/issues/23825
2. Fix RNN support issue in mkldnn-bridge
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25757

Differential Revision: D17367948

Pulled By: VitalyFedyunin

fbshipit-source-id: d8430d3909ecbf853afa0ce3d968735f86f1da31

* fix hypothesis timeout (#26280)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26280

ghstack-source-id: 90160270

Test Plan: testinprod

Differential Revision: D17396861

fbshipit-source-id: ee2348ffa7f6092e2c5647a42d0e17879dcfacd0

* Migrate away from using Variable( in test_nn.py (#26077)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26077

As per #26071, we would like to get rid of the calls to Variable(
where possible. This diff removes the calls in the test file test_nn.py. The
unit tests should all still pass as expected.
ghstack-source-id: 90086624

Test Plan: tests in `test_nn.py` should all pass.

Differential Revision: D17336484

fbshipit-source-id: 43fc7bd0b0be835ae89d06162ce1cbe4e0056d91

* Enabled conv methods for the bfloat16

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26167

Differential Revision: D17367728

Pulled By: izdeby

fbshipit-source-id: 0a7bd9a6dbc15815af195d644c9372af2135e93a

* Move the CUDA implementation of round to ATen. (#25041)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/25041

Fix #24617

Pull Request resolved: https://github.com/pytorch/pytorch/pull/25041

Test Plan: Imported from OSS

Differential Revision: D17114368

Pulled By: VitalyFedyunin

fbshipit-source-id: 6ec6ef99b4451acd7e93491fd4b44fca9ce1809d

* Whiltelist and fusion support for quantized::linear - addmm (#26208)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26208

Supporing `addmm` -> `quantized::linear` quant fusion

Test Plan:
python test/test_jit.py 'TestJit.test_quant_fusion'

Imported from OSS

Differential Revision: D17380074

fbshipit-source-id: fae88f118f85663d777648695768b0504ed7ccf9

* Whiltelist and fusion support for quantized::linear - matmul(without bias) (#26209)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26209

Support quant fusion for `matmul`(without bias) -> `quantized::linear`

Test Plan:
python test/test_jit.py 'TestJit.test_quant_fusion'

Imported from OSS

Differential Revision: D17380075

fbshipit-source-id: 290caee7f7bcf94d2731c0ee9bd40054f0fb9b07

* Updating submodules

Summary:
GitHub commits:

https://github.com/facebook/mcrouter/commit/653434b898ea35810d7369d0911e3bdab9a1c3ac
https://github.com/facebook/proxygen/commit/b74fbefc1a69de78989f540d9d0d312945aeadeb
https://github.com/facebook/rocksdb/commit/9bd5fce6e89fcb294a1d193f32f3e4bb2e41d994
https://github.com/facebookincubator/mvfst/commit/6efcef720fac04011708840b89d1f174d3f290d0
https://github.com/facebookresearch/pytorch-biggraph/commit/cb7830b6b30d2d24b591178705eaf9e8209ecd09
https://github.com/pytorch/fbgemm/commit/53f0c0d175ae4283609a5b251052f9c6598b8aee

Test Plan: n/a

Reviewed By: yns88

fbshipit-source-id: 78d0e24f5601aa990391a2404ae9d23b325de93f

* Add ProcessGroupGloo::createDefaultDevice (#26166)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26166

There were 2 variants to create a new device. One to do so based the
name of a network interface, and one to do so based on a hostname or
address. In the latter, if the address was not specified, it would
lookup the local hostname and try to resolve that. If that failed, the
process would crash.

In this default path, we now try to lookup and use the local hostname,
and if that fails we fallback to using the loopback address.

If the local hostname doesn't resolve to an address that we can bind
to, it is very likely that this process won't join other processes
over the network, and that the user is trying to run a local test.

If this assumption is wrong, the user can override the default
interface selection by setting the environment variable
`GLOO_SOCKET_IFNAME` to the name of the external network interface.

I tested this by changing the local hostname to a bogus name and
confirmed that default initialization works as expected.

Closes #26049.

Test Plan: Imported from OSS

Differential Revision: D17397898

Pulled By: pietern

fbshipit-source-id: 95a2467761d89df87b520d6e5837b92184b0dc12

* Disable broken unit tests (#26301)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26301

-
ghstack-source-id: 90176419

Test Plan: waitforsandcastle

Differential Revision: D17400971

fbshipit-source-id: b6f9cb27fe955b0200d62591300c70ba79a90e5f

* Kill defaults in nn.yaml. (#26282)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26282

Since this isn't the end-user API anymore, we shouldn't have defaults.

Test Plan: Imported from OSS

Differential Revision: D17397153

Pulled By: gchanan

fbshipit-source-id: d44040bec0ee9c70734a53ebcc10a96f12226a29

* Upgrade Caffe2 docker images to 306 to include roctracer and rocprofiler

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26260

Differential Revision: D17391902

Pulled By: bddppq

fbshipit-source-id: 89ab3dedf05ba398acb7300fac95f03cfb31f0ba

* Whiltelist and fusion support for quantized::linear - matmul(with bias) (#26204)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26204

Support quant fusion for `matmul` with bias to `quantized::linear`.

Test Plan:
python test/test_jit.py 'TestJit.test_quant_fusion'

Imported from OSS

Differential Revision: D17380073

fbshipit-source-id: 00014469a852cc5d5b66469fc4b8d05eafba1e3e

* Add __s390x__ compiler define for s390 builds. (#26233)

Summary:
pytorch builds fail on 390 architecture because
in simd.h the ifdef macros default to an x86 asm instruction.
This patchs adds an ifdef __s390x__ to be able to build on s390.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26233

Differential Revision: D17392714

Pulled By: soumith

fbshipit-source-id: 037672bfea64fc5e52da2390d93b973534137c12

* Clarified ambiguous docstring in NegativeBinomial

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25923

Differential Revision: D17392848

Pulled By: soumith

fbshipit-source-id: 2833e72fe449c74dfd8273a7b1eb46c05c63d999

* Dynamic quantization for bias. (#26057)

Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26057

bias is now unquantized (i.e. floating type) for qconv and qlinear. It is dynamically quantized by fbgemm.

TODO: Add some performance numbers.

Tests:

test:quantization
```
Summary (total time 8.41s):
  PASS: 24
  FAIL: 0
  SKIP: 0
  FATAL: 0
  TIMEOUT: 0More details at https://our.intern.facebook.com/intern/buck/build/74d5f6f7-55c9-4350-a618-2013042fffd8

  OMIT: 0
```

test:quantized
```
Summary (total time 13.21s):
  PASS: 43
  FAIL: 0
  SKIP: 5
    caffe2/test:quantized - test_qnnpack_maxpool2d (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_compare_tensor_scalar (test_quantized.TestComparatorOps)
    caffe2/test:quantized - test_qnnpack_linear (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qnnpack_relu (test_quantized.TestQNNPackOps)
    caffe2/test:quantized - test_qnnpack_add (test_quantized.TestQNNPackOps)
  FATAL: 0
  TIMEOUT: 0
  OMIT: 0
```
ghstack-source-id: 90166254

Test Plan:
buck test mode/dev caffe2/test:quantization

buck test mode/dev caffe2/test:quantized

Differential Revision: D17328028

fbshipit-source-id: d4a163d730d0f4a03e8e0faf7420710cf36eec09

* Use expected_wrapper only if CMAKE_{C,CXX}_COMPILER and/or is not set by user (#26306)

Summary:
This will honor user's preference.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26306

Differential Revision: D17408030

Pulled By: soumith

fbshipit-source-id: 6841b805603d40cd7caf78dbb42405a0c931f052

* Add derivative of cholesky_solve (#26185)

Summary:
Changelog:
- Add derivative of cholesky_solve. The equations are derived akin to the derivative of solve methods using the technique detailed [here](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiXrOjIyM7kAhWstlkKHRxqCDgQFjAAegQIAhAC&url=https%3A%2F%2Fpeople.maths.ox.ac.uk%2Fgilesm%2Ffiles%2FNA-08-01.pdf&usg=AOvVaw0BNISOvM_I9KjPrl0xv1R_)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/26185

Test Plan:
- Added tests for cholesky_solve in test_autograd.py

Closes half of https://github.com/pytorch/pytorch/issues/4669.

Differential Revision: D17408123

Pulled By: soumith

fbshipit-source-id: f9668c8d4d758c0dc658941a8b730a17683091aa

* Kill 'default_init', which isn't needed anymore.

Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26281

Test Plan: Imported from OSS

Differential Revision: D17397097

Pulled By: gchanan

fbshipit-source-id: fb53e90637a3dfb2300fca78f414abe2d82832f3

* Export round (#26126)

Summary:
Added round export in opset 11
Pull Request resolved: https:…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged module: rocm AMD GPU support for Pytorch open source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants