Merge pytorch master into lazy_tensor_staging #72894

wconstab · 2022-02-15T22:35:05Z

No description provided.

Summary: Tests under `test/onnx/test_models_onnxruntime.py` complains `AttributeError: 'TestModels' object has no attribute 'onnx_shape_inference'`. This failure in CI appears suddenly without any code changes to related files. It is likely due to different test case run order. The test code was badly written such that test class `TestModels_new_jit_API`, if called first, will assign `TestModels.onnx_shape_inference = True`, circumventing this problem. On the other hand, if `TestModels` is called first, `AttributeError` will be raised. Fixes #72337 Pull Request resolved: #72350 Reviewed By: jbschlosser, seemethere, janeyx99 Differential Revision: D34010794 Pulled By: malfet fbshipit-source-id: 816f7bee89ea0251bb5df8f482b68f8dc4823997 (cherry picked from commit b39b23b)

Summary: Update branch location after https://github.com/pytorch/test-infra/pull/182/files A step for pytorch/test-infra#175 Pull Request resolved: #72233 Reviewed By: jbschlosser Differential Revision: D34013014 Pulled By: kit1980 fbshipit-source-id: a05be1870608d5d2dc6cfd270222abd56b89e5fe (cherry picked from commit 9806e7d)

Summary: Pull Request resolved: #72244 att Test Plan: python test/test_quantization.py TestQuantizeFx python test/test_quantization.py TestQuantizeFxOps Imported from OSS Reviewed By: albanD Differential Revision: D33971544 fbshipit-source-id: a6e01d08ae9fdd479bba8abfa0941c555650f84e (cherry picked from commit fe756cc)

Summary: In order to make them accessible from other libraries Also make `REGISTER_ARCH_DISPATCH` export dispatches as TORCH_API, so that stubs could be called from libraries other than `torch_cpu`. To satisfy Windows builds, add the same `TORCH_API` to the static members declarations, although they are noops on Linux. Pull Request resolved: #72340 Reviewed By: janeyx99 Differential Revision: D34007756 Pulled By: malfet fbshipit-source-id: 6dcc4e350920c72f8b1762a5018082f7aeec98e9 (cherry picked from commit 9c1f44d)

Summary: Pull Request resolved: #72349 1. Interface call'd methods need to be registered to class. Previously all interface calls are inlined so there was no such problem. 2. parseDoubleList and parseBoolList got reversed when refactoring. Test Plan: 1. Get ASR's test model at ``` mkdir ~/asr1 && cd ~/asr1 fbpkg fetch speech.tuna.milan.ondevice.en_us ``` 2. Convert model: ``` cd ~/fbsource buck run //xplat/caffe2/fb/lite_predictor:convert_model -- --model=$HOME/asr1/pytorchmodel.pt --output_name=$HOME/asr1/pytorchmodel.ff ``` 3. Ran lite_predictor_flatbuffer ``` buck run //xplat/caffe2/fb/lite_predictor:lite_predictor_flatbuffer -- --model=$HOME/asr1/pytorchmodel.ff --method_to_call=encode_src --method_to_generate_input=get_all_bundled_inputs_for_encode_src ``` See perf metric generated (means loading and inference succeeded). Reviewed By: gmagogsfm, zhxchen17 Differential Revision: D33959746 fbshipit-source-id: 24671e1189438119f477032eb6c29bd7736e74ca (cherry picked from commit 5e18809)

Summary: D34018849 Test Plan: D34018849 Reviewed By: shoumikhin Differential Revision: D34018840 fbshipit-source-id: a78e3ea5b8ac93e9e002e2583961fd3a545a0abd (cherry picked from commit 57b7c51)

Summary: I think this diff stack broke all the related tasks below. Test Plan: For our failing tests: buck test //fbandroid/instrumentation_tests/com/facebook/pytorch/bi_xray:instrumentation_test -c test.external_runner=tpx -- --regex 'testBIXRayModel.*PyTorchBIXRayInstrumentationTest' --force-remote-execution --run-disabled For the ubn: Not really sure what to do, trying to build the app and see if I can use an effect? Reviewed By: shoumikhin Differential Revision: D34018849 fbshipit-source-id: 3571718cb6621931af931b494e0a70d6e0164e65 (cherry picked from commit 3cc63cb)

Summary: Pull Request resolved: #71783 Adds Quantized Matmul op which just naively performs dequantize -> matmul -> quantize. (To be optimized in the future) Test Plan: From fbcode: ```buck test caffe2/test:quantization -- test_qmatmul``` Reviewed By: kimishpatel Differential Revision: D33443161 fbshipit-source-id: 7e0a8e45bed1a63f9cd68a70cadbc9a8e35b2faa (cherry picked from commit 100d8b3)

Summary: Pull Request resolved: #69854 ghstack-source-id: 148315147 Test Plan: Time reported to start up static runtime on ctr_mobile_feed local_ro net is 8.8s instead of 9.5s Reviewed By: suo, d1jang Differential Revision: D33039733 fbshipit-source-id: 218dc7ff9aa421a352b71952ec77757368095860 (cherry picked from commit 7586712)

) Summary: Pull Request resolved: #72222 ghstack-source-id: 148415879 Test Plan: Only changing printing graphs during debugging. Reviewed By: HarutMov Differential Revision: D33960591 fbshipit-source-id: 2db5e65258b6d30a4fa88cb7e115cbffcebfa15f (cherry picked from commit 2253757)

Summary: Pull Request resolved: #72310 ghstack-source-id: 148415878 Test Plan: tested build locally Reviewed By: mikeiovine Differential Revision: D33995083 fbshipit-source-id: 68a992ad53f9063f906d0b933895c447f7b5fdab (cherry picked from commit 1c04e71)

Summary: Pull Request resolved: #71520 Specifically this pr is to deal with adverse cases such as D33662192 It is possible for someone to export a package, change something in it, and then attempt to repackage it. In this case it's possible that dependencies of the package are no longer interned. In this case it is not obvious where we would look for these packages, therefore, we throw an error. Test Plan: Imported from OSS Reviewed By: bradleyhd Differential Revision: D33675557 Pulled By: PaliC fbshipit-source-id: 807962bfb340d30d418617d6e78661a033828314 (cherry picked from commit 1b10c23)

…d always 64 bug (#72375) Summary: Pull Request resolved: #72375 Current MHA does not have num_head as param, add it Test Plan: In the following diff Reviewed By: swolchok Differential Revision: D33972168 fbshipit-source-id: 6b31bd6a516354d781e6dd5eea347a31d6cea272 (cherry picked from commit 3d07066)

Reviewed By: zertosh Differential Revision: D34028007 fbshipit-source-id: 6bb39fab7232baae3769d68857964a2a4e581e7d (cherry picked from commit 3a05821)

Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: pytorch/FBGEMM@49fe829 Pull Request resolved: #72158 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jspark1105 Differential Revision: D33932905 Pulled By: jasonjk-park fbshipit-source-id: 8b40ba08de2880f374e5b2dc09a7a451262385a7 (cherry picked from commit 67944c2)

Summary: Pull Request resolved: #72196 Test Plan: Imported from OSS Reviewed By: dagitses Differential Revision: D33951743 Pulled By: navahgar fbshipit-source-id: f1b36bb3ba9cd649f0dbf0911f5a9e4791089e65 (cherry picked from commit fbe5cad)

…tic and symbolic shapes (#72197) Summary: Pull Request resolved: #72197 Test Plan: Imported from OSS Reviewed By: huiguoo Differential Revision: D33951742 Pulled By: navahgar fbshipit-source-id: 0412d61da158e98429f377469e1c331587390b14 (cherry picked from commit c043fdf)

Summary: Pull Request resolved: #71312 Renames `seen_op_info` to `seen_q_op_info` and `SeenOpInfo` to `SeenQOpInfo`. This is to make it clear that the op here is a quantizeable op. This is useful for a future PR where we will start recording the DAG of non-quantizeable ops. That will be needed to properly support function fusion. Test Plan: CI and mypy Reviewed By: albanD Differential Revision: D33584751 Pulled By: vkuzo fbshipit-source-id: 0b659d4ecefc96d532c451abac410c638e457dcb (cherry picked from commit 6d85745)

Summary: Pull Request resolved: #71324 In a future PR we need to start recording the DAG of non-quantizeable ops, in order to properly reason about function fusion. This PR refactors the prepare hooks to have the first_call versions be separate functions. This ensures that the logic to record the DAG (the first_call functions) is separate from the logic used at inference. Test Plan: ``` python test/test_quantization.py -k DBR ``` Reviewed By: albanD Differential Revision: D33588558 Pulled By: vkuzo fbshipit-source-id: 7b27ee4e5b64f26bfb082ca2bbf2c04894bd2a97 (cherry picked from commit 21fa635)

Summary: Pull Request resolved: #71551 This PR makes DBR quant record the DAG of non-quantizeable ops. Having this will enable us to analyze the entire traced graph of pytorch ops, regardless of whether they support quantization or not. That, in turn, will enable analysis of uses of each op, allowing us to safely determine whether a subgraph of ops can be fused or not. In future PRs, this functionality will be used to implement function fusion. Test Plan: ``` python test/test_quantization.py -k DBR ``` Reviewed By: jerryzh168 Differential Revision: D33684130 Pulled By: vkuzo fbshipit-source-id: 497d9882f0670a36eef2a0900ea2517c82addf66 (cherry picked from commit b0d48a8)

Summary: Pull Request resolved: #71764 For DBR quant, adds the code for matching seen ops to function fusion patterns. After we have the full DAG, we have a separate pass over the dag and add matched fusion patterns to the seen op data structure. This is the first PR in the stack which implements matching and recording the match results. Future PRs in this stack will use the match results to modify observer insertion and inference. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR.test_fusion_functions ``` Reviewed By: jerryzh168 Differential Revision: D33775098 Pulled By: vkuzo fbshipit-source-id: 488aac902bf568d41c863ee49248990411ed9c53 (cherry picked from commit 4ad1ca1)

Summary: Pull Request resolved: #71780 Adds support for matching operator.add -> torch.relu in FX graph mode quantization. It would be nice to support torch.relu better in general, but saving that for a future PR to keep PRs small. This is useful for DBR quant because we have some test cases in DBR quant which use add-relu, and we'd like to match them to FX. Test Plan: ``` python test/test_quantization.py TestQuantizeFxOps.test_add_relu python test/test_quantization.py TestQuantizeFxOps.test_mul_relu ``` Reviewed By: jerryzh168 Differential Revision: D33775096 Pulled By: vkuzo fbshipit-source-id: 889d9b41d3758ecbbb6d7eab67f64ce3d4892d24 (cherry picked from commit c1f9f38)

…ence (#71781) Summary: Pull Request resolved: #71781 The previous PR added information about fusions found in the subgraphs. This PR uses that information for: 1. inserting observers at the end of fusions and not in the middle 2. during inference, replacing the original op with the fused op. The way this is implemented is that the base op is replaced with the fused op, and all other ops are replaced with identity functions. Test Plan: ``` python test/test_quantization.py TestQuantizeDBR.test_fusion_functions ``` Reviewed By: jerryzh168 Differential Revision: D33775097 Pulled By: vkuzo fbshipit-source-id: 12249b85b2f7ba7545a54872aeb5f1ff2fc928cf (cherry picked from commit 0db4324)

Summary: Pull Request resolved: #72344 ATen core is mostly compliant already so we can just add the flag to the build system. The only exception is interned string which includes symbols like `aten::add` generated for each operator. Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D34010820 Pulled By: albanD fbshipit-source-id: ef1a625d96f30457b5e6beffc5e630516e54f9b4 (cherry picked from commit b90c262)

Summary: Pull Request resolved: #72284 This update adds the prod op to the fx2trt tool which is used to create a TensorRT engine for a PyTorch model. Test Plan: A new unit test was added to test that the op was added to the acc tracer. This text can be run using the following command: buck test --debug //caffe2/test:test_fx_acc_tracer -- --exact 'caffe2/test:test_fx_acc_tracer - test_prod (fx_acc.test_acc_tracer.AccTracerTest)' A new suite of unit tests were also added for the conversion to tensorRT and can be tested using the following command: buck test mode/dev-nosan //caffe2/test/fx2trt/converters:test_prod Please note that unfortunately unlike other pytorch reduce ops such as sum, the pytorch prod function does not support reducing more than 1 dimension at a time (the dim arg cannot be a tuple, only a single int is acceptable for prod). Therefore prod cannot utilize all of the reduce_op code. https://pxl.cl/1Xpn8 https://pxl.cl/1Xpn9 Reviewed By: 842974287 Differential Revision: D33875336 fbshipit-source-id: f9340db3685d681b1cf4ffc3b9fd25d16914e231 (cherry picked from commit cfe48d3)

Should mitigate #72432 Pull Request resolved: #72433

Pull Request resolved: #72372

Summary: Pull Request resolved: #71129 cc ezyang albanD zou3519 gqchen pearu nikitaved soulitzer Lezcano Varal7 Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D34012577 Pulled By: anjali411 fbshipit-source-id: 02d2f2d761f7c9332e2f3cc529e8f1c6b60d7da2 (cherry picked from commit 87318a2)

Summary: This reverts the previous PR and add some comments to make it clear what the intent is. Also removes some extra static_assert that are not needed (at least for the compilers I tried). Pull Request resolved: #72336 Reviewed By: r-barnes Differential Revision: D34006722 Pulled By: albanD fbshipit-source-id: 290fb89a2d2c66a0d1c3651198b31d21216ec230 (cherry picked from commit 76f0aaa)

Summary: Pull Request resolved: #70030 range_push and range_pop do not support multi-thread. It only works for push and pop range in the same thread. For process level ranges, we should use range_start and range_end. This is important because PyTorch forward is on one thread, while the autograd is on a different thread. See NVidia implementation documentation: https://github.com/nvpro-samples/shared_external/blob/cab2dec7608ebc9d36fb086a07ce5112700b089d/NSight/nvToolsExt.h#L397-L407 Test Plan: ``` buck test caffe2/test:cuda Started reporting to test run: https://www.internalfb.com/intern/testinfra/testrun/8162774391483460 ✓ ListingSuccess: caffe2/test:cuda - main (19.640) Summary ListingSuccess: 1 If you need help understanding your runs, please follow the wiki: https://fburl.com/posting_in_tpx_users Finished test run: https://www.internalfb.com/intern/testinfra/testrun/8162774391483460 ``` Reviewed By: malfet Differential Revision: D33155244 fbshipit-source-id: c7d5143f6da9b6ef0e0811e2fcae03a3e76f24de (cherry picked from commit 22134e9)

facebook-github-bot · 2022-02-15T22:35:26Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/72894
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
↩️ [fb-only] Re-run with SSH instructions
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 8aabaa5 (more details on the Dr. CI page):

3/3 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

Lint / clang-tidy (1/1)

Step: "Check for warnings" (full log | diagnosis details | 🔁 rerun)

2022-02-15T22:53:08.3031461Z /__w/pytorch/pytor...rone-suspicious-missing-comma,-warnings-as-errors]

2022-02-15T22:53:08.3027780Z   ^
2022-02-15T22:53:08.3028172Z /__w/pytorch/pytorch/torch/csrc/jit/tensorexpr/lowerings.cpp:129:8: note: expanded from macro 'DEFINE_COMPARISON_SCALAR_OP_LOWERING'
2022-02-15T22:53:08.3028562Z       {"aten::" #op_name ".bool(bool a, bool b) -> (bool)",               \
2022-02-15T22:53:08.3028766Z        ^
2022-02-15T22:53:08.3029281Z /__w/pytorch/pytorch/torch/csrc/jit/tensorexpr/lowerings.cpp:149:3: error: suspicious string literal, probably missing a comma [bugprone-suspicious-missing-comma,-warnings-as-errors]
2022-02-15T22:53:08.3029734Z   DEFINE_COMPARISON_SCALAR_OP_LOWERING(gt, cast<bool>(a > b))
2022-02-15T22:53:08.3029944Z   ^
2022-02-15T22:53:08.3030324Z /__w/pytorch/pytorch/torch/csrc/jit/tensorexpr/lowerings.cpp:129:8: note: expanded from macro 'DEFINE_COMPARISON_SCALAR_OP_LOWERING'
2022-02-15T22:53:08.3030725Z       {"aten::" #op_name ".bool(bool a, bool b) -> (bool)",               \
2022-02-15T22:53:08.3030918Z        ^
2022-02-15T22:53:08.3031461Z /__w/pytorch/pytorch/torch/csrc/jit/tensorexpr/lowerings.cpp:150:3: error: suspicious string literal, probably missing a comma [bugprone-suspicious-missing-comma,-warnings-as-errors]
2022-02-15T22:53:08.3031909Z   DEFINE_COMPARISON_SCALAR_OP_LOWERING(ge, cast<bool>(a >= b))
2022-02-15T22:53:08.3032108Z   ^
2022-02-15T22:53:08.3032498Z /__w/pytorch/pytorch/torch/csrc/jit/tensorexpr/lowerings.cpp:129:8: note: expanded from macro 'DEFINE_COMPARISON_SCALAR_OP_LOWERING'
2022-02-15T22:53:08.3032901Z       {"aten::" #op_name ".bool(bool a, bool b) -> (bool)",               \
2022-02-15T22:53:08.3033172Z        ^
2022-02-15T22:53:08.3033335Z Warnings detected!
2022-02-15T22:53:08.3033492Z Summary:
2022-02-15T22:53:08.3033776Z [bugprone-suspicious-missing-comma] occurred 6 times
2022-02-15T22:53:08.3034069Z     /__w/pytorch/pytorch/torch/csrc/jit/tensorexpr/lowerings.cpp:145
2022-02-15T22:53:08.3034365Z     /__w/pytorch/pytorch/torch/csrc/jit/tensorexpr/lowerings.cpp:146

2 failures not recognized by patterns:

Job	Step	Action
^{Lint / shellcheck}	^{Assert that regenerating the workflows didn't change them}	🔁 rerun
^{Lint / mypy}	^{Run mypy}	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

BowenBao and others added 30 commits February 4, 2022 20:22

Back out "DispatchKeySet perf improvements"

888e3fb

Summary: D34018849 Test Plan: D34018849 Reviewed By: shoumikhin Differential Revision: D34018840 fbshipit-source-id: a78e3ea5b8ac93e9e002e2583961fd3a545a0abd (cherry picked from commit 57b7c51)

[Codemod][FBSourceBlackLinter] Daily arc lint --take BLACK

1edf6f5

Reviewed By: zertosh Differential Revision: D34028007 fbshipit-source-id: 6bb39fab7232baae3769d68857964a2a4e581e7d (cherry picked from commit 3a05821)

Pin librosa

a004f13

Should mitigate #72432 Pull Request resolved: #72433

Update release note bot to actually ping people

4e98a4b

Pull Request resolved: #72372

facebook-github-bot added the cla signed label Feb 15, 2022

wconstab removed request for BowenBao, H-Huang, IvanYashchuk, albanD, atalman, ezyang, janeyx99, jbschlosser, jeffdaily, jithunnair-amd, lezcano, mingzhe09088, mrshenli, nikitaved, pritamdamania87, rohan-varma, seemethere, shubhambhokare1, soulitzer and zhaojuanmao February 15, 2022 22:35

facebook-github-bot added oncall: jit Add this issue/PR to JIT oncall triage queue module: rocm AMD GPU support for Pytorch oncall: distributed Add this issue/PR to distributed oncall triage queue module: fx labels Feb 15, 2022

Krovatkin self-requested a review February 15, 2022 22:39

Krovatkin approved these changes Feb 15, 2022

View reviewed changes

wconstab merged commit 8aabaa5 into lazy_tensor_staging Feb 16, 2022

wconstab deleted the wconstab/lazy_tensor_staging branch February 16, 2022 04:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge pytorch master into lazy_tensor_staging #72894

Merge pytorch master into lazy_tensor_staging #72894

Uh oh!

wconstab commented Feb 15, 2022

Uh oh!

facebook-github-bot commented Feb 15, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

89 participants

Merge pytorch master into lazy_tensor_staging #72894

Merge pytorch master into lazy_tensor_staging #72894

Uh oh!

Conversation

wconstab commented Feb 15, 2022

Uh oh!

facebook-github-bot commented Feb 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

Lint / clang-tidy (1/1)

2 failures not recognized by patterns:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

89 participants

facebook-github-bot commented Feb 15, 2022 •

edited

Loading