Add option to disable autocast pass #77566

2022-05-16T18:06:59.6397668Z + python setup.py install
2022-05-16T18:07:00.5805259Z No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda'
2022-05-16T18:07:00.5917278Z Building torch_xla version: 1.12
2022-05-16T18:07:00.5917745Z XLA Commit ID: 56a52d53359b0862d0ada8d49c4e3dc52ff75d81
2022-05-16T18:07:00.5918207Z PyTorch Commit ID: 7b7a39ee29e45d02a8683579400b5d8bee18146d
2022-05-16T18:07:00.5981688Z /var/lib/jenkins/workspace /var/lib/jenkins/workspace/xla
2022-05-16T18:07:01.7066969Z /var/lib/jenkins/workspace/xla
2022-05-16T18:07:01.7994289Z Traceback (most recent call last):
2022-05-16T18:07:01.7994938Z   File "/var/lib/jenkins/workspace/xla/scripts/gen_lazy_tensor.py", line 84, in <module>
2022-05-16T18:07:01.7995520Z     get_device_fn="torch_xla::bridge::GetXlaDevice")
2022-05-16T18:07:01.7996288Z TypeError: run_gen_lazy_tensor() got an unexpected keyword argument 'get_device_fn'
2022-05-16T18:07:01.8083009Z Failed to generate lazy files: ['python', '/var/lib/jenkins/workspace/xla/scripts/gen_lazy_tensor.py']
2022-05-16T18:07:01.9875003Z + cleanup
2022-05-16T18:07:01.9875242Z + retcode=1
2022-05-16T18:07:01.9875411Z + set +x
2022-05-16T18:07:01.9908093Z ##[error]Process completed with exit code 1.
2022-05-16T18:07:02.0001223Z ##[group]Run pytorch/pytorch/.github/actions/get-workflow-job-id@master
2022-05-16T18:07:02.0001467Z with:
2022-05-16T18:07:02.0001883Z   github-token: ***
2022-05-16T18:07:02.0002055Z env:
2022-05-16T18:07:02.0002198Z   IN_CI: 1

pull / linux-xenial-py3.7-clang7-asan / test (default, 2, 4, linux.2xlarge) (2/4)

Step: "Upload test artifacts" (full log | diagnosis details | 🔁 rerun)

2022-05-16T18:19:10.9309708Z SUMMARY: Undefined.../jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in

2022-05-16T18:19:10.8769242Z     #10 0x55c9c9da3c81 in run_mod /home/builder/tkoch/workspace/python_1648536129212/work/Python/pythonrun.c:1037
2022-05-16T18:19:10.8771499Z     #11 0x55c9c9daec69 in PyRun_StringFlags /home/builder/tkoch/workspace/python_1648536129212/work/Python/pythonrun.c:961
2022-05-16T18:19:10.8772154Z     #12 0x55c9c9daeccb in PyRun_SimpleStringFlags /home/builder/tkoch/workspace/python_1648536129212/work/Python/pythonrun.c:455
2022-05-16T18:19:10.8773225Z     #13 0x55c9c9daedc8 in pymain_run_command /home/builder/tkoch/workspace/python_1648536129212/work/Modules/main.c:420
2022-05-16T18:19:10.8774068Z     #14 0x55c9c9daedc8 in pymain_run_python /home/builder/tkoch/workspace/python_1648536129212/work/Modules/main.c:2907
2022-05-16T18:19:10.8774481Z     #15 0x55c9c9daedc8 in pymain_main /home/builder/tkoch/workspace/python_1648536129212/work/Modules/main.c:3460
2022-05-16T18:19:10.8775043Z     #16 0x55c9c9daf18b in _Py_UnixMain /home/builder/tkoch/workspace/python_1648536129212/work/Modules/main.c:3495
2022-05-16T18:19:10.9308821Z     #17 0x7f001eab883f in __libc_start_main /build/glibc-S7Ft5T/glibc-2.23/csu/../csu/libc-start.c:291
2022-05-16T18:19:10.9309200Z     #18 0x55c9c9d54039 in _start (/opt/conda/bin/python3.7+0x1d8039)
2022-05-16T18:19:10.9309374Z 
2022-05-16T18:19:10.9309708Z SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /var/lib/jenkins/workspace/aten/src/ATen/Utils.cpp:20:3 in 
2022-05-16T18:19:10.9534363Z + retcode=1
2022-05-16T18:19:10.9534690Z + set -e
2022-05-16T18:19:10.9534879Z + return 1
2022-05-16T18:19:10.9538778Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX-* ]]
2022-05-16T18:19:10.9539346Z + [[ default == \n\o\g\p\u\_\N\O\_\A\V\X ]]
2022-05-16T18:19:10.9539972Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX2-* ]]
2022-05-16T18:19:10.9540455Z + [[ default == \n\o\g\p\u\_\N\O\_\A\V\X\2 ]]
2022-05-16T18:19:10.9541100Z + [[ linux-xenial-py3.7-clang7-asan-default == *-NO_AVX512-* ]]
2022-05-16T18:19:10.9541904Z + [[ default == \n\o\g\p\u\_\N\O\_\A\V\X\5\1\2 ]]
2022-05-16T18:19:10.9543246Z + [[ linux-xenial-py3.7-clang7-asan-default == *tbb* ]]

pull / linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (3/4)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

2022-05-16T18:01:28.3844642Z The PR is introduc...m to confirm whether this change is wanted or not.

2022-05-16T18:01:28.3830907Z processing existing schema:  text(__torch__.torch.classes.profiling.SourceRef _0) -> (str _0)
2022-05-16T18:01:28.3832529Z processing existing schema:  count(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-05-16T18:01:28.3833830Z processing existing schema:  duration_ns(__torch__.torch.classes.profiling.InstructionStats _0) -> (int _0)
2022-05-16T18:01:28.3834800Z processing existing schema:  source(__torch__.torch.classes.profiling.SourceStats _0) -> (__torch__.torch.classes.profiling.SourceRef _0)
2022-05-16T18:01:28.3836869Z processing existing schema:  line_map(__torch__.torch.classes.profiling.SourceStats _0) -> (Dict(int, __torch__.torch.classes.profiling.InstructionStats) _0)
2022-05-16T18:01:28.3837924Z processing existing schema:  __init__(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-05-16T18:01:28.3839463Z processing existing schema:  enable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-05-16T18:01:28.3840449Z processing existing schema:  disable(__torch__.torch.classes.profiling._ScriptProfile _0) -> (NoneType _0)
2022-05-16T18:01:28.3842638Z processing existing schema:  _dump_stats(__torch__.torch.classes.profiling._ScriptProfile _0) -> (__torch__.torch.classes.profiling.SourceStats[] _0)
2022-05-16T18:01:28.3844214Z processing existing schema:  __init__(__torch__.torch.classes.dist_rpc.WorkerInfo _0, str _1, int _2) -> (NoneType _0)
2022-05-16T18:01:28.3844642Z The PR is introducing backward incompatible changes to the operator library. Please contact PyTorch team to confirm whether this change is wanted or not. 
2022-05-16T18:01:28.3844653Z 
2022-05-16T18:01:28.3845577Z Broken ops: [
2022-05-16T18:01:28.3845747Z 	aten::lift(Tensor self) -> (Tensor)
2022-05-16T18:01:28.3845933Z 	aten::ccol_indices(Tensor(a) self) -> (Tensor(a))
2022-05-16T18:01:28.3846100Z 	aten::ccol_indices_copy(Tensor self) -> (Tensor)
2022-05-16T18:01:28.3846391Z 	aten::index_reduce(Tensor self, int dim, Tensor index, Tensor source, str reduce, *, bool include_self=True) -> (Tensor)
2022-05-16T18:01:28.3846723Z 	aten::index_reduce.out(Tensor self, int dim, Tensor index, Tensor source, str reduce, *, bool include_self=True, Tensor(a!) out) -> (Tensor(a!))
2022-05-16T18:01:28.3847026Z 	aten::index_reduce_(Tensor(a!) self, int dim, Tensor index, Tensor source, str reduce, *, bool include_self=True) -> (Tensor(a!))
2022-05-16T18:01:28.3847223Z 	aten::glu_jvp(Tensor glu, Tensor x, Tensor dx, int dim) -> (Tensor)
2022-05-16T18:01:28.3847486Z 	aten::_sparse_addmm(Tensor self, Tensor mat1, Tensor mat2, *, Scalar beta=1, Scalar alpha=1) -> (Tensor)

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu) (4/4)

Step: "Upload test artifacts" (full log | diagnosis details | 🔁 rerun)

2022-05-16T20:03:16.1687246Z test_data_parall...as.so.11: undefined symbol: cublasGetSmCountTarget

2022-05-16T20:03:15.5220528Z   test_data_parallel_model_device (__main__.TestDataParallel)
2022-05-16T20:03:15.5532726Z Test device[0] check at forward time. ... ok (0.036s)
2022-05-16T20:03:15.6018930Z   test_data_parallel_model_no_refcycles (__main__.TestDataParallel) ... ok (0.048s)
2022-05-16T20:03:15.6069413Z   test_data_parallel_module_zero_inputs (__main__.TestDataParallel) ... ok (0.005s)
2022-05-16T20:03:15.6134068Z   test_data_parallel_multiple_input (__main__.TestDataParallel) ... /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/comm.py:232: UserWarning: Using -1 to represent CPU tensor is deprecated. Please use a device object or string instead, e.g., "cpu".
2022-05-16T20:03:15.6134810Z   'Using -1 to represent CPU tensor is deprecated. Please use a '
2022-05-16T20:03:15.6307357Z ok (0.024s)
2022-05-16T20:03:15.6338980Z   test_data_parallel_nested_input (__main__.TestDataParallel) ... ok (0.003s)
2022-05-16T20:03:15.6406395Z   test_data_parallel_nested_output (__main__.TestDataParallel) ... ok (0.007s)
2022-05-16T20:03:15.6449246Z   test_data_parallel_no_grad (__main__.TestDataParallel) ... ok (0.004s)
2022-05-16T20:03:16.1687246Z   test_data_parallel_rnn (__main__.TestDataParallel) ... Could not load symbol cublasGetSmCountTarget from libcublas.so.11. Error: /usr/local/cuda/lib64/libcublas.so.11: undefined symbol: cublasGetSmCountTarget
2022-05-16T20:03:16.6736076Z ok (1.028s)
2022-05-16T20:03:16.6771315Z   test_data_parallel_small_back (__main__.TestDataParallel) ... ok (0.004s)
2022-05-16T20:03:16.6896305Z   test_data_parallel_sparse (__main__.TestDataParallel) ... ok (0.012s)
2022-05-16T20:03:16.7142864Z   test_gather_cpu (__main__.TestDataParallel) ... /opt/conda/lib/python3.7/site-packages/torch/nn/parallel/_functions.py:68: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.
2022-05-16T20:03:16.7143680Z   warnings.warn('Was asked to gather along dimension 0, but all '
2022-05-16T20:03:16.7379797Z ok (0.048s)
2022-05-16T20:03:16.7392818Z   test_gather_different_len_dicts (__main__.TestDataParallel) ... ok (0.001s)
2022-05-16T20:03:16.7872922Z   test_gather_gpu (__main__.TestDataParallel) ... ok (0.048s)
2022-05-16T20:03:16.7930135Z   test_parallel_apply (__main__.TestDataParallel) ... ok (0.006s)
2022-05-16T20:03:16.7991024Z   test_parallel_apply_autocast (__main__.TestDataParallel) ... ok (0.006s)

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ghstack-source-id: 81bc2c7 Pull Request resolved: #77566

davidberard98

had some small comments, otherwise looks good

davidberard98 · 2022-05-16T21:55:59Z

torch/csrc/jit/api/function_impl.h


+  // if invoked on a graph that has already traced through amp
+  // don't invoke amp pass
+  mutable bool force_no_amp_ = false;


does this need to be mutable?

i think so otherwise all the const stuff wont compile

davidberard98 · 2022-05-16T21:57:41Z

torch/csrc/jit/python/script_init.cpp

          "name",
          [](const StrongFunctionPtr& self) { return self.function_->name(); })
+      .def(
+          "_set_ignore_amp",


do we need anything like this for modules?

no, since the use case here is just aot autograd

eellison · 2022-05-18T14:56:06Z

@pytorchbot merge this pleasee

github-actions · 2022-05-18T14:58:18Z

Hey @eellison.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

Summary: Pull Request resolved: #77566 Approved by: https://github.com/anijain2305, https://github.com/davidberard98 Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/05ce0f9be63dd6fadd2fb40c29f8f867f267002b Reviewed By: seemethere Differential Revision: D36494147 Pulled By: seemethere fbshipit-source-id: c09a25d1b606e54646e5d12a6c961f91f26b215e

Add option to disable autocast pass

7b7a39e

[ghstack-poisoned]

facebook-github-bot added the cla signed label May 16, 2022

eellison pushed a commit that referenced this pull request May 16, 2022

Add option to disable autocast pass

fba79e4

ghstack-source-id: 81bc2c7 Pull Request resolved: #77566

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label May 16, 2022

eellison requested a review from davidberard98 May 16, 2022 17:58

anijain2305 approved these changes May 16, 2022

View reviewed changes

davidberard98 approved these changes May 16, 2022

View reviewed changes

pytorchmergebot added the Merged label May 18, 2022

pytorchmergebot closed this in 05ce0f9 May 18, 2022

facebook-github-bot deleted the gh/eellison/289/head branch May 22, 2022 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add option to disable autocast pass #77566

Add option to disable autocast pass #77566

Uh oh!

eellison commented May 16, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented May 16, 2022 •

edited

Loading

🕵️ 4 new failures recognized by patterns

pull / pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge) (1/4)

pull / linux-xenial-py3.7-clang7-asan / test (default, 2, 4, linux.2xlarge) (2/4)

pull / linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (3/4)

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu) (4/4)

Uh oh!

davidberard98 left a comment

Uh oh!

davidberard98 May 16, 2022

Uh oh!

eellison May 18, 2022

Uh oh!

davidberard98 May 16, 2022

Uh oh!

eellison May 17, 2022

Uh oh!

eellison commented May 18, 2022

Uh oh!

github-actions bot commented May 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add option to disable autocast pass #77566

Add option to disable autocast pass #77566

Uh oh!

Conversation

eellison commented May 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented May 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

❌ 4 New Failures

🕵️ 4 new failures recognized by patterns

pull / pytorch-xla-linux-bionic-py3.7-clang8 / test (xla, 1, 1, linux.2xlarge) (1/4)

pull / linux-xenial-py3.7-clang7-asan / test (default, 2, 4, linux.2xlarge) (2/4)

pull / linux-xenial-py3.7-gcc5.4 / test (backwards_compat, 1, 1, linux.2xlarge) (3/4)

pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu) (4/4)

Uh oh!

davidberard98 left a comment

Choose a reason for hiding this comment

Uh oh!

davidberard98 May 16, 2022

Choose a reason for hiding this comment

Uh oh!

eellison May 18, 2022

Choose a reason for hiding this comment

Uh oh!

davidberard98 May 16, 2022

Choose a reason for hiding this comment

Uh oh!

eellison May 17, 2022

Choose a reason for hiding this comment

Uh oh!

eellison commented May 18, 2022

Uh oh!

github-actions bot commented May 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

eellison commented May 16, 2022 •

edited

Loading

facebook-github-bot commented May 16, 2022 •

edited

Loading