Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DISABLED test_save_on_cpu_and_checkpoint (__main__.TestAutogradWithCompiledAutograd) #124706

Closed
pytorch-bot bot opened this issue Apr 23, 2024 · 6 comments
Closed
Assignees
Labels
module: flaky-tests Problem is a flaky test in CI module: inductor skipped Denotes a (flaky) test currently skipped in CI. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@pytorch-bot
Copy link

pytorch-bot bot commented Apr 23, 2024

Platforms: asan, linux, rocm, slow, mac, macos

This test was disabled because it is failing in CI. See recent examples and the most recent trunk workflow logs.

Over the past 3 hours, it has been determined flaky in 7 workflow(s) with 21 failures and 7 successes.

Debugging instructions (after clicking on the recent samples link):
DO NOT ASSUME THINGS ARE OKAY IF THE CI IS GREEN. We now shield flaky tests from developers so CI will thus be green but it will be harder to parse the logs.
To find relevant log snippets:

  1. Click on the workflow logs linked above
  2. Click on the Test step of the job so that it is expanded. Otherwise, the grepping will not work.
  3. Grep for test_save_on_cpu_and_checkpoint
  4. There should be several instances run (as flaky tests are rerun in CI) from which you can study the logs.
Sample error message
Traceback (most recent call last):
  File "/var/lib/jenkins/pytorch/test/test_autograd.py", line 9319, in test_save_on_cpu_and_checkpoint
    b.sum().backward()
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_tensor.py", line 534, in backward
    torch.autograd.backward(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/autograd/__init__.py", line 267, in backward
    _engine_run_backward(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/autograd/graph.py", line 767, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/eval_frame.py", line 403, in _fn
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/fx/graph_module.py", line 735, in call_wrapped
    return self._wrapped_call(self, *args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/fx/graph_module.py", line 315, in __call__
    raise e
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/fx/graph_module.py", line 302, in __call__
    return super(self.cls, obj).__call__(*args, **kwargs)  # type: ignore[misc]
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 977, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state, skip=1)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 411, in _convert_frame_assert
    return _compile(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_utils_internal.py", line 70, in wrapper_function
    return function(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 700, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 268, in time_wrapper
    r = func(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 568, in compile_inner
    out_code = transform_code_object(code, transform)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/bytecode_transformation.py", line 1116, in transform_code_object
    transformations(instructions, code_options)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 173, in _fn
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/convert_frame.py", line 515, in transform
    tracer.run()
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 2237, in run
    super().run()
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 875, in run
    while self.step():
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 790, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 2394, in RETURN_VALUE
    self._return(inst)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/symbolic_convert.py", line 2379, in _return
    self.output.compile_subgraph(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1082, in compile_subgraph
    self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
  File "/opt/conda/envs/py_3.8/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1274, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 268, in time_wrapper
    r = func(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1365, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/output_graph.py", line 1346, in call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/repro/after_dynamo.py", line 127, in debug_wrapper
    compiled_gm = compiler_fn(gm, example_inputs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/repro/after_dynamo.py", line 127, in debug_wrapper
    compiled_gm = compiler_fn(gm, example_inputs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/__init__.py", line 1781, in __call__
    return self.compiler_fn(model_, inputs_, **self.kwargs)
  File "inductor/test_compiled_autograd.py", line 26, in inner_compiler
    return inductor.compile(gm_, example_inputs_)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/__init__.py", line 28, in compile
    return compile_fx(gm, example_inputs, config_patches=options)
  File "/opt/conda/envs/py_3.8/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/compile_fx.py", line 1238, in compile_fx
    return flatten_graph_inputs(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 2665, in flatten_graph_inputs
    compiled_fn = compile_gm(GmWrapper(gm, spec), inputs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/compile_fx.py", line 1416, in compile_fx
    return aot_autograd(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/backends/common.py", line 65, in compiler_fn
    cg = aot_module_simplified(gm, example_inputs, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_functorch/aot_autograd.py", line 958, in aot_module_simplified
    compiled_fn = create_aot_dispatcher_function(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 268, in time_wrapper
    r = func(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_functorch/aot_autograd.py", line 685, in create_aot_dispatcher_function
    compiled_fn = compiler_fn(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 470, in aot_wrapper_dedupe
    return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_functorch/_aot_autograd/runtime_wrappers.py", line 672, in aot_wrapper_synthetic_base
    return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_functorch/_aot_autograd/jit_compile_runtime_wrappers.py", line 149, in aot_dispatch_base
    compiled_fw = compiler(fw_module, updated_flat_args)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 268, in time_wrapper
    r = func(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/compile_fx.py", line 1320, in fw_compiler_base
    return inner_compile(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/repro/after_aot.py", line 83, in debug_wrapper
    inner_compiled_fn = compiler_fn(gm, example_inputs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/debug.py", line 304, in inner
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/opt/conda/envs/py_3.8/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 268, in time_wrapper
    r = func(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/compile_fx.py", line 477, in compile_fx_inner
    compiled_graph = fx_codegen_and_compile(
  File "/opt/conda/envs/py_3.8/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/compile_fx.py", line 752, in fx_codegen_and_compile
    compiled_fn = graph.compile_to_fn()
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/graph.py", line 1541, in compile_to_fn
    return self.compile_to_module().call
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 268, in time_wrapper
    r = func(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/graph.py", line 1488, in compile_to_module
    mod = PyCodeCache.load_by_key_path(
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/codecache.py", line 2405, in load_by_key_path
    exec(code, mod.__dict__, mod.__dict__)
  File "/tmp/torchinductor_jenkins/cq/ccqbo2v2fzggqm2mcork6qs4mubazscelegqext3q6j7igb5sfbn.py", line 64, in <module>
    async_compile.wait(globals())
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/codecache.py", line 3007, in wait
    scope[key] = result.result()
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/codecache.py", line 2816, in result
    return self.result_fn()
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/codecache.py", line 2285, in future
    result = get_result()
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/codecache.py", line 2122, in load_fn
    future.result()
  File "/opt/conda/envs/py_3.8/lib/python3.8/concurrent/futures/_base.py", line 444, in result
    return self.__get_result()
  File "/opt/conda/envs/py_3.8/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
    raise self._exception
  File "/opt/conda/envs/py_3.8/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/codecache.py", line 2147, in _worker_compile_cpp
    compile_file(input_path, output_path, shlex.split(cmd))
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_dynamo/utils.py", line 268, in time_wrapper
    r = func(*args, **kwargs)
  File "/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/_inductor/codecache.py", line 2018, in compile_file
    raise exc.CppCompileError(cmd, output) from e
torch._dynamo.exc.BackendCompilerFailed: backend='inner_compiler' raised:
CppCompileError: C++ compile error

Command:
g++ /tmp/torchinductor_jenkins/df/cdf7czvuqi3idj7frfalsuiok5mbky7rrnhudmjo3h4ugt23piae.cpp -shared -fPIC -Wall -std=c++17 -Wno-unused-variable -Wno-unknown-pragmas -D_GLIBCXX_USE_CXX11_ABI=1 -I/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/include -I/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/include/TH -I/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/include/THC -I/opt/conda/envs/py_3.8/include/python3.8 -L/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/lib -L/opt/conda/envs/py_3.8/lib -L/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/lib -ltorch -ltorch_cpu -lgomp -ltorch_python -lc10 -mavx2 -mfma -DCPU_CAPABILITY_AVX2 -O3 -DNDEBUG -ffast-math -fno-finite-math-only -fno-unsafe-math-optimizations -ffp-contract=off -march=native -fopenmp -D C10_USING_CUSTOM_GENERATED_MACROS -o /tmp/torchinductor_jenkins/df/cdf7czvuqi3idj7frfalsuiok5mbky7rrnhudmjo3h4ugt23piae.so

Output:
/tmp/torchinductor_jenkins/df/cdf7czvuqi3idj7frfalsuiok5mbky7rrnhudmjo3h4ugt23piae.cpp:2:10: fatal error: /tmp/tmpkxa9d9lq/z2/cz2uvkefmshwlhxxsbghzvp6zv66yqdenm36rax6nft66odb4erj.h: No such file or directory
    2 | #include "/tmp/tmpkxa9d9lq/z2/cz2uvkefmshwlhxxsbghzvp6zv66yqdenm36rax6nft66odb4erj.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.


Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True


To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_ROCM=1 python test/test_autograd.py -k test_save_on_cpu_and_checkpoint

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0

Test file path: inductor/test_compiled_autograd.py

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang @clee2000 @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

@pytorch-bot pytorch-bot bot added module: flaky-tests Problem is a flaky test in CI module: inductor module: rocm AMD GPU support for Pytorch skipped Denotes a (flaky) test currently skipped in CI. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 23, 2024
Copy link
Author

pytorch-bot bot commented Apr 23, 2024

Hello there! From the DISABLED prefix in this issue title, it looks like you are attempting to disable a test in PyTorch CI. The information I have parsed is below:
  • Test name: test_save_on_cpu_and_checkpoint (__main__.TestAutogradWithCompiledAutograd)
  • Platforms for which to skip the test: asan, linux, mac, macos, rocm, slow
  • Disabled by pytorch-bot[bot]

Within ~15 minutes, test_save_on_cpu_and_checkpoint (__main__.TestAutogradWithCompiledAutograd) will be disabled in PyTorch CI for these platforms: asan, linux, mac, macos, rocm, slow. Please verify that your test name looks correct, e.g., test_cuda_assert_async (__main__.TestCuda).

To modify the platforms list, please include a line in the issue body, like below. The default action will disable the test for all platforms if no platforms list is specified.

Platforms: case-insensitive, list, of, platforms

We currently support the following platforms: asan, dynamo, inductor, linux, mac, macos, rocm, slow, win, windows.

Copy link
Author

pytorch-bot bot commented Apr 23, 2024

Another case of trunk flakiness has been found here. The list of platforms [rocm] does not appear to contain all the recently affected platforms [asan, linux]. Adding [asan, linux].

@pytorch-bot pytorch-bot bot removed the module: rocm AMD GPU support for Pytorch label Apr 23, 2024
Copy link
Author

pytorch-bot bot commented Apr 23, 2024

Another case of trunk flakiness has been found here. The list of platforms [asan, linux, rocm] does not appear to contain all the recently affected platforms [asan, linux, slow]. Adding [slow].

Copy link
Author

pytorch-bot bot commented Apr 23, 2024

Another case of trunk flakiness has been found here. The list of platforms [asan, linux, rocm, slow] appears to contain all the recently affected platforms [asan, linux]. Either the change didn't propogate fast enough or disable bot might be broken.

Copy link
Author

pytorch-bot bot commented May 10, 2024

Another case of trunk flakiness has been found here. The list of platforms [asan, linux, rocm, slow] does not appear to contain all the recently affected platforms [mac, macos]. Adding [mac, macos].

@xmfan xmfan self-assigned this May 17, 2024
@xmfan
Copy link
Member

xmfan commented May 17, 2024

#126146

@xmfan xmfan closed this as completed May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: flaky-tests Problem is a flaky test in CI module: inductor skipped Denotes a (flaky) test currently skipped in CI. triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

1 participant