DISABLED test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 (__main__.TestSelectAlgorithmCudaCUDA)

Platforms: rocm

  This test was disabled because it is failing in CI. See [recent examples](https://hud.pytorch.org/flakytest?name=test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16&suite=TestSelectAlgorithmCudaCUDA&limit=100) and the most recent trunk [workflow logs](https://github.com/pytorch/pytorch/runs/55365789065).

  Over the past 6 hours, it has been determined flaky in 3 workflow(s) with 3 failures and 3 successes.

  **Debugging instructions (after clicking on the recent samples link):**
  DO NOT ASSUME THINGS ARE OKAY IF THE CI IS GREEN. We now shield flaky tests from developers so CI will thus be green but it will be harder to parse the logs.
  To find relevant log snippets:
  1. Click on the workflow logs linked above
  2. Click on the Test step of the job so that it is expanded. Otherwise, the grepping will not work.
  3. Grep for `test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16`
  4. There should be several instances run (as flaky tests are rerun in CI) from which you can study the logs.
  
  

<details><summary>Sample error message</summary>

```
Traceback (most recent call last):
  File "/var/lib/jenkins/pytorch/test/inductor/test_cuda_select_algorithm.py", line 129, in test_int8_woq_mm_cuda
    self.common(mod, (x, w_scales))
  File "/var/lib/jenkins/pytorch/test/inductor/test_torchinductor.py", line 510, in check_model
    actual = run(*example_inputs, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 934, in compile_wrapper
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 920, in compile_wrapper
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 2083, in __call__
    result = self._torchdynamo_orig_backend(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 703, in __call__
    result = _compile(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1640, in _compile
    guarded_code, tracer_output = compile_inner(code, one_graph, hooks)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_utils_internal.py", line 97, in wrapper_function
    return function(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1323, in compile_inner
    return _compile_inner(code, one_graph, hooks)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1357, in _compile_inner
    dynamo_output = compile_frame(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1239, in compile_frame
    bytecode, tracer_output = transform_code_object(code, transform)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/bytecode_transformation.py", line 1600, in transform_code_object
    tracer_output = transformations(instructions, code_options)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 1211, in transform
    tracer_output = trace_frame(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 325, in _fn
    return fn(*args, **kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 834, in trace_frame
    run_tracer()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/convert_frame.py", line 815, in run_tracer
    tracer.run()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1637, in run
    while self.step():
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 1317, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 4655, in RETURN_VALUE
    self._return(inst)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/symbolic_convert.py", line 4637, in _return
    all_stack_locals_metadata = self.output.compile_subgraph(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 1726, in compile_subgraph
    self.compile_and_call_fx_graph(tx, pass2.graph_output_vars(), root)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 2247, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm, self.example_inputs())
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 2370, in call_user_compiler
    return self._call_user_compiler(gm, example_inputs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/output_graph.py", line 2405, in _call_user_compiler
    compiled_fn = compiler_fn(gm, example_inputs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_dynamo.py", line 156, in __call__
    compiled_gm = compiler_fn(gm, example_inputs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/__init__.py", line 2470, in __call__
    return self.compiler_fn(model_, inputs_, **self.kwargs)
  File "/var/lib/jenkins/pytorch/test/inductor/test_torchinductor.py", line 502, in compile_fx_wrapper
    return compile_fx(model_, example_inputs_)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2523, in compile_fx
    return _maybe_wrap_and_compile_fx_main(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2597, in _maybe_wrap_and_compile_fx_main
    return _compile_fx_main(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2806, in _compile_fx_main
    raise e.remove_dynamo_frames() from None  # see TORCHDYNAMO_VERBOSE=1
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2792, in _compile_fx_main
    return aot_autograd(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/backends/common.py", line 117, in __call__
    cg = aot_module_simplified(gm, example_inputs, **self.kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/aot_autograd.py", line 1119, in aot_module_simplified
    compiled_fn, _ = aot_stage2_compile(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 350, in aot_stage2_compile
    return aot_stage2_inference(aot_state, aot_graph_capture)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 424, in aot_stage2_inference
    compiled_fw = _aot_stage2b_inference_compile(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 393, in _aot_stage2b_inference_compile
    return _aot_stage2b_compile_forward_or_inference(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_functorch/_aot_autograd/graph_compile.py", line 2293, in _aot_stage2b_compile_forward_or_inference
    compiled_fw_func = compiler(fw_module, adjusted_flat_args)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 2093, in fw_compiler_freezing
    optimized_function = inner_compile(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 803, in compile_fx_inner
    return wrap_compiler_debug(_compile_fx_inner, compiler_name="inductor")(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_dynamo/repro/after_aot.py", line 146, in debug_wrapper
    inner_compiled_fn = compiler_fn(gm, example_inputs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1016, in _compile_fx_inner
    raise InductorError(e, currentframe()).with_traceback(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1000, in _compile_fx_inner
    mb_compiled_graph = fx_codegen_and_compile(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1752, in fx_codegen_and_compile
    return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/compile_fx.py", line 1534, in codegen_and_compile
    compiled_module = graph.compile_to_module()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2378, in compile_to_module
    return self._compile_to_module()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2384, in _compile_to_module
    self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2319, in codegen
    self._update_scheduler()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/graph.py", line 2313, in _update_scheduler
    self.scheduler = Scheduler(self.operations)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/scheduler.py", line 2589, in __init__
    self._init(nodes)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/scheduler.py", line 2682, in _init
    self.finalize_multi_template_buffers()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/scheduler.py", line 3466, in finalize_multi_template_buffers
    min_node_unfused, _ = multi_node.get_min_choice()
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ir.py", line 5345, in get_min_choice
    timings = self.choice_timings(hint_override=hint_override)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/ir.py", line 5317, in choice_timings
    self._choice_timings[hint_override] = self._choice_timings_fn(hint_override)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2763, in get_timings
    timings = self.do_autotuning(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 3024, in do_autotuning
    timings = self.lookup(
  File "/var/lib/jenkins/pytorch/test/inductor/test_cuda_select_algorithm.py", line 46, in skip_cache
    timings = benchmark(choices)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 3015, in track_has_autotuned
    return self.autotune(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2888, in autotune
    benchmark_results = self.benchmark(
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 2869, in benchmark
    return benchmark_fn(choices)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 3477, in benchmark_in_current_process
    return cls.benchmark_choices(choices, inputs)
  File "/opt/conda/envs/py_3.10/lib/python3.10/site-packages/torch/_inductor/select_algorithm.py", line 3446, in benchmark_choices
    raise AssertionError(  # noqa: B904
torch._inductor.exc.InductorError: AssertionError: Incorrect result from choice SubgraphCaller(contiguous_mm_0)

Tensor-likes are not close!

Mismatched elements: 1536 / 1536 (100.0%)
Greatest absolute difference: 1112.0 at index (14, 14) (up to 0.0001 allowed)
Greatest relative difference: 1.0 at index (0, 0) (up to 0.0001 allowed)


To execute this test, run the following from the base repo dir:
    PYTORCH_TEST_WITH_ROCM=1 python test/inductor/test_cuda_select_algorithm.py TestSelectAlgorithmCudaCUDA.test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16

This message can be suppressed by setting PYTORCH_PRINT_REPRO_ON_FAILURE=0
```

</details>


  Test file path: `inductor/test_cuda_select_algorithm.py`

  For all disabled tests (by GitHub issue), see https://hud.pytorch.org/disabled.

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @clee2000 @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @aakhundov @coconutruben @dllehr-amd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DISABLED test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 (main.TestSelectAlgorithmCudaCUDA) #167814

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DISABLED test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 (__main__.TestSelectAlgorithmCudaCUDA) #167814

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

DISABLED test_int8_woq_mm_cuda_batch_size_17_mid_dim_1_in_features_1024_out_features_64_cuda_bfloat16 (main.TestSelectAlgorithmCudaCUDA) #167814