[torch-mlir] bump to llvm/llvm-project@9b78ddf3b2abfb3e #3491

aartbik · 2024-06-22T01:49:23Z

This bump triggered an upstream assert. Includes a WAR for #3506.

Also includes several things I needed to do to repro:

When TORCH_MLIR_TEST_CONCURRENCY=1, test runs will be printed.
Added TORCH_MLIR_TEST_VERBOSE=1 handling to enable verbose mode (useful on CI).

Max191 · 2024-06-27T16:15:31Z

@aartbik @PeimingLiu is there any progress on this bump? Do you need someone to pick it up?

PeimingLiu · 2024-06-27T16:32:16Z

@aartbik @PeimingLiu is there any progress on this bump? Do you need someone to pick it up?

Yes, please! I can not reproduce the error locally.

stellaraccident

lgtm. retriggered CI. Looked like maybe infra flake.

stellaraccident · 2024-06-27T18:39:38Z

Not an infra issue but some crash in the TOSA test suite. Darn - will require triage.

stellaraccident · 2024-06-28T00:48:33Z

Narrowed down to ViewDynamicExpandCollapseWithParallelUnknownDimModule_basic emitting an error. A new assert was added in LLVM to ensure all errors were handled.

stellaraccident · 2024-06-28T02:11:32Z

Debugging instructions:

$sudo apt install python3-dbg
$gdb --args python -m e2e_testing.main --config tosa --filter ViewDynamicExpandCollapseWithParallelUnknownDimModule_basic

... assert ...
(gdb) bt
(gdb) py-bt

Native and Python stack:

#5  0x00007ffff7c2881b in __assert_fail_base (fmt=0x7ffff7dd01e8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=assertion@entry=0x7fff3386b029 "errors.empty() && \"unhandled captured errors\"",
    file=file@entry=0x7fff3386cbea "/home/stella/src/torch-mlir/externals/llvm-project/mlir/lib/Bindings/Python/IRModule.h", line=line@entry=434,
    function=function@entry=0x7fff3386155e "mlir::python::PyMlirContext::ErrorCapture::~ErrorCapture()") at ./assert/assert.c:94
#6  0x00007ffff7c3b507 in __assert_fail (assertion=0x7fff3386b029 "errors.empty() && \"unhandled captured errors\"",
    file=0x7fff3386cbea "/home/stella/src/torch-mlir/externals/llvm-project/mlir/lib/Bindings/Python/IRModule.h", line=434,
    function=0x7fff3386155e "mlir::python::PyMlirContext::ErrorCapture::~ErrorCapture()") at ./assert/assert.c:103

Traceback (most recent call first):
  <built-in method run of PyCapsule object at remote 0x7fff9c77e3a0>
  File "/home/stella/src/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir/compiler_utils.py", line 47, in run_pipeline_with_repro_report
    pm.run(module.operation)
  File "/home/stella/src/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/linalg_on_tensors_backends/refbackend.py", line 227, in compile
    run_pipeline_with_repro_report(
  File "/home/stella/src/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/tosa_backends/linalg_on_tensors.py", line 70, in compile
    return self.refbackend.compile(imported_module)
  File "/home/stella/src/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/configs/tosa_backend.py", line 42, in compile
    return self.backend.compile(module)
  File "/home/stella/src/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/framework.py", line 313, in compile_and_run_test
    compiled = config.compile(test.program_factory(), verbose=verbose)
  File "/home/stella/src/torch-mlir/build/tools/torch-mlir/python_packages/torch_mlir/torch_mlir_e2e_test/framework.py", line 390, in run_tests
    compile_and_run_test(test, config, verbose)
  File "/home/stella/src/torch-mlir/projects/pt1/e2e_testing/main.py", line 231, in main
    results = run_tests(tests, config, args.sequential, args.verbose)
  File "/home/stella/src/torch-mlir/projects/pt1/e2e_testing/main.py", line 258, in <module>
    main()

Isolated to some faulty pass error handling code fouling things up. I think this is masking a legitimate bug, but it was causing a test that was supposed to just be XFAIL to crash the test framework on a native assert. Added a Python level WAR until I can land a proper fix upstream (which will just issue a warning when diagnostics were dropped).

aartbik · 2024-06-28T17:23:05Z

Thanks for your help with this!

aartbik added 2 commits June 21, 2024 15:17

[torch-mlir] bump to llvm/llvm-project@9b78ddf3b2abfb3e

01abf87

fixes in torch-mlir required for the bump

dd6fcc2

aartbik requested a review from PeimingLiu June 22, 2024 01:49

aartbik added 2 commits June 22, 2024 14:11

Merge branch 'llvm:main' into bik

682ecd1

Merge branch 'llvm:main' into bik

058d930

Max191 mentioned this pull request Jun 27, 2024

Integrate llvm-project @34e34a03ac83b51e90f8788945f9668446e468f8 iree-org/iree#17754

Closed

stellaraccident self-requested a review June 27, 2024 18:09

stellaraccident approved these changes Jun 27, 2024

View reviewed changes

Work around upstream bug about pass error failures.

00552c2

stellaraccident added 2 commits June 27, 2024 19:14

Remove stray print

63926ef

Remove onnx bump

16b6005

stellaraccident merged commit 1f73895 into llvm:main Jun 28, 2024
3 checks passed

aartbik deleted the bik branch June 28, 2024 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[torch-mlir] bump to llvm/llvm-project@9b78ddf3b2abfb3e #3491

[torch-mlir] bump to llvm/llvm-project@9b78ddf3b2abfb3e #3491

aartbik commented Jun 22, 2024 •

edited by stellaraccident

Loading

Max191 commented Jun 27, 2024 •

edited

Loading

PeimingLiu commented Jun 27, 2024

stellaraccident left a comment

stellaraccident commented Jun 27, 2024

stellaraccident commented Jun 28, 2024

stellaraccident commented Jun 28, 2024

aartbik commented Jun 28, 2024

[torch-mlir] bump to llvm/llvm-project@9b78ddf3b2abfb3e #3491

[torch-mlir] bump to llvm/llvm-project@9b78ddf3b2abfb3e #3491

Conversation

aartbik commented Jun 22, 2024 • edited by stellaraccident Loading

Max191 commented Jun 27, 2024 • edited Loading

PeimingLiu commented Jun 27, 2024

stellaraccident left a comment

Choose a reason for hiding this comment

stellaraccident commented Jun 27, 2024

stellaraccident commented Jun 28, 2024

stellaraccident commented Jun 28, 2024

aartbik commented Jun 28, 2024

aartbik commented Jun 22, 2024 •

edited by stellaraccident

Loading

Max191 commented Jun 27, 2024 •

edited

Loading