Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate torch-mlir@ec6d7aa onnx.resize op #17358

Merged
merged 1 commit into from
May 14, 2024

Conversation

AmosLewis
Copy link
Contributor

@AmosLewis AmosLewis commented May 12, 2024

Unsloved iree issue:
ONNX "resize" op test failures #17345

One torch-mlir commit before:
Integrate both llvm-project@2083e97e (+1 ↩️, +1 🍒) and torch-mlir@bce800a3 #17330

Discord discussion:
https://discord.com/channels/973663919757492264/1238540944383541330

related torch-mlir onnx.resize patch: llvm/torch-mlir#3013, author: https://github.com/aldesilv

@ScottTodd
Copy link
Member

Please follow https://iree.dev/developers/general/contributing/#obtaining-commit-access to get at least triage access to this repository so workflows can run without approval.

image

@ScottTodd
Copy link
Member

Updating the XFAIL lists here is going to be a bit bumpy, since I've had to turn off the main runners used: #17370 and there is a new CUDA hang.

Can you at least sync this PR to include the newly disabled jobs?

@ScottTodd
Copy link
Member

Pushed a commit syncing this PR after a few of my fixes to the CI landed. Hopefully that should show the new test outcomes (passes/failures) and timeouts. We'll have to update the ROCm tests later - once the w7900 runner is back online and stable.

@ScottTodd
Copy link
Member

Ok, the tests that hang can be spotted easily now.
Logs from this PR: https://github.com/iree-org/iree/actions/runs/9071387883/job/24925163523?pr=17358#step:9:3466

Note the Failed: Timeout >30.0s lines:

PASSED SHARK-TestSuite/iree_tests/onnx/node/generated/test_xor_bcast4v4d/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_downsample_scales_linear/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_downsample_scales_linear_align_corners/model.mlir::gpu_cuda_t4_test - Failed: Timeout >30.0s
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_downsample_scales_linear_half_pixel_symmetric/model.mlir::gpu_cuda_t4_test - Failed: Timeout >30.0s
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_downsample_sizes_linear_pytorch_half_pixel/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_upsample_scales_linear/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_upsample_scales_linear_align_corners/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_upsample_scales_linear_half_pixel_symmetric/model.mlir::gpu_cuda_t4_test
FAILED SHARK-TestSuite/iree_tests/onnx/node/generated/test_resize_upsample_sizes_nearest_floor_align_corners/model.mlir::gpu_cuda_t4_test
============ 8 failed, 581 passed, 643 xfailed in 292.09s (0:04:52) ============

So to update xfail lists:

  1. Download .json files from the summary page: https://github.com/iree-org/iree/actions/runs/9071387883?pr=17358
  2. Move those files to https://github.com/iree-org/iree/tree/main/build_tools/pkgci/external_test_suite
  3. ** new ** edit those files manually (just the CUDA one in this case), putting any of the timeout tests in skip_run_tests, not expected_run_failures

 - Update skip_run_tests for onnx_gpu_cuda.json to fix cuda resize tests hang
 - Mark most of the cpu resize tests xfail
@AmosLewis AmosLewis marked this pull request as ready for review May 14, 2024 20:04
@AmosLewis AmosLewis requested a review from ScottTodd as a code owner May 14, 2024 20:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants