Skip to content

[llvm][tools] Improve llvm-gpu-loader checks#184791

Merged
adurang merged 2 commits intollvm:mainfrom
adurang:gpu-loader-checks
Mar 9, 2026
Merged

[llvm][tools] Improve llvm-gpu-loader checks#184791
adurang merged 2 commits intollvm:mainfrom
adurang:gpu-loader-checks

Conversation

@adurang
Copy link
Copy Markdown
Contributor

@adurang adurang commented Mar 5, 2026

When the file format is incorrect, or the platform or devices are not properly initialized llvm-gpu-loader follows corrupt pointers which result in hard to debug crashes.

This improves the checks to avoid such situations.

@adurang adurang requested a review from jhuber6 March 5, 2026 13:17
}
InitArgs.NumPlatforms = 1;
InitArgs.Platforms = &Backend;
} else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an intentional fall-through. When we can't determine it from the ELF flags we let olInit initialize every architecture.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see the benefit of doing that. The only other format we support right now is the bitcode and we could add a check here. I think it's better to error here with a more meaningful message than later with a "no device found" error which was due to an unsupported format.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an intentional fall-through. When we can't determine it from the ELF flag we let olInit initialize every plugin. If it fails to find a compatible device that will fail later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, but by the same token shouldn't we let fall-through ELFs that are not EM_AMDGPU or EM_CUDA instead of erring out?

Comment thread llvm/tools/llvm-gpu-loader/llvm-gpu-loader.cpp Outdated
@adurang adurang merged commit 3e3c3ab into llvm:main Mar 9, 2026
10 checks passed
@llvm-ci
Copy link
Copy Markdown

llvm-ci commented Mar 9, 2026

LLVM Buildbot has detected a new failure on builder ppc64le-mlir-rhel-clang running on ppc64le-mlir-rhel-test while building llvm at step 3 "clean-build-dir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/129/builds/40418

Here is the relevant piece of the build log for the reference
Step 3 (clean-build-dir) failure: Delete failed. (failure) (timed out)
Step 6 (test-build-check-mlir-build-only-check-mlir) failure: 1200 seconds without output running [b'ninja', b'check-mlir'], attempting to kill
...
PASS: MLIR :: Pass/pipeline-options-parsing.mlir (3932 of 3942)
PASS: MLIR-Unit :: IR/./MLIRIRTests/0/142 (3933 of 3942)
PASS: MLIR-Unit :: Interfaces/./MLIRInterfacesTests/11/22 (3934 of 3942)
PASS: MLIR :: mlir-reduce/simple-test.mlir (3935 of 3942)
PASS: MLIR-Unit :: Interfaces/./MLIRInterfacesTests/13/22 (3936 of 3942)
PASS: MLIR-Unit :: IR/./MLIRIRTests/37/142 (3937 of 3942)
PASS: MLIR-Unit :: Interfaces/./MLIRInterfacesTests/12/22 (3938 of 3942)
PASS: MLIR-Unit :: Pass/./MLIRPassTests/11/14 (3939 of 3942)
PASS: MLIR-Unit :: IR/./MLIRIRTests/38/142 (3940 of 3942)
PASS: MLIR :: mlir-reduce/dce-test.mlir (3941 of 3942)
command timed out: 1200 seconds without output running [b'ninja', b'check-mlir'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=2342.169189

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants