-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error running singularity fatal error involving no loop devices available #1824
Comments
Please upgrade to a current supported release (3.11.4). It is likely that the issue is related to a kernel change (backported to LTS kernels / distro kernels this year), for which a fix has already been implemented. A workaround is mentioned in the original issue thread: If upgrading to 3.11.4 does not fix the issue, you're welcome to re-open. Thanks. |
Hi @dtrudg |
Hi @dtrudg , you can consider this issue closed as it did not relate to singularity per say but a regression in the particular linux kernel I was using. With an update to the boot parameters, I am no longer encountering the issue. Cheers |
It's concerning that the update to 3.11.4 did not address the issue... while changing the boot parameter did. The code path through 3.11.4 was changed and tested (including on SLES) to avoid the need to change the boot parameter. Though, it is, of course possible that the Cray specific kernel is setting different default max loop devices etc. I appreciate you have a working solution... but if you are able to detail how you updated to 3.11.4 and tested it, that would be useful to ensure it's not an issue for others. Thanks! |
Version of Singularity
3.10.3
Describe the bug
The fatal error reported when running with
singularity --verbose
occurs between the default mount and checking for template passwd. The message isThis occurs on a HPE Cray EX system, specifically a AMD MI250x node (64-core Trento CPU, 4 AMD MI250x GPU cards) and only when running a job where there is more than one MPI process per node. That is a job running
will fail. But if the request is for
will succeed.
To Reproduce
I know this error is not easily reproducible since it would require access to a HPE Cray EX system and specifically the images running on our GPU nodes. So I will not provide details of how to reproduce. I am posting here to get some guidance on how to interpret the error message.
Expected behavior
It should just run without generating this fatal error.
OS / Linux Distribution
Which Linux distribution are you using?
Installation Method
spack install singularityce@3.10.3
with minor edit to disableconmon
.The text was updated successfully, but these errors were encountered: