Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No loop devices available for multiple singularity shells in specific kernel #1499

Closed
LarsRhijns opened this issue Mar 31, 2023 · 10 comments · Fixed by #1504
Closed

No loop devices available for multiple singularity shells in specific kernel #1499

LarsRhijns opened this issue Mar 31, 2023 · 10 comments · Fixed by #1504
Assignees

Comments

@LarsRhijns
Copy link

LarsRhijns commented Mar 31, 2023

Version of Singularity
3.10.5-focal

Describe the bug
Since a few days I have issues running a second (or more) singularity shells. The first launches as it always did before, from the second one and onwards I get the following error message:

FATAL:   container creation failed: mount /proc/self/fd/3->/var/lib/singularity/mnt/session/rootfs error: while mounting image /proc/self/fd/3: failed to find loop device: could not attach image file to loop device: no loop devices available

After a lot of searching online I somehow found the workaround by running $ sudo losetup -f in a new terminal, to somehow find a loop device which as to my knowledge was already free. After running this command I can run a new parallel singularity shell, but only in this particular terminal. Trying to launch another requires the losetup command to be ran as well.

After some more searching I noticed it was dependent on my kernel version: 5.15.0-69-generic. Rolling back to 5.15.0-46-generic while booting fixes the issue completely.

To Reproduce
I've tried many singularity container files, it does not seem to matter as far as I know. Simply be running either singularity 3.10.2 or 3.10.5 (maybe other versions as well) on kernel version 5.15.0-69-generic and launching with $ singularity shell -p <any .sif/.simg> will create the error message above.

Expected behavior
Expected is for the command $ singularity shell -p <any .sif/.simg> to have no issues with multiple terminals running more singularity shells.

OS / Linux Distribution

NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

(I also had the issue on a 22.04 partition, but already removed the partition in the meantime...)

Installation Method
I installed Singularity from here or here using the focal and jammy .deb downloads, and running dpkg to install followed by a $ sudo apt install -f.

Additional context
This issue seems to be kernel-version dependent. Probably more other kernel versions will solve this issue. For me the kernel version that causes this issue is 5.15.0-69-generic. Peers working with similar setups using 5.15.0-67-generic seem to be not having this issue already, so downgrading all the way to 5.15.0-46-generic seems not necessary but was the only one I specifically still had installed.

@LarsRhijns LarsRhijns added the bug Something isn't working label Mar 31, 2023
@LarsRhijns LarsRhijns changed the title No loop devices available for multiple Singularity Shell in specific kernel No loop devices available for multiple singularity shells in specific kernel Mar 31, 2023
@lunjohnzhang
Copy link

I am having exactly the same error with singularity-ce version 3.11.1-focal on Ubuntu 20.04. For me downgrading the kernel version to 5.15.0-67-generic resolves the issue.

@marioney
Copy link

marioney commented Apr 2, 2023

This seems to be related to apptainer/apptainer#1258

The proposed workaround there is:

edited singularity.conf and allowed the sharing of loop devices.

@dtrudg
Copy link
Member

dtrudg commented Apr 2, 2023

This has clearly been localized to the Ubuntu 20.04 linux-generic kernel updates corresponding to https://ubuntu.com/security/notices/USN-5982-1

However, it isn't clear to me how the updates listed there would be relevant to loop device calls. It's not trivial to see if other code that may be relevant, but not security-related, was modified, so this will take some time to investigate further.

If you observe the issue on any distribution other than Ubuntu 20.04, please give detail of the kernel version, distro version etc. Thank you.

@dtrudg dtrudg added needs investigation and removed bug Something isn't working labels Apr 2, 2023
@dtrudg dtrudg self-assigned this Apr 2, 2023
@dtrudg
Copy link
Member

dtrudg commented Apr 3, 2023

Confirmed on a VM running 5.15.0-69-generic and not an issue on 5.15.0-67-generic.

This appears to be related to a non-security related patch brought in for this update:

  • loop: Fix the max_loop commandline argument treatment when it is set to 0

https://kernel.ubuntu.com/git/ubuntu/ubuntu-focal.git/commit/?h=hwe-5.15-next&id=8d5c96dd3c7ed64e9beaab0d47766e4c8a18d66a

It appears that the patch means that the kernel now sets max_loop to the kernel config value of CONFIG_BLK_DEV_LOOP_MIN_COUNT, unless max_loop is specified as an option. CONFIG_BLK_DEV_LOOP_MIN_COUNT is set to 8 on Ubuntu. By default Ubuntu will consume several loop devices for snaps, so a 2nd singularity container run will hit the limit. This appears as though it might be a bug, but perhaps it is intentional behaviour for the upstream kernel patch?

$ grep DEV_LOOP /boot/config-5.15.0-69-generic 
CONFIG_BLK_DEV_LOOP=y
CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
CONFIG_AUFS_BDEV_LOOP=y

Solution

On this kernel, you will need to configure max_loop for the loop driver to a higher value. Suggest setting it to the same as max loop devices in singularity.conf, which is 256 by default.

You must do this with a kernel CMDLINE entry, because loop is compiled into the kernel on Ubuntu. It is not a module.

  1. Add max_loop=256 to the GRUB_CMDLINE_LINUX value in /etc/default/grub
  2. Run update-grub2 as root.
  3. Reboot the system.
  4. cat /proc/cmdline and verify max_loop=256 is present.

After this, Singularity should work as normal.

@dtrudg
Copy link
Member

dtrudg commented Apr 3, 2023

It appears the issue can probably be avoided by using /dev/loop-control rather than mknod and opening new loop devices.... snaps don't hit the limit.

It still seems to be a bug, however, as the commented code still states:

Loop devices can be requested on-demand with the /dev/loop-control interface, or be instantiated by accessing a 'dead' device node

https://lore.kernel.org/lkml/20221208200605.756287-1-isaacmanjarres@google.com/T/

@LarsRhijns
Copy link
Author

Solution

On this kernel, you will need to configure max_loop for the loop driver to a higher value. Suggest setting it to the same as max loop devices in singularity.conf, which is 256 by default.

You must do this with a kernel CMDLINE entry, because loop is compiled into the kernel on Ubuntu. It is not a module.

1. Add `max_loop=256` to the `GRUB_CMDLINE_LINUX` value in `/etc/default/grub`

2. Run `update-grub2` as root.

3. Reboot the system.

4. `cat /proc/cmdline` and verify `max_loop=256` is present.

After this, Singularity should work as normal.

This indeed makes kernel version 5.15.0.69-generic work as expected again (on my end as well). Thanks!

If there is anything you need from my side to help solve the root of the issue, let me know!

@nevesLiliane
Copy link

nevesLiliane commented Apr 3, 2023

This seems to be related to apptainer/apptainer#1258

The proposed workaround there is:

edited singularity.conf and allowed the sharing of loop devices.

Author of apptainer/apptainer#1258 here. This "solution" worked for a very brief time.

@dtrudg
Copy link
Member

dtrudg commented Apr 3, 2023

This seems to be related to apptainer/apptainer#1258
The proposed workaround there is:

edited singularity.conf and allowed the sharing of loop devices.

Author of apptainer/apptainer#1258 here. This "solution" worked for a very brief time.

Setting shared loop devices won't help a lot... it'll just allow you to run more copies of the same container. You need to ensure max_loop is set for a proper fix here.

dtrudg added a commit to dtrudg/singularity that referenced this issue Apr 4, 2023
Perform loop device creation via the LOOP_CTL_ADD ioctl against
/dev/loop-control.

This avoids hitting an error in the latest kernels that include a
change in handling of loop device creation when max_loop is unset.

Our target distros now all provide /dev/loop-control, so we are
free to make this change.

Also contains minor fixes to correct an error/debug message, and add
an earlier continue in case of failure to open loop device in
attachLoop.

Fixes sylabs#1499
dtrudg added a commit to dtrudg/singularity that referenced this issue Apr 4, 2023
Perform loop device creation via the LOOP_CTL_ADD ioctl against
/dev/loop-control.

This avoids hitting an error in the latest kernels that include a
change in handling of loop device creation when max_loop is unset.

Our target distros now all provide /dev/loop-control, so we are
free to make this change.

Also contains minor fixes to correct an error/debug message, and add
an earlier continue in case of failure to open loop device in
attachLoop.

Fixes sylabs#1499
dtrudg added a commit to dtrudg/singularity that referenced this issue Apr 4, 2023
Perform loop device creation via the LOOP_CTL_ADD ioctl against
/dev/loop-control.

This avoids hitting an error in the latest kernels that include a
change in handling of loop device creation when max_loop is unset.

Our target distros now all provide /dev/loop-control, so we are
free to make this change.

Also contains minor fixes to correct an error/debug message, and add
an earlier continue in case of failure to open loop device in
attachLoop.

Fixes sylabs#1499
dtrudg added a commit to dtrudg/singularity that referenced this issue Apr 6, 2023
Perform loop device creation via the LOOP_CTL_ADD ioctl against
/dev/loop-control.

This avoids hitting an error in the latest kernels that include a
change in handling of loop device creation when max_loop is unset.

Our target distros now all provide /dev/loop-control, so we are
free to make this change.

Also contains minor fixes to correct an error/debug message, and add
an earlier continue in case of failure to open loop device in
attachLoop.

Fixes sylabs#1499
dtrudg added a commit to dtrudg/singularity that referenced this issue Apr 12, 2023
Perform loop device creation via the LOOP_CTL_ADD ioctl against
/dev/loop-control.

This avoids hitting an error in the latest kernels that include a
change in handling of loop device creation when max_loop is unset.

Our target distros now all provide /dev/loop-control, so we are
free to make this change.

Also contains minor fixes to correct an error/debug message, and add
an earlier continue in case of failure to open loop device in
attachLoop.

Fixes sylabs#1499
@dtrudg
Copy link
Member

dtrudg commented Apr 12, 2023

The issue was raised with Ubuntu by others here: https://bugs.launchpad.net/ubuntu/+source/linux-hwe-5.15/+bug/2013086

@ptrebert
Copy link

Just in case somebody else stumbles upon this, this solution is working on 5.15.0-70-generic:
#1499 (comment)

Updating Singularity to 3.11.x does not fix the issue, only the above solution works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants