Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NVIDIA Jetson Platform - singularity run --nv --nvccli can't map in GPU stack: nvidia-container-cli: mount error #1850

Closed
eugeneswalker opened this issue Jul 6, 2023 · 9 comments
Labels
bug Something isn't working needs investigation

Comments

@eugeneswalker
Copy link

eugeneswalker commented Jul 6, 2023

Version of Singularity

$> singularity --version
singularity-ce version 3.11.3

Describe the bug
I'm unable to expose GPU components from the host via --nv --nvcli on NVIDIA's Jetson Orin platform.

$> singularity run --nv --nvccli e4s-base-cuda-aarch64-23.05.sif
INFO:    Setting 'NVIDIA_VISIBLE_DEVICES=all' to emulate legacy GPU binding.
INFO:    Setting --writable-tmpfs (required by nvidia-container-cli)
FATAL:   container creation failed: nvidia-container-cli failed with exit status 1: 
...
src: /usr/lib/aarch64-linux-gnu/tegra/libvulkansc.so, src_lnk: libvulkansc.so.1, dst: /usr/locanvidia-container-cli: mount error: file creation failed: /usr/local/var/singularity/mnt/session/final/dev/nvhost-as-gpu: invalid argument
...

To Reproduce

$> singularity run --nv --nvccli e4s-base-cuda-aarch64-23.05.sif
INFO:    Setting 'NVIDIA_VISIBLE_DEVICES=all' to emulate legacy GPU binding.
INFO:    Setting --writable-tmpfs (required by nvidia-container-cli)
FATAL:   container creation failed: nvidia-container-cli failed with exit status 1: src: /etc/vulkan/icd.d/nvidia_icd.json, src_lnk: /usr/lib/aarch64-linux-gnu/tegra/nvidia_icd.json, dst: /usr/local/var/singularity/mnt/session/final/etc/vulkan/icd.d/nvidia_icd.json, dst_lnk: /usr/lib/aarch64-linux-gnu/tegra/nvidia_icd.json
src: /usr/lib/aarch64-linux-gnu/gbm/nvidia-drm_gbm.so, src_lnk: ../tegra/libnvidia-allocator.so, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/gbm/nvidia-drm_gbm.so, dst_lnk: ../tegra/libnvidia-allocator.so
src: /usr/lib/aarch64-linux-gnu/gbm/tegra_gbm.so, src_lnk: ../tegra/libnvidia-allocator.so, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/gbm/tegra_gbm.so, dst_lnk: ../tegra/libnvidia-allocator.so
src: /usr/lib/aarch64-linux-gnu/gbm/tegra-udrm_gbm.so, src_lnk: ../tegra/libnvidia-allocator.so, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/gbm/tegra-udrm_gbm.so, dst_lnk: ../tegra/libnvidia-allocator.so
src: /usr/lib/aarch64-linux-gnu/libcuda.so, src_lnk: tegra/libcuda.so, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/libcuda.so, dst_lnk: tegra/libcuda.so
src: /usr/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0.1603.99999, src_lnk: tegra/libnvgstreamer-1.0.so, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/libgstreamer-1.0.so.0.1603.99999, dst_lnk: tegra/libnvgstreamer-1.0.so
src: /usr/lib/aarch64-linux-gnu/libnvcucompat.so, src_lnk: tegra/libnvcucompat.so, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/libnvcucompat.so, dst_lnk: tegra/libnvcucompat.so
src: /usr/lib/aarch64-linux-gnu/libv4l2.so.0.0.999999, src_lnk: tegra/libnvv4l2.so, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/libv4l2.so.0.0.999999, dst_lnk: tegra/libnvv4l2.so
src: /usr/lib/aarch64-linux-gnu/libv4lconvert.so.0.0.999999, src_lnk: tegra/libnvv4lconvert.so, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/libv4lconvert.so.0.0.999999, dst_lnk: tegra/libnvv4lconvert.so
src: /usr/lib/aarch64-linux-gnu/libv4l/plugins/nv/libv4l2_nvargus.so, src_lnk: ../../../tegra/libv4l2_nvargus.so, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/libv4l/plugins/nv/libv4l2_nvargus.so, dst_lnk: ../../../tegra/libv4l2_nvargus.so
src: /usr/lib/aarch64-linux-gnu/libv4l/plugins/nv/libv4l2_nvcuvidvideocodec.so, src_lnk: ../../../tegra/libv4l2_nvcuvidvideocodec.so, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/libv4l/plugins/nv/libv4l2_nvcuvidvideocodec.so, dst_lnk: ../../../tegra/libv4l2_nvcuvidvideocodec.so
src: /usr/lib/aarch64-linux-gnu/libv4l/plugins/nv/libv4l2_nvvideocodec.so, src_lnk: ../../../tegra/libv4l2_nvvideocodec.so, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/libv4l/plugins/nv/libv4l2_nvvideocodec.so, dst_lnk: ../../../tegra/libv4l2_nvvideocodec.so
src: /usr/lib/aarch64-linux-gnu/libvulkan.so.1.3.204, src_lnk: tegra/libvulkan.so.1.3.204, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/libvulkan.so.1.3.204, dst_lnk: tegra/libvulkan.so.1.3.204
src: /usr/lib/aarch64-linux-gnu/tegra/libcuda.so, src_lnk: libcuda.so.1.1, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libcuda.so, dst_lnk: libcuda.so.1.1
src: /usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1, src_lnk: libcuda.so.1.1, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libcuda.so.1, dst_lnk: libcuda.so.1.1
src: /usr/lib/aarch64-linux-gnu/tegra/libgstnvdsseimeta.so, src_lnk: libgstnvdsseimeta.so.1.0.0, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libgstnvdsseimeta.so, dst_lnk: libgstnvdsseimeta.so.1.0.0
src: /usr/lib/aarch64-linux-gnu/tegra/libgstreamer-1.0.so.0, src_lnk: libnvgstreamer-1.0.so, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libgstreamer-1.0.so.0, dst_lnk: libnvgstreamer-1.0.so
src: /usr/lib/aarch64-linux-gnu/tegra/libnvbufsurface.so, src_lnk: libnvbufsurface.so.1.0.0, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libnvbufsurface.so, dst_lnk: libnvbufsurface.so.1.0.0
src: /usr/lib/aarch64-linux-gnu/tegra/libnvbufsurftransform.so, src_lnk: libnvbufsurftransform.so.1.0.0, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libnvbufsurftransform.so, dst_lnk: libnvbufsurftransform.so.1.0.0
src: /usr/lib/aarch64-linux-gnu/tegra/libnvbuf_utils.so, src_lnk: libnvbuf_utils.so.1.0.0, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libnvbuf_utils.so, dst_lnk: libnvbuf_utils.so.1.0.0
src: /usr/lib/aarch64-linux-gnu/tegra/libnvdsbufferpool.so, src_lnk: libnvdsbufferpool.so.1.0.0, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libnvdsbufferpool.so, dst_lnk: libnvdsbufferpool.so.1.0.0
src: /usr/lib/aarch64-linux-gnu/tegra/libnvidia-allocator.so, src_lnk: libnvidia-allocator.so.1, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libnvidia-allocator.so, dst_lnk: libnvidia-allocator.so.1
src: /usr/lib/aarch64-linux-gnu/tegra/libnvidia-egl-gbm.so.1, src_lnk: libnvidia-egl-gbm.so, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libnvidia-egl-gbm.so.1, dst_lnk: libnvidia-egl-gbm.so
src: /usr/lib/aarch64-linux-gnu/tegra/libnvidia-egl-wayland.so.1, src_lnk: libnvidia-egl-wayland.so, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libnvidia-egl-wayland.so.1, dst_lnk: libnvidia-egl-wayland.so
src: /usr/lib/aarch64-linux-gnu/tegra/libnvidia-kms.so, src_lnk: libnvidia-kms.so.35.2.1, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libnvidia-kms.so, dst_lnk: libnvidia-kms.so.35.2.1
src: /usr/lib/aarch64-linux-gnu/tegra/libnvidia-nvvm.so.4, src_lnk: libnvidia-nvvm.so.35.2.1, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libnvidia-nvvm.so.4, dst_lnk: libnvidia-nvvm.so.35.2.1
src: /usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.1, src_lnk: libnvidia-ptxjitcompiler.so.35.2.1, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libnvidia-ptxjitcompiler.so.1, dst_lnk: libnvidia-ptxjitcompiler.so.35.2.1
src: /usr/lib/aarch64-linux-gnu/tegra/libnvidia-vksc-core.so, src_lnk: libnvidia-vksc-core.so.35.2.1, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libnvidia-vksc-core.so, dst_lnk: libnvidia-vksc-core.so.35.2.1
src: /usr/lib/aarch64-linux-gnu/tegra/libnvid_mapper.so, src_lnk: libnvid_mapper.so.1.0.0, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libnvid_mapper.so, dst_lnk: libnvid_mapper.so.1.0.0
src: /usr/lib/aarch64-linux-gnu/tegra/libnvscibuf.so, src_lnk: libnvscibuf.so.1, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libnvscibuf.so, dst_lnk: libnvscibuf.so.1
src: /usr/lib/aarch64-linux-gnu/tegra/libnvscicommon.so, src_lnk: libnvscicommon.so.1, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libnvscicommon.so, dst_lnk: libnvscicommon.so.1
src: /usr/lib/aarch64-linux-gnu/tegra/libnvscistream.so, src_lnk: libnvscistream.so.1, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libnvscistream.so, dst_lnk: libnvscistream.so.1
src: /usr/lib/aarch64-linux-gnu/tegra/libnvscisync.so, src_lnk: libnvscisync.so.1, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libnvscisync.so, dst_lnk: libnvscisync.so.1
src: /usr/lib/aarch64-linux-gnu/tegra/libv4l2.so.0, src_lnk: libnvv4l2.so, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libv4l2.so.0, dst_lnk: libnvv4l2.so
src: /usr/lib/aarch64-linux-gnu/tegra/libv4lconvert.so.0, src_lnk: libnvv4lconvert.so, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libv4lconvert.so.0, dst_lnk: libnvv4lconvert.so
src: /usr/lib/aarch64-linux-gnu/tegra/libvulkansc.so, src_lnk: libvulkansc.so.1, dst: /usr/locanvidia-container-cli: mount error: file creation failed: /usr/local/var/singularity/mnt/session/final/dev/nvhost-as-gpu: invalid argument
l/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libvulkansc.so, dst_lnk: libvulkansc.so.1
src: /usr/lib/aarch64-linux-gnu/tegra/libvulkansc.so.1, src_lnk: libvulkansc.so.1.0.10, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libvulkansc.so.1, dst_lnk: libvulkansc.so.1.0.10
src: /usr/lib/aarch64-linux-gnu/tegra/libvulkan.so.1, src_lnk: libvulkan.so.1.3.204, dst: /usr/local/var/singularity/mnt/session/final/usr/lib/aarch64-linux-gnu/tegra/libvulkan.so.1, dst_lnk: libvulkan.so.1.3.204
src: /usr/share/glvnd/egl_vendor.d/10_nvidia.json, src_lnk: ../../../lib/aarch64-linux-gnu/tegra-egl/nvidia.json, dst: /usr/local/var/singularity/mnt/session/final/usr/share/glvnd/egl_vendor.d/10_nvidia.json, dst_lnk: ../../../lib/aarch64-linux-gnu/tegra-egl/nvidia.json

Expected behavior
I would expect to be able to run singularity run --nv --nvccli ... without error, and be able to access GPU from within the container.

OS / Linux Distribution

$ cat /etc/nv_tegra_release
# R35 (release), REVISION: 2.1, GCID: 32413640, BOARD: t186ref, EABI: aarch64, DATE: Tue Jan 24 23:38:33 UTC 2023

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

Installation Method
Installed from source according to instructions here

Additional context
nvidia-container-toolkit version is 1.13.1

FYI @dtrudg

@eugeneswalker eugeneswalker added the bug Something isn't working label Jul 6, 2023
@tri-adam
Copy link
Member

In case it's helpful, adding some output from nvidia-container-cli on another Jetson Orin Nano:

$ nvidia-container-cli list 
$ nvidia-container-cli info
NVRM version:   (null)
CUDA version:   11.4

Device Index:   0
Device Minor:   0
Model:          Orin
Brand:          (null)
GPU UUID:       (null)
Bus Location:   (null)
Architecture:   8.7
$ cat /etc/nv_tegra_release 
# R35 (release), REVISION: 3.1, GCID: 32827747, BOARD: t186ref, EABI: aarch64, DATE: Sun Mar 19 15:19:21 UTC 2023

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

@dtrudg
Copy link
Member

dtrudg commented Jul 17, 2023

Thanks @tri-adam - it's somewhat concerning here that nvidia-container-cli reports a 0 device index / minor, and null information.

There's not going to be a workaround other than getting a chunk of time set aside with access to the nano, to see what is going on.

@eugeneswalker
Copy link
Author

Thanks @tri-adam - it's somewhat concerning here that nvidia-container-cli reports a 0 device index / minor, and null information.

Should I raise this on the GitHub issues page for nvidia-container-toolkit?

@dtrudg
Copy link
Member

dtrudg commented Aug 10, 2023

@eugeneswalker you may wish to. Apologies we haven't gotten to this yet. I anticipate having some time to spend on issues like this in 2 weeks, after the 4.0.0 release candidate is ready.

@eugeneswalker
Copy link
Author

Thanks @tri-adam - it's somewhat concerning here that nvidia-container-cli reports a 0 device index / minor, and null information.

Asked here:

@eugeneswalker
Copy link
Author

Repeating what @elezar said on the linked issue:

As of the NVIDIA Container Toolkit 1.10.0, the NVIDIA Container CLI is no longer used on Tegra-based systems. Instead the NVIDIA Container Runtime is used to make modifications to the incoming OCI Runtime Specifications directly on these systems.

Since you are using Singularity, would using the upcoming --oci mode be relevant to you?

I have to defer to @dtrudg or other Singularity maintainer on the question of whether --oci mode would help!

@dtrudg
Copy link
Member

dtrudg commented Jun 14, 2024

Given the deprecation of nvidia-container-cli for Tegra based systems indicated above, we aren't going to be able to have this working with --nv --nvccli going forward.

I don't have access to Jetson hardware, but a report of whether a CDI config used with the --oci mode CDI support would be welcome.

Jetson support for native more (without --oci) would depend on #1395 - so it'd be appropriate to add a comment there if it's important to you.

@dtrudg dtrudg closed this as not planned Won't fix, can't repro, duplicate, stale Jun 14, 2024
@elezar
Copy link
Contributor

elezar commented Jun 14, 2024

The nvidia-ctk cdi generate command should detect that a Tegra-based system is being used and generate an applicable CDI specification. If the mode detection fails for some reason, one should be able to run nvidia-ctk cdi generate --mode=csv.

Please see https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html and let us know if any information is missing.

@eugeneswalker
Copy link
Author

Thank you for your help @dtrudg @elezar !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs investigation
Projects
None yet
Development

No branches or pull requests

4 participants