Integrate the NVIDIA container toolkit #52

ehfd · 2023-11-18T15:24:39Z

This is an issue that has been spun off from the Discord channel.

@Murazaki : It might be good to find a better workflow for providing drivers to Wolf.
On Debian, drivers are pretty old in the main stable repo, and updated ones can be found on CUDA drivers, but do not exactly match manual installation ones.

@ABeltramo : I guess I should go back to look into the Nvidia Docker Toolkit for people that would like to use that
I agree though, it's a bit of a pain point at the moment

@Murazaki : Cuda drivers repo :
https://developer.download.nvidia.com/compute/cuda/repos/

Linux manual installer :
https://download.nvidia.com/XFree86/Linux-x86_64/

right now, latest in cuda packaged installs is 545.23.08.
It doesn't exist as a manual installer.
that breaks the dockerfile and renders wolf unusable
I wanted to make a docker image for debian packages install, but it uses apt-add-repository which is installing a bunch of supplementary stuff
Here it is for Debian Bookworm :
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Debian&target_version=12&target_type=deb_network

More thorough installation steps here :
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

@juliosueiras : There is one problem though, nvidia driver toolkit doesn’t inject driver in the container and still require a driver installed in the container image itself,

And here, I start with what interventions I made with NVIDIA for the last three years to not require the NVIDIA drivers to run Wayland inside the NVIDIA container toolkit.

What NVIDIA container toolkit does: it's pretty simple. It injects (1) kernel devices, and (2) userspace libraries, into a container. (1) and (2) compose a subset of the driver.

(1) kernel devices: /dev/nvidiaN, /dev/nvidiactl, /dev/nvidia-modeset, /dev/nvidia-uvm, and /dev/nvidia-uvm-tools. In addition, /dev/dri/cardX and /dev/dri/renderDY, where N, X, and Y depend on the GPU the container toolkit provisions. The /dev/dri devices were added with NVIDIA/libnvidia-container#118.

(2) userspace libraries:
OpenGL libraries including EGL: '/usr/lib/libGL.so.1', '/usr/lib/libEGL.so.1', '/usr/lib/libGLESv1_CM.so.525.78.01', '/usr/lib/libGLESv2.so.525.78.01', '/usr/lib/libEGL_nvidia.so.0', '/usr/lib/libOpenGL.so.0', '/usr/lib/libGLX.so.0', and '/usr/lib/libGLdispatch.so.0', '/usr/lib/libnvidia-tls.so.525.78.01'

Vulkan libraries: '/usr/lib/libGLX_nvidia.so.0' and the configuration '/etc/vulkan/icd.d/nvidia_icd.json'

EGLStreams-Wayland and GBM-Wayland libraries: '/usr/lib/libnvidia-egl-wayland.so.1' and the config '/usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json' '/usr/lib/libnvidia-egl-gbm.so.1' and the config '/usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json'

NVENC libraries: /usr/lib/libnvidia-encode.so.525.78.01, which depends on /usr/lib/libnvcuvid.so.525.78.01, which depends on /usr/lib/x86_64-linux-gnu/libcuda.so.1

VDPAU libraries: /usr/lib/vdpau/libvdpau_nvidia.so.525.78.01
NVFBC libraries: /usr/lib/libnvidia-fbc.so.525.78.01
OPTIX libraries: /usr/lib/libnvoptix.so.1

Not very relevant but of note, perhaps for XWayland: NVIDIA X.Org driver: /usr/lib/xorg/modules/drivers/nvidia_drv.so, NVIDIA X.org GLX driver: /usr/lib/xorg/modules/extensions/libglxserver_nvidia.so.525.78.01

In many cases, things don't work because the below configuration files are absent inside the container. Without these, applications inside the container don't know which library to call (what each file does is self-explanatory):

The contents of /usr/share/glvnd/egl_vendor.d/10_nvidia.json:

{
    "file_format_version" : "1.0.0",
    "ICD" : {
        "library_path" : "libEGL_nvidia.so.0"
    }
}

The contents of /etc/vulkan/icd.d/nvidia_icd.json (note that api_version is variable based on the Driver version):

{
    "file_format_version" : "1.0.0",
    "ICD": {
        "library_path": "libGLX_nvidia.so.0",
        "api_version" : "1.3.205"
    }
}

The contents of /etc/OpenCL/vendors/nvidia.icd:

libnvidia-opencl.so.1

The contents of /usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json:

{
        "file_format_version" : "1.0.0",
        "ICD" : {
                "library_path" : "libnvidia-egl-gbm.so.1"
        }
}

The contents of /usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json:

{
    "file_format_version" : "1.0.0",
    "ICD" : {
        "library_path" : "libnvidia-egl-wayland.so.1"
    }
}

I'm pretty sure that now (was different a few months ago), the newest NVIDIA container toolkit provisions all of the required libraries plus the json configurations for Wayland (not for X11 but you don't have to care).
If only the json configurations are absent, it's trivial to manually add the above template.

About GStreamer:
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/3108

Now, it is correct that NVENC does require CUDA. But that doesn't mean that it requires the whole CUDA Toolkit (separate from the CUDA drivers). The CUDA drivers are the four following libraries installed with the display drivers, independent of the CUDA Toolkit: libcuda.so, libnvidia-ptxjitcompiler.so, libnvidia-nvvm.so, libcudadebugger.so

These versions go with the display drivers, and are all injected into the container by the NVIDIA container toolkit.

GStreamer 1.22 and before in nvcodec requires just two files of the CUDA Toolkit: libnvrtc.so and libnvrtc-bulletins.so. This can be installed from the network repository like the current approach, or be extracted from a PyPi package:

# Extract NVRTC dependency, https://developer.download.nvidia.com/compute/cuda/redist/cuda_nvrtc/LICENSE.txt
cd /tmp && curl -fsSL -o nvidia_cuda_nvrtc_linux_x86_64.whl "https://developer.download.nvidia.com/compute/redist/nvidia-cuda-nvrtc/nvidia_cuda_nvrtc-11.0.221-cp36-cp36m-linux_x86_64.whl" && unzip -joq -d ./nvrtc nvidia_cuda_nvrtc_linux_x86_64.whl && cd nvrtc && chmod 755 libnvrtc* && find . -maxdepth 1 -type f -name "*libnvrtc.so.*" -exec sh -c 'ln -snf $(basename {}) libnvrtc.so' \; && mv -f libnvrtc* /opt/gstreamer/lib/x86_64-linux-gnu/ && cd /tmp && rm -rf /tmp/*

One thing to note here is that libnvrtc.so is not minor version compatible with CUDA. Thus, it will error on any display driver version older than its corresponding display driver version. However, backwards compatibility always works. Thus, it is a good idea to use the oldest possible libnvrtc.so version.

Display CUDA
545 - 12.3
535 - 12.2
530 - 12.1
525 - 12.0
520 - 11.8
515 - 11.7
(and so on...)

https://docs.nvidia.com/deploy/cuda-compatibility/

So, I have moderate to high confidence that if you guys try the newest NVIDIA container toolkit again, you won't need to install the drivers, assuming that you ensure the json files are present or written.

Environment variables that currently work for me:

RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf && \
    echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf
# Expose NVIDIA libraries and paths
ENV PATH /usr/local/nvidia/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/lib/x86_64-linux-gnu:/usr/lib/i386-linux-gnu${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
# Make all NVIDIA GPUs visible by default
ENV NVIDIA_VISIBLE_DEVICES all
# All NVIDIA driver capabilities should preferably be used, check `NVIDIA_DRIVER_CAPABILITIES` inside the container if things do not work
ENV NVIDIA_DRIVER_CAPABILITIES all
# Disable VSYNC for NVIDIA GPUs
ENV __GL_SYNC_TO_VBLANK 0

The text was updated successfully, but these errors were encountered:

ABeltramo · 2023-11-25T08:51:58Z

TLDR: as of the latest Nvidia Container Toolkit (1.14.3-1) unfortunately this is still not possible.

What's the issue?

With the latest versions I can run both Wolf and the Gstreamer pipeline just by running the container with --gpus=all unfortunately for some apps this is still missing some important library.
Gamescope to run seems to require the following additional libraries that aren't provided by the toolkit:

libnvidia-egl-gbm.so.1
libnvidia-egl-wayland.so.1

libnvidia-vulkan-producer.so

gbm/nvidia-drm_gbm.so

Last one seems to just be a simlink to libnvidia-allocator.so.1 which is already present so that might be fine.

Now this is running from an X11 host and I can see that those additional libraries aren't present in my host system:

ls -la /usr/lib/x86_64-linux-gnu/libnv
libnvcuvid.so@                         libnvidia-container.so.1@              libnvidia-glvkspirv.so.530.30.02       libnvidia-opticalflow.so.1@
libnvcuvid.so.1@                       libnvidia-container.so.1.14.3*         libnvidia-ml.so.1@                     libnvidia-ptxjitcompiler.so.1@
libnvidia-allocator.so.1@              libnvidia-eglcore.so.530.30.02         libnvidia-ngx.so.1@                    libnvidia-rtcore.so.530.30.02
libnvidia-cfg.so.1@                    libnvidia-encode.so.1@                 libnvidia-ngx.so.530.30.02             libnvidia-tls.so.530.30.02
libnvidia-compiler.so.530.30.02        libnvidia-fbc.so.1@                    libnvidia-nvvm.so@                     libnvidia-wayland-client.so.530.30.02
libnvidia-container-go.so.1@           libnvidia-glcore.so.530.30.02          libnvidia-nvvm.so.4@                   libnvoptix.so.1@
libnvidia-container-go.so.1.14.3       libnvidia-glsi.so.530.30.02            libnvidia-opencl.so.1@

Anyone that can confirm the output of

docker run --rm -it --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all  -e NVIDIA_DRIVER_CAPABILITIES=all ubuntu ls /usr/lib/x86_64-linux-gnu/libnvidia*
/usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.1	   /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.530.30.02     /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.4
/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.1		   /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.530.30.02       /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.530.30.02  /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.530.30.02  /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.530.30.02   /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1		       /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1		   /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.1		       /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.530.30.02
/usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.1		   /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.530.30.02        /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.530.30.02

on a Nvidia Wayland host?

What can we do better?

I think we should keep manually downloading and linking the drivers like we are doing at the moment. We should probably add a proper check for mismatch between the downloaded drivers and the host installed drivers either on startup of the containers (somewhere in the base-app) or on startup of Wolf.

ehfd · 2024-01-09T21:39:48Z

@ABeltramo One comment I have here is that there isn't really a concept called an "Xorg host" and a "Wayland host". It depends on what the desktop environment and login manager uses, and by default, all drivers bundle both libraries.

We will discuss more in NVIDIA/libnvidia-container#118.

ohayak · 2024-02-12T13:06:13Z

Hi,
I've been working on a project using Unreal Engine Pixel Streaming to stream games at scale with kubernetes. Packaging docker images with the right nvidia docker drivers is a pain. I invite you to take a look at the work done by Adam He used nvidia/cuda image as base to create a set of images that with various configurations:
22.04-vulkan: Ubuntu 22.04 + OpenGL + Vulkan + PulseAudio Client + PulseAudio Server
22.04-cudagl11: Ubuntu 22.04 + OpenGL + Vulkan + CUDA 11.8.0 + PulseAudio Client + PulseAudio Server
22.04-cudagl12: Ubuntu 22.04 + OpenGL + Vulkan + CUDA 12.2.0 + PulseAudio Client + PulseAudio Server
22.04-vulkan-noaudio: Ubuntu 22.04 + OpenGL + Vulkan (no audio support)
22.04-cudagl11-noaudio: Ubuntu 22.04 + OpenGL + Vulkan + CUDA 11.8.0 (no audio support)
22.04-cudagl12-noaudio: Ubuntu 22.04 + OpenGL + Vulkan + CUDA 12.2.0 (no audio support)
22.04-vulkan-hostaudio: Ubuntu 22.04 + OpenGL + Vulkan + PulseAudio Client (uses host PulseAudio Server)
22.04-cudagl11-hostaudio: Ubuntu 22.04 + OpenGL + Vulkan + CUDA 11.8.0 + PulseAudio Client (uses host PulseAudio Server)
22.04-cudagl12-hostaudio: Ubuntu 22.04 + OpenGL + Vulkan + CUDA 12.2.0 + PulseAudio Client (uses host PulseAudio Server)
22.04-vulkan-x11: Ubuntu 22.04 + OpenGL + Vulkan + PulseAudio Client (uses host PulseAudio Server) + X11
22.04-cudagl11-x11: Ubuntu 22.04 + OpenGL + Vulkan + CUDA 11.8.0 + PulseAudio Client (uses host PulseAudio Server) + X11
22.04-cudagl12-x11: Ubuntu 22.04 + OpenGL + Vulkan + CUDA 12.2.0 + PulseAudio Client (uses host PulseAudio Server) + X11
I think you should check the docker files

Murazaki · 2024-02-12T15:49:19Z

Hi, I've been working on a project using Unreal Engine Pixel Streaming to stream games at scale with kubernetes. Packaging docker images with the right nvidia docker drivers is a pain. I invite you to take a look at the work done by Adam He used nvidia/cuda image as base to create a set of images that with various configurations
[...]
I think you should check the docker files

Could you provide a link to that/those Dockerfiles maybe ?

Murazaki · 2024-02-15T22:24:52Z

Oh sorry I actually know what they're talking about : it's adamrehn/ue4-runtime. I think I wrote an answer and forgot to send ^^

https://github.com/adamrehn/ue4-runtime

ABeltramo · 2024-02-15T22:49:40Z

If those are the images that @ohayak was talking about, unfortunately, there's nothing there that can help us.
As I explained in a comment above, the nvidia toolkit is not mounting all libraries that are needed in order to spawn our custom Wayland compositor. There are only a few possible solutions that I can think of:

Ask users to install the dependencies in a volume that will be mounted and shared with the apps (the current solution in Wolf)
Ask users to manually install the libraries if missing from the host and then fill out all the correct mounts
Try to fill a patch upstream with the nvidia driver toolkit project and hope that it gets merged

I'm very open to suggestions or alternative solutions!

ehfd · 2024-02-25T08:41:11Z

modprobe -r nvidia_drm ; modprobe nvidia_drm modeset=1

Our experience was related to the above kernel module.
Also, we have reinstalled the OS and installed everything from scratch, then things started to work again.
One more possibility is the lack of DKMS (required to build relevant NVIDIA kernel modules) or a kernel version upgrade without rebuilding the NVIDIA modules.

ehfd · 2024-03-28T09:41:26Z

The core issue is whether the EGL Wayland library is installed or not, likely not the container toolkit.

This is available if you use the .run file.
https://download.nvidia.com/XFree86/Linux-x86_64/550.67/README/installedcomponents.html

I don't think the Debian/Ubuntu PPA repositories install the Wayland components automatically.
If so, the following components need to be installed. This is sufficient and contains all missing Wayland files.

libnvidia-egl-gbm1
libnvidia-egl-wayland1

But, solution: install the .run file.

https://community.kde.org/Plasma/Wayland/Nvidia

tux-rampage · 2024-04-03T14:02:36Z

Hi,

I am currently trying to get this flying with CRI-o and the nvidia-ctk by using the runtime and CDI config. Afair this can be used in docker as well.
So far all configs and libs are injected/mounted into the container as far as I can see.
I can double check for the reported libs here later.

Currently I'm facing the vblank resource unavailable issue. (Driver version 550.something - cannot look it up atm)

tux-rampage · 2024-04-05T06:24:40Z

Here are some short steps for what I've done so far (Nvidia Driver version 550.54.14):

nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
nvidia-ctk runtime configure --runtime=crio

For docker I guess it's enough to use --runtime=docker

I used the nvidia runtime when starting the container(s) and setting the following Env-Vars:

NVIDIA_DRIVER_CAPABILITIES=all
NVIDIA_VISIBLE_DEVICES="nvidia.com/gpu=all"

As documented, this enables the CDI integration which will mount the host libs and binaries.

What is working so far:

vkgears (directly)
vkgears (gamescope)

What is not working:

glxgears (gamescope): Failed to load driver zink, Blank surface, success render output on STDOUT
steam (gamescope): Failed to load driver zink, vblank sync failures, shm failures seems to be webkit sandbox related, possibly Nvidia 545.29.06 and Gamescope issue #60

tux-rampage · 2024-04-09T18:03:58Z

TLDR: as of the latest Nvidia Container Toolkit (1.14.3-1) unfortunately this is still not possible.

What's the issue?

With the latest versions I can run both Wolf and the Gstreamer pipeline just by running the container with --gpus=all unfortunately for some apps this is still missing some important library. Gamescope to run seems to require the following additional libraries that aren't provided by the toolkit:
libnvidia-egl-gbm.so.1
libnvidia-egl-wayland.so.1

libnvidia-vulkan-producer.so

gbm/nvidia-drm_gbm.so
Last one seems to just be a simlink to libnvidia-allocator.so.1 which is already present so that might be fine.

Now this is running from an X11 host and I can see that those additional libraries aren't present in my host system:
ls -la /usr/lib/x86_64-linux-gnu/libnv
libnvcuvid.so@                         libnvidia-container.so.1@              libnvidia-glvkspirv.so.530.30.02       libnvidia-opticalflow.so.1@
libnvcuvid.so.1@                       libnvidia-container.so.1.14.3*         libnvidia-ml.so.1@                     libnvidia-ptxjitcompiler.so.1@
libnvidia-allocator.so.1@              libnvidia-eglcore.so.530.30.02         libnvidia-ngx.so.1@                    libnvidia-rtcore.so.530.30.02
libnvidia-cfg.so.1@                    libnvidia-encode.so.1@                 libnvidia-ngx.so.530.30.02             libnvidia-tls.so.530.30.02
libnvidia-compiler.so.530.30.02        libnvidia-fbc.so.1@                    libnvidia-nvvm.so@                     libnvidia-wayland-client.so.530.30.02
libnvidia-container-go.so.1@           libnvidia-glcore.so.530.30.02          libnvidia-nvvm.so.4@                   libnvoptix.so.1@
libnvidia-container-go.so.1.14.3       libnvidia-glsi.so.530.30.02            libnvidia-opencl.so.1@
Anyone that can confirm the output of
docker run --rm -it --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all  -e NVIDIA_DRIVER_CAPABILITIES=all ubuntu ls /usr/lib/x86_64-linux-gnu/libnvidia*
/usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.1	   /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.530.30.02     /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.4
/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.1		   /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.530.30.02       /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.530.30.02  /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.530.30.02  /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.530.30.02   /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1		       /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1		   /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.1		       /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.530.30.02
/usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.1		   /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.530.30.02        /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.530.30.02
on a Nvidia Wayland host?

What can we do better?

I think we should keep manually downloading and linking the drivers like we are doing at the moment. We should probably add a proper check for mismatch between the downloaded drivers and the host installed drivers either on startup of the containers (somewhere in the base-app) or on startup of Wolf.

ls -1 /usr/lib/x86_64-linux-gnu/libnvidia*
/usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.550.54.14
/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.550.54.14
/usr/lib/x86_64-linux-gnu/libnvidia-egl-gbm.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-egl-gbm.so.1.1.1
/usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.550.54.14
/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-encode.so.550.54.14
/usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.550.54.14
/usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.550.54.14
/usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.550.54.14
/usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.550.54.14
/usr/lib/x86_64-linux-gnu/libnvidia-gpucomp.so.550.54.14
/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.550.54.14
/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.550.54.14
/usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.4
/usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.550.54.14
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.550.54.14
/usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.550.54.14
/usr/lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.550.54.14
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.550.54.14
/usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.550.54.14
/usr/lib/x86_64-linux-gnu/libnvidia-tls.so.550.54.14

It seems that libnvidia-egl-wayland.so.1 and libnvidia-vulkan-producer.so is missing. driver version is 550.54.14

tux-rampage · 2024-04-09T18:11:22Z

libnvidia-egl-wayland seems to be present in the package libnvidia-egl-wayland1.
libnvidia-vulkan-producer.so seems to be dropped from the driver. Need to confirm by building the driver volume for the current release.

Edit: Is it possible that libnvidia-egl-wayland1 is not part of Nvidias driver but an oss component?

Edit2: yes libnvidia-vulkan-producer was removed recently: https://www.nvidia.com/Download/driverResults.aspx/214102/en-us/

Removed libnvidia-vulkan-producer.so from the driver package. This helper library is no longer needed by the Wayland WSI.

ehfd · 2024-04-10T06:08:10Z

Those libraries are for the EGLStreams backend. I believe compositors have now stopped supporting them.

https://github.com/NVIDIA/egl-wayland

tux-rampage · 2024-04-10T08:50:10Z

Maybe. The latest Nvidia Driver comes with a gbm backend. Maybe that's something useful. I'll give GBM_BACKEND=nvidia-gbm a try. Maybe this will help: https://download.nvidia.com/XFree86/Linux-x86_64/510.39.01/README/gbm.html

From my approach yesterday evening, glxgears is running without any error messages so far. But the Output in moonlight stays black.

ehfd · 2024-05-10T14:14:26Z

I think NVIDIA Container Toolkit 1.15.0 (released not long ago) fixes most of the problems.

Please check it out.

I am trying to fix the remainder of issues with NVIDIA/nvidia-container-toolkit#490. Please feedback.

I've written about the situation more detailedly in: NVIDIA/nvidia-container-toolkit#490 (comment)

Within the scope of Wolf, the libnvidia-egl-wayland1 APT package will install the EGLStreams interface (if it can be used instead of GBM), and libnvidia-egl-gbm is installed with the Graphics Drivers PPA. Both are installed with the .run installer and the above PR will also inject libnvidia-egl-wayland.

tux-rampage · 2024-06-01T22:26:58Z

I've finally been successful to run steam and cyberpunk with wolf using the nvidia container toolkit instead of the drivers image.
As @ehfd metioned, the Symlink nvidia-drm_gbm.so is missing, so I had to create it manually before running gamescope/steam:

mkdir -p /usr/lib/x86_64-linux-gnu/gbm;
ln -sv ../libnvidia-allocator.so.1 /usr/lib/x86_64-linux-gnu/gbm/nvidia-drm_gbm.so;

After that launching steam and the game was successful. This all works without the use of libnvidia-egl-wayland.

ABeltramo · 2024-06-02T08:13:34Z

That sounds really good, thanks for reporting back!
We could easily add that to the images, my only concern is that this only works with a specific version of the toolkit. We should probably add a check on startup for the required libraries and print a proper error message if missing..

ehfd · 2024-06-02T08:38:24Z

If someone has knowledge of Go, could they contribute fixes for the unsolved aspects of the PR for NVIDIA/nvidia-container-toolkit#490?

I will give write access to https://github.com/ehfd/nvidia-container-toolkit/tree/main if they ask in order to keep it into one PR.

tux-rampage · 2024-06-04T07:13:54Z

If someone has knowledge of Go, could they contribute fixes for the unsolved aspects of the PR for NVIDIA/nvidia-container-toolkit#490?

I will give write access to https://github.com/ehfd/nvidia-container-toolkit/tree/main if they ask in order to keep it into one PR.

I can take a look at it later

tux-rampage · 2024-06-05T05:01:09Z

@ehfd Thanks for trusting with the access to your fork. I have addressed the pending issues on the code side and requested some feedback.

ehfd · 2024-06-05T05:19:14Z

Thanks for trusting with the access to your fork. I have addressed the pending issues on the code side and requested some feedback.

No sweat. You know Go better than me and seems like you did a great job at it.

kayakyakr · 2024-06-27T20:36:50Z

Congrats on getting this merged! This is going to substantially simplify getting the drivers up and running

ABeltramo · 2024-06-29T08:03:06Z

I've tried upgrading Gstreamer to 1.24.5 unfortunately it's now failing to use Cuda with:

0:00:00.102290171   172 0x560d70ab1190 ERROR              cudanvrtc gstcudanvrtc.cpp:165:gst_cuda_nvrtc_load_library_once: Failed to load 'nvrtcGetCUBINSize', 'nvrtcGetCUBINSize': /usr/local/nvidia/lib/libnvrtc.so: undefined symbol: nvrtcGetCUBINSize

It looks like nvrtcGetCUBINSize was recently added, so I guess there's a mismatch between the libnvrtc.so and what Gstreamer expects. Seems that this was added in this commit which reference @ehfd issue.
I could successfully run this Gstreamer version on my host which has Cuda12 installed, I guess this will break compatibility with older cards so I'm going to revert back Gstreamer in Wolf for now..

see: #52 (comment)

ehfd · 2024-06-29T08:08:01Z

@ABeltramo This should not be an issue as long as NVRTC CUDA version is kept at around 11.3, yes, 11.0 will not work.

ABeltramo · 2024-06-29T08:11:23Z

Thanks for the very quick reply! What's the compatibility matrix for NVRTC? Would upgrading to 11.3 still work for older Cuda installations?

ehfd · 2024-06-29T08:30:33Z

Mostly the internal ABI of NVIDIA.
CUBIN was designed so that there can be forward compatibility for the NVRTC files, but since CUBIN itself didn't exist backwardly, there's a problem here.

You can probably fix the issue yourself on GStreamer and then backport it to GStreamer 1.24.6 by a simple error handling in the C code. Probably #ifdef can work.

ABeltramo · 2024-06-29T08:43:45Z

Thanks, I've got enough on my plate already.. This is definitely lower priority compared to the rest, looks like we are going to stay on 1.22.7 for a bit longer..

ehfd · 2024-07-09T14:51:52Z

The pull request was technically merged today because it was reverted.

NVIDIA/nvidia-container-toolkit#548

ABeltramo added help wanted Extra attention is needed wontfix This will not be worked on labels Jan 24, 2024

ABeltramo closed this as completed in 53116ef Feb 15, 2024

ABeltramo reopened this Feb 15, 2024

ehfd mentioned this issue Jun 2, 2024

Inject additional libraries required for full display functionality NVIDIA/nvidia-container-toolkit#490

Merged

ABeltramo added a commit that referenced this issue Jun 29, 2024

fix: reverted Gstreamer to 1.22.7

b21a5a8

see: #52 (comment)

ABeltramo linked a pull request Jul 2, 2024 that will close this issue

Support for nvidia container toolkit #84

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate the NVIDIA container toolkit #52

Integrate the NVIDIA container toolkit #52

ehfd commented Nov 18, 2023 •

edited

Loading

ABeltramo commented Nov 25, 2023

ehfd commented Jan 9, 2024 •

edited

Loading

ohayak commented Feb 12, 2024

Murazaki commented Feb 12, 2024

Murazaki commented Feb 15, 2024 •

edited

Loading

ABeltramo commented Feb 15, 2024

ehfd commented Feb 25, 2024 •

edited

Loading

ehfd commented Mar 28, 2024 •

edited

Loading

tux-rampage commented Apr 3, 2024

tux-rampage commented Apr 5, 2024 •

edited

Loading

tux-rampage commented Apr 9, 2024

What's the issue?

What can we do better?

tux-rampage commented Apr 9, 2024 •

edited

Loading

ehfd commented Apr 10, 2024

tux-rampage commented Apr 10, 2024

ehfd commented May 10, 2024 •

edited

Loading

tux-rampage commented Jun 1, 2024

ABeltramo commented Jun 2, 2024

ehfd commented Jun 2, 2024 •

edited

Loading

tux-rampage commented Jun 4, 2024

tux-rampage commented Jun 5, 2024

ehfd commented Jun 5, 2024

kayakyakr commented Jun 27, 2024

ABeltramo commented Jun 29, 2024

ehfd commented Jun 29, 2024 •

edited

Loading

ABeltramo commented Jun 29, 2024

ehfd commented Jun 29, 2024 •

edited

Loading

ABeltramo commented Jun 29, 2024

ehfd commented Jul 9, 2024

Integrate the NVIDIA container toolkit #52

Integrate the NVIDIA container toolkit #52

Comments

ehfd commented Nov 18, 2023 • edited Loading

ABeltramo commented Nov 25, 2023

What's the issue?

What can we do better?

ehfd commented Jan 9, 2024 • edited Loading

ohayak commented Feb 12, 2024

Murazaki commented Feb 12, 2024

Murazaki commented Feb 15, 2024 • edited Loading

ABeltramo commented Feb 15, 2024

ehfd commented Feb 25, 2024 • edited Loading

ehfd commented Mar 28, 2024 • edited Loading

tux-rampage commented Apr 3, 2024

tux-rampage commented Apr 5, 2024 • edited Loading

tux-rampage commented Apr 9, 2024

What's the issue?

What can we do better?

tux-rampage commented Apr 9, 2024 • edited Loading

ehfd commented Apr 10, 2024

tux-rampage commented Apr 10, 2024

ehfd commented May 10, 2024 • edited Loading

tux-rampage commented Jun 1, 2024

ABeltramo commented Jun 2, 2024

ehfd commented Jun 2, 2024 • edited Loading

tux-rampage commented Jun 4, 2024

tux-rampage commented Jun 5, 2024

ehfd commented Jun 5, 2024

kayakyakr commented Jun 27, 2024

ABeltramo commented Jun 29, 2024

ehfd commented Jun 29, 2024 • edited Loading

ABeltramo commented Jun 29, 2024

ehfd commented Jun 29, 2024 • edited Loading

ABeltramo commented Jun 29, 2024

ehfd commented Jul 9, 2024

ehfd commented Nov 18, 2023 •

edited

Loading

ehfd commented Jan 9, 2024 •

edited

Loading

Murazaki commented Feb 15, 2024 •

edited

Loading

ehfd commented Feb 25, 2024 •

edited

Loading

ehfd commented Mar 28, 2024 •

edited

Loading

tux-rampage commented Apr 5, 2024 •

edited

Loading

tux-rampage commented Apr 9, 2024 •

edited

Loading

ehfd commented May 10, 2024 •

edited

Loading

ehfd commented Jun 2, 2024 •

edited

Loading

ehfd commented Jun 29, 2024 •

edited

Loading

ehfd commented Jun 29, 2024 •

edited

Loading