Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Blender does not start with container #10

Closed
1 task done
alexleach opened this issue May 30, 2024 · 9 comments
Closed
1 task done

[BUG] Blender does not start with container #10

alexleach opened this issue May 30, 2024 · 9 comments

Comments

@alexleach
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

I just updated the container image to the latest tag (currently 4.1.1), and blender does not launch. It appears to be something to do with X trying to use MESA and zink?

When accessing the web server, KasmVNC loads fine, but I just get a black screen. Running ps in the container shows that blender is not launched. However, if I launch blender manually within the container, then it shows up in my web browser.

Expected Behavior

Blender should start with the container.

Steps To Reproduce

I have an NVIDIA GPU and am running with docker compose, using the nvidia-container-runtime. I also extend the base image by installing nvidia-cuda-toolkit. This has worked fine for several months, allowing me to render on my RTX 3070 graphics card.

However, since updating to the latest image, version 4.1.1, bringing up the container won't bring up blender...

I found that I can fix the behaviour by first installing python3-xdg and then editing /defaults/startwm.sh and commenting out the environment variables which are being set. I then searched for what commit and what repository added these environment variables to startwm.sh...

So, it was in the docker-baseimage-kasmvnc repository where these environment variables were added... However, on the same day that a commit added these environment variables (linuxserver/docker-baseimage-kasmvnc@421ff46), they were removed in a later commit (linuxserver/docker-baseimage-kasmvnc@c8a520d).

So, perhaps the bug should be reported there, but the thing is, they've already fixed it... Just maybe not in a release? I've not quite figured that part out, but either way, please can you update your latest release to include that commit?

Environment

- OS: Arch Linux, with `nvidia-open-dkms` drivers.
- How docker service was installed: pacman -S docker docker-compose

CPU architecture

x86-64

Docker creation

`docker compose up -d`

My compose.yaml:-


services:                                                                                                                                                                                                             
  blender:                                                                                                                                                                                                            
    image: local/blender:latest                                                                                                                                                                                       
    build:                                                                                                                                                                                                            
      context: .                                                                                                                                                                                                      
      dockerfile_inline: |                                                                                                                                                                                            
        FROM linuxserver/blender:latest                                                                                                                                                                               
        RUN apt-get update && \                                                                                                                                                                                       
          apt-get install --no-install-recommends -y nvidia-cuda-toolkit python3-xdg && \                                                                                                                             
          rm -rf /var/lib/apt/lists/*                                                                                                                                                                                 
                                                                                                                                                                                                                      
      tags:                                                                                                                                                                                                           
        - local/blender:latest                                                                                                                                                                                        
                                                                                                                                                                                                                      
    restart: unless-stopped                                                                                                                                                                                           
    container_name: blender                                                                                                                                                                                           
    environment:                                                                                                                                                                                                      
      - NVIDIA_VISIBLE_DEVICES=all                                                                                                                                                                                    
      # Run as the host vglusers group (1003)                                                                                                                                                                         
      - PGID=1003                                                                                                                                                                                                     
                                                                                                                                                                                                                      
    runtime: nvidia                                                                                                                                                                                                   
                                                                                                                                                                                                                      
    volumes:                                                                                                                                                                                                          
      # Pass-through support for nvidia GPU                                                                                                                                                                           
      - "/dev/nvidia0:/dev/nvidia0"                                                                                                                                                                                   
      - "/dev/nvidiactl:/dev/nvidiactl"                                                                                                                                                                               
      - "/dev/nvidia-modeset:/dev/nvidia-modeset"                                                                                                                                                                     
      - "/dev/nvidia-uvm:/dev/nvidia-uvm"                                                                                                                                                                             
      - "/dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools"                                                                                                                                                                 
                                                                                                                                                                                                                      
      # For passing through host X11 server. Does it help?                                                                                                                                                            
      - /tmp/.X11-unix:/tmp/.X11-unix                                                                                                                                                                                 
                                                                                                                                                                                                                      
      - /media/data/blender-config:/config                                                                                                                                                                            
      - /media/data/blender-cache:/var/cache/blender                                                                                                                                                                  
                                                                                                                                                                                                                      
    # Add all the GPU capabilities                                                                                                                                                                                    
    deploy:                                                                                                                                                                                                           
      resources:                                                                                                                                                                                                      
        reservations:                                                                                                                                                                                                 
          devices:                                                                                                                                                                                                    
          - driver: nvidia                                                                                                                                                                                            
            count: all                                                                                                                                                                                                
            capabilities: [gpu, compute, utility, graphics]

Container logs

The container logs show the following:


blender  | ───────────────────────────────────────                                                                                                                                                                    
blender  |                                                                                                                                                                                                            
blender  |       ██╗     ███████╗██╗ ██████╗                                                                                                                                                                          
blender  |       ██║     ██╔════╝██║██╔═══██╗                                                                                                                                                                         
blender  |       ██║     ███████╗██║██║   ██║                                                                                                                                                                         
blender  |       ██║     ╚════██║██║██║   ██║                                                                                                                                                                         
blender  |       ███████╗███████║██║╚██████╔╝                                                                                                                                                                         
blender  |       ╚══════╝╚══════╝╚═╝ ╚═════╝                                                                                                                                                                          
blender  |                                                                                                                                                                                                            
blender  |    Brought to you by linuxserver.io                                                                                                                                                                        
blender  | ───────────────────────────────────────                                                                                                                                                                    
blender  |                                                                                                                                                                                                            
blender  | To support LSIO projects visit:                                                                                                                                                                            
blender  | https://www.linuxserver.io/donate/                                                                                                                                                                         
blender  |                                                                                                                                                                                                            
blender  | ───────────────────────────────────────                                                                                                                                                                    
blender  | GID/UID                                                                                                                                                                                                    
blender  | ───────────────────────────────────────                                                                                                                                                                    
blender  |                                                                                                                                                                                                            
blender  | User UID:    911                                                                                                                                                                                           
blender  | User GID:    1003                                                                                                                                                                                          
blender  | ───────────────────────────────────────                                                                                                                                                                    
blender  |                                                                                                                                                                                                            
blender  | **** permissions for /dev/dri/card1 are good ****                                                                                                                                                          
blender  | **** permissions for /dev/dri/renderD128 are good ****                                                                                                                                                     
blender  | [custom-init] No custom files found, skipping...
blender  | /usr/bin/nvidia-smi
blender  | /usr/bin/nvidia-smi
blender  | 
blender  | Xvnc KasmVNC 1.2.0 - built May 23 2024 00:22:09
blender  | Copyright (C) 1999-2018 KasmVNC Team and many others (see README.me)
blender  | See http://kasmweb.com for information on KasmVNC.
blender  | Underlying X server release 12014000, The X.Org Foundation
blender  | 
blender  | [ls.io-init] done.
blender  | Obt-Message: Xinerama extension is not present on the server
blender  | MESA: error: zink: could not create swapchain
blender  | X Error of failed request:  GLXBadCurrentWindow
blender  |   Major opcode of failed request:  149 (GLX)
blender  |   Minor opcode of failed request:  11 (X_GLXSwapBuffers)
blender  |   Serial number of failed request:  175
blender  |   Current serial number in output stream:  175
blender  | Read prefs: "/config/.config/blender/4.1/config/userpref.blend"
Copy link

Thanks for opening your first issue here! Be sure to follow the relevant issue templates, or risk having this issue marked as invalid.

@thelamer
Copy link
Member

I can only confirm what I can test, I run Debian Bookworm and have tried on it and Fedora 40.

Here is a test Run:

docker run --rm -it \
 --shm-size=1gb \
 --runtime nvidia \
 --gpus all -p 3000:3000 \
 linuxserver/blender bash

This is with a 3060 and I have full CUDA support and accelerated preview rendering running the 525.147.05 drivers.
cuda

@alexleach
Copy link
Author

Interesting, thanks for this. When I run your exact command on my machine, which has very similar hardware(!), I don't have CUDA or OptiX support...

Screenshot 2024-05-30 at 13 03 54

However, it does (evidently) launch blender, with just one seemingly benign error message shown in the console:

**** adding /dev/dri/card1 to video group root with id 0 ****
**** permissions for /dev/dri/renderD128 are good ****
[custom-init] No custom files found, skipping...
/usr/bin/nvidia-smi
/usr/bin/nvidia-smi
[ls.io-init] done.
_XSERVTransmkdir: ERROR: euid != 0,directory /tmp/.X11-unix will not be created.

Xvnc KasmVNC 1.2.0 - built May 23 2024 00:22:09
Copyright (C) 1999-2018 KasmVNC Team and many others (see README.me)
See http://kasmweb.com for information on KasmVNC.
Underlying X server release 12014000, The X.Org Foundation

root@f36e0429e7f1:/# Obt-Message: Xinerama extension is not present on the server
 2024-05-30 12:03:24,123 [INFO] websocket 0: got client connection from 127.0.0.1
 2024-05-30 12:03:24,130 [PRIO] Connections: accepted: @192.168.1.5_1717070604.123549::websocket

In fact, when using docker run [...], blender always starts, so I don't get the black screen or error message about zine at all, even after adding pretty much every flag I know of that corresponds to my compose file:

docker run --rm -it \
  --shm-size=1gb \
  --runtime nvidia \
  --gpus all \
  -p 3000:3000 \
  -e NVIDIA_VISIBLE_DEVICES=all \
  -v /dev/nvidia0:/dev/nvidia0 \
  -v /dev/nvidiactl:/dev/nvidiactl \
  -v /dev/nvidia-modeset:/dev/nvidia-modeset \
  -v /dev/nvidia-uvm:/dev/nvidia-uvm \
  -v /dev/nvidia-uvm-tools:/dev/nvidia-uvm-tools \
  -v /media/data/blender-cache:/var/cache/blender \
  -v /tmp/.X11-unix:/tmp/.X11-unix \
  linuxserver/blender bash

I did note that when using docker run with -e PGID=1003 (where group 1003 is my host's vglusers group), blender segfaults when I open Edit > Preferences, with the console output showing:

**** adding /dev/dri/card1 to video group root with id 0 ****
**** permissions for /dev/dri/renderD128 are good ****
[custom-init] No custom files found, skipping...
/usr/bin/nvidia-smi
/usr/bin/nvidia-smi
[ls.io-init] done.

Xvnc KasmVNC 1.2.0 - built May 23 2024 00:22:09
Copyright (C) 1999-2018 KasmVNC Team and many others (see README.me)
See http://kasmweb.com for information on KasmVNC.
Underlying X server release 12014000, The X.Org Foundation

root@7f9776a6aa64:/# Obt-Message: Xinerama extension is not present on the server
 2024-05-30 13:36:21,210 [INFO] websocket 0: got client connection from 127.0.0.1
 2024-05-30 13:36:21,218 [PRIO] Connections: accepted: @192.168.1.5_1717076181.210856::websocket
Writing: /tmp/blender.crash.txt
Segmentation fault (core dumped)

ERROR: openbox-xdg-autostart requires PyXDG to be installed

I also reproduced this with a minimal compose.yaml file, producing the same segfault and console output:

services:
  blender:
    image: linuxserver/blender:latest

    restart: unless-stopped

    container_name: blender
    environment:
      - PGID=1003

    runtime: nvidia

    ports:
      - 0.0.0.0:3000:3000/tcp

    volumes:
      - /media/data/blender-config:/config
      - /media/data/blender-cache:/var/cache/blender

    # Add all the GPU capabilities
    deploy:
      resources:
        reservations:
          devices:
          - driver: nvidia
            count: all
            capabilities: [compute, gpu, graphics, utility, video, display]

    # Increase shared memory size
    shm_size: '1gb'

I've just been playing with a bunch of additional combinations, and have noted the following. These are additive to the compose.yaml, just above...

  • Install nvidia-container-runtime. CUDA not available...
  • Mount /tmp/.X11-unix:/tmp/.X11-unix. CUDA not available.
  • Set env. var PGID=1003 (host's vglusers). Blender does not launch, black screen, originally reported console error.
  • Mount a customised startwm.sh, one without those environment variables set. Everything works...
Screenshot 2024-05-30 at 13 40 40

I've then worked backwards:

  • Remove nvidia-container-runtime. Ooh, everything works! 🙂
  • Don't mount /tmp/.X11-unix:/tmp/.X11-unix.. Hmm, everything still works.
  • Unset PGID=1003. CUDA not available.
  • Okay, re-add PGID=1003, and don't overwrite startwm.sh. Black screen, blender doesn't start.

In summary, I for some reason need to set the group to my host's vglusers, and remove those environment variables...

So, I'm a bit confused about this if I'm honest, especially as you're basically on the same hardware architecture, with the main difference being that you're running Debian instead of Arch...

Can I ask you to share your nvidia-container-runtime configuration file? Mine is at /etc/nvidia-container-runtime/config.toml? The contents of mine are below...

The other thing that crosses my mind is cgroups, maybe I should look into that again.

#accept-nvidia-visible-devices-as-volume-mounts = false
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
#swarm-resource = "DOCKER_RESOURCE_GPU"

[nvidia-container-cli]
debug = "/var/log/nvidia-container-toolkit.log"
environment = []
#ldcache = "/etc/ld.so.cache"
ldconfig = "@/sbin/ldconfig"
load-kmods = true
#no-cgroups = true
#path = "/usr/bin/nvidia-container-cli"
#root = "/run/nvidia/driver"
user = "root:root"

[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc", "crun"]

[nvidia-container-runtime.modes]

[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]

[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

[nvidia-container-runtime-hook]
path = "nvidia-container-runtime-hook"
skip-mode-detection = false

[nvidia-ctk]
path = "nvidia-ctk"

@thelamer
Copy link
Member

thelamer commented May 30, 2024

Sure, one quick thing when you say working keep in mind that yes for rendering it will use CUDA etc, but the actual onscreen preview of the model is rendered in OpenGL which is why we want to automatically inject the Zink override, otherwise all your rotations and previews use LLVMPipe and that is a GPU emulated on your CPU. It is night and day when you get it working.

Here is my runtime config:

#accept-nvidia-visible-devices-as-volume-mounts = false
#accept-nvidia-visible-devices-envvar-when-unprivileged = true
disable-require = false
supported-driver-capabilities = "compat32,compute,display,graphics,ngx,utility,video"
#swarm-resource = "DOCKER_RESOURCE_GPU"

[nvidia-container-cli]
#debug = "/var/log/nvidia-container-toolkit.log"
environment = []
#ldcache = "/etc/ld.so.cache"
ldconfig = "@/sbin/ldconfig"
load-kmods = true
#no-cgroups = false
#path = "/usr/bin/nvidia-container-cli"
#root = "/run/nvidia/driver"
#user = "root:video"

[nvidia-container-runtime]
#debug = "/var/log/nvidia-container-runtime.log"
log-level = "info"
mode = "auto"
runtimes = ["docker-runc", "runc", "crun"]

[nvidia-container-runtime.modes]

[nvidia-container-runtime.modes.cdi]
annotation-prefixes = ["cdi.k8s.io/"]
default-kind = "nvidia.com/gpu"
spec-dirs = ["/etc/cdi", "/var/run/cdi"]

[nvidia-container-runtime.modes.csv]
mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d"

[nvidia-container-runtime-hook]
path = "nvidia-container-runtime-hook"
skip-mode-detection = false

[nvidia-ctk]
path = "nvidia-ctk"

Also I know Arch is more head but I run a backport kernel (6.6.13) and current nvidia runtime:

nvidia-container-cli --version
cli-version: 1.15.0
lib-version: 1.15.0

Maybe it is the root:root perms for the device in your config ?

@alexleach
Copy link
Author

alexleach commented May 30, 2024 via email

@thelamer
Copy link
Member

No we use S6v3 init and require root for all out images. Everything runs in userspace as the abc user in the container, but we have hooks that run on init that require root in the container like chowning the video device.

@thelamer
Copy link
Member

Does this work? (given your error)

slint-ui/slint#4828 (comment)

@alexleach
Copy link
Author

alexleach commented May 30, 2024 via email

@LinuxServer-CI
Copy link
Collaborator

This issue has been automatically marked as stale because it has not had recent activity. This might be due to missing feedback from OP. It will be closed if no further activity occurs. Thank you for your contributions.

@LinuxServer-CI LinuxServer-CI closed this as not planned Won't fix, can't repro, duplicate, stale Aug 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

No branches or pull requests

3 participants