Skip to content

Hardware acceleration doesn't work when the container's (hardcoded) render group's GID doesn't match the host's #2739

@melyux

Description

@melyux

1. What is not working as documented?

My video playback was extremely slow with non-HEVC videos, so I looked at the logs and saw a bunch of:

h264_qsv: failed transcoding

messages. I turned on debug logging and kept seeing

Error initializing an internal MFX session: unsupported (-3)

Hardware transcoding was not working at all. I tried a lot of things that didn't work, but finally ran ls -l /dev/dri inside the container and saw that the GID for the renderd128 device was a bare number instead of the render group's name. This seems to be a problem.

So on the host, I ran chmod 777 /dev/dri/renderD128 and restarted the docker container. This time, no more errors and I could see in intel_gpu_top that the GPU was working!

This, however, is not sustainable because it resets on host reboot, and other containers work with hardware acceleration without doing host permission changes (like Plex).

2. How can we reproduce it?

Steps to reproduce the behavior:

  1. Use :preview docker image on an Intel machine that can do QSV and has its /dev/dri/renderd128 owned by the render group that doesn't have the GID 115 that PhotoPrism seems to hardcode in create-users.sh.
  2. Enable FFMPEG's intel encoder in the docker options.
  3. Try to stream some video files that required transcoding.
  4. Check the log to see that it fails to transcode using hardware encoder.

3. What behavior do you expect?

The docker container should handle GIDs for the render group that aren't 115. I think.

4. What could be the cause of your problem?

The create-users.sh file hardcodes the render group's PID as 115 so there's a mismatch between the container and the host, stopping the container from accessing the /dev/dri/renderd128 device. The logs don't clearly indicate this as the cause so it requires lots of debugging.

Someone had a similar issue on the Plex Linuxserver container (linuxserver/docker-plex#207), and it was solved by Linuxserver changing their user/group addition logic to be more dynamic, it seems (https://github.com/linuxserver/docker-plex/blob/master/root/etc/cont-init.d/50-gid-video).

Giving everything in /dev/dri 777 permissions "fixes" the problem, pointing to a permissions issue.

5. Can you provide us with example files for testing, error logs, or screenshots?

See above for the ffmpeg errors.

6. Which software versions do you use?

(a) PhotoPrism Architecture & Build Number: AMD64, 220919-cc8bab446

(b) Database Type & Version: MariaDB, latest

(c) Operating System Types & Versions: Linux

(d) Browser Types & Versions: Safari on Mac

(e) Ad Blockers, Browser Plugins, and/or Firewall Software? No

7. On what kind of device is PhotoPrism installed?

(a) Device / Processor Type: Intel Core i7-7700K

(b) Physical Memory & Swap Space in GB: 16GB + 8GB swap

(c) Storage Type: SSD + HDD

(d) Anything else that might be helpful to know?

I'm also using vGPU for a Windows VM on the same machine. Plex works with hw acceleration in another container.

8. Do you use a Reverse Proxy, Firewall, VPN, or CDN?

No

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinginvalidWorks as documented or cannot be reproduced

Type

No type

Projects

Status

Release 🌈

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions