Skip to content

GPU hardware decode not actually engaged: '-hwaccel auto' alone is insufficient for AV1/VP9, falls back to software #276

@mescon

Description

@mescon

TL;DR

Healarr's "hardware acceleration" code path adds -hwaccel auto to ffmpeg invocations but does not select a hardware decoder. For codecs whose default ffmpeg decoder is a pure software library (notably AV1 → libdav1d and VP9 → libvpx), this means the GPU is never actually engaged — the CUDA context is set up and immediately ignored. The "Health check: ffmpeg hardware acceleration detected, enabling -hwaccel auto" log line is misleading: it tells you the build supports hwaccel, not that it's being used for this file.

Confirmed live on Sokaris (RTX 4070, NVIDIA Container Toolkit, custom ffmpeg 7.1.1 with all *_cuvid decoders compiled in). Verified with the Fast triage preset enabled on /media/Movies/HD/: every AV1 file decodes via libdav1d despite -hwaccel auto being passed.

Reproducing

Inside sokaris-healarr, with a known AV1 input:

$ ffmpeg -v info -hwaccel auto -i <av1file>.mkv -t 5 -f null -
…
[libdav1d @ 0x...] libdav1d 1.5.2
  Stream #0:0(eng): Video: av1 (libdav1d) …

vs. the explicit cuvid path:

$ ffmpeg -v info -hwaccel cuda -c:v av1_cuvid -i <av1file>.mkv -t 5 -f null -

nvidia-smi --query-compute-apps confirms: only the explicit-cuvid invocation shows up as a GPU process.

Why this happens

-hwaccel <name> configures the hardware context for filters and (some) decoders that have internal hwaccel hooks. For H.264 and HEVC the stock h264 / hevc decoders have those hooks, so -hwaccel cuda alone can engage NVDEC. For AV1, VP9, VP8, and a few others, ffmpeg's default decoder is a software library with no hwaccel hooks; the CUDA context is set up but never consumed. To actually decode AV1 on NVDEC you need -c:v av1_cuvid (and vp9_cuvid for VP9, etc.).

Healarr's resolveHwAccelArgs (internal/integration/health_checker.go:248) returns ["-hwaccel", "auto"] (or ["-hwaccel", <name>] for explicit settings) and stops there. The decoder is left to ffmpeg's default selection, which for AV1 means libdav1d.

Why it wasn't caught earlier

  • The "Fast triage" / 60-second decode preset still ran to completion regardless of decoder choice — software libdav1d is slow but not slow enough at 60 seconds of 1080p to make scans visibly hang.
  • The smarter probeHwAccelAvailable from #261 verifies that hwaccel is available, not that it's being used for any given file.
  • HEVC and H.264 files do actually use NVDEC with the current code (because of the internal-hwaccel hooks above), so the feature isn't completely broken — it's specifically broken for AV1/VP9/VP8.

This bug was masked by #261 declaring success based on a setup-time probe and the absence of any check confirming actual decoder selection at decode time.

Affected codecs

Anything whose default ffmpeg decoder is a software library without internal hwaccel hooks. With the custom Alpine ffmpeg shipped in v1.3.4:

Codec Default decoder NVDEC decoder Currently engages GPU?
H.264 h264 (has hwaccel hook) h264_cuvid yes (via -hwaccel alone)
HEVC / H.265 hevc (has hwaccel hook) hevc_cuvid yes (via -hwaccel alone)
MPEG2 mpeg2video (has hwaccel hook) mpeg2_cuvid yes (via -hwaccel alone)
AV1 libdav1d (software) av1_cuvid no — bug
VP9 libvpx-vp9 (software) vp9_cuvid no — bug
VP8 libvpx (software) vp8_cuvid no (but matters less)
VC-1 vc1 (has hwaccel hook) vc1_cuvid yes (via -hwaccel alone)
MJPEG mjpeg mjpeg_cuvid mostly software

AV1 and VP9 are the meaningful gap — those are the codecs the AV1-NVDEC work in #261 was specifically about.

Proposed fix

Codec-aware hardware decoder selection:

  1. Probe the input codec with a quick ffprobe -select_streams v:0 -show_entries stream=codec_name call before each thorough decode. Cheap (~30-50ms), negligible vs the thorough decode itself.
  2. Map codec → cuvid decoder for all the codec_name values ffprobe emits (av1av1_cuvid, vp9vp9_cuvid, etc.).
  3. Only apply the override when the resolved hwaccel is CUDA-based-c:v av1_cuvid would fail on a VAAPI host. Easy heuristic: hwaccel setting is cuda, or auto AND /dev/nvidiactl exists.
  4. Append -c:v <codec>_cuvid to the ffmpeg input args alongside the existing -hwaccel ....

Falls back cleanly:

  • No /dev/nvidiactl → no cuvid override → current VAAPI / -hwaccel auto behavior unchanged.
  • Codec has no _cuvid variant → no override → ffmpeg picks its default decoder.
  • Cuvid decode fails (e.g. malformed bitstream) → ffmpeg's -xerror reports the failure → that file gets flagged corrupt. This is acceptable: a file that genuinely can't decode on NVDEC is at minimum unhealthy from the GPU consumer's perspective.

Locations to change:

  • internal/integration/health_checker.go: add detectVideoCodec, cuvidDecoderForCodec, and a new hwAccelInputArgs(ov, path) wrapper.
  • internal/integration/health_checker_tools.go:37: thorough ffmpeg call uses the new wrapper.
  • internal/integration/health_checker.go::AnalyzeContent: same change.

Target

Fix in v1.3.5. Issue #274 (zombie cancelled scans) is also in that release.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions