HW acceleration seems to function poorly on RTX 3060 Ti #9780

kjkent · 2024-05-26T23:01:21Z

The bug

If this is expected behavior, I apologize for the noise. It appears that ffmpeg may not be fully utilizing hardware acceleration on my machine, with an RTX 3060 Ti, as transcoding looks like it's hitting the CPU (Ryzen 3600X) far harder than the GPU. Running 6 concurrent transcoding jobs (to 1080p/HEVC/Opus), I'm seeing ~40-60% CPU use across 12 cores but only ~9% GPU utilization with 3.3G of 8G GPU memory used:

btop:

nvidia-smi:

The OS that Immich Server is running on

Arch (6.9.2-arch1-1)

Version of Immich Server

v1.105.1

Version of Immich Mobile App

N/A

Platform with the issue

Server
Web
Mobile

Your docker-compose.yml content

x-immich-env: &immich-env
  env_file: ./.env
  restart: always
  user: ${UID}:${GID}
  networks:
    - immich

services:
  immich-server:
    <<: *immich-env
    command: [ "start.sh", "immich" ]
    container_name: immich-server
    image: ghcr.io/immich-app/immich-server:release
    labels:
      traefik.enable: "true"
      traefik.http.services.immich.loadbalancer.server.port: "3001"
      traefik.http.routers.immich.entrypoints: "websecure"
      traefik.http.routers.immich.rule: "Host(`immich.<redacted url>`)"
    networks:
      - traefik-proxy
      - immich
    volumes:
      - ./media:/usr/src/app/upload
      - ./import:/import:ro
    depends_on:
      - immich-redis
      - immich-postgres

  immich-microservices:
    <<: *immich-env
    container_name: immich-microservices
    command: [ "start.sh", "microservices" ]
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu
                - compute
                - video
    image: ghcr.io/immich-app/immich-server:release
    volumes:
      - ./media:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
      - ./reverse-geocoding-dump:/usr/src/app/.reverse-geocoding-dump
    depends_on:
      - immich-redis
      - immich-postgres

  immich-machine-learning:
    <<: *immich-env
    container_name: immich-machine-learning
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities:
                - gpu
    image: ghcr.io/immich-app/immich-machine-learning:release-cuda
    volumes:
      - ml_model-cache:/cache
      - ml_dot-cache:/.cache
      - ./ml_config:/.config

  immich-redis:
    <<: *immich-env
    container_name: immich-redis
    image: redis:6.2-alpine
    volumes:
      - ./redis:/data

  immich-postgres:
    <<: *immich-env
    command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"]
    container_name: immich-postgres
    image: tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0 
    environment:
      POSTGRES_USER: "${DB_USERNAME}"
      POSTGRES_PASSWORD: "${DB_PASSWORD}"
      POSTGRES_DB: "${DB_DATABASE_NAME}"
      PGDATA: '/var/lib/postgresql/data/pgdata'
      POSTGRES_INITDB_ARGS: '--data-checksums'
      # SElinux fixes// :z == shared mount || :Z == private unshared mount
    volumes:
      - ./postgres:/var/lib/postgresql/data:Z

networks:
  immich:
    name: "immich"
  traefik-proxy:
    external: true

volumes:
  ml_model-cache:
  ml_dot-cache:

Your .env content

DB_USERNAME=kjkent
DB_PASSWORD=<redacted>
DB_DATABASE_NAME=immich
DB_HOSTNAME=immich-postgres
REDIS_HOSTNAME=immich-redis
PGDATA=/var/lib/postgresql/data/pgdata

API_TOKEN=<redacted> // this is read by a script to import images via the server's cli



### Reproduction steps

```bash
1. Start a transcode of all videos via the web GUI with the following settings:

{
  "ffmpeg": {
    "crf": 28,
    "threads": 0,
    "preset": "slower",
    "targetVideoCodec": "hevc",
    "acceptedVideoCodecs": [
      "hevc"
    ],
    "targetAudioCodec": "libopus",
    "acceptedAudioCodecs": [
      "libopus"
    ],
    "targetResolution": "1080",
    "maxBitrate": "0",
    "bframes": -1,
    "refs": 0,
    "gopSize": 0,
    "npl": 0,
    "temporalAQ": true,
    "cqMode": "auto",
    "twoPass": true,
    "preferredHwDevice": "auto",
    "transcode": "optimal",
    "tonemap": "hable",
    "accel": "nvenc"
  },
  "job": {
    "thumbnailGeneration": {
      "concurrency": 6
    },
    "videoConversion": {
      "concurrency": 6
    }
  }
}

2.
3.
...

Relevant log output

In terms of log output, when I had the jobs set to ~15 instead of six, it maxed out the CPU yet GPU utilization stayed low, however I did get a CUDA error from `immich-microservices` where it failed to allocate memory.

Aside from this, pretty much 0 log output from the microservices container aside from a websocket initialization notice.

Additional information

I'm hesitant to report this as it may just be my misaligned settings. I do have another bug to report where I think the server is responding with an inappropriate status code when videos are requested from Chrome, when Immich is behind a reverse proxy with gzip encoding enabled. I'm going to verify and get more info before filing that.

I noticed that 3dd3428 changes the settings for nvenc & introduces nvdec so I can see if there's any difference by bumping my container images to main

Thank you for developing Immich -- it's an incredible undertaking, implemented incredibly well! I hope one day to contribute.

The text was updated successfully, but these errors were encountered:

mertalev · 2024-05-26T23:05:03Z

This has been dramatically improved with #9452.

kjkent · 2024-05-26T23:36:19Z

Amazing, thank you

kjkent · 2024-05-27T18:48:52Z

@mertalev I tested the latest containers in main as of ~12hrs ago, there's a small but noticeable (subjective) improvement in terms of GPU utilisation with HW decoding off. With hardware decoding on, transcoding fails and reverts to full CPU usage:

built with gcc 12 (Debian 12.2.0-14)
  configuration: --prefix=/usr/lib/jellyfin-ffmpeg --target-os=linux --extra-version=Jellyfin --disable-doc --disable-ffplay --disable-ptx-compression --disable-static --disable-libxcb --disable-sdl2 --disable-xlib --enable-lto --enable-gpl --enable-version3 --enable-shared --enable-gmp --enable-gnutls --enable-chromaprint --enable-opencl --enable-libdrm --enable-libass --enable-libfreetype --enable-libfribidi --enable-libfontconfig --enable-libbluray --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libopenmpt --enable-libdav1d --enable-libsvtav1 --enable-libwebp --enable-libvpx --enable-libx264 --enable-libx265 --enable-libzvbi --enable-libzimg --enable-libfdk-aac --arch=amd64 --enable-libshaderc --enable-libplacebo --enable-vulkan --enable-vaapi --enable-amf --enable-libvpl --enable-ffnvcodec --enable-cuda --enable-cuda-llvm --enable-cuvid --enable-nvdec --enable-nvenc
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
[h264 @ 0x327ac2e0580] Reinit context to 1088x1920, pix_fmt: yuv420p
Selecting decoder 'h264' because of requested hwaccel method cuda
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'upload/library/admin/2023/12/video.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    title           : 856115136307470
    encoder         : Lavf59.27.100
  Duration: 00:00:28.21, start: 0.000000, bitrate: 4018 kb/s
  Stream #0:0[0x1](und): Video: h264 (High), 1 reference frame (avc1 / 0x31637661), yuv420p(tv, bt709, progressive, left), 1080x1920 (1088x1920), 3974 kb/s, 30 fps, 30 tbr, 15360 tbn (default)
    Metadata:
      handler_name    : VideoHandler
      vendor_id       : [0][0][0][0]
      encoder         : Lavc59.37.100 h264_fbv
  Stream #0:1[0x2](und): Audio: aac (HE-AAC) (mp4a / 0x6134706D), 44100 Hz, stereo, fltp, 48 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
      vendor_id       : [0][0][0][0]
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> hevc (hevc_nvenc))
  Stream #0:1 -> #0:1 (aac (native) -> opus (libopus))
Press [q] to stop, [?] for help
[h264 @ 0x327ac2e1980] NVDEC capabilities:
[h264 @ 0x327ac2e1980] format supported: yes, max_mb_count: 65536
[h264 @ 0x327ac2e1980] min_width: 48, max_width: 4096
[h264 @ 0x327ac2e1980] min_height: 16, max_height: 4096
[h264 @ 0x327ac2e1980] Reinit context to 1088x1920, pix_fmt: cuda
[graph 0 input from stream 0:0 @ 0x327ac1a2280] w:1080 h:1920 pixfmt:cuda tb:1/15360 fr:30/1 sar:0/1
[auto_scale_0 @ 0x327ac1a2640] w:iw h:ih flags:'' interl:0
[Parsed_format_0 @ 0x327ac1a21c0] auto-inserting filter 'auto_scale_0' between the filter 'graph 0 input from stream 0:0' and the filter 'Parsed_format_0'
Impossible to convert between the formats supported by the filter 'graph 0 input from stream 0:0' and the filter 'auto_scale_0'
Error reinitializing filters!
Failed to inject frame into filter network: Function not implemented
Error while processing the decoded data for stream #0:0
[AVIOContext @ 0x327ac1d0680] Statistics: 0 bytes written, 0 seeks, 0 writeouts
Terminating demuxer thread 0
[AVIOContext @ 0x327ac1d02c0] Statistics: 179128 bytes read, 0 seeks
Conversion failed!

I see in the updated docs that HW decoding may not work for every video, but the above error repeated for all six videos in the job queue with 0% GPU usage and 100% CPU -- so I thought I'd mention here in case it's unexpected.

mertalev · 2024-05-27T19:19:27Z

I was initially going to say that I just tested on main and don't have this issue, but there was a video that had the same error. The fix was just merged into main, but it'll take a bit for the new image to be built. With your current image, you can try setting a different target resolution (like 720p) to make it work.

mertalev closed this as completed May 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HW acceleration seems to function poorly on RTX 3060 Ti #9780

HW acceleration seems to function poorly on RTX 3060 Ti #9780

kjkent commented May 26, 2024

mertalev commented May 26, 2024

kjkent commented May 26, 2024

kjkent commented May 27, 2024

mertalev commented May 27, 2024

HW acceleration seems to function poorly on RTX 3060 Ti #9780

HW acceleration seems to function poorly on RTX 3060 Ti #9780

Comments

kjkent commented May 26, 2024

The bug

The OS that Immich Server is running on

Version of Immich Server

Version of Immich Mobile App

Platform with the issue

Your docker-compose.yml content

Your .env content

Relevant log output

Additional information

mertalev commented May 26, 2024

kjkent commented May 26, 2024

kjkent commented May 27, 2024

mertalev commented May 27, 2024