Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No CUDA-capable device is detected after 15 minutes of successful work #10554

Closed
1 of 3 tasks
maxwase opened this issue Jun 22, 2024 · 0 comments
Closed
1 of 3 tasks

No CUDA-capable device is detected after 15 minutes of successful work #10554

maxwase opened this issue Jun 22, 2024 · 0 comments

Comments

@maxwase
Copy link

maxwase commented Jun 22, 2024

The bug

I guess I'm having the same issue, after a 10-15 minutes of photo+video uploading I get the following error for every new coming video

The OS that Immich Server is running on

6.6.32-1-MANJARO

Version of Immich Server

v1.106.4

Version of Immich Mobile App

v1.106.3 build.143

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    extends:
        file: hwaccel.transcoding.yml
        service: nvenc # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    ports:
      - 2283:3001
    depends_on:
      - redis
      - database
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-cuda
    extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
        file: hwaccel.ml.yml
        service: cuda # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always

  redis:
    container_name: immich_redis
    image: docker.io/redis:6.2-alpine@sha256:d6c2911ac51b289db208767581a5d154544f2b2fe4914ea5056443f62dc6e900
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    healthcheck:
      test: pg_isready --dbname='${DB_DATABASE_NAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1
      interval: 5m
      start_interval: 30s
      start_period: 5m
    command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"]
    restart: always

volumes:
  model-cache:

Your .env content

No special changes

Reproduction steps

Container restart helps, but again, only for 10 minutes

I tried to go into the container and it reproduces
1. `ffmpeg -i VID.mp4 -vcodec h264_nvenc output.mp4`
2. See the error
3. Container restart
4. run `1`
5. See no error
6. Wait for 15 minutes with multiple video upload
7. run `1`
8. See the error

Relevant log output

[Nest] 7  - 06/22/2024, 10:41:46 PM   ERROR [Microservices:MediaRepository] ffmpeg version 6.0.1-Jellyfin Copyright (c) 2000-2023 the FFmpeg developers
  built with gcc 12 (Debian 12.2.0-14)
  configuration: --prefix=/usr/lib/jellyfin-ffmpeg --target-os=linux --extra-version=Jellyfin --disable-doc --disable-ffplay --disable-ptx-compression --disable-static --disable-libxcb --disable-sdl2 --disable-xlib --enable-lto --enable-gpl --enable-version3 --enable-shared --enable-gmp --enable-gnutls --enable-chromaprint --enable-opencl --enable-libdrm --enable-libass --enable-libfreetype --enable-libfribidi --enable-libfontconfig --enable-libbluray --enable-libmp3lame --enable-libopus --enable-libtheora --enable-libvorbis --enable-libopenmpt --enable-libdav1d --enable-libsvtav1 --enable-libwebp --enable-libvpx --enable-libx264 --enable-libx265 --enable-libzvbi --enable-libzimg --enable-libfdk-aac --arch=amd64 --enable-libshaderc --enable-libplacebo --enable-vulkan --enable-vaapi --enable-amf --enable-libvpl --enable-ffnvcodec --enable-cuda --enable-cuda-llvm --enable-cuvid --enable-nvdec --enable-nvenc
  libavutil      58.  2.100 / 58.  2.100
  libavcodec     60.  3.100 / 60.  3.100
  libavformat    60.  3.100 / 60.  3.100
  libavdevice    60.  1.100 / 60.  1.100
  libavfilter     9.  3.100 /  9.  3.100
  libswscale      7.  1.100 /  7.  1.100
  libswresample   4. 10.100 /  4. 10.100
  libpostproc    57.  1.100 / 57.  1.100
Selecting decoder 'hevc' because of requested hwaccel method cuda
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'upload/library/admin/2021/06-Jun/22/VID_20210601_111439.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 0
    compatible_brands: isommp42
    creation_time   : 2021-06-01T08:15:19.000000Z
    com.android.version: 11
    com.android.capture.fps: 60.000000
  Duration: 00:00:38.74, start: 0.000000, bitrate: 15295 kb/s
  Stream #0:0[0x1](eng): Video: hevc (Main), 1 reference frame (hvc1 / 0x31637668), yuvj420p(pc, bt470bg/bt470bg/smpte170m, left), 1920x1080 (1920x1088), 15019 kb/s, SAR 1:1 DAR 16:9, 59.96 fps, 60 tbr, 90k tbn (default)
    Metadata:
      creation_time   : 2021-06-01T08:15:19.000000Z
      handler_name    : VideoHandle
      vendor_id       : [0][0][0][0]
  Stream #0:1[0x2](eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 192 kb/s (default)
    Metadata:
      creation_time   : 2021-06-01T08:15:19.000000Z
      handler_name    : SoundHandle
      vendor_id       : [0][0][0][0]
[AVHWDeviceContext @ 0x221c4030500] cu->cuInit(0) failed -> CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
Device creation failed: -542398533.
No device available for decoder: device type cuda needed for codec hevc.
Stream mapping:
  Stream #0:0 -> #0:0 (hevc (native) -> h264 (h264_nvenc))
  Stream #0:1 -> #0:1 (copy)
Device setup failed for decoder on input stream #0:0 : Generic error in an external library
[AVIOContext @ 0x221c42c0540] Statistics: 0 bytes written, 0 seeks, 0 writeouts
[AVIOContext @ 0x221c40f0180] Statistics: 148274 bytes read, 1 seeks

[Nest] 7  - 06/22/2024, 10:41:46 PM   ERROR [Microservices:MediaService] Error: ffmpeg exited with code 1: 
[Nest] 7  - 06/22/2024, 10:41:46 PM   ERROR [Microservices:MediaService] Error occurred during transcoding. Retrying with NVENC acceleration disabled.

Additional information

No response

@immich-app immich-app locked and limited conversation to collaborators Jun 22, 2024
@mertalev mertalev converted this issue into discussion #10555 Jun 22, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant