Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ml)!: cuda and openvino acceleration #5619

Merged
merged 41 commits into from
Jan 21, 2024
Merged

feat(ml)!: cuda and openvino acceleration #5619

merged 41 commits into from
Jan 21, 2024

Conversation

mertalev
Copy link
Contributor

@mertalev mertalev commented Dec 10, 2023

Description

Potentially breaking change: hwaccel.yml is renamed to hwaccel.transcoding.yml and the way it's used in the docker-compose is changed. Existing docker-compose / hwaccel.yml setups will continue to work, but if a user who used the hwaccel.yml file updates their docker-compose.yml , they will need to change to the new format for it to work (or keep the older extends section).

This PR adds hardware acceleration support for Nvidia and Intel devices through CUDA and OpenVINO. It uses prebuilt onnxruntime packages for these APIs and updates the ML Dockerfile to conditionally target a device based on the DEVICE build arg. There is a check at runtime to detect the available execution providers and set them accordingly.

Current limitations:

  • onnxruntime-openvino doesn't currently support Python 3.11, so targeting an OpenVINO build will also target 3.10

  • The CUDA image is massive, but it can't be helped

Edit: I'm removing TensorRT support as it's slow to load and uses much more RAM than normal CUDA.

How has this been tested?

I ran the CPU, CUDA and OpenVINO variants, confirmed successful responses for each task when querying with Postman, and confirmed the CUDA and OpenVINO variants were running on GPU. For OpenVINO, I tested on Linux, but also included the WSL2 configuration recommended by Intel for the OpenVINO image it uses.

While testing, I also ended up increasing test coverage from 72% to 80%.

@mertalev mertalev changed the title feat(ml): tensorrt and openvino acceleration feat(ml): cuda and openvino acceleration Dec 11, 2023
@mertalev mertalev marked this pull request as ready for review December 11, 2023 02:52
@yodatak
Copy link

yodatak commented Dec 11, 2023

Does this acceleration apply to darktable generation with opencl and thumbnails? For large collections it could be Awsome thanks a lot for this

Copy link

cloudflare-pages bot commented Dec 14, 2023

Deploying with  Cloudflare Pages  Cloudflare Pages

Latest commit: b68f17e
Status: ✅  Deploy successful!
Preview URL: https://e27c0135.immich.pages.dev
Branch Preview URL: https://feat-ml-tensorrt.immich.pages.dev

View logs

@kkoshelev
Copy link

Got some issues with this build.

[12/14/23 18:23:57] INFO     Starting gunicorn 21.2.0                           
[12/14/23 18:23:57] INFO     Listening at: http://0.0.0.0:3003 (9)              
[12/14/23 18:23:57] INFO     Using worker: uvicorn.workers.UvicornWorker        
[12/14/23 18:23:57] INFO     Booting worker with pid: 25                        
[12/14/23 18:24:10] INFO     Created in-memory cache with unloading after 300s  
                             of inactivity.                                     
[12/14/23 18:24:11] INFO     Initialized request thread pool with 16 threads.   
[12/14/23 18:26:37] INFO     Loading clip model 'ViT-B-32__openai'              
Exception in ASGI application
Traceback (most recent call last):
  File "/opt/venv/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 435, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/opt/venv/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/opt/venv/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/opt/venv/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/main.py", line 82, in predict
    model = await load(await app.state.model_cache.get(model_name, model_type, **kwargs))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/models/cache.py", line 55, in get
    model = from_model_type(model_type, model_name, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/models/__init__.py", line 21, in from_model_type
    return FaceRecognizer(model_name, **model_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/models/facial_recognition.py", line 27, in __init__
    super().__init__(clean_name(model_name), cache_dir, **model_kwargs)
TypeError: InferenceModel.__init__() got an unexpected keyword argument 'maxDistance'
Exception in ASGI application
Traceback (most recent call last):
  File "/opt/venv/lib/python3.11/site-packages/uvicorn/protocols/http/httptools_impl.py", line 435, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/opt/venv/lib/python3.11/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/opt/venv/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/opt/venv/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/opt/venv/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/opt/venv/lib/python3.11/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
               ^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/main.py", line 82, in predict
    model = await load(await app.state.model_cache.get(model_name, model_type, **kwargs))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/models/cache.py", line 55, in get
    model = from_model_type(model_type, model_name, **model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/models/__init__.py", line 21, in from_model_type
    return FaceRecognizer(model_name, **model_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/app/models/facial_recognition.py", line 27, in __init__
    super().__init__(clean_name(model_name), cache_dir, **model_kwargs)
TypeError: InferenceModel.__init__() got an unexpected keyword argument 'maxDistance'
[12/14/23 18:26:40] INFO     Loading image classification model                 
                             'microsoft/resnet-50'                              
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
/opt/venv/lib/python3.11/site-packages/transformers/models/convnext/feature_extraction_convnext.py:28: FutureWarning: The class ConvNextFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use ConvNextImageProcessor instead.
  warnings.warn(
[12/14/23 18:31:51] INFO     Shutting down due to inactivity.                   
[12/14/23 18:31:52] ERROR    Worker (pid:25) exited with code 1                 
[12/14/23 18:31:52] ERROR    Worker (pid:25) exited with code 1.                
[12/14/23 18:31:52] INFO     Booting worker with pid: 85        
[Nest] 7  - 12/14/2023, 6:26:37 PM     LOG [MediaService] Start encoding video dbbfbc1b-1b7d-424f-8bea-fc8957d7811e {"inputOptions":["-init_hw_device cuda=cuda:0","-filter_hw_device cuda"],"outputOptions":["-tune hq","-qmin 0","-rc-lookahead 20","-i_qfactor 0.75","-c:v h264_nvenc","-c:a aac","-movflags faststart","-fps_mode passthrough","-map 0:0","-map 0:1","-g 256","-v verbose","-vf format=nv12,hwupload_cuda","-preset p6","-cq:v 18"],"twoPass":false}
[Nest] 7  - 12/14/2023, 6:26:37 PM     LOG [MediaService] Successfully generated WEBP video thumbnail for asset dbbfbc1b-1b7d-424f-8bea-fc8957d7811e
[Nest] 7  - 12/14/2023, 6:26:39 PM     LOG [MediaService] Successfully generated JPEG image thumbnail for asset 01ea340d-cf30-4c11-b42f-0802e0e0ab81
[Nest] 7  - 12/14/2023, 6:26:39 PM     LOG [MediaService] Start encoding video dbbfbc1b-1b7d-424f-8bea-fc8957d7811e {"inputOptions":["-init_hw_device cuda=cuda:0","-filter_hw_device cuda"],"outputOptions":["-tune hq","-qmin 0","-rc-lookahead 20","-i_qfactor 0.75","-c:v h264_nvenc","-c:a aac","-movflags faststart","-fps_mode passthrough","-map 0:0","-map 0:1","-g 256","-v verbose","-vf format=nv12,hwupload_cuda","-preset p6","-cq:v 18"],"twoPass":false}
[Nest] 7  - 12/14/2023, 6:26:40 PM   ERROR [JobService] Unable to run job handler (recognizeFaces/recognize-faces): Error: Request for facial recognition failed with status 500: Internal Server Error
[Nest] 7  - 12/14/2023, 6:26:40 PM   ERROR [JobService] Error: Request for facial recognition failed with status 500: Internal Server Error
    at MachineLearningRepository.post (/usr/src/app/dist/infra/repositories/machine-learning.repository.js:18:19)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async PersonService.handleRecognizeFaces (/usr/src/app/dist/domain/person/person.service.js:247:23)
    at async /usr/src/app/dist/domain/job/job.service.js:112:37
    at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:387:28)
    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:574:24)
[Nest] 7  - 12/14/2023, 6:26:40 PM   ERROR [JobService] Object:
{
  "id": "dbbfbc1b-1b7d-424f-8bea-fc8957d7811e",
  "source": "upload"
}
[Nest] 7  - 12/14/2023, 6:26:40 PM   ERROR [JobService] Unable to run job handler (recognizeFaces/recognize-faces): Error: Request for facial recognition failed with status 500: Internal Server Error
[Nest] 7  - 12/14/2023, 6:26:40 PM   ERROR [JobService] Error: Request for facial recognition failed with status 500: Internal Server Error
    at MachineLearningRepository.post (/usr/src/app/dist/infra/repositories/machine-learning.repository.js:18:19)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async PersonService.handleRecognizeFaces (/usr/src/app/dist/domain/person/person.service.js:247:23)
    at async /usr/src/app/dist/domain/job/job.service.js:112:37
    at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:387:28)
    at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:574:24)
[Nest] 7  - 12/14/2023, 6:26:40 PM   ERROR [JobService] Object:
{
  "id": "01ea340d-cf30-4c11-b42f-0802e0e0ab81",
  "source": "upload"
}
[Nest] 7  - 12/14/2023, 6:26:41 PM     LOG [MediaService] Encoding success dbbfbc1b-1b7d-424f-8bea-fc8957d7811e
[Nest] 7  - 12/14/2023, 6:26:42 PM     LOG [MediaService] Encoding success dbbfbc1b-1b7d-424f-8bea-fc8957d7811e
[Nest] 7  - 12/14/2023, 6:26:42 PM     LOG [MediaService] Successfully generated WEBP image thumbnail for asset 01ea340d-cf30-4c11-b42f-0802e0e0ab81

@mertalev
Copy link
Contributor Author

Thanks for testing! It looks like I forgot to change the maxDistance being sent for facial recognition requests.

Are you just running on CPU, or are you trying to use with an acceleration device? Could you try disabling facial recognition for now just to see that the other jobs work?

@kkoshelev
Copy link

kkoshelev commented Dec 14, 2023

I'm using CUDA, see below.

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  GRID P4-1B4                    On  | 00000000:06:10.0 Off |                  N/A |
| N/A   N/A    P8              N/A /  N/A |      0MiB /  7488MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

I had to revert back the release build (1.90.2)

@mertalev
Copy link
Contributor Author

That issue should be fixed now.

@mertalev mertalev force-pushed the feat/ml-tensorrt branch 5 times, most recently from 0622173 to 7361e69 Compare December 21, 2023 23:58
@mertalev
Copy link
Contributor Author

Rebased on #5667 so that should be merged first

@mertalev
Copy link
Contributor Author

I did some testing with this and both CUDA and OpenVINO work correctly and actually use the GPU.

@mertalev mertalev changed the title feat(ml): cuda and openvino acceleration feat(ml)!: cuda and openvino acceleration Jan 14, 2024
@mertalev mertalev requested a review from fyfrey January 16, 2024 22:57
Copy link
Contributor

@jrasm91 jrasm91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. There are a few places you need to replace hwaccel.yml still. Notable in the prepare-release.yml file as well as links in the docs to the release artifact.

.github/workflows/docker.yml Outdated Show resolved Hide resolved
.github/workflows/docker.yml Outdated Show resolved Hide resolved
Copy link
Contributor

@fyfrey fyfrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Really like the overhaul of the docker hwaccel, so much more consistent and streamlined now. btw, thanks for adding the documentation for ARMNN :-)

docker/docker-compose.dev.yml Outdated Show resolved Hide resolved
docker/docker-compose.prod.yml Outdated Show resolved Hide resolved
@mertalev mertalev requested a review from bo0tzz January 18, 2024 23:28
docs/docs/features/hardware-transcoding.md Outdated Show resolved Hide resolved
@tbleiker
Copy link

tbleiker commented Jan 21, 2024

I tested the feat/ml-tensorrt branch yesterday: I let the ml container run on my computer (Archlinux, GTX 1060) and adjusted the url for the machine learning server accordingly. @mertalev: Everything worked so far! The face detection step was way faster then on my server CPU (Xeon E3-1220v3). However, the facial recognition part still seemed to be running on the server. Is this not done by the ml container?

@mertalev
Copy link
Contributor Author

I tested the feat/ml-tensorrt branch yesterday: I let the ml container run on my computer (Archlinux, GTX 1060) and adjusted the url for the machine learning server accordingly. @mertalev: Everything worked so far! The face detection step was way faster then on my server CPU (Xeon E3-1220v3). However, the facial recognition part still seemed to be running on the server. Is this not done by the ml container?

Really happy to hear that!

Yes, the clustering is all done on CPU. The face detection outputs are stored in Postgres and queried with a special vector search index. The ML service is designed to be very independent - it doesn't integrate with Postgres, Redis, etc. and has no knowledge about what earlier model outputs were. All of that is orchestrated by immich-microservices.

@alextran1502
Copy link
Contributor

I think we can merge this after splitting the documentation into a separate PR that will get merged after the next release.

Please feel free to press the green button after doing so! Thank you so much

@mertalev mertalev merged commit 95cfe22 into main Jan 21, 2024
24 checks passed
@mertalev mertalev deleted the feat/ml-tensorrt branch January 21, 2024 23:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants