Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(deps): update onnxruntime-openvino #7854

Merged
merged 7 commits into from Mar 16, 2024

Conversation

mertalev
Copy link
Contributor

Description

This should hopefully fix some of the issues around OpenVINO.

Can you test this? @agrawalsourav98

Copy link

cloudflare-pages bot commented Mar 11, 2024

Deploying immich with  Cloudflare Pages  Cloudflare Pages

Latest commit: 9c93286
Status: ✅  Deploy successful!
Preview URL: https://204d279c.immich.pages.dev
Branch Preview URL: https://chore-upgrade-onnxruntime-op.immich.pages.dev

View logs

@agrawalsourav98
Copy link
Contributor

@mertalev Yep, let me have a look and see if this fixes the issue with model loading in different scenarios.

@agrawalsourav98
Copy link
Contributor

@mertalev Yep, let me have a look and see if this fixes the issue with model loading in different scenarios.

I have noticed that there is in-ordinate amount of memory usage when loading the model inside the app and outside it. Inside the app ViT-B-32__openai takes almost 6 GBs of RAM whereas outside (in a interpreter) it takes 1.5 GBs. I think if you can fix the docker issue I can run it inside a docker environment to confirm this behavior. If this behaviour is true we should ideally investigate before we mark this a fix to openvino issues.

@mertalev
Copy link
Contributor Author

Hmm, that's really weird. I wonder if it's using that fancy AUTO mode where it runs things on CPU until the GPU is ready.

@dvdblg
Copy link
Contributor

dvdblg commented Mar 11, 2024

@mertalev maybe I'm missing something, but can't we use the 2023.3.0 openvino docker image? The release page at intel onnxrutime fork mentions support for 2023.3

@mertalev
Copy link
Contributor Author

Oh, you're right. I was looking at the normal 1.17 release where it says Added support for OpenVINO 2023.2..

@agrawalsourav98
Copy link
Contributor

Yeah. Can debug. Even if using auto there is no separate memory for the integrated GPU, it would still load things into the computer's memory. Let me play with some options to see the best option we have.

@mertalev
Copy link
Contributor Author

I looked at it some more and think the app vs. local difference you noticed might just be the fact that it continues compiling the model in the background. When you run inference the first time, there's a background thread that increases RAM usage as it runs.

I tried a few things in the provider options, like setting the cache_dir, disable_dynamic_shapes=True, num_streams=1, etc., but the memory usage barely changed. Their docs mention lowering compilation threads, but it already looks single-threaded to me.

But I also haven't compared with the 1.15 version. How much does it use now?

@agrawalsourav98
Copy link
Contributor

agrawalsourav98 commented Mar 14, 2024

This silliest of things. We shouldn't use mimalloc. That is causing this issue. I don't think onnxruntime has good support for mimalloc on linux. Without the LD_PRELOAD stuff in start.sh the memory consumption was very less and I was able to run ViT-H-14__laion2b-s32b-b79k and run it on OpenVINO GPU. One thing we should add to README is to use a larger WORKER_TIMEOUT as large models take long time to load on GPU and by the time they are loaded, they sort of enter a race condition.

@agrawalsourav98
Copy link
Contributor

And we can remove the chdir stuff as that is not required anymore with this fix.

@mertalev
Copy link
Contributor Author

Good catch! mimalloc is generally great with ONNX Runtime and recommended by them. We rely on it to avoid memory fragmentation and arenas being created for each session. I think this is specifically OpenVINO not working well with it.

@agrawalsourav98
Copy link
Contributor

Good catch! mimalloc is generally great with ONNX Runtime and recommended by them. We rely on it to avoid memory fragmentation and arenas being created for each session. I think this is specifically OpenVINO not working well with it.

That might be the case. I think we can just disable it for the openvino docker image, by using an arg with the shell script or something.

@mertalev mertalev force-pushed the chore/upgrade-onnxruntime-openvino branch from c20f8d5 to 5ccdacb Compare March 15, 2024 22:15
@mertalev
Copy link
Contributor Author

I tested the current version of this PR with both the default model and ViT-H-14-378-quickgelu__dfn5b on a 13700H and all is well. It seems like they've fixed things for the most part. The memory usage being so different with mimalloc and jemalloc is strange, though, so I think I'll make an issue for that.

@mertalev mertalev merged commit 3a045b3 into main Mar 16, 2024
24 checks passed
@mertalev mertalev deleted the chore/upgrade-onnxruntime-openvino branch March 16, 2024 04:04
@mertalev mertalev mentioned this pull request Mar 16, 2024
3 tasks
@dvdblg
Copy link
Contributor

dvdblg commented Mar 16, 2024

I'm sorry but with the latest changes of this PR I'm not able to get smart search to work.

With the current release, I'm able to finish the smart search job but then when using the search function it crashes, because of the bugs with OpenVINO which should be fixed with this PR.

With this PR, tested both with immich-machine-learning:pr-7854-openvino and with immich-machine-learning:main-openvino, I get this error:

[03/16/24 09:46:15] INFO     Starting gunicorn 21.2.0                           
[03/16/24 09:46:15] INFO     Listening at: http://0.0.0.0:3003 (9)              
[03/16/24 09:46:15] INFO     Using worker: app.config.CustomUvicornWorker       
[03/16/24 09:46:15] INFO     Booting worker with pid: 13                        
[03/16/24 09:46:20] WARNING  Matplotlib created a temporary cache directory at  
                             /tmp/matplotlib-bpxxgisk because the default path  
                             (/.config/matplotlib) is not a writable directory; 
                             it is highly recommended to set the MPLCONFIGDIR   
                             environment variable to a writable directory, in   
                             particular to speed up the import of Matplotlib and
                             to better support multiprocessing.                 
[03/16/24 09:46:22] INFO     Started server process [13]                        
[03/16/24 09:46:22] INFO     Waiting for application startup.                   
[03/16/24 09:46:22] INFO     Created in-memory cache with unloading after 300s  
                             of inactivity.                                     
[03/16/24 09:46:22] INFO     Initialized request thread pool with 4 threads.    
[03/16/24 09:46:22] INFO     Application startup complete.                      
[03/16/24 09:47:21] INFO     Setting 'ViT-B-32__openai' execution providers to  
                             ['OpenVINOExecutionProvider',                      
                             'CPUExecutionProvider'], in descending order of    
                             preference                                         
[03/16/24 09:47:21] INFO     Loading clip model 'ViT-B-32__openai' to memory    
[03/16/24 09:48:23] CRITICAL WORKER TIMEOUT (pid:13)                            
[03/16/24 09:48:23] ERROR    Worker (pid:13) was sent SIGABRT! 

Also the strange thing is that I'm seeing very high CPU usage, and by checking with intel_gpu_top it seems the GPU is not being used (whereas with the current release it is being used during the smart search job).

This is using the default model and concurrency=1

@agrawalsourav98
Copy link
Contributor

agrawalsourav98 commented Mar 16, 2024

I'm sorry but with the latest changes of this PR I'm not able to get smart search to work.

With the current release, I'm able to finish the smart search job but then when using the search function it crashes, because of the bugs with OpenVINO which should be fixed with this PR.

With this PR, tested both with immich-machine-learning:pr-7854-openvino and with immich-machine-learning:main-openvino, I get this error:

[03/16/24 09:46:15] INFO     Starting gunicorn 21.2.0                           
[03/16/24 09:46:15] INFO     Listening at: http://0.0.0.0:3003 (9)              
[03/16/24 09:46:15] INFO     Using worker: app.config.CustomUvicornWorker       
[03/16/24 09:46:15] INFO     Booting worker with pid: 13                        
[03/16/24 09:46:20] WARNING  Matplotlib created a temporary cache directory at  
                             /tmp/matplotlib-bpxxgisk because the default path  
                             (/.config/matplotlib) is not a writable directory; 
                             it is highly recommended to set the MPLCONFIGDIR   
                             environment variable to a writable directory, in   
                             particular to speed up the import of Matplotlib and
                             to better support multiprocessing.                 
[03/16/24 09:46:22] INFO     Started server process [13]                        
[03/16/24 09:46:22] INFO     Waiting for application startup.                   
[03/16/24 09:46:22] INFO     Created in-memory cache with unloading after 300s  
                             of inactivity.                                     
[03/16/24 09:46:22] INFO     Initialized request thread pool with 4 threads.    
[03/16/24 09:46:22] INFO     Application startup complete.                      
[03/16/24 09:47:21] INFO     Setting 'ViT-B-32__openai' execution providers to  
                             ['OpenVINOExecutionProvider',                      
                             'CPUExecutionProvider'], in descending order of    
                             preference                                         
[03/16/24 09:47:21] INFO     Loading clip model 'ViT-B-32__openai' to memory    
[03/16/24 09:48:23] CRITICAL WORKER TIMEOUT (pid:13)                            
[03/16/24 09:48:23] ERROR    Worker (pid:13) was sent SIGABRT! 

Also the strange thing is that I'm seeing very high CPU usage, and by checking with intel_gpu_top it seems the GPU is not being used (whereas with the current release it is being used during the smart search job).

This is using the default model and concurrency=1

Might be due to a low value of worker timeout. Can you try to increase worker timeout? Like set it to 600 or something.

@dvdblg
Copy link
Contributor

dvdblg commented Mar 16, 2024

Might be due to a low value of worker timeout. Can you try to increase worker timeout? Like set it to 600 or something.

Thank you, I've increased the timeout to 600 and now I am able to run the smart search job and also to make searches!
To make the search work I also had to increase the memory limit to 10GB otherwise I got a SIGKILL because of running out of memory.

I am using the model XLM-Roberta-Large-Vit-B-32, do you think around 7-8 GB of memory usage is normal for this model? Also, just to have a reference, what is your memory usage during search and which model are you using?

@mertalev
Copy link
Contributor Author

Do you have the request thread pool disabled? I don't see why this should block the server since it runs in a background thread. Or maybe it's just stressing the CPU enough that everything slows to a crawl.

The memory usage when running ViT-H-14-378-quickgelu__dfn5b was definitely much higher than for CUDA or CPU. I think the smart search job used about 8gb. Based on the OpenVINO docs, this seems to be intended behavior since there's a separate memory-intensive compilation step.

Caching the compiled model would at least make this a one-time thing since it can be reused.

@dvdblg
Copy link
Contributor

dvdblg commented Mar 16, 2024

Do you have the request thread pool disabled? I don't see why this should block the server since it runs in a background thread. Or maybe it's just stressing the CPU enough that everything slows to a crawl.

Nope, I didn't set the MACHINE_LEARNING_REQUEST_THREADS variable in my .env file. The CPU is an Intel N100, so it is not a beast.

The memory usage when running ViT-H-14-378-quickgelu__dfn5b was definitely much higher than for CUDA or CPU. I think the smart search job used about 8gb. Based on the OpenVINO docs, this seems to be intended behavior since there's a separate memory-intensive compilation step.

The model I am using is about half the parameters size of the ViT-H-14-378-quickgelu__dfn5b model, so the memory usage should be at least a bit lower than 8gb, right?

Anyway I am happy to be able to use the smart search feature, thanks to both of you!

@dvdblg
Copy link
Contributor

dvdblg commented Mar 16, 2024

@mertalev
small update: I tried adding cache_dir to OpenVINO EP as you suggested and It indeed greatly reduced memory usage, just ~2.5gb during smart search with the same model I mentioned above.

Should I make a small PR for this? Also, should it be an option in the form of an environment variable?

@mertalev
Copy link
Contributor Author

A PR would be great! And no, we can just make the cache dir an openvino folder next to the normal model.onnx, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants