Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU High #17

Closed
xllusion-dong opened this issue Jul 5, 2024 · 12 comments
Closed

CPU High #17

xllusion-dong opened this issue Jul 5, 2024 · 12 comments

Comments

@xllusion-dong
Copy link

When runing sample video d6, the cpu reach 100% for long time.
1720194306609

Any improvment for that?

@Celtmant
Copy link

Celtmant commented Jul 6, 2024

I agree heavily loads the processor, as well as after RAM! I would like to note here is another thing, if you load not such a large image and do not choose a fully long video will not be much easier, perhaps reduce upscale

@funwithforks
Copy link

pip install onnxruntime-gpu

@xllusion-dong
Copy link
Author

Already install onnxruntime-gpu, but it seems sometime it consume cpu,sometime it cost GPU. I will keep on watching it to find out the reason.

@wandrzej
Copy link

wandrzej commented Jul 7, 2024

Overall I wonder about the performance. The paper claims to achieve 12.8 ms per frame, but in my case it's far from it and for 3/4 of the time it's not even utilizing the GPU, nor a CPU (both are at 20 and 10% respectively), so wonder apart from the onnx issue, is there anything else that could be a bottleneck - looks like a single core process is running and blocking the whole thing.

@funwithforks
Copy link

I don't have the issue so I have no input beyond onnx, but after getting that going my 4090 is at 55% steadily during the run. For reference. CPU on process is %345. Without GPU it was much higher CPU.

@LubuLubu2
Copy link

For me it uses 100% cpu + 100% gpu and around 2.5GB v-ram the entier generation time with a 832x1152 resolution image. but it generates pretty quickly, 3060ti.

@Celtmant
Copy link

Celtmant commented Jul 10, 2024

For me it uses 100% cpu + 100% gpu and around 2.5GB v-ram the entier generation time with a 832x1152 resolution image. but it generates pretty quickly, 3060ti.

And there's even more to it than that. Video with longer duration sucked a lot of CPU and RAM resources. My RAM was eating up almost all 29 gigabytes and the computer was freezing. I used "pip install onnxruntime-gpu" and it became not much easier, but with long videos, the RAM clogged and I'm afraid there may be everything. I have a 360rtx/12 video card, 32 RAM.

@LubuLubu2
Copy link

For me it uses 100% cpu + 100% gpu and around 2.5GB v-ram the entier generation time with a 832x1152 resolution image. but it generates pretty quickly, 3060ti.

And there's even more to it than that. Video with longer duration sucked a lot of CPU and RAM resources. My RAM was eating up almost all 29 gigabytes and the computer was freezing. I used "pip install onnxruntime-gpu" and it became not much easier, but with long videos, the RAM clogged and I'm afraid there may be everything. I have a 360rtx/12 video card, 32 RAM.

Yep, 35seconds example video or i even tried 1minuted can eat all resources that you have and if you don't have enough your pc will freeze for minutes :)) mine was frozen for 15 minutes for a 1min video, again generation is fine, but at the end every single frame of lets say 1minutes video at 24fps have to be proccessed, i mean that created more than a thousand images and eats all your ram. 20second or less is fine, for longer videos we have to limited frame cap and generate couple of videos and join them together later.

@kosmicdream
Copy link

Same problem here, I've been trying to run the example video on an A40 instance on Runpod and everything freezes.

@wandrzej
Copy link

On my side, I think it's not really a matter of some bottleneck, I have 128GB RAM, so some frame off-loading is not the problem. Same with vram - 24GB. I do have onnx-gpu installed, but it's 1.5 I believe, maybe there's a version mismatch, but even that wouldn't explain the low load on both CPU and GPU in the pre-processing phase.

Anyway this could work way more efficiently, and given the low utilization numbers provided by others, I think with proper use of both CPU and GPU the claimed 12.8ms per frame is possible, regardless of the length of the video. It could be that this is an issue with comfy itself, that it needs to finish one 'block' from the pre-process node, before moving to the generation.

@kijai
Copy link
Owner

kijai commented Jul 11, 2024

On my side, I think it's not really a matter of some bottleneck, I have 128GB RAM, so some frame off-loading is not the problem. Same with vram - 24GB. I do have onnx-gpu installed, but it's 1.5 I believe, maybe there's a version mismatch, but even that wouldn't explain the low load on both CPU and GPU in the pre-processing phase.

Anyway this could work way more efficiently, and given the low utilization numbers provided by others, I think with proper use of both CPU and GPU the claimed 12.8ms per frame is possible, regardless of the length of the video. It could be that this is an issue with comfy itself, that it needs to finish one 'block' from the pre-process node, before moving to the generation.

Their code has a lot of inefficiencies, I don't know if the their speed claim is about the whole process or part of it. For example skipping the pasteback gives ~30% speed boost.

For reference the numbers I'm currently getting for video editing in the develop branch with 4090, for the detection/cropping part, using CUDA for onnx: 33it/s

And the rest, which uses mostly GPU but there's also lots of CV2/numpy operations that are done on CPU, I'm getting 12it/s on Ryzen 7950x

So something like ~14 fps without pasteback and ~11 fps with.

@kijai
Copy link
Owner

kijai commented Jul 11, 2024

Oh, and about the memory issue...that's common in Comfy when the frame count gets really high, it's not really designed to handle that in general as everything is kept in memory with no disk caching.

@kijai kijai closed this as completed Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants