Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Processing on long video with high resolution #12

Closed
devidlatkin opened this issue May 26, 2021 · 4 comments
Closed

Processing on long video with high resolution #12

devidlatkin opened this issue May 26, 2021 · 4 comments

Comments

@devidlatkin
Copy link

Hello!
Thank you for the amazing framework!

I have an issue while processing on long video with high resolution. I ran out of GPU memory.
As I understand, mivos tries to upload all images directly to GPU and if the video is too long or in high-resolution mivos can't handle such cases.
Is there is a way to fix this issue? Maybe modify code to work with data chunks?

Thank you in advance!

@hkchengrex
Copy link
Owner

hkchengrex commented May 26, 2021

That is correct. Perhaps you can try a different memory profile (e.g. 2, then the images are not cached on GPU):

parser.add_argument('--mem_profile', default=0, type=int, help='0 - Faster and more memory intensive; 2 - Slower and less memory intensive. Default: 0.')

I think it will still struggle if the video is really high-resolution and long... You can also increase mem_freq:

parser.add_argument('--mem_freq', default=5, type=int)

@devidlatkin
Copy link
Author

Yeah, it works on CPU (--mem_profile 2) with my video but it takes a lot of time to propagate so it is almost impossible to annotate in real-time =(
It can work on GPU only if I use small chunks of 30 frames but it is really inconvenient to annotate the whole video by such small chunks...

I'm thinking about modifying interactive_gui.py to do propagation by chunks inside the framework. Is it possible to implement such logic? Does it violate some method restrictions?

@hkchengrex
Copy link
Owner

With mem_profile=2 the algorithm still runs on the GPU. There are just a lot more CPU-to-GPU communications (images are stored on CPU, send to GPU for encoding and propagation, and the results are sent back to CPU) which are still unavoidable even if you do them in chunks... You can perhaps try half precision.

Doing them in chunks does have the benefit of batching the CPU-GPU-CPU transfer so it might still be worth doing. The performance would not be the same, but you can try to implement a behavior where the algorithm discards most frames (except maybe last 5 for memory features) after 30 frames and loads the next 30 frames into the GPU for propagation and so on.

@devidlatkin
Copy link
Author

Ok, got it.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants