Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limitation in processing number of video frames according to GPU memory? #36

Closed
instant-high opened this issue Jul 4, 2021 · 13 comments

Comments

@instant-high
Copy link

Since I got it to work on my GForce 1050GTX / 2GB , at least for videos not longer than ~ 16 frames, before the GPU runs out of memory I wonder if there is also a limitation for using a 8 GB GPU ?

I had the same problem using Wav2Lip, but it could be solved by setting the chunk size to 1.

Would it (theoretically) be possible to process videos in SimSwap in smaller parts or chunks by releasing GPU memory every 15 frames ?

@ExponentialML
Copy link

ExponentialML commented Jul 4, 2021

There are multiple things you could do.

  1. Lower the size of your input videos.

  2. Split the chunks into separate files, then loop over them or do it one by one (painful).

  3. Modify videoswap.py using the below as a starting point.

    frame_count = int(video.get(cv2.CAP_PROP_FRAME_COUNT))

  4. Use a subprocess and use ffmpeg to split the video into chunks, then do a for loop over each video chunk using the python script, then merge them after the fact in a video editor or with ffmpeg. For example:

~pseudo code~ 

for video_file in video_file_directory:
    python test_video_swapsingle.py video_file ...

I would go with number 3, with pseudo code being something like.

this isn't tested, it's just to give you an idea

current_frame = 0
max_frame = 14

for frame_index in tqdm(range(frame_count)): 
        ret, frame = video.read()
        if  ret:
           current_frame += 1
           if current_frame == max_frame:
               Do something to empty video memory here
               current_frame = 0   
           
            detect_results = detect_model.get(frame,crop_size)

            if detect_results is not None:
            .....

Like I said, I haven't tested it, but it could be a bit of work to get it implemented from scratch as I haven't looked into how the models are loaded into memory yet. The above is definitely enough to work out your own solution though without messing with torch though.

@instant-high
Copy link
Author

instant-high commented Jul 4, 2021

Ok.
Added Line #50 in util/videoswap.py
torch.cuda.empty_cache()
This lets me process 99 frames before out of memory....
I'll try to free memory also in the second for loop

@ExponentialML
Copy link

Ok.
Added Line #50 in util/videoswap.py
torch.cuda.empty_cache()
This lets me process 99 frames before out of memory....
I'll try to free memory also in the second for loop

Great to hear. I would try to create a little wrapper function where you can tune your own parameters (max frame count), and plug it into line 50 where it executes torch.cuda.empty_cache() every nth frame.

@instant-high
Copy link
Author

Yes.
But why does it run out of memory after 99 frames even if i call "empty_cache" after each frame?
I cannot find anything filling the cache. Searched all other scripts in simswap.
Btw.:
I don't know much about python... just beginner after 30 years coding in (visual)basic and a little c++

@instant-high
Copy link
Author

instant-high commented Jul 4, 2021

So I need little bit of help.
I've inserted the following code:

for frame_index in tqdm(range(frame_count)):
    torch.cuda.empty_cache() 
    ret, frame = video.read()
    if frame_index == 98:
        print (frame_index)
        input("Press Enter to continue...")
        break

Then it begins to write the video_file(1) containing the first 98 frames

Is there a way to jump back to video_swap but continue with frame 99 for the next 98 frames?
Call video_swap or something like goto video_swap?
After the break it just had to write video_file(2)
And so on...
Don't know if this would work and how to....

EDIT.
Got it to work as written above, but when calling swap_video again (break after 10 frames) it runs out of memory immediately.

@ExponentialML
Copy link

The way torch.cuda.empty_cache() works is it only frees the memory that it's able to. Remember what I said about me not being aware of how the models are loaded into memory with this project? This is what I was referring to. They may be instantiated in different parts of the script, so it may be a bit more work, but you can try what I said below.

Also, you're running that torch call every frame which isn't necessary / can lead to some issues. Also, you don't need to use user input to go to the next iteration. It probably runs out of memory because it's still executing in the background while waiting for your input. Try this instead (untested as I'm away from my machine).

# Add these two lines above the for loop.
current_frame = 0 
max_frame = 14 

for frame_index in tqdm(range(frame_count)): 
        ret, frame = video.read()
        if  ret:
        # If ret returns true, increment the current_frame counter by 1.
           current_frame += 1 
           # if the current frame count equals the max frame count, do something.
           if current_frame == max_frame: 
              # Let's empty the cache.
               torch.cuda.empty_cache() 
              # Reset the counter back to 0.
               current_frame = 0   

@instant-high
Copy link
Author

The user input is just for some testing purpose after 98 frames.
Resetting frame index to 0 would process the same part of the input video and overwrite the temporary image sequence...
I think I found a solution how to process the whole input via batch and some additional parameters in video_swapsingle.py (start frame, end frame) without the need to split it into shorter parts.
But I have a daytime job now.....

@ExponentialML
Copy link

ExponentialML commented Jul 5, 2021

Resetting frame index to 0 would process the same part of the input video and overwrite the temporary image sequence...

Read my proposed code again please. It's not about setting the frame index to 0, it's about creating a separate counter variable that increases a certain amount of times in the loop, and once it hits a certain limit (max_frame), the counter resets.

You said that your GPU runs out of memory every 15th frame or so. In theory, clearing your GPU cache via torch's methods (which may not work) or doing something to alleviate GPU resources every 15 frames would prevent you from doing all of the extra steps you've mentioned.

@instant-high
Copy link
Author

instant-high commented Jul 5, 2021

Looks like i accidently got it to work on 2GB GPU VRAM....

Not the way I initially planned .... but it works.
No problem processing testvideo duration 23 sec. / 1396 frames.

As soon as I have cleaned up the code I will post the changes I've made.
(test_video_swapsingle.py / videoswap.py / test_options.py)

EDIT:
I will write a simple GUI (VB6 :-)

@instant-high
Copy link
Author

instant-high commented Jul 6, 2021

Here are the changes I made to run SimSwap on 2GB VRAM:

./options/test_options.py

self.parser.add_argument("--first_frame", dest="first_frame", type=int, default=0, help="Set frame to start from.")


./util/videoswap.py
.
.
.
from util.add_watermark import watermark_image
#frame_index = 0
first_frame = 0
.
.
.
def video_swap(first_frame , video_path, id_vetor, swap_model, detect_model, save_path, temp_results_dir='./temp_results', crop_size=224, no_simswaplogo = False):
.
.
.
for frame_index in tqdm(range(first_frame,frame_count)):
torch.cuda.empty_cache()
ret, frame = video.read()
if frame_index == 1:
break
.
.
.
video.release()
if frame_index > 1:
image_filename_list = []
path = os.path.join(temp_results_dir,'*.jpg')
image_filenames = sorted(glob.glob(path))
clips = ImageSequenceClip(image_filenames,fps = fps)


test_video_swapsingle.py
.
.
.
first_frame = 0
video_swap(first_frame, opt.video_path, latend_id, model, app,
opt.output_path,temp_results_dir=opt.temp_path,no_simswaplogo=opt.no_simswaplogo)

first_frame = 2
video_swap(first_frame, opt.video_path, latend_id, model, app, opt.output_path,temp_results_dir=opt.temp_path,no_simswaplogo=opt.no_simswaplogo)


test_video_swapsingle.py calls video_swap in ./util/videoswap.py and processes the first 2 frames before the break
then it calls video_swap again, starting at frame 2 and runs until the end of the input file. Tested so far on 2600 frames but there seems to be no limit.
torch.cuda.empty_cache() clears the VRAM before processing every single frame...


Not perfect but (working for me).....

@instant-high
Copy link
Author

Just found a more simple solution for "cuda out of memory" problem while running SimSwap on 2GB GPU:

I only insert a with torch.no_grad(): command in ../util/videoswap.py between lines 48 and 49
(and add 4 more spaces indent to every following line from 49 to 84)

and it works perfect

@NNNNAI
Copy link
Collaborator

NNNNAI commented Jul 18, 2021

Just found a more simple solution for "cuda out of memory" problem while running SimSwap on 2GB GPU:

I only insert a with torch.no_grad(): command in ../util/videoswap.py between lines 48 and 49
(and add 4 more spaces indent to every following line from 49 to 84)

and it works perfect

OMG, I forgot to add this .I will make it done in the next update.

@instant-high
Copy link
Author

:-)
Came across it making some more mods to first order motion model and co-part segmentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants