Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make continuing video processing easier #433

Closed
wants to merge 7 commits into from

Conversation

bitplane
Copy link
Contributor

@bitplane bitplane commented Sep 10, 2022

  1. Increase the number of digits for frames so they sort easily
  2. Process frames in alphanumeric order, rather than arbitrary order decided by the filesystem.
  3. Add a restart resume option which won't download the video again, and will skip frames that have already been processed. It doesn't skip the copying or extraction steps though
  4. Put some log messages in so that you can see what step it's up to.

So if you're worried about running out of time on colab, you can tar up the dest images every couple of hours and move them over to drive, restart the session, copy them back in and continue.

Copy link
Owner

@jantic jantic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I definitely like most of what I see here and I appreciate pull request like this (rare!) that really seem to be well thought out. You're also addressing some things that quite frankly I screwed up on a bit (like repeating string processing).

There's a few things sticking out that need rethinking though:

First, VideoColorizerColab.ipynb won't work for more than one video if restart is not set to True (more on that later) because the file name is saving as video.mp4 by default. Hence if you change the url, it won't re-download the code due to the logic you've put in place to only download if the file by name isn't already there. Specifically this in line 401 of visualize.py which decides whether to download and update the source frame images:

if restart or not os.path.exists(video_path):

Now you may argue that the user should know to set the "restart" flag in the case of doing a new video but I think that this adds unnecessary complication. And added burden to the user. What I think may be less confounding would be to change the concept from "restart" to "continuation" and have it default to False instead of True. That is to say- by default do what it was doing before because that would be the normal (and backwards compatible) use case and less prone to error, and if the user is working on a particularly long video and wants to continue from a crashed session, they'll have the option of doing so by setting that flag explicitly. But this will require documentation too in the readme I think, and in the notebooks- something I'd need before I accept the pull request.

Second, I found at least one bug in parameter passing so double check things like this: colorize_from_file_name takes a restart parameter but doesn't actually pass it to the colorize_from_path function.

Third, the aforementioned functions in have inconsistent "restart" defaults- True in one and False in the other. This goes back to user experience oriented thinking in #1: It should be consistent to avoid confusion.

I didn't do an exhaustive review here but I think it should be enough to communicate intentions and point out what needs to be looked at. And please do develop/test with the multiple short videos in one session on Colab scenario in mind.

Hope that all helps!
Jason

@bitplane bitplane force-pushed the proper-frame-order branch 2 times, most recently from eee0785 to c5726b3 Compare September 13, 2022 23:03
@bitplane
Copy link
Contributor Author

bitplane commented Sep 13, 2022

Hi Jason, sounds great, thanks for the feedback.

I'll keep hacking away at it in another branch while I'm processing videos and merge here when it's ready for review again, rather than keep force-pushing here and bugging you with notifications! :D

The reason I used restart is because the most natural word to use would be "continue", but that's a reserved word in Python. The convention is to stick an underscore on the end of it like continue_, but that seems horrible - specially in the notebook world where parameter inspection isn't that great. Maybe calling it "resume" instead?

Now I know you're interested in changes I'll have a deeper think about how to make it a bit more beautiful. I'm thinking that since we've got a class I should probably put the flags on the instance rather than pass them as parameters, edit: I misread, scratch that... and add some proper docstrings, fix the unsafe os.system calls, use f-strings etc.

I've also had a little hack around with skipping frame extraction but I don't like the way I've implemented it at present, there's a few edge cases to do with restarting after deleting images.

The other things that are bugging me:

  1. I'm wasting a lot of time on Colab just building the video itself. I could tar the frames up and push them into drive while using the GPU to process another video, and do the rest on my CPU at home. So I might also make the actual video combination step optional. But it'd need some thought around properly managing the dependencies.
  2. There's too many files in a dir to be practical in Jupyter's web UI; it doesn't like more than a thousand and chokes when clicking around to look at the results. I think it'd make sense to break outputs into dirs like "012345000/012345678.jpg"
  3. It feels like the global interpreter lock / single threading is killing me locally. I'm getting about 70% GPU usage according to nvtop while I've got one CPU core running hot - I assume it's processing JPEGs or PIL creating images. If I get round to profiling it (I love performance engineering) I'll raise another PR with changes for that; there could be a +40% perf win there.

@jantic
Copy link
Owner

jantic commented Sep 17, 2022

@bitplane That all sounds fantastic. Yeah as far as performance goes- I've had to do a lot of optimization of our more advanced video stuff in our commercial work with my DeOldify startup. Both speed and memory actually- doing everything in memory is quite a challenge, for example. It's a pretty deep rabbit hole to get ffmpeg pipelines optimized and there's a lot of low hanging fruit. I haven't been in a rush to try optimizing too much for this stuff but please if you want to go ahead! I can say that the less compression you use on frames, the faster it is and that's potentially an easy win, but you can use up disk space very quickly. Especially with these Colabs. Another potential win is using grayscale formats when applicable.

Oh and "resume" sounds like great wording.

@bitplane
Copy link
Contributor Author

Hey sorry, just a quick update - I've neglected this for a bit while I'm working on something else, but I will get back to it sometime this week, hopefully tomorrow or thursday

Makes sure that video frames are processed in a reasonable order
fix the purge function so it uses glob and does less string manipulation
So we can see which step it's up to.
@bitplane
Copy link
Contributor Author

bitplane commented Nov 5, 2022

Sorry it took so long! I've started a new contract and it's been pretty intense over the first month. Finally got around to fixing the parameter changes though, and also added a few more logging things. I've not been using it recently so might have missed some things. I did give it a quick manual test, but I'm a feeble bag of flesh and water; it could do with some test automation!

I'll hopefully have time to do some more old movies soon and look at some perf improvements :)

@bitplane
Copy link
Contributor Author

bitplane commented Nov 6, 2022

Oops sorry I forgot the re-downloading check. Hold fire, I'll deal with that ASAP.

@jantic
Copy link
Owner

jantic commented Feb 11, 2023

Oops sorry I forgot the re-downloading check. Hold fire, I'll deal with that ASAP.

There hasn't been any activity after this. Want to go ahead with this or do you want me to close this out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants