Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Design for preprocessor at scale / distributed #38

Open
johndpope opened this issue Jun 6, 2024 · 2 comments
Open

Design for preprocessor at scale / distributed #38

johndpope opened this issue Jun 6, 2024 · 2 comments

Comments

@johndpope
Copy link
Owner

BACKGROUND
so current thinking is - I'm GPU poor - there's community members who have a spare gpu around who want to help.
Need to preprocess some videos - but it's very time consuming.
We have 35,000 videos
magnet:?xt=urn:btih:843b5adb0358124d388c4e9836654c246b988ff4&dn=CelebV-HQ&tr=https%3A%2F%2Facademictorrents.com%2Fannounce.php&tr=https%3A%2F%2Fipv6.academictorrents.com%2Fannounce.php

each one taking 1-5+ mins to process. This code is in the EmoDataset loaderr

it achieves

  • frame dumping via decord (needs redoing with VALI or torchaudio.StreamReader Apparent locking issues when running across multiple GPUs dmlc/decord#283)
  • warping / thin spline - it has parameter for dialin this up / down.
  • auto-cropping / sweet spot
  • background removal (option for greenscreen)
  • normalization
  • resizing to 512x512
  • cache into a single npz file for subsequent training runs that does all this

SOLUTION

  1. make huggingface repo that contains just the npz files
  2. setup organziation that people can join
    https://huggingface.co/organizations/covershot/share/jVGQxiItlPNKPUbwcSXfMEICDLJxLttyAB
  3. allow users to upload /commit npz file to repo.
@elihalpern
Copy link

I'm willing to assist! Would love to see this implemented. I have a 3090 and a 4060 TI.

@johndpope
Copy link
Owner Author

johndpope commented Jun 7, 2024

looks like pillow doesn't have gpu support.
python-pillow/Pillow#1546

i got multicore to send work to multicpus / which does speed things up - but it never completes and gets to next line - something not right.
https://github.com/johndpope/MegaPortrait-hack/tree/feat/38-multicore

Screenshot from 2024-06-07 17-04-51

https://github.com/johndpope/MegaPortrait-hack/blob/feat/38-multicore/EmoDataset.py#L251

save the date
Screenshot from 2024-06-10 09-19-54

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants