-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running out of CUDA/GPU spaces #14
Comments
Additionally, when training on 20 videos and text prompts the model output is still just noise, which I think is the expected result, given the lack of training, right? |
@gmegh yea, training on video won't be a cakewalk also, before the i plan on making the network agnostic to image or video training, and start with images first. realistically, for this to be trained successfully outside of google, it would need to be pretrained on images |
Yes, that makes, sense. Let me know if I can help. Do you know when are you planning on having the agnostic feature ready? I did create some short functions to be able to use .mp4 instead of just gifs and saved the tensors to mp4 as well. Let me know if you would like for me to add them to a PR |
@gmegh so i have to add 3d continuous relative positional bias to the maskgit embedding to allow for generalization to different sizes. i think i should be able to get it done by tomorrow evening re: mp4 - yes! that would be super helpful! |
Great! I will create a PR. Also for reference, these guys are also working on implementing it: https://github.com/LAION-AI/phenaki I think another nice to-do would be to allow for saving the trained model and be able to load it |
@gmegh yup, i've been chatting with Dominic they are planning on straying a bit farther from the paper's implementation (for example, using all convolutions in the cvivit) but this is a joint effort; anything i develop here they are free to use |
@gmegh yea, i'll definitely get to the training code soon, once i add a few more bells and whistles to the attention networks |
Awesome! Happy to help if you want. |
@gmegh yea definitely welcome any help! do you know of any good packages for processing and loading video data? |
@lucidrains Yes! I think cv2 is a good package. I made some quick functions with it that I have added to the new PR. The crop_image() should probably be edited further |
What is the status of the code right now? I think the checkboxes in the readme are outdated, right? |
@gmegh the code will be in a very good place by the end of the week, and by end of next week, all the training code will be there |
@gmegh usually there is some back and forth and whittling away at bugs for about a month or so after i remove the |
@gmegh for training on my end, i plan to get it to a place where the framework can produce unconditional (or text conditioned) images by end of the week that part i know very well from my other works |
@gmegh feel free to experiment in the mean time! |
Hi @lucidrains ! Is the framework that can produce unconditional (or text conditioned) images ready? I am experimenting with the current version and I would need a way to train by batches, because using 500 videos at a time already fills up my CUDA memory. Any idea on how to go about this? |
@lucidrains I could take care of this. Any preferences as to whether you'd like to break down each video into frames, or sample from a video directly? |
I have a GPU with 15GB and it seems it runs out of space when I try to train the network with 50 videos at a time. Do you think it would be better to repeat the loss training video per video, instead of all the videos at once?
The text was updated successfully, but these errors were encountered: