New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
training on a large dataset #89
Comments
@yli1994 hey! so i think non-efficient is the more conservative route, until we close out issue 72 128 dimension with non efficient should be sufficient, but your guess is as good as mine 😆 |
@lucidrains thanks! I will try 128 non-eff first, but the speed of 256-eff is so charming😆😆 |
@yli1994 definitely chat with Aidan and Zion on the Laion discord, as they have been training the dalle2-pytorch unet with good results and probably have some training tips 😄 |
@yli1994 i'll try to get the lightning training code in place by week's end! |
@yli1994 Francisco just reported that the color shifting issue seems to be absent in the latest version (caveat, @ 10k steps) so maybe it is a cautious yellow light to proceed with the memory efficient unet |
Here is our run ~ 90k steps (memory efficient unet) ... No color shifting as far as I can see.. (samples in report) https://wandb.ai/elbo-ai/imagen/runs/1y5gc6d2?workspace=user-elbo This is from |
@srelbo Great report! Looks like you're starting to get good results. A few questions
Anyways, I am currently training on a single RTX 3090 GPU on the CocoCaptions dataset. Here's my run so far. |
Thanks @camlaedtke ! Looks like you are keeping your 3090 super busy 😄
We are doing the latter. This is just our first experiment, but doing one
It's a subset of the LAION Aesthetic data set, it's really large (~2.6 Tb). We have not done even a single epoch after 2 days.
I will keep an 👀 on it when we pick up the latest changes. Perhaps in the next few days. |
Sounds great! Looking forward to it! |
Training one unet at one time could be cheaper. You can respond to your generation results faster and tune your strategy, especially when using few gpus. |
@srelbo Good to know, thanks! Yeah using a single GPU is really testing my patience. Looks like it'll take a couple of days to train 30,000 steps but hopefully that will be good enough for somewhat decent results. |
should be ready for that with the new accelerate integration |
Hi @lucidrains Finally I tried DistributedSampler. |
Hi @lucidrains
Thank you for your great work!
I am going to train Imagen on a relatively large dataset(100M data, a subset from LAION-5B) with 8xRTX3090 .
For the text-to-image DDPM, 128dim-non-efficient or 256dim-efficient will be considered, can I have your suggestion which one you may prefer(256dim-eff speed is about 2x faster at around 2days/epoch)
Learned a lot from your discussion in issue#72!
The text was updated successfully, but these errors were encountered: