You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey,
Thanks for your great work.@ZENGXH
I would like to ask how long it takes to train VAE in all categories.
I train VAE in all categories on 8 V100 16GB for 15days with batchsize 12. But only 4000 epochs have been trained.
Is there anyway to accelerate the training process? ( For example: increase batchsize ?)
Another problem is that it is hard for me to judge whether VAE is well trained (I think visualisation is not a comprehensive way to reflect the effectiveness of VAE training). Especially when the training process takes a lot of time, it is important to guarantee the training effect.
The text was updated successfully, but these errors were encountered:
Hi,
I think 15 days is probably enough (I only train for 7 days with 4A100, I stop early due to the paper deadline). for 55 class, we don't need to run the same number of epochs as the single class data since there is too much data.
In terns of acceleration, yes increase batch-size should help, especially for diffusion model training.
One thing you can try is to investigate the loss curve, and see whether they are at the flatten region (converged stage). For the diffusion model training, you can evaluate the 1-nna metric to see whether it's fully converged or not.
For reference, this the my reconstruction results when I stop my vae training:
This is my training curve:
Hey,
Thanks for your great work.@ZENGXH
I would like to ask how long it takes to train VAE in all categories.
I train VAE in all categories on 8 V100 16GB for 15days with batchsize 12. But only 4000 epochs have been trained.
2023-04-20 18:52:20.615 | INFO | trainers.base_trainer:train_epochs:256 - [R0] | E4112 iter[371/372] | [Loss] 8847.80 | [exp] ../exp/0405/all/1c389bh_hvae_lion_B12 | [step] 1530035 | [url] none | [time] 5.0m (~325h) |[best] 199 0.001x1e-2
Is there anyway to accelerate the training process? ( For example: increase batchsize ?)
Another problem is that it is hard for me to judge whether VAE is well trained (I think visualisation is not a comprehensive way to reflect the effectiveness of VAE training). Especially when the training process takes a lot of time, it is important to guarantee the training effect.
The text was updated successfully, but these errors were encountered: