You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
So, I am experimenting on a custom dataset of about 70k images consisting of 7 different classes. However, the model seems to collapse after 3-4 epochs of training. I have tried playing around with different embedding dimensions for the out_dim parameter and lower values for teacher_temp to increase sharpening, but in vain.
Have you experimented with smaller datasets? Would you be able to provide any suggestions in this case?
Thanks!
The text was updated successfully, but these errors were encountered:
Thanks for trying out the codebase. I have not tried to complete pre-training and evaluate the performance for a smaller dataset, though I usually use the dataset of a similar size (eg ImageWoof) for debugging in a local machine.
Could you please post the your hyper-parameter, dataset settings, and training logs here (eg, what is the behavior of "the model seems to collapse after 3-4 epochs of training")? so that we can access the details and start the discussion.
Here are my training logs. In the logs you can also find the entropy (H) and KL divergence for each epoch, both with and without centering and/or sharpening, to look for collapse as suggested by the authors of DINO in their paper. Further, I also tried logging the cosine similarity of output embedings to see if the model collapses to the same representation regardless of the input.
I use all the default settings except out_dim and teacher_temp, which, for this log, were set to 4096 and 0.03 respectively.
I will also try training the model on another similar sized dataset and see if I can debug it.
Hi! First of all kudos on the great work!
So, I am experimenting on a custom dataset of about 70k images consisting of 7 different classes. However, the model seems to collapse after 3-4 epochs of training. I have tried playing around with different embedding dimensions for the
out_dim
parameter and lower values forteacher_temp
to increase sharpening, but in vain.Have you experimented with smaller datasets? Would you be able to provide any suggestions in this case?
Thanks!
The text was updated successfully, but these errors were encountered: