Drop in accuracy after rewinding convnext training #495

AlaaKhaddaj · 2023-04-13T18:54:53Z

AlaaKhaddaj
Apr 13, 2023

Hello,

I am trying to replicate the improvement you got in convnext using rewinding. However, I am noticing a big drop in accuracy. Can you please provide more details about that?

Thanks!

rom1504 · 2023-04-13T19:05:00Z

rom1504
Apr 13, 2023
Maintainer

are you using a few hundreds gpus and a very large batch size ?

0 replies

AlaaKhaddaj · 2023-04-13T19:28:53Z

AlaaKhaddaj
Apr 13, 2023
Author

Yes, we are using a global batch size of ~70k. We are not sure exactly how you are doing the rewinding? Are you starting the training from the last checkpoint, with a larger LR and batch size? Can you please provide more details?

We checked the details here https://huggingface.co/laion/CLIP-convnext_xxlarge-laion2B-s34B-b82K-augreg-rewind and it seems the rewind is basically retraining for 256 epochs with larger batchsize and lr? I am confused why it says For the rewind of last 10% .

Thanks!

0 replies

rwightman · 2023-04-15T23:38:31Z

rwightman
Apr 15, 2023
Maintainer

@AlaaKhaddaj it is not retraining for 256 checkpoint epochs ('virtual' epochs as they are not full dataset passes), only a 10% rewind which in this case means I resumed from checkpoint 230, so it reran the last 26. Basically setting the initial LR to 2e-3 instead of 1e-3 (so the resume LR will be based on 230/256 schedule wise in the cosine and slightly altered by the # steps from the global batch size change). I'd say the bump in global batch is a bit of a factor, from 82->95k. Augmentation was increased.

EDIT: also, should point out that the original hparams weren't Ideal, I started a bit too low in LR and only figured that out after I'd got some results from base/large models that finished sooner (but were started later)...

0 replies

rwightman · 2023-04-16T17:46:26Z

rwightman
Apr 16, 2023
Maintainer

@AlaaKhaddaj moved to discussion as this is a fiddly hparams thing that is likely good as a reference and not a bug

0 replies

AlaaKhaddaj · 2023-04-17T19:25:01Z

AlaaKhaddaj
Apr 17, 2023
Author

Thank you for the clarification!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop in accuracy after rewinding convnext training #495

{{title}}

Replies: 5 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Drop in accuracy after rewinding convnext training #495

AlaaKhaddaj Apr 13, 2023

Replies: 5 comments

rom1504 Apr 13, 2023 Maintainer

AlaaKhaddaj Apr 13, 2023 Author

rwightman Apr 15, 2023 Maintainer

rwightman Apr 16, 2023 Maintainer

AlaaKhaddaj Apr 17, 2023 Author

AlaaKhaddaj
Apr 13, 2023

rom1504
Apr 13, 2023
Maintainer

AlaaKhaddaj
Apr 13, 2023
Author

rwightman
Apr 15, 2023
Maintainer

rwightman
Apr 16, 2023
Maintainer

AlaaKhaddaj
Apr 17, 2023
Author