On the use of Apex AMP and hybrid stages #22

DonkeyShot21 · 2022-01-08T19:50:16Z

Is there a specific reason why you used Apex AMP instead of the native AMP provided by PyTorch? Have you tried native AMP?

I tried to train poolformer_s12 and poolformer_s24 with solo-learn; with native fp16 the loss goes to nan after a few epochs, while with fp32 it works fine. Did you experience similar behavior?

On a side note, can you provide the implementation and the hyperparameters for the hybrid stage [Pool, Pool, Attention, Attention]? It seems very interesting!

The text was updated successfully, but these errors were encountered:

yuweihao · 2022-01-10T15:54:52Z

Hi @DonkeyShot21 , Thanks for your attention.

We only used to train poolformer_s12 with Apex AMP and it works well, so we use the Apex AMP to show how to train it with Apex AMP on GPUs. We have not tested it with native AMP, and thus have no experience in native AMP.

We plan to release the implementation and more trained models of hybrid stages around March. As for [Pool, Pool, Attention, Attention]-S12 (81.0% accuracy) shown in the paper, we trained it with LayerNorm, batch size of 1024, the learning rate of 1e-3. The remained hyper-parameters are the same as poolformer_s12. The implementation of the pooling token mixer is the same as that of PoolFormer. After the first two stages, the position embedding is added. The attention token mixer is similar to that in timm. The difference is that as the default data format of our implementation is [B, C, H, W], the input of the attention token mixer is transformed into [B, N, C], and the output is transformed into [B, C, H, W] again.

DonkeyShot21 · 2022-01-10T16:12:25Z

Hi @yuweihao, thanks for the nice reply.

Apex can be hard to install without sudo, that is why I prefer native AMP. Actually, I have tried both (Apex, native) with solo-learn and both lead to nans in the loss quite quickly. This also happens with Swin and ViT. I am trying your implementation now with native AMP and it seems it works nicely, the logs are similar to the ones you posted on google drive. So I guess my problem is related to the SSL methods or to the fact that solo-learn does not support mixup and cutmix. The only way I could stabilize training was with SGD + LARS and gradient accumulation (to simulate a large batch size), but the results are very bad, much worse than ResNet18. I guess SGD is not a good match for metaformers in general.

Thanks for the details on the hybrid stage. I have also seen in other issues that you said that depthwise convs can also be used instead of pooling with a slight increase in performance. Do you think this can be paired with the hybrid stages as well (e.g. depthwise conv in the first 2 blocks and then attention in the last 2)?

yuweihao · 2022-01-10T16:30:56Z

Hi @DonkeyShot21 , thanks for your wonderful works for the research community:)

Yes, [DWConv, DWConv, Attention, Attention] also works very well and it is in our release plan of models with hybrid stages.

DonkeyShot21 · 2022-01-10T16:38:20Z

Thank you again! Looking forward to the release!

DonkeyShot21 · 2022-01-13T13:09:51Z

Hey @yuweihao, sorry to bother you again. For the hybrid stage [Pool, Pool, Attention, Attention] did you use layer norm just for the attention blocks or for the pooling blocks as well? I am trying to reproduce it on ImageNet-100 but I didn't get better performance than vanilla poolformer. The params and flops are the same as you reported, so I guess the implementation should be correct.

yuweihao · 2022-01-13T16:06:33Z

Hi @DonkeyShot21 , I use layer norm for all [Pool, Pool, Attention, Attention]-S12 blocks. I guess the attention blocks may be easy to overfit on small datasets, which results in worse performance than vanilla poolformer on ImageNet-100.

DonkeyShot21 closed this as completed Jan 10, 2022

DonkeyShot21 reopened this Jan 13, 2022

yuweihao added the discussion label Mar 2, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

On the use of Apex AMP and hybrid stages #22

On the use of Apex AMP and hybrid stages #22

DonkeyShot21 commented Jan 8, 2022

yuweihao commented Jan 10, 2022

DonkeyShot21 commented Jan 10, 2022

yuweihao commented Jan 10, 2022

DonkeyShot21 commented Jan 10, 2022

DonkeyShot21 commented Jan 13, 2022

yuweihao commented Jan 13, 2022

On the use of Apex AMP and hybrid stages #22

On the use of Apex AMP and hybrid stages #22

Comments

DonkeyShot21 commented Jan 8, 2022

yuweihao commented Jan 10, 2022

DonkeyShot21 commented Jan 10, 2022

yuweihao commented Jan 10, 2022

DonkeyShot21 commented Jan 10, 2022

DonkeyShot21 commented Jan 13, 2022

yuweihao commented Jan 13, 2022