-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can you explain why YOLOS-Small has 30 Million parameter while DeiT-S has 22 Million parameter #3
Comments
Hi @gaopengcuhk, thanks for your interest in our work and good question! For the small- and base-sized model, the added parameters mainly come from positional embeddings (PE): we add randomly initialized We have added a detailed description in the Appendix and we will submit it to the This issue won't be closed until we update our manuscript on |
Another question, why only add the prediction head on the last layer? Have you tried to add the prediction head to the last several layers like DETR? |
Thanks for your valuable issue. The reason we guess is: for DETR, the deep supervision works because the supervision is "deep enough". I.e., the decoders are stacked upon least 50 / 101 layers ResNet backbone and 6 layers Transformer encoders. While YOLOS with a much shallow network cannot benefit from deep supervision. |
Another question, it seems like you add the position embedding to x every layer. While in Deit, only the first layer add position embedding, is this important in YOLOS? |
We have actually answered here: #3 (comment): YOLOS with only first layer PE added is better in terms of AP and parameter efficiency :) |
Thank you very much for your reply. |
This issue won't be closed until we update our manuscript on |
We have updated our manuscript on |
As the title suggested
The text was updated successfully, but these errors were encountered: