Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object Detection LB #1

Closed
jaideep11061982 opened this issue Jun 5, 2021 · 2 comments
Closed

Object Detection LB #1

jaideep11061982 opened this issue Jun 5, 2021 · 2 comments
Labels
question Further information is requested

Comments

@jaideep11061982
Copy link

❔Question

Congratulation for publishing a good work.
How is performance wrt to YOLO5 and other YOlo series and also its standing on Object detection LB.

Additional context

@jaideep11061982 jaideep11061982 added the question Further information is requested label Jun 5, 2021
@LegendBC LegendBC added question Further information is requested and removed question Further information is requested labels Jun 5, 2021
@Yuxin-CV
Copy link
Member

Yuxin-CV commented Jun 5, 2021

Hi @jaideep11061982, thanks for your interest in our work!

As mentioned in our paper, YOLOS is not designed to be a sophisticated high-performance object detector. On the contrary, we purposefully make modifications as few as possible on a given pre-trained ViT / DeiT to precisely unveil the versatility and transferability of Transformer from image recognition to object detection. So our paper is more about Transformer than object detection in a sense.

We argue that the 2D object detection is a quite hard task for naive Transformer since ViT always does seq2seq modeling, which means ViT tries to perceive higher dimension visual signal from a lower dimension perspective. Nevertheless, we observe that ViT can accomplish this task.

Transformer can benefit from super large-sized model and super large-scale pre-training. In our paper, we only use the mid-sized ImageNet-1k as the pre-training dataset, and the largest model we study has 128M parameters. Whether object detection results can benefit from the excellent scalability of Transformer is interesting.

@Yuxin-CV
Copy link
Member

We believe we have answered your question, and as such I'm closing this issue, but let us know if you have further questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants