Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paper related problem #7

Closed
yypurpose opened this issue Mar 22, 2021 · 4 comments
Closed

paper related problem #7

yypurpose opened this issue Mar 22, 2021 · 4 comments

Comments

@yypurpose
Copy link

yypurpose commented Mar 22, 2021

Hi! I really appreciate the authors for doing such an inspiring job and sharing the code.

Somehow, I am kind of confused by the Sec. 4.1 of the paper. In my understanding, comparing MiMo, SiSo using C5 may cause some problems in detecting small object (High level feature with low resolution is preferred for large scale objects). To bridge the gap between SiSo and MiMo, why should we focus on larger objects? Fig 4 discusses about receptive fields. However, receptive fields of low level feature is smaller than C5, if SiSo has the receptive field problem, MiMo should have too!

So, I don't really understand this part. I will be glad if you point me out the problem.

Thanks in advance.

@chensnathan
Copy link
Collaborator

Hi, thanks for your questions.

In MiMo encoders, there are five levels of features, from P3 to P7. The receptive field in P6 and P7 is relatively larger than C5, making the receptive field problem not severe. However, in SiSo encoders, there is only one level feature (C5 or DC5). We have to enlarge the receptive field, compensating for the lack of P6 and P7.

While for small objects, it is true that C5 achieves an inferior performance in detecting small objects (Shown in Table1, YOLOF(19.1 mAP) vs. RetinaNet+(22.2 mAP)). However, it is hard to recover more detailed information from a high-level feature, thus we try to keep the detailed information in the C5 feature by adopting a shortcut.

We also show a feasible way to improve the detection performance on small objects in Table 9 in the paper. We use the dilated C5 feature to replace the C5 feature. The performance of small objects can be improved (from 19.1 mAP(YOLOF-C5-1x) to 22.3 mAP(YOLOF-DC5-1x)).

@yypurpose
Copy link
Author

Thanks for your timely and detailed answer!

So it's bridging the gap between SiSo and SiMo, not SiSo and MiMo! My understanding was wrong. And dilated conversations is to make up for the missing of P6 & P7. I DO UNDERTAND THIS TIME.

And as you have said in the paper, SiMo is comparable with MiMo (35.0mAP vs. 35.9mAP), is the main drop caused by the small object or it's uniform? Could you provide the detailed results of the four models in Fig. 1 as I did not found them in the paper. I'm really curious about it.

THANKS A LOT!!

@chensnathan
Copy link
Collaborator

Yes, the performance drop is mainly caused by small objects.

The results are list below:
MiMo:
AP | AP50 | AP75 | APs | APm | APl |
35.9 | 55.8 | 38.5 | 19.9 | 39.6 | 47.9 |

SiMo:
AP | AP50 | AP75 | APs | APm | APl |
35.0 | 54.8 | 37.1 | 17.4 | 39.3 | 47.8 |

@yypurpose
Copy link
Author

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants