paper related problem #7

yypurpose · 2021-03-22T12:42:03Z

Hi! I really appreciate the authors for doing such an inspiring job and sharing the code.

Somehow, I am kind of confused by the Sec. 4.1 of the paper. In my understanding, comparing MiMo, SiSo using C5 may cause some problems in detecting small object (High level feature with low resolution is preferred for large scale objects). To bridge the gap between SiSo and MiMo, why should we focus on larger objects? Fig 4 discusses about receptive fields. However, receptive fields of low level feature is smaller than C5, if SiSo has the receptive field problem, MiMo should have too!

So, I don't really understand this part. I will be glad if you point me out the problem.

Thanks in advance.

chensnathan · 2021-03-23T04:32:10Z

Hi, thanks for your questions.

In MiMo encoders, there are five levels of features, from P3 to P7. The receptive field in P6 and P7 is relatively larger than C5, making the receptive field problem not severe. However, in SiSo encoders, there is only one level feature (C5 or DC5). We have to enlarge the receptive field, compensating for the lack of P6 and P7.

While for small objects, it is true that C5 achieves an inferior performance in detecting small objects (Shown in Table1, YOLOF(19.1 mAP) vs. RetinaNet+(22.2 mAP)). However, it is hard to recover more detailed information from a high-level feature, thus we try to keep the detailed information in the C5 feature by adopting a shortcut.

We also show a feasible way to improve the detection performance on small objects in Table 9 in the paper. We use the dilated C5 feature to replace the C5 feature. The performance of small objects can be improved (from 19.1 mAP(YOLOF-C5-1x) to 22.3 mAP(YOLOF-DC5-1x)).

yypurpose · 2021-03-23T06:03:03Z

Thanks for your timely and detailed answer!

So it's bridging the gap between SiSo and SiMo, not SiSo and MiMo! My understanding was wrong. And dilated conversations is to make up for the missing of P6 & P7. I DO UNDERTAND THIS TIME.

And as you have said in the paper, SiMo is comparable with MiMo (35.0mAP vs. 35.9mAP), is the main drop caused by the small object or it's uniform? Could you provide the detailed results of the four models in Fig. 1 as I did not found them in the paper. I'm really curious about it.

THANKS A LOT!!

chensnathan · 2021-03-24T02:36:14Z

Yes, the performance drop is mainly caused by small objects.

The results are list below:
MiMo:
AP | AP50 | AP75 | APs | APm | APl |
35.9 | 55.8 | 38.5 | 19.9 | 39.6 | 47.9 |

SiMo:
AP | AP50 | AP75 | APs | APm | APl |
35.0 | 54.8 | 37.1 | 17.4 | 39.3 | 47.8 |

yypurpose · 2021-03-24T02:49:56Z

Thanks a lot!

chensnathan closed this as completed Mar 24, 2021

zcl912 mentioned this issue Apr 8, 2021

/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda [](int)->auto::operator()(int)->auto: block: [0,0,0], thread: [121,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. #20

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

paper related problem #7

paper related problem #7

yypurpose commented Mar 22, 2021 •

edited

chensnathan commented Mar 23, 2021

yypurpose commented Mar 23, 2021

chensnathan commented Mar 24, 2021

yypurpose commented Mar 24, 2021

paper related problem #7

paper related problem #7

Comments

yypurpose commented Mar 22, 2021 • edited

chensnathan commented Mar 23, 2021

yypurpose commented Mar 23, 2021

chensnathan commented Mar 24, 2021

yypurpose commented Mar 24, 2021

yypurpose commented Mar 22, 2021 •

edited