Adapt the input size of SSDlite pre-trained model and assess its inference accuracy #3819

datumbox · 2021-05-12T16:35:27Z

🚀 Feature

The pre-trained ssdlite320_mobilenet_v3_large() model expects a fixed size input image of 320x320 and if a user tries to pass a size parameter to the method it's going to be ignored:

vision/torchvision/models/detection/ssdlite.py

Line 202 in e35793a

size = (320, 320)

vision/torchvision/models/detection/ssdlite.py

Lines 184 to 185 in e35793a

    
           if "size" in kwargs: 
        
               warnings.warn("The size of the model is already fixed; ignoring the argument.")

In other models such as FasterRCNN it is possible to reuse the existing pre-trained models but use different input sizes. The results might vary in terms of accuracy but this allows users to reuse the models and adjust them according to their speed needs.

It is possible that the SSD method is more sensitive than FasterRCNN to the input size because of the assumptions that it makes concerning the input and the parameterisation of the anchor boxes:

vision/torchvision/models/detection/ssdlite.py

Line 203 in e35793a

    
           anchor_generator = DefaultBoxGenerator([[2, 3] for _ in range(6)], min_ratio=0.2, max_ratio=0.95)

The target of this ticket is to investigate:

What happens to mAP of the pre-trained model if we increase the size of the image to 512x512 and to 640x640?
Do we have to adapt the min_ratio and max_ratio or any other parameter of the DefaultBoxGenerator to achieve better results?

If the experiment yields positive results (reasonably higher mAP for the extra computation cost), we should follow up with a PR that allows the overwriting of the size from outside the ssdlite320_mobilenet_v3_large method and potentially the extra parameters of the DefaultBoxGenerator.

The text was updated successfully, but these errors were encountered:

prabhat00155 · 2021-05-25T16:43:59Z

Here is the result table of my experiments:

IoU metric: bbox
min_ratio	0.2	0.42	0.2	0.1	0.04	0.04	0.04	0.2
max_ratio	0.95	0.95	0.95	0.95	0.95	0.74	1.06	0.95
size	320x320	512x512	512x512	512x512	512x512	512x512	512x512	640x640
Average Precision (AP) @[ IoU=0.50:0.95 \| area= all \| maxDets=100 ] =	0.213	0.009	0.018	0.081	0.061	0.069	0.04	0.007
Average Precision (AP) @[ IoU=0.50 \| area= all \| maxDets=100 ] =	0.343	0.031	0.078	0.282	0.173	0.169	0.13	0.036
Average Precision (AP) @[ IoU=0.75 \| area= all \| maxDets=100 ] =	0.221	0.003	0.004	0.022	0.034	0.042	0.017	0.001
Average Precision (AP) @[ IoU=0.50:0.95 \| area= small \| maxDets=100 ] =	0.011	0.002	0.006	0.03	0	0	0	0.004
Average Precision (AP) @[ IoU=0.50:0.95 \| area=medium \| maxDets=100 ] =	0.202	0.005	0.022	0.155	0.087	0.058	0.067	0.008
Average Precision (AP) @[ IoU=0.50:0.95 \| area= large \| maxDets=100 ] =	0.444	0.016	0.03	0.1	0.174	0.224	0.11	0.012
Average Recall (AR) @[ IoU=0.50:0.95 \| area= all \| maxDets= 1 ] =	0.208	0.027	0.038	0.101	0.089	0.094	0.068	0.021
Average Recall (AR) @[ IoU=0.50:0.95 \| area= all \| maxDets= 10 ] =	0.307	0.072	0.12	0.225	0.179	0.171	0.151	0.083
Average Recall (AR) @[ IoU=0.50:0.95 \| area= all \| maxDets=100 ] =	0.334	0.098	0.165	0.268	0.215	0.204	0.186	0.125
Average Recall (AR) @[ IoU=0.50:0.95 \| area= small \| maxDets=100 ] =	0.043	0.005	0.037	0.092	0.011	0.011	0.01	0.027
Average Recall (AR) @[ IoU=0.50:0.95 \| area=medium \| maxDets=100 ] =	0.344	0.061	0.169	0.355	0.243	0.2	0.199	0.124
Average Recall (AR) @[ IoU=0.50:0.95 \| area= large \| maxDets=100 ] =	0.643	0.236	0.289	0.371	0.434	0.45	0.379	0.22

time(s)	18.2	22.91	21.39	18.68	17.59	17.05	17.63	25.11
Averaged stats: model_time	0.0657 (0.0660)	0.0682 (0.0707)	0.0690 (0.0703)	0.0691 (0.0711)	0.0688 (0.0710)	0.0695 (0.0716)	0.0679 (0.0702)	0.0732 (0.0752)
evaluator_time	0.0171 (0.0224)	0.0154 (0.0204)	0.0163 (0.0218)	0.0169 (0.0228)	0.0175 (0.0238)	0.0186 (0.0237)	0.0169 (0.0233)	0.0185 (0.0225)

Average Precision and average recall go down with increase in input size. Decreasing min_ratio does tend to improve the scores.

datumbox · 2021-05-25T17:24:04Z

Awesome work @prabhat00155, thanks for the investigation!

I think as we expected, SSD is very sensitive to the DefaultBox configuration. I don't think it's possible to reuse it with variable sizes. I'll close the issue because I think the work is complete here.

datumbox added enhancement module: models topic: object detection labels May 12, 2021

datumbox assigned prabhat00155 May 12, 2021

datumbox closed this as completed May 25, 2021

datumbox mentioned this issue May 26, 2021

Documentation enhancement: Specifying detailed shape requirements for pretrained models #3921

Open

datumbox mentioned this issue Nov 8, 2021

Adding multiweight support to SSD #4881

Merged

datumbox mentioned this issue Jun 15, 2022

Using any torchvision pretrained model as backbone for FasterRcnn #6172

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adapt the input size of SSDlite pre-trained model and assess its inference accuracy #3819

Adapt the input size of SSDlite pre-trained model and assess its inference accuracy #3819

datumbox commented May 12, 2021

prabhat00155 commented May 25, 2021 •

edited

Loading

datumbox commented May 25, 2021

Adapt the input size of SSDlite pre-trained model and assess its inference accuracy #3819

Adapt the input size of SSDlite pre-trained model and assess its inference accuracy #3819

Comments

datumbox commented May 12, 2021

🚀 Feature

prabhat00155 commented May 25, 2021 • edited Loading

datumbox commented May 25, 2021

prabhat00155 commented May 25, 2021 •

edited

Loading