Documentation confusing on whether SSD and RetinaNet count background as class object #4106

douglasrizzo · 2021-06-24T10:15:49Z

The documentation for the SSD class mentions that we should not count the background as an object class when passing the number of classes as a parameter to instantiate an SSD object.

vision/torchvision/models/detection/ssd.py

Line 144 in 9596668

    
                   num_classes (int): number of output classes of the model (excluding the background).

However, further down in the same file, an SSD object is instantiated in a function that explicitly says that the background should be counted as an object class, but this is not taken into account in the code (i.e. I did not see num_classes be decremented by one when creating the SSD object).

vision/torchvision/models/detection/ssd.py

Line 589 in 9596668

model = SSD(backbone, anchor_generator, (300, 300), num_classes, **kwargs)

Here is the documentation for this function, which says we should include the background in the number of classes.

vision/torchvision/models/detection/ssd.py

Line 563 in 9596668

    
                   num_classes (int): number of output classes of the model (including the background)

This is confusing. Should we or should we not count the background as an object class when instantiating the SSD? In either case, how should object classes be ID'd during training?

As an example, with Faster RCNN, the background is counted as an object class (with ID 0 reserved for it) and actual object classes are identified during training starting from ID 1. What should be the procedure for SSD?

I have also opened a topic in the forums, since this is both a personal question of mine as well as a possible issue in the docs (or the code).

The text was updated successfully, but these errors were encountered:

datumbox · 2021-06-28T14:18:40Z

@douglasrizzo Thanks for reporting.

Blame me for copy-pasting the doc string of RetinaNet which had exactly the same issue. The num_classes should include the background. I've issued a PR that fixes the problem.

douglasrizzo · 2021-06-28T14:47:49Z

@datumbox Thanks for the reply. I actually believe the RetinaNet documentation suffers from a similar problem.

The documentation for the RetinaNet class tells us to exclude the background as an object class:

vision/torchvision/models/detection/retinanet.py

Line 259 in 1962fdd

    
                   num_classes (int): number of output classes of the model (excluding the background).

The function retinanet_resnet50_fpn tells us to include the background as an object class:

vision/torchvision/models/detection/retinanet.py

Line 608 in 1962fdd

    
                   num_classes (int): number of output classes of the model (including the background)

But when a RetinaNet object is created inside of retinanet_resnet50_fpn, the number of classes passed to retinanet_resnet50_fpn (which counts the background) is not decremented by 1 before being used in RetinaNet:

vision/torchvision/models/detection/retinanet.py

Line 622 in 1962fdd

model = RetinaNet(backbone, num_classes, **kwargs)

Since num_classes is only used inside retinanet_resnet50_fpn to instantiate a RetinaNet object, my bet is that the documentation for retinanet_resnet50_fpn is wrong.

douglasrizzo · 2021-06-28T14:49:51Z

I just saw your PR fixes both documents.

WZMIAOMIAO · 2021-06-30T02:31:28Z

@datumbox
hi, I think num_classes in retinanet should exclude the background.
because in classification loss using binary_cross_entropy_with_logits, don't need to consider background.

Xonxt · 2022-06-08T09:15:10Z

Sorry to reopen the issue, but the other question was not answered:
how should the labels be numbered during training (for custom datasets) - starting from 0 or from 1?
And then during inference, does the model return the labels starting from 0 or from 1?

datumbox · 2022-06-08T09:35:44Z

@Xonxt The documentation was improved to reflect on he situation. To answer your question, the num_classes should include the background which is encoded with 0. During inference the model predicts labels starting from 1.

datumbox mentioned this issue Jun 28, 2021

Fix documentation for SSD and RetinaNet #4132

Merged

datumbox added bug module: documentation labels Jun 28, 2021

douglasrizzo changed the title ~~Documentation confusing on whether SSD counts background as class object~~ Documentation confusing on whether SSD and RetinaNet count background as class object Jun 28, 2021

datumbox closed this as completed in #4132 Jun 28, 2021

datumbox mentioned this issue Aug 16, 2021

Add info on retinanet finetune to docs. #3442

Closed

easz mentioned this issue Jan 21, 2022

num_classes usage is not consistent with torhcvision's doc weecology/DeepForest#295

Closed

bw4sz mentioned this issue Dec 5, 2022

No background class weecology/DeepForest#370

Open

NicolasHug mentioned this issue May 24, 2024

retinanet num_classes includes the background #8432

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation confusing on whether SSD and RetinaNet count background as class object #4106

Documentation confusing on whether SSD and RetinaNet count background as class object #4106

douglasrizzo commented Jun 24, 2021

datumbox commented Jun 28, 2021

douglasrizzo commented Jun 28, 2021

douglasrizzo commented Jun 28, 2021

WZMIAOMIAO commented Jun 30, 2021

Xonxt commented Jun 8, 2022

datumbox commented Jun 8, 2022

Documentation confusing on whether SSD and RetinaNet count background as class object #4106

Documentation confusing on whether SSD and RetinaNet count background as class object #4106

Comments

douglasrizzo commented Jun 24, 2021

datumbox commented Jun 28, 2021

douglasrizzo commented Jun 28, 2021

douglasrizzo commented Jun 28, 2021

WZMIAOMIAO commented Jun 30, 2021

Xonxt commented Jun 8, 2022

datumbox commented Jun 8, 2022