Bug fix: Handle empty instances in FCOS. #3851

kdexd · 2022-01-03T22:13:12Z

🐛 One line bug info: FCOS implementation in current master is unable to handle empty instances.

This bug went unnoticed because: (a) images with empty instances are usually filtered while loading COCO annotations, and (b) FCOS should not encounter empty instances with its default training hyper-parameters. However, if I switch to a more aggressive large-scale jitter (LSJ) cropping augmentation, the model may encounter an image crop without any boxes in it. This crashes training abruptly due to a size mismatch in the pairwise anchor matching matrix.

Bug fix

This PR lets FCOS handle empty instances by adding a dummy [0, 0, 0, 0] box labeled as background (ID = num_classes), similar to how it is handled in RetinaNet class. Training FCOS with LSJ augmentation does not crash anymore.

Additional refactor: While I was working on a fix, I noticed some inconsistencies in variable representations and naming conventions. For example, the pairwise_match variable was a (R: anchor points, M: GT boxes) matrix — this is a transposed representation of what a match_quality_matrix represents in RetinaNet and GeneralizedRCNN, (M: GT boxes, R: anchor points). I refactored the code to make it more uniform with D2 conventions and make FCOS logic flow similar to RetinaNet, given that RetinaNet was the primary baseline in FCOS paper. My changes include:

Refactoring pairwise_match to a match_quality_matrix with its representation consistent with the rest of meta architectures. Moreover, variables renaming like (matched_boxes –> matched_gt_boxes, label —> gt_labels, and gt_index —> matched_indices) make the code more consistent with the naming convention in rest of meta architectures.
Update: After @ppwwyyxx , I added this explanation as a comment instead of refactoring code: Use a Matcher instead of simply doing pairwise_match.max(). Original code was replacing indices of unmatched anchors as -1 and accessing GT labels/boxes by doing gt_index.clip(0), which felt non-ideal.
Change the internal method match_anchors to compute and return per-instance match_quality_matrix (similar to using pairwise_iou in R-CNN). This modifies the old behavior of returning matched_indices. Since this method is used internally in FCOS, I renamed the method to _match_anchors.

Any API changes?

The call signature of FCOS is unchanged. FCOS.match_anchors() is now FCOS._match_anchors() with a different return value, but it was only used internally by FCOS.label_anchors().

Verification

I verified my changes by one full training run of FCOS with ResNet-50-FPN on COCO detection, using the builtin config. The validation curves overlap very closely (orange: master branch, blue: with my changes).

I should have fixed the training seed in both runs... so additionally, I manually checked the equality of pairwise_match.tranpose(0, 1) and my match_quality_matrixfor first 20 iterations by fixing train.seed = 0 in config. Everything matches exactly.

xiaohu2015 · 2022-01-21T08:21:57Z

@kdexd hi，I also fix this bug #3910, but different with you.

ppwwyyxx

Thanks for fixing the bug and also bringing the style of FCOS closer to RetinaNet!

There are some unittests (test_empty_data) in test_model_e2e.py that check if model supports empty instances. Could you add FCOS there? Just

class FCOSE2ETest(InstanceModelE2ETest, unittest.TestCase):
    CONFIG_PATH = "COCO-Detection/fcos....py"

is probably sufficient.

detectron2/modeling/meta_arch/fcos.py

kdexd · 2022-01-22T04:06:45Z

I also decided to add the internal method as FCOS._match_anchors which is only called inside FCOS.label_anchors. I am extending FCOS to do instance segmentation in my current project, and I need to override FCOS.label_anchors to return matched masks as well — it results in a lot of code duplication for computing match_quality_matrix. It is worth having a callable that returns a match_quality_matrix, analogous to pairwise_iou in RetinaNet / R-CNN. Let me know what you think.

Since this will end up in master, I have started a training job to verify numbers (job same as that in my main comment). I will update this thread when it converges!

kdexd · 2022-01-22T04:13:13Z

My tests are failing because of two reasons —

Linting tests are failing on files that are outside this PR.
Newly added FCOS tests are failing because they do not directly support reading lazy configs (if I understood the logs correctly)?

@ppwwyyxx: what is your suggestion? I am brute-forcing as many solutions as I can:

Add YAML config for FCOS
Add Lazy config handling in tests in this PR.
Run FCOS with forced empty boxes / LSJ augmentation and post behavior here; work on tests in separate PR.
..? 🤔

ppwwyyxx · 2022-01-22T04:30:36Z

Yeah please feel free to ignore linter 😢

This should let it support .py configs:

--- i/detectron2/utils/testing.py
+++ w/detectron2/utils/testing.py
@@ -4,6 +4,7 @@ import numpy as np
 import torch

 from detectron2 import model_zoo
+from detectron2.config import CfgNode, instantiate
 from detectron2.data import DatasetCatalog
 from detectron2.data.detection_utils import read_image
 from detectron2.modeling import build_model
@@ -21,9 +22,12 @@ def get_model_no_weights(config_path):
     Like model_zoo.get, but do not load any weights (even pretrained)
     """
     cfg = model_zoo.get_config(config_path)
-    if not torch.cuda.is_available():
-        cfg.MODEL.DEVICE = "cpu"
-    return build_model(cfg)
+    if isinstance(cfg, CfgNode):
+        if not torch.cuda.is_available():
+            cfg.MODEL.DEVICE = "cpu"
+        return build_model(cfg)
+    else:
+        return instantiate(cfg.model)

ppwwyyxx · 2022-01-24T18:25:58Z

Thanks, the tests have passed. (macos + pytorch1.9 failure will be fixed in #3909)

detectron2/modeling/meta_arch/fcos.py

ppwwyyxx · 2022-01-24T18:28:50Z

detectron2/modeling/meta_arch/fcos.py


-            matched_indices.append(matched_idx)
-        return matched_indices
+        # Get matches and their labels using match quality matrix.


this line of comment seems unrelated

Oops, an artifact of prior commit. Thanks for spotting!

kdexd · 2022-01-24T19:33:52Z

After the recent refactor (removing anchor matcher and re-adding FCOS._match_anchors, I am getting 39.52 AP using built-in config (COCO detection with R-50-FPN, 1x schedule).

kdexd · 2022-01-24T19:41:41Z

I rechecked everything to verify correctness and I found that I added an incorrect type hint in compute_ctrness_targets — anchors is List[Boxes] and not List[torch.Tensor].

ppwwyyxx · 2022-02-03T03:05:22Z

@zhanghang1989 could you merge this bug fix? Thanks!

facebook-github-bot · 2022-02-03T06:22:32Z

@zhanghang1989 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Summary: ### 🐛 One line bug info: [FCOS implementation in current master](https://github.com/facebookresearch/detectron2/blob/31ec19b3132a3ac609600802dd37b2b40a76b5c9/detectron2/modeling/meta_arch/fcos.py) is unable to handle empty instances. This bug went unnoticed because: (a) images with empty instances are usually filtered while loading COCO annotations, and (b) FCOS should not encounter empty instances with its default training hyper-parameters. However, if I switch to a more aggressive large-scale jitter (LSJ) cropping augmentation, the model may encounter an image crop without any boxes in it. This crashes training abruptly due to a size mismatch in the pairwise anchor matching matrix. ### Bug fix This PR lets FCOS handle empty instances by adding a dummy `[0, 0, 0, 0]` box labeled as background (ID = `num_classes`), similar to how it is handled in `RetinaNet` class. Training FCOS with LSJ augmentation does not crash anymore. **Additional refactor:** While I was working on a fix, I noticed some inconsistencies in variable representations and naming conventions. For example, the `pairwise_match` variable was a `(R: anchor points, M: GT boxes)` matrix — this is a transposed representation of what a `match_quality_matrix` represents in `RetinaNet` and `GeneralizedRCNN`, `(M: GT boxes, R: anchor points)`. I refactored the code to make it more uniform with D2 (11528ce) conventions and make FCOS logic flow similar to RetinaNet, given that RetinaNet was the primary baseline in FCOS paper. My changes include: - Refactoring `pairwise_match` to a `match_quality_matrix` with its representation consistent with the rest of meta architectures. Moreover, variables renaming like (`matched_boxes` –> `matched_gt_boxes`, `label` —> `gt_labels`, and `gt_index` —> `matched_indices`) make the code more consistent with the naming convention in rest of meta architectures. - Update: After ppwwyyxx , I added this explanation as a comment instead of refactoring code: ~Use a `Matcher` instead of simply doing `pairwise_match.max()`. Original code was replacing indices of unmatched anchors as `-1` and accessing GT labels/boxes by doing `gt_index.clip(0)`, which felt non-ideal.~ - Change the internal method `match_anchors` to compute and return per-instance `match_quality_matrix` (similar to using `pairwise_iou` in R-CNN). This modifies the old behavior of returning `matched_indices`. Since this method is used internally in `FCOS`, I renamed the method to `_match_anchors`. ### Any API changes? The call signature of `FCOS` is unchanged. `FCOS.match_anchors()` is now `FCOS._match_anchors()` with a different return value, but it was only used internally by `FCOS.label_anchors()`. ### Verification I verified my changes by one full training run of FCOS with ResNet-50-FPN on COCO detection, using the builtin config. The validation curves overlap very closely (orange: `master` branch, blue: with my changes). ![image](https://user-images.githubusercontent.com/10494087/147985981-4d0eceb4-2103-468c-9a98-f351441303ae.png) I should have fixed the training seed in both runs... so additionally, I manually checked the equality of `pairwise_match.tranpose(0, 1)` and my `match_quality_matrix`for first 20 iterations by fixing `train.seed = 0` in config. Everything matches exactly. Pull Request resolved: #3851 Reviewed By: wat3rBro Differential Revision: D33971235 Pulled By: zhanghang1989 fbshipit-source-id: 9eca18ef79c2942588cf12ead218a4f89bc8a297

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 3, 2022

kdexd changed the title ~~Handle empty instances in FCOS.~~ Bug fix: Handle empty instances in FCOS. Jan 3, 2022

xiaohu2015 mentioned this pull request Jan 21, 2022

improve fcos to support images without box annotations. #3910

Closed

ppwwyyxx reviewed Jan 21, 2022

View reviewed changes

detectron2/modeling/meta_arch/fcos.py Outdated Show resolved Hide resolved

xiaohu2015 mentioned this pull request Jan 24, 2022

fix bug when the target is empty in FCOS pytorch/vision#5267

Merged

ppwwyyxx approved these changes Jan 24, 2022

View reviewed changes

detectron2/modeling/meta_arch/fcos.py Show resolved Hide resolved

ppwwyyxx reviewed Jan 24, 2022

View reviewed changes

Handle empty instances in FCOS and add test.

81293e4

Fix type hint in compute_ctrness_targets.

e5f4ac8

zhanghang1989 requested a review from wat3rBro February 3, 2022 06:22

facebook-github-bot closed this Feb 7, 2022

kdexd deleted the fcos-patch branch February 8, 2022 04:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug fix: Handle empty instances in FCOS. #3851

Bug fix: Handle empty instances in FCOS. #3851

kdexd commented Jan 3, 2022 •

edited

xiaohu2015 commented Jan 21, 2022

ppwwyyxx left a comment

kdexd commented Jan 22, 2022 •

edited

kdexd commented Jan 22, 2022

ppwwyyxx commented Jan 22, 2022

ppwwyyxx commented Jan 24, 2022

ppwwyyxx Jan 24, 2022

kdexd Jan 24, 2022

kdexd commented Jan 24, 2022

kdexd commented Jan 24, 2022

ppwwyyxx commented Feb 3, 2022

facebook-github-bot commented Feb 3, 2022

Bug fix: Handle empty instances in FCOS. #3851

Bug fix: Handle empty instances in FCOS. #3851

Conversation

kdexd commented Jan 3, 2022 • edited

🐛 One line bug info: FCOS implementation in current master is unable to handle empty instances.

Bug fix

Any API changes?

Verification

xiaohu2015 commented Jan 21, 2022

ppwwyyxx left a comment

Choose a reason for hiding this comment

kdexd commented Jan 22, 2022 • edited

kdexd commented Jan 22, 2022

ppwwyyxx commented Jan 22, 2022

ppwwyyxx commented Jan 24, 2022

ppwwyyxx Jan 24, 2022

Choose a reason for hiding this comment

kdexd Jan 24, 2022

Choose a reason for hiding this comment

kdexd commented Jan 24, 2022

kdexd commented Jan 24, 2022

ppwwyyxx commented Feb 3, 2022

facebook-github-bot commented Feb 3, 2022

kdexd commented Jan 3, 2022 •

edited

kdexd commented Jan 22, 2022 •

edited