Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fix: Handle empty instances in FCOS. #3851

Closed
wants to merge 2 commits into from
Closed

Bug fix: Handle empty instances in FCOS. #3851

wants to merge 2 commits into from

Conversation

kdexd
Copy link
Contributor

@kdexd kdexd commented Jan 3, 2022

πŸ› One line bug info: FCOS implementation in current master is unable to handle empty instances.

This bug went unnoticed because: (a) images with empty instances are usually filtered while loading COCO annotations, and (b) FCOS should not encounter empty instances with its default training hyper-parameters. However, if I switch to a more aggressive large-scale jitter (LSJ) cropping augmentation, the model may encounter an image crop without any boxes in it. This crashes training abruptly due to a size mismatch in the pairwise anchor matching matrix.

Bug fix

This PR lets FCOS handle empty instances by adding a dummy [0, 0, 0, 0] box labeled as background (ID = num_classes), similar to how it is handled in RetinaNet class. Training FCOS with LSJ augmentation does not crash anymore.

Additional refactor: While I was working on a fix, I noticed some inconsistencies in variable representations and naming conventions. For example, the pairwise_match variable was a (R: anchor points, M: GT boxes) matrix β€” this is a transposed representation of what a match_quality_matrix represents in RetinaNet and GeneralizedRCNN, (M: GT boxes, R: anchor points). I refactored the code to make it more uniform with D2 conventions and make FCOS logic flow similar to RetinaNet, given that RetinaNet was the primary baseline in FCOS paper. My changes include:

  • Refactoring pairwise_match to a match_quality_matrix with its representation consistent with the rest of meta architectures. Moreover, variables renaming like (matched_boxes –> matched_gt_boxes, label β€”> gt_labels, and gt_index β€”> matched_indices) make the code more consistent with the naming convention in rest of meta architectures.
  • Update: After @ppwwyyxx , I added this explanation as a comment instead of refactoring code: Use a Matcher instead of simply doing pairwise_match.max(). Original code was replacing indices of unmatched anchors as -1 and accessing GT labels/boxes by doing gt_index.clip(0), which felt non-ideal.
  • Change the internal method match_anchors to compute and return per-instance match_quality_matrix (similar to using pairwise_iou in R-CNN). This modifies the old behavior of returning matched_indices. Since this method is used internally in FCOS, I renamed the method to _match_anchors.

Any API changes?

The call signature of FCOS is unchanged. FCOS.match_anchors() is now FCOS._match_anchors() with a different return value, but it was only used internally by FCOS.label_anchors().

Verification

I verified my changes by one full training run of FCOS with ResNet-50-FPN on COCO detection, using the builtin config. The validation curves overlap very closely (orange: master branch, blue: with my changes).

image

I should have fixed the training seed in both runs... so additionally, I manually checked the equality of pairwise_match.tranpose(0, 1) and my match_quality_matrixfor first 20 iterations by fixing train.seed = 0 in config. Everything matches exactly.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 3, 2022
@kdexd kdexd changed the title Handle empty instances in FCOS. Bug fix: Handle empty instances in FCOS. Jan 3, 2022
@xiaohu2015
Copy link
Contributor

@kdexd hi,I also fix this bug #3910, but different with you.

Copy link
Contributor

@ppwwyyxx ppwwyyxx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the bug and also bringing the style of FCOS closer to RetinaNet!

There are some unittests (test_empty_data) in test_model_e2e.py that check if model supports empty instances. Could you add FCOS there? Just

class FCOSE2ETest(InstanceModelE2ETest, unittest.TestCase):
    CONFIG_PATH = "COCO-Detection/fcos....py"

is probably sufficient.

detectron2/modeling/meta_arch/fcos.py Outdated Show resolved Hide resolved
@kdexd
Copy link
Contributor Author

kdexd commented Jan 22, 2022

I also decided to add the internal method as FCOS._match_anchors which is only called inside FCOS.label_anchors. I am extending FCOS to do instance segmentation in my current project, and I need to override FCOS.label_anchors to return matched masks as well β€” it results in a lot of code duplication for computing match_quality_matrix. It is worth having a callable that returns a match_quality_matrix, analogous to pairwise_iou in RetinaNet / R-CNN. Let me know what you think.

Since this will end up in master, I have started a training job to verify numbers (job same as that in my main comment). I will update this thread when it converges!

@kdexd
Copy link
Contributor Author

kdexd commented Jan 22, 2022

My tests are failing because of two reasons β€”

  1. Linting tests are failing on files that are outside this PR.
  2. Newly added FCOS tests are failing because they do not directly support reading lazy configs (if I understood the logs correctly)?

@ppwwyyxx: what is your suggestion? I am brute-forcing as many solutions as I can:

  • Add YAML config for FCOS
  • Add Lazy config handling in tests in this PR.
  • Run FCOS with forced empty boxes / LSJ augmentation and post behavior here; work on tests in separate PR.
  • ..? πŸ€”

@ppwwyyxx
Copy link
Contributor

Yeah please feel free to ignore linter 😒

This should let it support .py configs:

--- i/detectron2/utils/testing.py
+++ w/detectron2/utils/testing.py
@@ -4,6 +4,7 @@ import numpy as np
 import torch

 from detectron2 import model_zoo
+from detectron2.config import CfgNode, instantiate
 from detectron2.data import DatasetCatalog
 from detectron2.data.detection_utils import read_image
 from detectron2.modeling import build_model
@@ -21,9 +22,12 @@ def get_model_no_weights(config_path):
     Like model_zoo.get, but do not load any weights (even pretrained)
     """
     cfg = model_zoo.get_config(config_path)
-    if not torch.cuda.is_available():
-        cfg.MODEL.DEVICE = "cpu"
-    return build_model(cfg)
+    if isinstance(cfg, CfgNode):
+        if not torch.cuda.is_available():
+            cfg.MODEL.DEVICE = "cpu"
+        return build_model(cfg)
+    else:
+        return instantiate(cfg.model)

@ppwwyyxx
Copy link
Contributor

Thanks, the tests have passed. (macos + pytorch1.9 failure will be fixed in #3909)


matched_indices.append(matched_idx)
return matched_indices
# Get matches and their labels using match quality matrix.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line of comment seems unrelated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, an artifact of prior commit. Thanks for spotting!

@kdexd
Copy link
Contributor Author

kdexd commented Jan 24, 2022

After the recent refactor (removing anchor matcher and re-adding FCOS._match_anchors, I am getting 39.52 AP using built-in config (COCO detection with R-50-FPN, 1x schedule).

image

@kdexd
Copy link
Contributor Author

kdexd commented Jan 24, 2022

I rechecked everything to verify correctness and I found that I added an incorrect type hint in compute_ctrness_targets β€” anchors is List[Boxes] and not List[torch.Tensor].

@ppwwyyxx
Copy link
Contributor

ppwwyyxx commented Feb 3, 2022

@zhanghang1989 could you merge this bug fix? Thanks!

@facebook-github-bot
Copy link
Contributor

@zhanghang1989 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Feb 7, 2022
Summary:
### πŸ› One line bug info: [FCOS implementation in current master](https://github.com/facebookresearch/detectron2/blob/31ec19b3132a3ac609600802dd37b2b40a76b5c9/detectron2/modeling/meta_arch/fcos.py) is unable to handle empty instances.

This bug went unnoticed because: (a) images with empty instances are usually filtered while loading COCO annotations, and (b) FCOS should not encounter empty instances with its default training hyper-parameters. However, if I switch to a more aggressive large-scale jitter (LSJ) cropping augmentation, the model may encounter an image crop without any boxes in it. This crashes training abruptly due to a size mismatch in the pairwise anchor matching matrix.

### Bug fix

This PR lets FCOS handle empty instances by adding a dummy `[0, 0, 0, 0]` box labeled as background (ID = `num_classes`), similar to how it is handled in `RetinaNet` class. Training FCOS with LSJ augmentation does not crash anymore.

**Additional refactor:** While I was working on a fix, I noticed some inconsistencies in variable representations and naming conventions. For example, the `pairwise_match` variable was a `(R: anchor points, M: GT boxes)` matrix β€” this is a transposed representation of what a `match_quality_matrix` represents in `RetinaNet` and `GeneralizedRCNN`, `(M: GT boxes, R: anchor points)`. I refactored the code to make it more uniform with D2 (11528ce) conventions and make FCOS logic flow similar to RetinaNet, given that RetinaNet was the primary baseline in FCOS paper. My changes include:
  - Refactoring `pairwise_match` to a `match_quality_matrix` with its representation consistent with the rest of meta architectures. Moreover, variables renaming like (`matched_boxes` –> `matched_gt_boxes`, `label` β€”> `gt_labels`, and `gt_index` β€”> `matched_indices`) make the code more consistent with the naming convention in rest of meta architectures.
  - Update: After ppwwyyxx , I added this explanation as a comment instead of refactoring code: ~Use a `Matcher` instead of simply doing `pairwise_match.max()`. Original code was replacing indices of unmatched anchors as `-1` and accessing GT labels/boxes by doing `gt_index.clip(0)`, which felt non-ideal.~
  - Change the internal method `match_anchors` to compute and return per-instance `match_quality_matrix` (similar to using `pairwise_iou` in R-CNN). This modifies the old behavior of returning `matched_indices`. Since this method is used internally in `FCOS`, I renamed the method to `_match_anchors`.

### Any API changes?

The call signature of `FCOS` is unchanged. `FCOS.match_anchors()` is now `FCOS._match_anchors()` with a different return value, but it was only used internally by `FCOS.label_anchors()`.

### Verification

I verified my changes by one full training run of FCOS with ResNet-50-FPN on COCO detection, using the builtin config. The validation curves overlap very closely (orange: `master` branch, blue: with my changes).

![image](https://user-images.githubusercontent.com/10494087/147985981-4d0eceb4-2103-468c-9a98-f351441303ae.png)

I should have fixed the training seed in both runs... so additionally, I manually checked the equality of `pairwise_match.tranpose(0, 1)` and my `match_quality_matrix`for first 20 iterations by fixing `train.seed = 0` in config. Everything matches exactly.

Pull Request resolved: #3851

Reviewed By: wat3rBro

Differential Revision: D33971235

Pulled By: zhanghang1989

fbshipit-source-id: 9eca18ef79c2942588cf12ead218a4f89bc8a297
@kdexd kdexd deleted the fcos-patch branch February 8, 2022 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants