-
Notifications
You must be signed in to change notification settings - Fork 7.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix: Handle empty instances in FCOS. #3851
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing the bug and also bringing the style of FCOS closer to RetinaNet!
There are some unittests (test_empty_data
) in test_model_e2e.py
that check if model supports empty instances. Could you add FCOS there? Just
class FCOSE2ETest(InstanceModelE2ETest, unittest.TestCase):
CONFIG_PATH = "COCO-Detection/fcos....py"
is probably sufficient.
I also decided to add the internal method as Since this will end up in |
My tests are failing because of two reasons β
@ppwwyyxx: what is your suggestion? I am brute-forcing as many solutions as I can:
|
Yeah please feel free to ignore linter π’ This should let it support .py configs:
|
Thanks, the tests have passed. (macos + pytorch1.9 failure will be fixed in #3909) |
|
||
matched_indices.append(matched_idx) | ||
return matched_indices | ||
# Get matches and their labels using match quality matrix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line of comment seems unrelated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, an artifact of prior commit. Thanks for spotting!
I rechecked everything to verify correctness and I found that I added an incorrect type hint in |
@zhanghang1989 could you merge this bug fix? Thanks! |
@zhanghang1989 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Summary: ### π One line bug info: [FCOS implementation in current master](https://github.com/facebookresearch/detectron2/blob/31ec19b3132a3ac609600802dd37b2b40a76b5c9/detectron2/modeling/meta_arch/fcos.py) is unable to handle empty instances. This bug went unnoticed because: (a) images with empty instances are usually filtered while loading COCO annotations, and (b) FCOS should not encounter empty instances with its default training hyper-parameters. However, if I switch to a more aggressive large-scale jitter (LSJ) cropping augmentation, the model may encounter an image crop without any boxes in it. This crashes training abruptly due to a size mismatch in the pairwise anchor matching matrix. ### Bug fix This PR lets FCOS handle empty instances by adding a dummy `[0, 0, 0, 0]` box labeled as background (ID = `num_classes`), similar to how it is handled in `RetinaNet` class. Training FCOS with LSJ augmentation does not crash anymore. **Additional refactor:** While I was working on a fix, I noticed some inconsistencies in variable representations and naming conventions. For example, the `pairwise_match` variable was a `(R: anchor points, M: GT boxes)` matrix β this is a transposed representation of what a `match_quality_matrix` represents in `RetinaNet` and `GeneralizedRCNN`, `(M: GT boxes, R: anchor points)`. I refactored the code to make it more uniform with D2 (11528ce) conventions and make FCOS logic flow similar to RetinaNet, given that RetinaNet was the primary baseline in FCOS paper. My changes include: - Refactoring `pairwise_match` to a `match_quality_matrix` with its representation consistent with the rest of meta architectures. Moreover, variables renaming like (`matched_boxes` β> `matched_gt_boxes`, `label` β> `gt_labels`, and `gt_index` β> `matched_indices`) make the code more consistent with the naming convention in rest of meta architectures. - Update: After ppwwyyxx , I added this explanation as a comment instead of refactoring code: ~Use a `Matcher` instead of simply doing `pairwise_match.max()`. Original code was replacing indices of unmatched anchors as `-1` and accessing GT labels/boxes by doing `gt_index.clip(0)`, which felt non-ideal.~ - Change the internal method `match_anchors` to compute and return per-instance `match_quality_matrix` (similar to using `pairwise_iou` in R-CNN). This modifies the old behavior of returning `matched_indices`. Since this method is used internally in `FCOS`, I renamed the method to `_match_anchors`. ### Any API changes? The call signature of `FCOS` is unchanged. `FCOS.match_anchors()` is now `FCOS._match_anchors()` with a different return value, but it was only used internally by `FCOS.label_anchors()`. ### Verification I verified my changes by one full training run of FCOS with ResNet-50-FPN on COCO detection, using the builtin config. The validation curves overlap very closely (orange: `master` branch, blue: with my changes). ![image](https://user-images.githubusercontent.com/10494087/147985981-4d0eceb4-2103-468c-9a98-f351441303ae.png) I should have fixed the training seed in both runs... so additionally, I manually checked the equality of `pairwise_match.tranpose(0, 1)` and my `match_quality_matrix`for first 20 iterations by fixing `train.seed = 0` in config. Everything matches exactly. Pull Request resolved: #3851 Reviewed By: wat3rBro Differential Revision: D33971235 Pulled By: zhanghang1989 fbshipit-source-id: 9eca18ef79c2942588cf12ead218a4f89bc8a297
π One line bug info: FCOS implementation in current master is unable to handle empty instances.
This bug went unnoticed because: (a) images with empty instances are usually filtered while loading COCO annotations, and (b) FCOS should not encounter empty instances with its default training hyper-parameters. However, if I switch to a more aggressive large-scale jitter (LSJ) cropping augmentation, the model may encounter an image crop without any boxes in it. This crashes training abruptly due to a size mismatch in the pairwise anchor matching matrix.
Bug fix
This PR lets FCOS handle empty instances by adding a dummy
[0, 0, 0, 0]
box labeled as background (ID =num_classes
), similar to how it is handled inRetinaNet
class. Training FCOS with LSJ augmentation does not crash anymore.Additional refactor: While I was working on a fix, I noticed some inconsistencies in variable representations and naming conventions. For example, the
pairwise_match
variable was a(R: anchor points, M: GT boxes)
matrix β this is a transposed representation of what amatch_quality_matrix
represents inRetinaNet
andGeneralizedRCNN
,(M: GT boxes, R: anchor points)
. I refactored the code to make it more uniform with D2 conventions and make FCOS logic flow similar to RetinaNet, given that RetinaNet was the primary baseline in FCOS paper. My changes include:pairwise_match
to amatch_quality_matrix
with its representation consistent with the rest of meta architectures. Moreover, variables renaming like (matched_boxes
β>matched_gt_boxes
,label
β>gt_labels
, andgt_index
β>matched_indices
) make the code more consistent with the naming convention in rest of meta architectures.Use aMatcher
instead of simply doingpairwise_match.max()
. Original code was replacing indices of unmatched anchors as-1
and accessing GT labels/boxes by doinggt_index.clip(0)
, which felt non-ideal.match_anchors
to compute and return per-instancematch_quality_matrix
(similar to usingpairwise_iou
in R-CNN). This modifies the old behavior of returningmatched_indices
. Since this method is used internally inFCOS
, I renamed the method to_match_anchors
.Any API changes?
The call signature of
FCOS
is unchanged.FCOS.match_anchors()
is nowFCOS._match_anchors()
with a different return value, but it was only used internally byFCOS.label_anchors()
.Verification
I verified my changes by one full training run of FCOS with ResNet-50-FPN on COCO detection, using the builtin config. The validation curves overlap very closely (orange:
master
branch, blue: with my changes).I should have fixed the training seed in both runs... so additionally, I manually checked the equality of
pairwise_match.tranpose(0, 1)
and mymatch_quality_matrix
for first 20 iterations by fixingtrain.seed = 0
in config. Everything matches exactly.