-
Notifications
You must be signed in to change notification settings - Fork 6.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SimpleCopyPaste augmentation #5825
Conversation
@lezwon Thanks for kicking this off. Just a quick note, I would recommend moving this transform on detection instead of segmentation. MaskRCNN is actually run using the detection pipeline (I know it's confusing). |
Sure will do that 👍 |
hey @vadimkantorov @datumbox , I've moved the augmentation to the detection module. It is a basic functioning POC right now. It would be really nice to get some early feedback from ya'll :) I have attached some samples for your reference. |
7c58731
to
2782e89
Compare
@datumbox are tests/docs necessary for this PR? |
@lezwon Apologies for the delayed response. I was OOO and trying to catch up. Let me see if I can find someone who could support you on the remaining bit. No need for doc/tests etc at this point. Especially since it's on the references, the implementation won't appear on main TorchVision (for now). Let's focus on completing the implementation, verifying it works as expected, reviewing the API and training a model with it. |
@lezwon I was checking the paper, official TF implementation and unofficial pytorch one. In our case, let's proceed in the following way. Originally, copy paste transform works on dataset level and mixes current image with a paste image taken randomly from the dataset. In our case in order to avoid a mess we could have if dealing with datasets, we can start by copy/pasting data on batch level. For detection recipee, images and targets are tuple of tensors and tuple of dict (key->Tensor). CopyPaste transform should work on that level instead of single image/target pair. Thus, we have to add it on DataLoader when collating data: copypaste = CopyPaste()
def copypaste_collate_fn(batch):
return copypaste(*utils_collate_fn(batch))
data_loader = torch.utils.data.DataLoader(
dataset, batch_sampler=train_batch_sampler, num_workers=args.workers, collate_fn=copypaste_collate_fn
) Here is an unfinished implemention that we could update in order to make this PR merged: from torch import nn
def copy_paste(image, target, paste_image, paste_target):
# implement copy/paste logic
return paste_image, paste_target
class CopyPaste(nn.Module):
def __init__(self, inplace=True):
super().__init__()
self.inplace = inplace
def forward(self, images, targets=None):
assert targets is not None
# assert images is a tuple of Tensors
# assert targets is a tuple of dict of key -> Tensor
if not self.inplace:
# clone images and targets
pass
# images = [t1, t2, ..., tN]
# Let's define paste_images as shifted list of input images
# in TF they mix data on the dataset level
# paste_images = [t2, t3, ..., tN, t1]
images = list(images)
targets = list(targets)
shift = 1
le = len(images)
out_images = []
out_targets = []
for i in range(le):
image, target = images[i], targets[i]
paste_image, paste_target = images[(i + shift) % le], targets[(i + shift) % le]
image, target = copy_paste(image, target, paste_image, paste_target)
out_images.append(image)
out_targets.append(target)
return tuple(out_images), tuple(out_targets) Let me know what do you think |
Hi @vfdev-5, I have a few doubts and questions:
My current implementation updates the targets individually first and then updates the entire image batch as I felt it would be more efficient. Please do let me know if this is incorrect, and needs improvement. |
Thanks @lezwon ! Meanhwile, I changed previous cropping strategy to resize of the data to paste, if sizes are different. This way input image will have the same size after pasting data into it. Previously, common minimal size was found and it could have few problems with cropped targets and also input size can change its size. |
Right now, I encountered an issue in CopyPaste code with some batch during the training:
Something is not aligned and rather complicated to reproduce due to randomness. Let's see if we could make it a bit more bulletproof |
Ah yes, I faced this issue. The size of iscrowd and area does not match with masks, boxes for some images. Not sure about the reason for it in COCO dataset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vfdev-5 looks good. Just minor comments, let me know your thoughts.
Added fallbacks to support LSJ
… into transforms/simplecopypaste
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot! Last set of nits, promise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW the failing tests are unrelated and they are tracked separately on a different issue. Merging. |
Summary: * added simple POC * added jitter and crop options * added references * moved simplecopypaste to detection module * working POC for simple copy paste in detection * added comments * remove transforms from class updated the labels added gaussian blur * removed loop for mask calculation * replaced Gaussian blur with functional api * added inplace operations * added changes to accept tuples instead of tensors * - make copy paste functional - make only one copy of batch and target * add inplace support within copy paste functional * Updated code for copy-paste transform * Fixed code formatting * [skip ci] removed manual thresholding * Replaced cropping by resizing data to paste * Removed inplace arg (as useless) and put a check on iscrowd target * code-formatting * Updated copypaste op to make it torch scriptable Added fallbacks to support LSJ * Fixed flake8 * Updates according to the review Differential Revision: D37212651 fbshipit-source-id: 467b670164150dd5cc424f4d616d436295ce818d Co-authored-by: vfdev-5 <vfdev.5@gmail.com> Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
Summary: * added simple POC * added jitter and crop options * added references * moved simplecopypaste to detection module * working POC for simple copy paste in detection * added comments * remove transforms from class updated the labels added gaussian blur * removed loop for mask calculation * replaced Gaussian blur with functional api * added inplace operations * added changes to accept tuples instead of tensors * - make copy paste functional - make only one copy of batch and target * add inplace support within copy paste functional * Updated code for copy-paste transform * Fixed code formatting * [skip ci] removed manual thresholding * Replaced cropping by resizing data to paste * Removed inplace arg (as useless) and put a check on iscrowd target * code-formatting * Updated copypaste op to make it torch scriptable Added fallbacks to support LSJ * Fixed flake8 * Updates according to the review Reviewed By: datumbox Differential Revision: D37212651 fbshipit-source-id: 8bb4eb613d44071d381da19c030f6c63278c3815 Co-authored-by: vfdev-5 <vfdev.5@gmail.com> Co-authored-by: Vasilis Vryniotis <datumbox@users.noreply.github.com>
Due to the way the technique combines images within the batches, the mini-batch size is expected to affect the performance. I was expecting that a naive config of setting mini-batch-size=2 would have very detrimental effects but actually it doesn't hurt much. I'm currently running for larger batch-sizes where the method makes more sense and I'll update the results above as I get them. I will also try to the recipe described on the paper. Below I summarize the training results of using the specific augmentation.
|
@lezwon @vfdev-5 I've completed the training of the *RCNN models with SimpleCopyPaste (see updated previous comment). There are some extremely small improvements which can be the result of running the training for much longer. Given this, I don't think it's worth releasing updated weights for them. Might be worth rechecking once this implementation mades it to the new Transforms API but for now I think we can conclude the work. |
This PR is related to #3817. It implements SimpleCopyPaste augmentation for segmenatation tasks.
https://arxiv.org/abs/2012.07177