Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

what does samples.mask do? #37

Closed
YJHMITWEB opened this issue Jan 11, 2021 · 5 comments
Closed

what does samples.mask do? #37

YJHMITWEB opened this issue Jan 11, 2021 · 5 comments

Comments

@YJHMITWEB
Copy link

` def forward(self, samples: NestedTensor):
""" The forward expects a NestedTensor, which consists of:
- samples.tensor: batched images, of shape [batch_size x 3 x H x W]
- samples.mask: a binary mask of shape [batch_size x H x W], containing 1 on padded pixels

        It returns a dict with the following elements:
           - "pred_logits": the classification logits (including no-object) for all queries.
                            Shape= [batch_size x num_queries x (num_classes + 1)]
           - "pred_boxes": The normalized boxes coordinates for all queries, represented as
                           (center_x, center_y, height, width). These values are normalized in [0, 1],
                           relative to the size of each individual image (disregarding possible padding).
                           See PostProcess for information on how to retrieve the unnormalized bounding box.
           - "aux_outputs": Optional, only returned when auxilary losses are activated. It is a list of
                            dictionnaries containing the two above keys for each decoder layer.
    """`

As is described in models/deformable_detr.py, what does samples.mask do here?

@linjing7
Copy link

When batch>=2, the shape of different frames in a batch may not be the same. And the dataset will pad them into the same shape with zero, so it need the mask to tell which pixels are padded.

@YJHMITWEB
Copy link
Author

When batch>=2, the shape of different frames in a batch may not be the same. And the dataset will pad them into the same shape with zero, so it need the mask to tell which pixels are padded.

I see. Great thanks

@prismformore
Copy link

prismformore commented Feb 2, 2021

@linjing-ai Could you please explain more about why do we need to filter out the padded area? What if we don't use this manner in the position encoding? Thank you very much.

@linjing7
Copy link

linjing7 commented Feb 5, 2021

@prismformore The resolution(HxW) of different frames in a batch may not be the same , so we need to pad the smaller ones with zeros to the same size as the larger one. If all the frames have the same resolution, I think that 'mask = torch.zeros((B,H,W))' is okay.

@prismformore
Copy link

prismformore commented Feb 8, 2021

@linjing-ai sorry for not making my question clear. My real question is why do we need to take the padding area into consideration (by filtering out them). I am confused because the padding area should not influence the result. Maybe because there will be misalignment when resizing the feature map?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants