what does samples.mask do? #37

YJHMITWEB · 2021-01-11T09:34:48Z

` def forward(self, samples: NestedTensor):
""" The forward expects a NestedTensor, which consists of:
- samples.tensor: batched images, of shape [batch_size x 3 x H x W]
- samples.mask: a binary mask of shape [batch_size x H x W], containing 1 on padded pixels

        It returns a dict with the following elements:
           - "pred_logits": the classification logits (including no-object) for all queries.
                            Shape= [batch_size x num_queries x (num_classes + 1)]
           - "pred_boxes": The normalized boxes coordinates for all queries, represented as
                           (center_x, center_y, height, width). These values are normalized in [0, 1],
                           relative to the size of each individual image (disregarding possible padding).
                           See PostProcess for information on how to retrieve the unnormalized bounding box.
           - "aux_outputs": Optional, only returned when auxilary losses are activated. It is a list of
                            dictionnaries containing the two above keys for each decoder layer.
    """`

As is described in models/deformable_detr.py, what does samples.mask do here?

The text was updated successfully, but these errors were encountered:

linjing7 · 2021-01-15T02:19:47Z

When batch>=2, the shape of different frames in a batch may not be the same. And the dataset will pad them into the same shape with zero, so it need the mask to tell which pixels are padded.

YJHMITWEB · 2021-01-16T18:03:29Z

When batch>=2, the shape of different frames in a batch may not be the same. And the dataset will pad them into the same shape with zero, so it need the mask to tell which pixels are padded.

I see. Great thanks

prismformore · 2021-02-02T12:48:53Z

@linjing-ai Could you please explain more about why do we need to filter out the padded area? What if we don't use this manner in the position encoding? Thank you very much.

linjing7 · 2021-02-05T07:37:47Z

@prismformore The resolution(HxW) of different frames in a batch may not be the same , so we need to pad the smaller ones with zeros to the same size as the larger one. If all the frames have the same resolution, I think that 'mask = torch.zeros((B,H,W))' is okay.

prismformore · 2021-02-08T07:22:32Z

@linjing-ai sorry for not making my question clear. My real question is why do we need to take the padding area into consideration (by filtering out them). I am confused because the padding area should not influence the result. Maybe because there will be misalignment when resizing the feature map?

YJHMITWEB closed this as completed Jan 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what does samples.mask do? #37

what does samples.mask do? #37

YJHMITWEB commented Jan 11, 2021

linjing7 commented Jan 15, 2021

YJHMITWEB commented Jan 16, 2021

prismformore commented Feb 2, 2021 •

edited

linjing7 commented Feb 5, 2021

prismformore commented Feb 8, 2021 •

edited

what does samples.mask do? #37

what does samples.mask do? #37

Comments

YJHMITWEB commented Jan 11, 2021

linjing7 commented Jan 15, 2021

YJHMITWEB commented Jan 16, 2021

prismformore commented Feb 2, 2021 • edited

linjing7 commented Feb 5, 2021

prismformore commented Feb 8, 2021 • edited

prismformore commented Feb 2, 2021 •

edited

prismformore commented Feb 8, 2021 •

edited