New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
what does samples.mask do? #37
Comments
When batch>=2, the shape of different frames in a batch may not be the same. And the dataset will pad them into the same shape with zero, so it need the mask to tell which pixels are padded. |
I see. Great thanks |
@linjing-ai Could you please explain more about why do we need to filter out the padded area? What if we don't use this manner in the position encoding? Thank you very much. |
@prismformore The resolution(HxW) of different frames in a batch may not be the same , so we need to pad the smaller ones with zeros to the same size as the larger one. If all the frames have the same resolution, I think that 'mask = torch.zeros((B,H,W))' is okay. |
@linjing-ai sorry for not making my question clear. My real question is why do we need to take the padding area into consideration (by filtering out them). I am confused because the padding area should not influence the result. Maybe because there will be misalignment when resizing the feature map? |
` def forward(self, samples: NestedTensor):
""" The forward expects a NestedTensor, which consists of:
- samples.tensor: batched images, of shape [batch_size x 3 x H x W]
- samples.mask: a binary mask of shape [batch_size x H x W], containing 1 on padded pixels
As is described in models/deformable_detr.py, what does samples.mask do here?
The text was updated successfully, but these errors were encountered: