Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about the method of handling the multi-patch inputs #3

Closed
QiushiYang opened this issue Jan 8, 2021 · 3 comments
Closed

Question about the method of handling the multi-patch inputs #3

QiushiYang opened this issue Jan 8, 2021 · 3 comments

Comments

@QiushiYang
Copy link

After reading your paper, I have a confusion that how do you handle the multi-patch (256) inputs in the encoder? It seems that in the encoder, the network fuses the 256 patches and learns one feature map (with size: (H/16, W/16, D)) of the whole original image (instead of the patch-wise image), and then decode this feature map to generate the segmentatoin map. Wonder how to process and fue the 256 patches in the encoder?

@lzrobots
Copy link
Contributor

lzrobots commented Jan 8, 2021

The size of each patch is 1616, if the size of the input image is HW, then the sequence length is (H/16)*(W/16)=HW/256. Not 256

The size of the output feature of the encoder is (HW/256, 1024), HW/256 is the sequence length and 1024 is the embedding dimension. Then we reshape it to feature map with size (H/16, W/16, 1024) and connect to the decoder. Please refer to Figure 1 in the main paper for more detail.

@QiushiYang
Copy link
Author

Thanks for your helps. I think in the encoder, all layers carry out all calculations (MSA & MLP) in the inter-patch way, which doesn't consider the intra-patch information. Could it affect to capture small or local features?

@lzrobots
Copy link
Contributor

lzrobots commented Jan 9, 2021

agree. the intra-patch information has been done in the linear projection layer: 16x16x3 (RGB 3 channel) --> 1x1x1024 and there is no chance to do intra-patch within the 1x1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants