Question about the method of handling the multi-patch inputs #3

QiushiYang · 2021-01-08T15:46:57Z

After reading your paper, I have a confusion that how do you handle the multi-patch (256) inputs in the encoder? It seems that in the encoder, the network fuses the 256 patches and learns one feature map (with size: (H/16, W/16, D)) of the whole original image (instead of the patch-wise image), and then decode this feature map to generate the segmentatoin map. Wonder how to process and fue the 256 patches in the encoder?

lzrobots · 2021-01-08T16:27:16Z

The size of each patch is 1616, if the size of the input image is HW, then the sequence length is (H/16)*(W/16)=HW/256. Not 256

The size of the output feature of the encoder is (HW/256, 1024), HW/256 is the sequence length and 1024 is the embedding dimension. Then we reshape it to feature map with size (H/16, W/16, 1024) and connect to the decoder. Please refer to Figure 1 in the main paper for more detail.

QiushiYang · 2021-01-09T04:44:40Z

Thanks for your helps. I think in the encoder, all layers carry out all calculations (MSA & MLP) in the inter-patch way, which doesn't consider the intra-patch information. Could it affect to capture small or local features?

lzrobots · 2021-01-09T06:45:08Z

agree. the intra-patch information has been done in the linear projection layer: 16x16x3 (RGB 3 channel) --> 1x1x1024 and there is no chance to do intra-patch within the 1x1.

lzrobots closed this as completed Jan 11, 2021

wsj20010128 mentioned this issue Aug 31, 2023

RuntimeError: no valid convolution algorithms available in CuDNN #61

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about the method of handling the multi-patch inputs #3

Question about the method of handling the multi-patch inputs #3

QiushiYang commented Jan 8, 2021

lzrobots commented Jan 8, 2021

QiushiYang commented Jan 9, 2021

lzrobots commented Jan 9, 2021 •

edited

Question about the method of handling the multi-patch inputs #3

Question about the method of handling the multi-patch inputs #3

Comments

QiushiYang commented Jan 8, 2021

lzrobots commented Jan 8, 2021

QiushiYang commented Jan 9, 2021

lzrobots commented Jan 9, 2021 • edited

lzrobots commented Jan 9, 2021 •

edited