using TinyVit_5m_224 for backbone to train segmentation task #117

haoxurt · 2022-08-15T07:09:16Z

Hi, thanks for sharing your excellent work. I want to try to use TinyVit_5m_224 for backbone to train segmentation task which input size is 512x512. Need I changed the original weight because of different size? How can I do it ?

wkcn · 2022-08-15T07:36:40Z

Thanks for your attention to our work!

TinyViT supports arbitrary input size, since the feature map will be padded if the attention window does not cover an entire window. You do not need to change the original weight.

Padding the feature map:

# https://github.com/microsoft/Cream/blob/main/TinyViT/models/tiny_vit.py#L346
x = x.view(B, H, W, C)
pad_b = (self.window_size - H %
         self.window_size) % self.window_size
pad_r = (self.window_size - W %
         self.window_size) % self.window_size
padding = pad_b > 0 or pad_r > 0

if padding:
    x = F.pad(x, (0, 0, 0, pad_r, 0, pad_b))

However, the padding operation may affect the performance of dense prediction.
You can change the window size to avoid the padding operation under 512x512 resolution for better performance.

For example, change the window sizes to [ 16, 16, 32, 16 ] like this config . The weight attention_biases will be resized when calling the function utils.load_pretrained.

The weight attention_biases is resized.

#https://github.com/microsoft/Cream/blob/main/TinyViT/utils.py#L136
relative_position_bias_table_pretrained_resized = torch.nn.functional.interpolate(
    relative_position_bias_table_pretrained.view(1, nH1, S1, S1), size=(S2, S2),
    mode='bicubic')

haoxurt · 2022-08-15T08:10:44Z

Thanks your quick reply. According to your reply, I need only change the window sizes for better performance, and don't need change original weights. The weight attention_biases will be resized automatically in the function utils.load_pretrained . Is it?

wkcn · 2022-08-15T08:18:59Z

Thanks your quick reply. According to your reply, I need only change the window sizes for better performance, and don't need change original weights. The weight attention_biases will be resized automatically in the function utils.load_pretrained . Is it?

Yes : )

haoxurt · 2022-08-15T08:24:41Z

Thanks very much!

HaoWuSR · 2022-10-09T10:52:59Z

Thanks very much!

Hi, could you please share the model for segmentation?
It is grateful if you could help me to reproduce the network!

wkcn · 2022-10-13T01:49:54Z

Thanks very much!

Hi, could you please share the model for segmentation? It is grateful if you could help me to reproduce the network!

Hi @HaoWuSR , sorry that we did try the model on the segmentation task.

wkcn added the TinyViT label Aug 15, 2022

haoxurt closed this as completed Aug 15, 2022

seefun mentioned this issue Aug 30, 2023

Add TinyVit models huggingface/pytorch-image-models#1937

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

using TinyVit_5m_224 for backbone to train segmentation task #117

using TinyVit_5m_224 for backbone to train segmentation task #117

haoxurt commented Aug 15, 2022

wkcn commented Aug 15, 2022

haoxurt commented Aug 15, 2022

wkcn commented Aug 15, 2022

haoxurt commented Aug 15, 2022

HaoWuSR commented Oct 9, 2022

wkcn commented Oct 13, 2022

using TinyVit_5m_224 for backbone to train segmentation task #117

using TinyVit_5m_224 for backbone to train segmentation task #117

Comments

haoxurt commented Aug 15, 2022

wkcn commented Aug 15, 2022

haoxurt commented Aug 15, 2022

wkcn commented Aug 15, 2022

haoxurt commented Aug 15, 2022

HaoWuSR commented Oct 9, 2022

wkcn commented Oct 13, 2022