Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using TinyVit_5m_224 for backbone to train segmentation task #117

Closed
haoxurt opened this issue Aug 15, 2022 · 6 comments
Closed

using TinyVit_5m_224 for backbone to train segmentation task #117

haoxurt opened this issue Aug 15, 2022 · 6 comments
Labels

Comments

@haoxurt
Copy link

haoxurt commented Aug 15, 2022

Hi, thanks for sharing your excellent work. I want to try to use TinyVit_5m_224 for backbone to train segmentation task which input size is 512x512. Need I changed the original weight because of different size? How can I do it ?

@wkcn
Copy link
Contributor

wkcn commented Aug 15, 2022

Thanks for your attention to our work!

TinyViT supports arbitrary input size, since the feature map will be padded if the attention window does not cover an entire window. You do not need to change the original weight.

Padding the feature map:

# https://github.com/microsoft/Cream/blob/main/TinyViT/models/tiny_vit.py#L346
x = x.view(B, H, W, C)
pad_b = (self.window_size - H %
         self.window_size) % self.window_size
pad_r = (self.window_size - W %
         self.window_size) % self.window_size
padding = pad_b > 0 or pad_r > 0

if padding:
    x = F.pad(x, (0, 0, 0, pad_r, 0, pad_b))

However, the padding operation may affect the performance of dense prediction.
You can change the window size to avoid the padding operation under 512x512 resolution for better performance.

For example, change the window sizes to [ 16, 16, 32, 16 ] like this config . The weight attention_biases will be resized when calling the function utils.load_pretrained.

The weight attention_biases is resized.

#https://github.com/microsoft/Cream/blob/main/TinyViT/utils.py#L136
relative_position_bias_table_pretrained_resized = torch.nn.functional.interpolate(
    relative_position_bias_table_pretrained.view(1, nH1, S1, S1), size=(S2, S2),
    mode='bicubic')

@wkcn wkcn added the TinyViT label Aug 15, 2022
@haoxurt
Copy link
Author

haoxurt commented Aug 15, 2022

Thanks your quick reply. According to your reply, I need only change the window sizes for better performance, and don't need change original weights. The weight attention_biases will be resized automatically in the function utils.load_pretrained . Is it?

@wkcn
Copy link
Contributor

wkcn commented Aug 15, 2022

Thanks your quick reply. According to your reply, I need only change the window sizes for better performance, and don't need change original weights. The weight attention_biases will be resized automatically in the function utils.load_pretrained . Is it?

Yes : )

@haoxurt
Copy link
Author

haoxurt commented Aug 15, 2022

Thanks very much!

@haoxurt haoxurt closed this as completed Aug 15, 2022
@HaoWuSR
Copy link

HaoWuSR commented Oct 9, 2022

Thanks very much!

Hi, could you please share the model for segmentation?
It is grateful if you could help me to reproduce the network!

@wkcn
Copy link
Contributor

wkcn commented Oct 13, 2022

Thanks very much!

Hi, could you please share the model for segmentation? It is grateful if you could help me to reproduce the network!

Hi @HaoWuSR , sorry that we did try the model on the segmentation task.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants