[Fix] fix patch_embed and pos_embed mismatch error #685

xiexinch · 2021-07-08T02:33:28Z

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Fix patch_embed and pos_embed mismatch error and remove out_shape param.

Modification

In resize_pos_embed() function, interpolation will match padded input image.
Replace out_shape parameters with output_cls_token, user may choose whether append cls_token to output feature map.

Checklist

Pre-commit or other linting tools are used to fix the potential lint issues.
The modification is covered by complete unit tests. If not, please add more unit test to ensure the correctness.
If the modification has potential influence on downstream projects, this PR should be tested with downstream projects, like MMDet or MMDet3D.
The documentation has been modified accordingly, like docstring or example tutorials.

codecov · 2021-07-08T02:58:13Z

Codecov Report

Merging #685 (9a4e1e8) into master (5097d55) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #685   +/-   ##
=======================================
  Coverage   85.78%   85.78%           
=======================================
  Files         105      105           
  Lines        5627     5627           
  Branches      915      916    +1     
=======================================
  Hits         4827     4827           
  Misses        621      621           
  Partials      179      179

Flag	Coverage Δ
unittests	`85.76% <100.00%> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmseg/models/backbones/vit.py	`84.84% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5097d55...9a4e1e8. Read the comment docs.

xvjiarui · 2021-07-08T04:44:57Z

mmseg/models/backbones/vit.py

+        x, H, W = self.patch_embed(
+            inputs), self.patch_embed.DH, self.patch_embed.DW


We may use hw_shape instead, to be consistent with Swin Transformer.

xvjiarui · 2021-07-08T04:45:30Z

mmseg/models/backbones/vit.py

@@ -317,14 +316,13 @@ def init_weights(self):
                    constant_init(m.bias, 0)
                    constant_init(m.weight, 1.0)

-    def _pos_embeding(self, img, patched_img, pos_embed):
+    def _pos_embeding(self, downsampled_img_size, patched_img, pos_embed):


Suggested change

def _pos_embeding(self, downsampled_img_size, patched_img, pos_embed):

def _pos_embeding(self, x, hw_shape, pos_embed):

Junjun2016 · 2021-07-08T16:56:51Z

mmseg/models/backbones/vit.py

-        assert out_shape in ['NLC',
-                             'NCHW'], 'output shape must be "NLC" or "NCHW".'
+        if output_cls_token:
+            assert with_cls_token is True, f'with_cls_token must be True if' \


It is better to add this description to Docstring.

Junjun2016 · 2021-07-08T17:10:07Z

mmseg/models/backbones/vit.py

            pos_embed (torch.Tensor): pos_embed weights.
            input_shpae (tuple): Tuple for (input_h, intput_w).
            pos_shape (tuple): Tuple for (pos_h, pos_w).
-            patch_size (int): Patch size.
+            mode (str): Algorithm used for upsampling.


It would be better to descript the Args in more detail, and we don’t need abbreviations in the description, e.g. pos_embed should be position embedding.

Junjun2016 · 2021-07-08T17:11:46Z

mmseg/models/backbones/vit.py

@@ -371,7 +370,7 @@ def resize_pos_embed(pos_embed, input_shpae, pos_shape, patch_size, mode):
            1, pos_h, pos_w, pos_embed.shape[2]).permute(0, 3, 1, 2)
        pos_embed_weight = F.interpolate(
            pos_embed_weight,
-            size=[input_h // patch_size, input_w // patch_size],
+            size=[input_h, input_w],


Suggested change

size=[input_h, input_w],

size=input_shpae,

so input_h, input_w = input_shpae is redundant.

Junjun2016 · 2021-07-08T17:14:41Z

mmseg/models/backbones/vit.py

            patched_img (torch.Tensor): The patched image, it should be
                shape of [B, L1, C].
+            hw_shape (tuple): The downsampled image resolution.


Maybe output_shape is better.

I am not sure. @xvjiarui

Junjun2016 · 2021-07-08T17:16:29Z

mmseg/models/backbones/vit.py

                                              self.interpolate_mode)
        return self.drop_after_pos(patched_img + pos_embed)

    @staticmethod
-    def resize_pos_embed(pos_embed, input_shpae, pos_shape, patch_size, mode):
+    def resize_pos_embed(pos_embed, input_shpae, pos_shape, mode):
        """Resize pos_embed weights.

        Resize pos_embed using bicubic interpolate method.
        Args:
            pos_embed (torch.Tensor): pos_embed weights.
            input_shpae (tuple): Tuple for (input_h, intput_w).


Maybe output_shape is better.

I am not sure. @xvjiarui

…into fix_vit_pos_embed

If -> Whether

…mentation into fix_vit_pos_embed

xvjiarui · 2021-07-19T06:21:10Z

Some configs still have out_shape.
Please kindly delete them.

xiexinch · 2021-07-19T07:28:03Z

Some configs still have out_shape.
Please kindly delete them.

I have checked all vit configs, out_shape is not found.

* fix patch_embed and pos_embed mismatch error * add docstring * update unittest * use downsampled image shape * use tuple * remove unused parameters and add doc * fix init weights function * revise docstring * Update vit.py If -> Whether * fix lint Co-authored-by: Junjun2016 <hejunjun@sjtu.edu.cn>

xiexinch added 3 commits July 7, 2021 18:20

fix patch_embed and pos_embed mismatch error

f7e2faa

add docstring

dcbc3c7

update unittest

99c5962

use downsampled image shape

2738dd7

xvjiarui reviewed Jul 8, 2021

View reviewed changes

xiexinch added 3 commits July 8, 2021 13:10

use tuple

822a115

remove unused parameters and add doc

f1e97df

fix init weights function

d59d2e3

Junjun2016 reviewed Jul 8, 2021

View reviewed changes

revise docstring

efbd67e

Junjun2016 approved these changes Jul 9, 2021

View reviewed changes

xiexinch and others added 5 commits July 19, 2021 10:23

Merge branch 'master' of https://github.com/open-mmlab/mmsegmentation …

523c440

…into fix_vit_pos_embed

Update vit.py

a4f4d5c

If -> Whether

Merge branch 'fix_vit_pos_embed' of https://github.com/xiexinch/mmseg…

0c2e560

…mentation into fix_vit_pos_embed

fix lint

bab4c03

Merge branch 'fix_vit_pos_embed' of https://github.com/xiexinch/mmseg…

9a4e1e8

…mentation into fix_vit_pos_embed

xvjiarui merged commit dff7a96 into open-mmlab:master Jul 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] fix patch_embed and pos_embed mismatch error #685

[Fix] fix patch_embed and pos_embed mismatch error #685

xiexinch commented Jul 8, 2021

codecov bot commented Jul 8, 2021 •

edited

Loading

xvjiarui Jul 8, 2021

xvjiarui Jul 8, 2021

Junjun2016 Jul 8, 2021

Junjun2016 Jul 8, 2021

Junjun2016 Jul 8, 2021 •

edited

Loading

Junjun2016 Jul 8, 2021

Junjun2016 Jul 8, 2021

xvjiarui commented Jul 19, 2021

xiexinch commented Jul 19, 2021

		x, H, W = self.patch_embed(
		inputs), self.patch_embed.DH, self.patch_embed.DW

	def _pos_embeding(self, downsampled_img_size, patched_img, pos_embed):
	def _pos_embeding(self, x, hw_shape, pos_embed):

[Fix] fix patch_embed and pos_embed mismatch error #685

[Fix] fix patch_embed and pos_embed mismatch error #685

Conversation

xiexinch commented Jul 8, 2021

Motivation

Modification

Checklist

codecov bot commented Jul 8, 2021 • edited Loading

Codecov Report

xvjiarui Jul 8, 2021

Choose a reason for hiding this comment

xvjiarui Jul 8, 2021

Choose a reason for hiding this comment

Junjun2016 Jul 8, 2021

Choose a reason for hiding this comment

Junjun2016 Jul 8, 2021

Choose a reason for hiding this comment

Junjun2016 Jul 8, 2021 • edited Loading

Choose a reason for hiding this comment

Junjun2016 Jul 8, 2021

Choose a reason for hiding this comment

Junjun2016 Jul 8, 2021

Choose a reason for hiding this comment

xvjiarui commented Jul 19, 2021

xiexinch commented Jul 19, 2021

codecov bot commented Jul 8, 2021 •

edited

Loading

Junjun2016 Jul 8, 2021 •

edited

Loading