Swin support for any input size #15986

FrancescoSaverioZuppichini · 2022-03-08T13:21:36Z

What does this PR do?

This PR adds padding to Swin allowing to support any (if divisible by 32) input size.

Example:

from transformers import SwinConfig, SwinModel
import torch

model = SwinModel(SwinConfig(image_size=384))
x = torch.randn((1, 3, 1024, 640))
out = model(x)

Moreover, it adds a new field to the outputs, hidden_states_spatial_dimensions, containing the spatial dimension of all the stages' inputs

HuggingFaceDocBuilderDev · 2022-03-08T13:25:42Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

sgugger

I has missed it inside the Maskformer PR but the hidden_states_spatial_dimensions in the model outputs is not a nested tensor, which will make distributed training fail for Maskformer when using the Trainer.

Otherwise, looks good to refactor this inside swin.

src/transformers/models/swin/modeling_swin.py

sgugger · 2022-03-08T15:30:49Z

src/transformers/models/swin/modeling_swin.py


        attention_windows = attention_output.view(-1, self.window_size, self.window_size, channels)
-        shifted_windows = window_reverse(attention_windows, self.window_size, height, width)  # B H' W' C
+        shifted_windows = window_reverse(attention_windows, self.window_size, height_pad, width_pad)  # B H' W' C


The comment could go above and be more helpful. I have no idea what H' and W' are for instance.

I've removed the comment since it is clear from the code what is going on (IMHO)

src/transformers/models/swin/modeling_swin.py

sgugger · 2022-03-08T15:32:02Z

tests/swin/test_modeling_swin.py

+        config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()
+
+        def set_nan_tensor_to_zero(t):
+


Suggested change

tests/swin/test_modeling_swin.py

src/transformers/models/swin/modeling_swin.py

NielsRogge · 2022-03-08T17:31:56Z

src/transformers/models/swin/modeling_swin.py

+            shape `(batch_size, sequence_length, hidden_size)`.
+
+            Hidden-states of the model at the output of each layer plus the initial embedding outputs.
+        hidden_states_spatial_dimensions (`tuple(tuple(int, int))`, *optional*):


Not sure whether it makes sense to do this. If the model just returns hidden states of shape (B, C, H, W), then we get both hidden states and spatial dimensions for free.

+1 on this, very nice idea.

From a design point of view, if we say output_hidden_states the user will expect to receive the hidden_states untouched. But I see why this is more convenient.

Pinging @sgugger for a feedback

I have already suggested this on the Maskformer PR but it was ignored so I thought it wasn't possible. If it indeed is, it would kill two birds with one stone.

Apologies if it was ignored in MaskFormer. it is indeed possible but requires some refactoring. let me try to change this

src/transformers/models/swin/modeling_swin.py

FrancescoSaverioZuppichini · 2022-03-09T15:26:03Z

Swin now returns a list of reshaped hidden_states (B, C, H, W). However, due to the view operation, the viewed tensor won't have .grad in it, so test_retain_grad_hidden_states_attentions fails. Not sure how to proceed

from transformers import SwinConfig, SwinModel
import torch

model = SwinModel(SwinConfig(image_size=384))
x = torch.randn((1, 3, 1024, 640))
out = model(x, output_hidden_states=True)
[print(e.shape) for e in out.hidden_states]

torch.Size([1, 96, 256, 160])
torch.Size([1, 192, 128, 80])
torch.Size([1, 384, 64, 40])
torch.Size([1, 768, 32, 20])
torch.Size([1, 768, 32, 20])

Maybe I am missing something, kindly pinging @sgugger

src/transformers/models/swin/modeling_swin.py

NielsRogge · 2022-03-11T09:14:59Z

src/transformers/models/swin/modeling_swin.py

+        pixel_values = self.maybe_pad(pixel_values, height, width)
+        embeddings = self.projection(pixel_values)
+        _, _, height, width = embeddings.shape
+        output_dimensions = (height, width)
+        embeddings = embeddings.flatten(2).transpose(1, 2)
+
+        return embeddings, output_dimensions


Can't you derive the output dimensions from the embeddings tensor here? Probably only for square sizes?

I think is easier and more robust to just get the shape

src/transformers/models/swin/modeling_swin.py

FrancescoSaverioZuppichini · 2022-03-11T13:03:16Z

Following @NielsRogge suggestion, we now return the reshaped hidden sized inside reshape_hidden_sizes in all four (Encoder/Model/MaskedImage/ImageClassifier) outputs

docs/source/model_doc/swin.mdx

src/transformers/models/swin/modeling_swin.py

NielsRogge · 2022-03-15T10:58:03Z

src/transformers/models/swin/modeling_swin.py

+            Last layer hidden-state of the first token of the sequence (classification token) after further processing
+            through the layers used for the auxiliary pretraining task. E.g. for BERT-family of models, this returns
+            the classification token after processing through a linear layer and a tanh activation function. The linear
+            layer weights are trained from the next sentence prediction (classification) objective during pretraining.


This should be updated, it comes from BERT

updated, I am wondering why we are not supporting cls token pooling

src/transformers/models/swin/modeling_swin.py

docs/source/model_doc/swin.mdx

src/transformers/models/swin/modeling_swin.py

FrancescoSaverioZuppichini · 2022-03-16T13:26:35Z

Thanks to all the reviewers. I've resolved all the conversation and renamed some layers to match our convention of Stage and Layer

tests/swin/test_modeling_swin.py

NielsRogge

LGTM! Some minor comments left.

Note that the tests of Swin can be cleaned up a bit, you can remove all the chunk_length and encoder_decoder stuff, since those doesn't apply for Swin

* padding done * correctly return one attention per layer * almost correct, attentions are not flatten one tuple per stage * tests green * doc * conversations * reshaping hidden_states * view in the test * reshape_hidden_states in Encoder and Model * new outputs with reshaped_hidden_states * conversations * doc * Update docs/source/model_doc/swin.mdx Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * conversations * fix tests * minor changes * resolved conversations * attentions one per stage * typo * typos * typos * function signature * CI * clean up tests Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

FrancescoSaverioZuppichini requested review from NielsRogge and sgugger March 8, 2022 13:22

sgugger reviewed Mar 8, 2022

View reviewed changes

NielsRogge reviewed Mar 8, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Outdated Show resolved Hide resolved

NielsRogge reviewed Mar 8, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Outdated Show resolved Hide resolved

NielsRogge reviewed Mar 8, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Outdated Show resolved Hide resolved

NielsRogge reviewed Mar 8, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Outdated Show resolved Hide resolved

sgugger approved these changes Mar 11, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Outdated Show resolved Hide resolved

NielsRogge reviewed Mar 11, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Show resolved Hide resolved

FrancescoSaverioZuppichini force-pushed the swin_padding branch from b6db2aa to 9d5d277 Compare March 11, 2022 14:19

FrancescoSaverioZuppichini requested a review from NielsRogge March 14, 2022 07:48

NielsRogge reviewed Mar 14, 2022

View reviewed changes

docs/source/model_doc/swin.mdx Outdated Show resolved Hide resolved

NielsRogge reviewed Mar 14, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Outdated Show resolved Hide resolved

NielsRogge reviewed Mar 14, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Outdated Show resolved Hide resolved

FrancescoSaverioZuppichini added 11 commits March 15, 2022 10:44

padding done

ba96431

correctly return one attention per layer

9edd61d

almost correct, attentions are not flatten one tuple per stage

c239848

tests green

09e4661

doc

91be4b7

conversations

8d6022f

reshaping hidden_states

f2b0d3d

view in the test

bfa25a9

reshape_hidden_states in Encoder and Model

d7abc4d

new outputs with reshaped_hidden_states

fbd07d3

conversations

e4b4db9

NielsRogge reviewed Mar 15, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Outdated Show resolved Hide resolved

NielsRogge reviewed Mar 15, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Outdated Show resolved Hide resolved

NielsRogge reviewed Mar 15, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Outdated Show resolved Hide resolved

NielsRogge reviewed Mar 15, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Outdated Show resolved Hide resolved

FrancescoSaverioZuppichini added 3 commits March 15, 2022 15:04

conversations

906df50

fix tests

72a0798

minor changes

9ff078e

NielsRogge reviewed Mar 16, 2022

View reviewed changes

docs/source/model_doc/swin.mdx Outdated Show resolved Hide resolved

NielsRogge reviewed Mar 16, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Outdated Show resolved Hide resolved

NielsRogge reviewed Mar 16, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Show resolved Hide resolved

NielsRogge reviewed Mar 16, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Outdated Show resolved Hide resolved

NielsRogge reviewed Mar 16, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Outdated Show resolved Hide resolved

NielsRogge reviewed Mar 16, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Show resolved Hide resolved

resolved conversations

5a5fc6d

NielsRogge reviewed Mar 16, 2022

View reviewed changes

src/transformers/models/swin/modeling_swin.py Outdated Show resolved Hide resolved

attentions one per stage

1d36bb8

FrancescoSaverioZuppichini added 5 commits March 16, 2022 14:27

typo

ce005be

typos

cfb3650

typos

66b2306

function signature

120488a

CI

25e5105

NielsRogge reviewed Mar 16, 2022

View reviewed changes

tests/swin/test_modeling_swin.py Outdated Show resolved Hide resolved

NielsRogge reviewed Mar 16, 2022

View reviewed changes

tests/swin/test_modeling_swin.py Outdated Show resolved Hide resolved

NielsRogge approved these changes Mar 16, 2022

View reviewed changes

clean up tests

669c1c7

FrancescoSaverioZuppichini merged commit 667b823 into master Mar 16, 2022

FrancescoSaverioZuppichini deleted the swin_padding branch March 16, 2022 17:38

		config, inputs_dict = self.model_tester.prepare_config_and_inputs_for_common()

		def set_nan_tensor_to_zero(t):

Swin support for any input size #15986

Swin support for any input size #15986

Uh oh!

Conversation

FrancescoSaverioZuppichini commented Mar 8, 2022

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Mar 8, 2022

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FrancescoSaverioZuppichini commented Mar 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

NielsRogge Mar 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

FrancescoSaverioZuppichini commented Mar 11, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

FrancescoSaverioZuppichini commented Mar 16, 2022

Uh oh!

Uh oh!

Uh oh!

NielsRogge left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

FrancescoSaverioZuppichini commented Mar 9, 2022 •

edited

Loading

NielsRogge Mar 11, 2022 •

edited

Loading

NielsRogge left a comment •

edited

Loading