Skip to content

Move logic into pixelshuffle layer#17899

Merged
amyeroberts merged 3 commits intohuggingface:mainfrom
amyeroberts:move-logic-into-pixelshuffle-layer
Jun 28, 2022
Merged

Move logic into pixelshuffle layer#17899
amyeroberts merged 3 commits intohuggingface:mainfrom
amyeroberts:move-logic-into-pixelshuffle-layer

Conversation

@amyeroberts
Copy link
Copy Markdown
Contributor

What does this PR do?

Moves logic relating to PixelShuffle layer into layer class. This is to provide a consistent usage wrt the PyTorch pixel shuffle layer and makes sure all necessary logic is ported if any #Copied from statements are used.

Also renamed layer PixelShuffle -> TFSwinPixelShuffle to reflect naming in the rest of the repo. The following was run to make sure the models are still compatible with current weights:

from transformers import AutoFeatureExtractor, TFSwinForImageClassification

checkpoint = "microsoft/swin-tiny-patch4-window7-224"

# relative_position_index isn't updated during training. In TF set as instance param
print("\nTFSwinForImageClassification - from PyTorch checkpoint")
tf_model = TFSwinForImageClassification.from_pretrained(checkpoint, from_pt=True)
print("\nTFSwinForImageClassification - from TF checkpoint")
tf_model = TFSwinForImageClassification.from_pretrained(checkpoint)

With the following output. Note: relative_position_index isn't updated during training and is set as an instance param in the TF model

TFSwinForImageClassification - from PyTorch checkpoint
Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFSwinForImageClassification: ['swin.encoder.layers.3.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.4.attention.self.relative_position_index', 'swin.encoder.layers.1.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.2.attention.self.relative_position_index', 'swin.encoder.layers.1.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.3.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.0.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.0.blocks.0.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.1.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.3.attention.self.relative_position_index', 'swin.encoder.layers.2.blocks.5.attention.self.relative_position_index']
- This IS expected if you are initializing TFSwinForImageClassification from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFSwinForImageClassification from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFSwinForImageClassification were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFSwinForImageClassification for predictions without further training.

TFSwinForImageClassification - from TF checkpoint
All model checkpoint layers were used when initializing TFSwinForImageClassification.

All the layers of TFSwinForImageClassification were initialized from the model checkpoint at microsoft/swin-tiny-patch4-window7-224.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFSwinForImageClassification for predictions without further training.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

HuggingFaceDocBuilderDev commented Jun 27, 2022

The documentation is not available anymore as the PR was closed or merged.

@amyeroberts amyeroberts requested review from Rocketknight1 and sgugger and removed request for sgugger June 27, 2022 21:51
Copy link
Copy Markdown
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Member

@Rocketknight1 Rocketknight1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well!

Comment on lines +1245 to +1248
permutation = tf.constant(
[[i + j * block_size_squared for i in range(block_size_squared) for j in range(output_depth)]]
)
hidden_states = tf.gather(params=hidden_states, indices=tf.tile(permutation, [batch_size, 1]), batch_dims=-1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any appearance of gather in this kind of context is a 100% guarantee that someone is emulating the specific details of a weird Torch function.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too true 😭

@amyeroberts amyeroberts merged commit f71895a into huggingface:main Jun 28, 2022
@amyeroberts amyeroberts deleted the move-logic-into-pixelshuffle-layer branch June 28, 2022 12:04
viclzhu pushed a commit to viclzhu/transformers that referenced this pull request Jul 18, 2022
* Move all pixelshuffle logic into layer

* Rename layer

* Use correct input to function
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants