New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SegFormer #14019
Add SegFormer #14019
Conversation
3d5f203
to
fcfd28d
Compare
PR is ready for review, only thing to be added is padding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly left nits, this is very clean! Great new addition!
|
||
def pad_images(self, images): | ||
"""Pad images to ``self.crop_size``.""" | ||
padded_images = nn.functional.pad(images, pad=self.crop_size, value=self.padding_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for this, it would useful to have our own pad
function that uses PyTorch if images
is a torch Tensor and NumPy if images
is a NumPy array. Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, if you could implement that, that would be great :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to, but I won't have time to do this this week however (not even sure about next week either).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's numpy.pad
: https://numpy.org/doc/stable/reference/generated/numpy.pad.html that would make this implementation quite simple. Can you give it a try @NielsRogge ?
269eee9
to
b3bf5e1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this Niels! You'll have to rebase on the current master
branch and re-run make fix-copies
to ensure that the Korean readme also gets updated.
This looks good to me, I have only left nits and one request regarding the padding method.
|
||
def pad_images(self, images): | ||
"""Pad images to ``self.crop_size``.""" | ||
padded_images = nn.functional.pad(images, pad=self.crop_size, value=self.padding_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's numpy.pad
: https://numpy.org/doc/stable/reference/generated/numpy.pad.html that would make this implementation quite simple. Can you give it a try @NielsRogge ?
|
||
def forward(self, hidden_states, height, width, output_attentions=False): | ||
self_attention_outputs = self.attention( | ||
self.layer_norm_1(hidden_states), # in Segformer, layernorm is applied before self-attention |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be done outside of the call for readability?
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
a43c716
to
3d372c7
Compare
* First draft * Make style & quality * Improve conversion script * Add print statement to see actual slice * Make absolute tolerance smaller * Fix image classification models * Add post_process_semantic method * Disable padding * Improve conversion script * Rename to ForSemanticSegmentation, add integration test, remove post_process methods * Improve docs * Fix code quality * Fix feature extractor tests * Fix tests for image classification model * Delete file * Add is_torch_available to feature extractor * Improve documentation of feature extractor methods * Apply suggestions from @sgugger's code review Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com> * Apply some more suggestions of code review * Rebase with master * Fix rebase issues * Make sure model only outputs hidden states when the user wants to * Apply suggestions from code review * Add pad method * Support padding of 2d images * Add print statement * Add print statement * Move padding method to SegformerFeatureExtractor * Fix issue * Add casting of segmentation maps * Add test for padding * Add small note about padding Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
What does this PR do?
This PR adds SegFormer, a new model by NVIDIA that is surprisingly simple, yet very powerful for semantic segmentation of images. It uses a hierarchical Transformer as backbone, and an all-MLP decode head. I've implemented 3 models:
SegformerModel
(backbone-only)SegformerForImageClassification
(backbone + classifier head)SegformerForSemanticSegmentation
(backbone + semantic segmentation all-MLP head)Models are on the hub (with approval from the author): https://huggingface.co/models?other=segformer
Here's how to use the semantic segmentation model:
Quick inference notebook with visualization: https://colab.research.google.com/drive/1Kc1VLuFrWUPz0rZXA2E_rKQqdK7kV2iH?usp=sharing
To do/questions
SegformerDecodeHead
, rather than SegformerDecoder. It's more of a lightweight head, than a decoder. Is this ok?image_utils.py
), cc @sgugger. Currently, I rely ontorch.nn.functional.pad
, which makes the feature extractor depend on PyTorch. It could also make sense to do it in Numpy (this model for example pads after normalizing, so it would benefit from it as the output after normalization are Numpy arrays).SequenceClassifierOutput
, however this will render wrong shapes of logits in the docs. Logits are actually of shape (batch_size, num_labels, height/4, width/4).