Add SegFormer #14019

NielsRogge · 2021-10-15T11:43:57Z

What does this PR do?

This PR adds SegFormer, a new model by NVIDIA that is surprisingly simple, yet very powerful for semantic segmentation of images. It uses a hierarchical Transformer as backbone, and an all-MLP decode head. I've implemented 3 models:

SegformerModel (backbone-only)
SegformerForImageClassification (backbone + classifier head)
SegformerForSemanticSegmentation (backbone + semantic segmentation all-MLP head)

Models are on the hub (with approval from the author): https://huggingface.co/models?other=segformer

Here's how to use the semantic segmentation model:

from transformers import SegformerFeatureExtractor, SegformerForSemanticSegmentation
from PIL import Image

feature_extractor = SegformerFeatureExtractor(do_random_crop=False)
model = SegformerForSemanticSegmentation.from_pretrained("nvidia/segformer-b0-finetuned-ade-512-512")

image = Image.open("...")

# prepare image for model
pixel_values = feature_extractor(image, return_tensors="pt").pixel_values

# forward pass
outputs = model(pixel_values)

# logits are of shape (batch_size, num_labels, height/4, width/4)
logits = outputs.logits

Quick inference notebook with visualization: https://colab.research.google.com/drive/1Kc1VLuFrWUPz0rZXA2E_rKQqdK7kV2iH?usp=sharing

To do/questions

Decide on the default values of the feature extractor (which are kind of arbitrary right now)
I've called the decode head SegformerDecodeHead, rather than SegformerDecoder. It's more of a lightweight head, than a decoder. Is this ok?
Add padding of images + segmentation maps (probably a single function in image_utils.py), cc @sgugger. Currently, I rely on torch.nn.functional.pad, which makes the feature extractor depend on PyTorch. It could also make sense to do it in Numpy (this model for example pads after normalizing, so it would benefit from it as the output after normalization are Numpy arrays).
Make sure model doesn't return hidden states when the user doesn't want to
Model currently returns a SequenceClassifierOutput, however this will render wrong shapes of logits in the docs. Logits are actually of shape (batch_size, num_labels, height/4, width/4).
Add model cards (author has joined the NVIDIA org on the hub and might create these)

NielsRogge · 2021-10-18T12:06:01Z

PR is ready for review, only thing to be added is padding.

sgugger

Mostly left nits, this is very clean! Great new addition!

README.md

src/transformers/image_utils.py

src/transformers/models/segformer/configuration_segformer.py

sgugger · 2021-10-18T13:32:33Z

src/transformers/models/segformer/feature_extraction_segformer.py

+
+    def pad_images(self, images):
+        """Pad images to ``self.crop_size``."""
+        padded_images = nn.functional.pad(images, pad=self.crop_size, value=self.padding_value)


I think for this, it would useful to have our own pad function that uses PyTorch if images is a torch Tensor and NumPy if images is a NumPy array. Wdyt?

Yes, if you could implement that, that would be great :)

Happy to, but I won't have time to do this this week however (not even sure about next week either).

There's numpy.pad: https://numpy.org/doc/stable/reference/generated/numpy.pad.html that would make this implementation quite simple. Can you give it a try @NielsRogge ?

src/transformers/models/segformer/modeling_segformer.py

tests/test_modeling_segformer.py

LysandreJik

Thank you for working on this Niels! You'll have to rebase on the current master branch and re-run make fix-copies to ensure that the Korean readme also gets updated.

This looks good to me, I have only left nits and one request regarding the padding method.

LysandreJik · 2021-10-26T17:28:21Z

src/transformers/models/segformer/feature_extraction_segformer.py

+
+    def pad_images(self, images):
+        """Pad images to ``self.crop_size``."""
+        padded_images = nn.functional.pad(images, pad=self.crop_size, value=self.padding_value)


There's numpy.pad: https://numpy.org/doc/stable/reference/generated/numpy.pad.html that would make this implementation quite simple. Can you give it a try @NielsRogge ?

LysandreJik · 2021-10-26T17:30:28Z

src/transformers/models/segformer/modeling_segformer.py

+
+    def forward(self, hidden_states, height, width, output_attentions=False):
+        self_attention_outputs = self.attention(
+            self.layer_norm_1(hidden_states),  # in Segformer, layernorm is applied before self-attention


Can this be done outside of the call for readability?

src/transformers/models/segformer/modeling_segformer.py

…process methods

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

NielsRogge mentioned this pull request Oct 15, 2021

Porting SegFormer to HuggingFace Transformers NVlabs/SegFormer#20

Open

NielsRogge force-pushed the modeling_segformer_v3 branch from 3d5f203 to fcfd28d Compare October 18, 2021 11:29

NielsRogge requested review from sgugger and LysandreJik October 18, 2021 12:05

sgugger approved these changes Oct 18, 2021

View reviewed changes

NielsRogge force-pushed the modeling_segformer_v3 branch from 269eee9 to b3bf5e1 Compare October 19, 2021 15:16

LysandreJik approved these changes Oct 26, 2021

View reviewed changes

NielsRogge and others added 22 commits October 27, 2021 10:27

First draft

d74df80

Make style & quality

48aa617

Improve conversion script

af0a87e

Add print statement to see actual slice

e2d6124

Make absolute tolerance smaller

9dd6b7d

Fix image classification models

63e7a1c

Add post_process_semantic method

2dbf48b

Disable padding

46a65b4

Improve conversion script

21e13c8

Rename to ForSemanticSegmentation, add integration test, remove post_…

39bc801

…process methods

Improve docs

0212e25

Fix code quality

b33dae9

Fix feature extractor tests

52d0f30

Fix tests for image classification model

3b1d0a6

Delete file

104e01a

Add is_torch_available to feature extractor

6f23364

Improve documentation of feature extractor methods

dda55ae

Apply suggestions from @sgugger's code review

4dae590

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Apply some more suggestions of code review

904c963

Rebase with master

0f3e1c9

Fix rebase issues

b077fb1

Make sure model only outputs hidden states when the user wants to

583fd90

Apply suggestions from code review

3d372c7

NielsRogge force-pushed the modeling_segformer_v3 branch from a43c716 to 3d372c7 Compare October 27, 2021 08:35

NielsRogge added 9 commits October 27, 2021 13:49

Add pad method

d50934f

Support padding of 2d images

5242461

Add print statement

1377c29

Add print statement

ec1e452

Move padding method to SegformerFeatureExtractor

130ea6d

Fix issue

e454b1d

Add casting of segmentation maps

f036968

Add test for padding

9e2f0f7

Add small note about padding

ba58bca

LysandreJik merged commit 1dc96a7 into huggingface:master Oct 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SegFormer #14019

Add SegFormer #14019

NielsRogge commented Oct 15, 2021 •

edited

Loading

NielsRogge commented Oct 18, 2021

sgugger left a comment

sgugger Oct 18, 2021

NielsRogge Oct 19, 2021

sgugger Oct 19, 2021

LysandreJik Oct 26, 2021

LysandreJik left a comment

LysandreJik Oct 26, 2021

LysandreJik Oct 26, 2021

Add SegFormer #14019

Add SegFormer #14019

Conversation

NielsRogge commented Oct 15, 2021 • edited Loading

What does this PR do?

To do/questions

NielsRogge commented Oct 18, 2021

sgugger left a comment

Choose a reason for hiding this comment

sgugger Oct 18, 2021

Choose a reason for hiding this comment

NielsRogge Oct 19, 2021

Choose a reason for hiding this comment

sgugger Oct 19, 2021

Choose a reason for hiding this comment

LysandreJik Oct 26, 2021

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

LysandreJik Oct 26, 2021

Choose a reason for hiding this comment

LysandreJik Oct 26, 2021

Choose a reason for hiding this comment

NielsRogge commented Oct 15, 2021 •

edited

Loading