Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Apple's MobileOne encoder #693

Merged
merged 10 commits into from Dec 15, 2022
Merged

Add Apple's MobileOne encoder #693

merged 10 commits into from Dec 15, 2022

Conversation

kevinpl07
Copy link
Contributor

Hello,

I added support for Apple's MobileOne encoder.

Paper: Link

There were very few changes I had to make to their official github repo: Link

It works with all decoders and has impressive inference time for images with 256x256:

Encoder-Decoder Inference time in vanilla torch
mobileone_s1_pspnet_256 0.0313718318939209
mobileone_s0_pan_256 0.03421592712402344
mobileone_s2_pspnet_256 0.036206960678100586
mobileone_s3_pspnet_256 0.04711484909057617
mobileone_s1_pan_256 0.05329489707946777
mobileone_s0_linknet_256 0.05789995193481445
mobileone_s0_deeplabv3plus_256 0.058853864669799805
mobileone_s0_fpn_256 0.07664108276367188
mobileone_s4_pspnet_256 0.0768282413482666
mobileone_s1_deeplabv3plus_256 0.07886672019958496
mobileone_s2_pan_256 0.07946181297302246
mobileone_s3_pan_256 0.09101414680480957
mobileone_s1_fpn_256 0.09615683555603027
mobileone_s1_linknet_256 0.09956574440002441
mobileone_s2_fpn_256 0.11291790008544922
mobileone_s0_unet_256 0.11676502227783203
mobileone_s2_linknet_256 0.12518310546875
mobileone_s3_deeplabv3plus_256 0.12642478942871094
mobileone_s2_deeplabv3plus_256 0.1289658546447754
mobileone_s3_fpn_256 0.1370537281036377
mobileone_s4_pan_256 0.14015984535217285
mobileone_s1_unet_256 0.15249204635620117
mobileone_s3_linknet_256 0.15824413299560547
mobileone_s4_deeplabv3plus_256 0.16476082801818848
mobileone_s0_manet_256 0.17203474044799805
mobileone_s2_unet_256 0.17334604263305664
mobileone_s4_fpn_256 0.182358980178833
mobileone_s3_unet_256 0.20330286026000977
mobileone_s4_linknet_256 0.21462082862854004
mobileone_s0_deeplabv3_256 0.22992897033691406
mobileone_s4_unet_256 0.24337363243103027
mobileone_s0_unetplusplus_256 0.29451799392700195
mobileone_s1_deeplabv3_256 0.31217503547668457
mobileone_s1_manet_256 0.3140380382537842
mobileone_s1_unetplusplus_256 0.5090749263763428
mobileone_s2_deeplabv3_256 0.5372707843780518
mobileone_s3_deeplabv3_256 0.5489542484283447
mobileone_s2_unetplusplus_256 0.5728631019592285
mobileone_s4_deeplabv3_256 0.638185977935791
mobileone_s2_manet_256 0.6446411609649658
mobileone_s3_manet_256 0.6838269233703613
mobileone_s3_unetplusplus_256 0.6991360187530518
mobileone_s4_manet_256 0.748121976852417
mobileone_s4_unetplusplus_256 0.9898359775543213

@qubvel
Copy link
Owner

qubvel commented Dec 8, 2022

Hi, thanks for your work and contribution!
Could you please correct the code formatting and add information about the encoder to the docs?

@kevinpl07
Copy link
Contributor Author

Hi, thanks for your work and contribution! Could you please correct the code formatting and add information about the encoder to the docs?

Done :)

@JulienMaille
Copy link
Contributor

Thanks for your contribution, I tried it on my side and could not make it work when the input images has only one channel (greyscale images). Is that a known limitation?

@kevinpl07
Copy link
Contributor Author

I honestly didn't check that. Let me investigate.

@kevinpl07
Copy link
Contributor Author

Thanks for your contribution, I tried it on my side and could not make it work when the input images has only one channel (greyscale images). Is that a known limitation?

I looked into the grayscale limitation and could not make it work without making more drastic changes to apple's code (other than passing "in_channel" through all the init functions).
I added the limitation to the readme file.
Me or someone else can revisit this, I don't think the architecture limits the channels. It just needs more thought put into it.

@JulienMaille
Copy link
Contributor

I can try to give it a look, if you already have advices to share it might help.

@JulienMaille
Copy link
Contributor

JulienMaille commented Dec 9, 2022

Model must be inited with 3 channels so that weights can be loaded.
Then utils.patch_first_conv is in charge of updating to the desired number of channels (1 in my case)
This loops through blocks and patches the first conv2D it finds, however in our case it seems we at least need to patch the stage0 rbr_conv

@kevinpl07
Copy link
Contributor Author

Mode must be inited with 3 channels so that weights can be loaded. Then utils.patch_first_conv is in charge of updating to the desired number of channels (1 in my case) This loops through blocks and patches the first conv2D it finds, however in our case it seems we at least need to patch the stage0 rbr_conv

Feel free to send me a snippet or create a PR if it works! 👍

@JulienMaille
Copy link
Contributor

@kevinpl07 what's the purpose of reparameterize?

@JulienMaille
Copy link
Contributor

JulienMaille commented Dec 9, 2022

Monkey patching like this seems to do the trick

    from . import _utils as utils

    def set_in_channels(self, in_channels, pretrained=True):
        """Change first convolution channels"""
        if in_channels == 3:
            return

        self._in_channels = in_channels
        self._out_channels = tuple([in_channels] + list(self._out_channels)[1:])
        utils.patch_first_conv(model=self.stage0.rbr_conv, new_in_channels=in_channels, pretrained=pretrained)
        utils.patch_first_conv(model=self.stage0.rbr_scale, new_in_channels=in_channels, pretrained=pretrained)

@kevinpl07
Copy link
Contributor Author

@kevinpl07 what's the purpose of reparameterize?

Essentially the multi-branch structure is benefitial for training but has drawbacks during inference. The reparameterize function takes the model after training and converts it to plain CNN-like structure for inference. This can be called on the complete segmentation model because it checks whether individual components have a reparameterize function.
See Apple's official repo fo more info.

@JulienMaille
Copy link
Contributor

I'm surprised by the size of the model. I'm used to work with unet-resnet18 (depth 4) and unet-mobileone_s2 (depth 4) is still bigger in size (23Mo vs 14Mo)

@kevinpl07
Copy link
Contributor Author

kevinpl07 commented Dec 12, 2022

I'm surprised by the size of the model. I'm used to work with unet-resnet18 (depth 4) and unet-mobileone_s2 (depth 4) is still bigger in size (23Mo vs 14Mo)

I know what you mean. The thing is that at end of the day, they only optimized for classification inference time on an iPhone 12.
From their paper:
Screenshot 2022-12-12 091228
Further they state:

[...]For example, MobileOne-S1 has 4.8M parameters and incurs a latency of
0.89ms, while MobileNet-V2 [2] has 3.4M (29.2% less than MobileOne-S1) parameters and incurs
a latency of 0.98ms. At this operating point, MobileOne attains 3.9% better top-1 accuracy than
MobileNet-V2.

So essentially:

  • It's optimized for inference speed over n_params
  • it might not even be a good backbone for segmentation (this PR would enable people to try it, there is no other Repo that has mobileone for segmentation)
  • we need experiments :)

Hope I could help a bit.

@JulienMaille
Copy link
Contributor

I got you, but on paper resnet18 (11M) has more parameters than mobileone_s0/1/2/3
It seems to perform well on segmentation, but I only scratched the surface.

@kevinpl07
Copy link
Contributor Author

@JulienMaille Can you check if my last commit is according to your suggestion?

@qubvel can you trigger the workflow again, once Julien approves?

@JulienMaille
Copy link
Contributor

Looks good to me

mod_list.add_module("bn", nn.BatchNorm2d(num_features=self.out_channels))
return mod_list

def set_in_channels(self, in_channels, pretrained=True):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably we should move it to the MobileOne class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, my mistake -> Done.

@qubvel
Copy link
Owner

qubvel commented Dec 14, 2022

@kevinpl07 could you, please, also add information about new encoders to the docs here
https://github.com/qubvel/segmentation_models.pytorch/blob/master/docs/encoders.rst

@kevinpl07
Copy link
Contributor Author

@kevinpl07 could you, please, also add information about new encoders to the docs here https://github.com/qubvel/segmentation_models.pytorch/blob/master/docs/encoders.rst

Done as well :)

@qubvel qubvel merged commit c2fce7b into qubvel:master Dec 15, 2022
@qubvel
Copy link
Owner

qubvel commented Dec 15, 2022

Thanks a lot, merged!

@JulienMaille
Copy link
Contributor

@kevinpl07 I gave it a try, IoU is great but inference time on Cuda is not optimized (tried with OpenCV with Cuda backend)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants