Bugfix for MNASNet #1224

1e100 · 2019-08-10T10:47:01Z

The original implementation I submitted contained a bug which affects all MNASNet variants other than 1.0. The bug is that the first few layers need to also be scaled in terms of width multiplier, along with all the rest. This fixes the issue, and brings the implementation fully in sync with Google's TPU reference code. I have compared the ONNX dump of this model against TFLite's hosted model and ensured that all layer configurations line up exactly.

Because only MNASNet 0.5 checkpoint was affected, I have also trained a slightly better checkpoint for it. I was unable to train this to the same accuracy with Torchvision's reference training code (and it wasn't for the lack of trying), and had to use label smoothing and EMA to get this result. The final checkpoint is derived from EMA.

Even so, the accuracy is a bit lower than Google's result (67.83 for this model vs 68.03 TFLite). The posted number for TF TPU implementation is 68.9, but that model uses SE on some of its layers, which my implementation does not.

…the batch dimension

1e100 · 2019-08-12T10:31:32Z

Sure, we can make it backward compatible. Of the two proposed solutions I like solution 1 better (fewer moving parts), but I'll implement whatever we decide here.

Maybe I should take this opportunity and add squeeze and excitation as well. Authors use it in some of the larger variants of the model. Accuracy will go up a bit if I do that, and then we won't have to version it again in the future.

Is there a release schedule BTW? I wanted this fix to be ready before the release (models unfortunately take forever to train), and noticed the release had been cut a few days ago, so the PR didn't make it.

fmassa · 2019-08-28T08:58:37Z

Hi @1e100 ,

I just got back from holidays, sorry for the delay in replying.

Sure, we can make it backward compatible. Of the two proposed solutions I like solution 1 better (fewer moving parts), but I'll implement whatever we decide here.

I'm not yet clear on the best solution. I feel that this is something that needs to be carefully considered, because model versioning is going to be a big topic.

Maybe I should take this opportunity and add squeeze and excitation as well.

I feel that this should be sent in a separate PR.

Is there a release schedule BTW?

We will be cutting a new release of torchvision in the next 2-3 weeks, with minor fixes and improvements.

I'm also tagging @ailzhang for handling BC-breaking changes within hub, and @cpuhrsch @vincentqb and @zhangguanheng66 for torchaudio and torchtext model versioning in the future.

soumith · 2019-08-28T10:55:17Z

using _version counter like BatchNorm makes a lot of sense to me.

zhangguanheng66 · 2019-08-28T14:23:58Z

I had to handle the BC breaking for nn.MultiheadAttention. To extend the capability of the module, I add four extra attributes in the module. We used the hasattr function to check the existing of the attribute, and it seems fine. Users are care-free. _version counter is another way and more common, I believe.

For the second option, we have to instruct users to use _load_from_state_dict, which is not hard but needs to be clear in the docs.

1e100 · 2019-09-07T07:18:15Z

OK, had some time today to look into this, aiming to get it done over the weekend.

Basically, it seems that the following simple logic would satisfy the backward compat requirements:

Add _version == 2 field
When model is loaded using _load_from_state_dict(), if dict contains version 1 (default), modify the network definition such that the stem is configured as it was configured in the initial version and perhaps show a warning as in MultiHeadAttention, then set MNASNet._version = 1 turning the model into a de-facto previous version model. If version == 2 is found, do nothing and simply load the new checkpoint.

Seems like a pretty straightforward fix to me.

1e100 · 2019-09-07T09:43:15Z

OK, @fmassa, here's the first cut of the requested changes. CI seems to be failing on something CUDA-related. Let me know if this is what you had in mind!

…t could be reloaded with BC patching again

fmassa

This is looking very good, I like it!

I have a few more comments, let me know what you think.

Also, I am thinking that we might need to still add an option somewhere (maybe in the mnasnet_0_5 function), something that initially raises a warning if the user doesn't pass an argument `, saying that the default behavior will change in a new version, so that we don't break BC right away for the users?

fmassa · 2019-09-09T13:19:03Z

torchvision/models/mnasnet.py

+                self.layers[idx] = layer
+
+            # The model is now identical to v1, and must be saved as such.
+            MNASNet._version = 1


This modifies all instances of MNASNet, and not the one being called.
This could have some unexpected effects, maybe you meant to do instead something like self._version?

D'oh! You're right. Changed, and verified it works with this code:

#!/usr/bin/env python3 import torch import torchvision # NOTE: v1 checkpoint ckpt = torch.load("mnasnet0.5_top1_67.592-7c6cb539b9.pth") m = torchvision.models.MNASNet(0.5) m.load_state_dict(ckpt) print("Loaded old") torch.save(m.state_dict(), "resaved.pth") print("Re-saved") ckpt = torch.load("resaved.pth") m = torchvision.models.MNASNet(0.5) m.load_state_dict(ckpt) print("Re-loaded")

fmassa · 2019-09-09T13:19:41Z

torchvision/models/mnasnet.py

@@ -139,16 +149,58 @@ def _initialize_weights(self):
                nn.init.ones_(m.weight)
                nn.init.zeros_(m.bias)
            elif isinstance(m, nn.Linear):
-                nn.init.normal_(m.weight, 0.01)
+                nn.init.kaiming_uniform_(m.weight, mode="fan_out",


This changes the initialization scheme, does this yield better performance?

It may have very slightly improved the top1 on MNASNet b1 0.5 that I trained for this PR, but I'm not sure. The purpose of the change is that initialization is now identical to the reference TensorFlow code (which also uses a variance scaling initializer aka Kaiming uniform). Certainly not worse than before.

torchvision/models/mnasnet.py

1e100 · 2019-09-14T04:08:53Z

@fmassa I've addressed your feedback, PTAL

fmassa · 2019-09-19T20:52:20Z

Sorry for the delay in replying, I've made a few more comments

Remove unused member var as per review.

1e100

I've addressed the feedback, PTAL

fmassa

Thanks a lot!

I'll upload the weights and update the PR

fmassa · 2019-09-20T18:40:31Z

@1e100 I couldn't push the updated path without a force push on your master branch.

Can you update the link in the PR to

https://download.pytorch.org/models/mnasnet0.5_top1_67.823-3ffadce67e.pth

And let me know?

1e100 · 2019-09-20T21:55:03Z

@fmassa done!

fmassa · 2019-09-23T13:09:31Z

Thanks a lot!

1e100 added 30 commits April 1, 2019 23:12

Add initial mnasnet impl

50bfbe6

Remove all type hints, comply with PyTorch overall style

e1c5506

Expose models

0d77acc

Remove avgpool from features() and add separately

c41aaab

Merge upstream

d6115f9

Fix python3-only stuff, replace subclasses with functions

568bd50

fix __all__

5617b8e

Fix typo

ba0ad4d

Remove conditional dropout

bd4836b

Merge branch 'master' of github.com:1e100/vision

5ac43bd

Make dropout functional

102ba55

Addressing @fmassa's feedback, round 1

9c8b827

Replaced adaptive avgpool with mean on H and W to prevent collapsing …

2872b1f

…the batch dimension

Partially address feedback

05b387b

YAPF

2d39797

Removed redundant class vars

8b5f7b9

Merge master

8de71fe

Update urls to releases

40471ac

Add information to models.rst

b1d54ec

Replace init with kaiming_normal_ in fan-out mode

ec717d0

Use load_state_dict_from_url

8b2dba9

Merge master

06177ee

Merge master again

c34df87

Merge branch 'master' of https://github.com/pytorch/vision

8b538ae

Fix depth scaling on first 2 layers

7be7478

Restore initialization

e996c36

Match reference implementation initialization for dense layer

1fc9c76

Meant to use Kaiming

e5164e3

Remove spurious relu

f5c9a17

Point to the newest 0.5 checkpoint

1b7808e

fmassa mentioned this pull request Aug 28, 2019

[FYI] Bug in R2+1D implementation #1265

Open

1e100 added 5 commits September 7, 2019 02:02

Implement backwards compat as suggested by Soumith

c611d0d

Update checkpoint URL

ed89aac

Move warnings up

36fa9fa

Record a couple more function parameters

3ceed68

Update comment

b9e60c2

1e100 added 2 commits September 7, 2019 02:52

Set the correct version such that if the BC-patched model is saved, i…

2c8ccbc

…t could be reloaded with BC patching again

Merge branch 'master' of github.com:1e100/vision

9165032

fmassa requested changes Sep 11, 2019

View reviewed changes

Set a member var, not class var

061dade

Update mnasnet.py

d0a43c4

Remove unused member var as per review.

1e100 commented Sep 20, 2019

View reviewed changes

fmassa approved these changes Sep 20, 2019

View reviewed changes

Update the path to weights

00ddb9d

fmassa merged commit 367e851 into pytorch:master Sep 23, 2019

xsacha mentioned this pull request Sep 25, 2019

use mobilenetv3 may Get better results? biubug6/Pytorch_Retinaface#3

Open

fmassa mentioned this pull request Oct 22, 2019

[video][r25d model update] fixing #1265 #1432

Closed

2 tasks

ngimel mentioned this pull request Dec 19, 2019

Mnasnet0_5 first layer shape incorrect pytorch/pytorch#31471

Open

datumbox mentioned this pull request Nov 30, 2021

[RFC] Use pretrained=True to load the best available pre-trained weights #5015

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bugfix for MNASNet #1224

Bugfix for MNASNet #1224

1e100 commented Aug 10, 2019 •

edited

Loading

1e100 commented Aug 12, 2019 •

edited

Loading

fmassa commented Aug 28, 2019

soumith commented Aug 28, 2019

zhangguanheng66 commented Aug 28, 2019 •

edited

Loading

1e100 commented Sep 7, 2019 •

edited

Loading

1e100 commented Sep 7, 2019

fmassa left a comment

fmassa Sep 9, 2019

1e100 Sep 14, 2019 •

edited

Loading

fmassa Sep 9, 2019

1e100 Sep 14, 2019

1e100 commented Sep 14, 2019

fmassa commented Sep 19, 2019

1e100 left a comment

fmassa left a comment

fmassa commented Sep 20, 2019

1e100 commented Sep 20, 2019

fmassa commented Sep 23, 2019

Bugfix for MNASNet #1224

Bugfix for MNASNet #1224

Conversation

1e100 commented Aug 10, 2019 • edited Loading

1e100 commented Aug 12, 2019 • edited Loading

fmassa commented Aug 28, 2019

soumith commented Aug 28, 2019

zhangguanheng66 commented Aug 28, 2019 • edited Loading

1e100 commented Sep 7, 2019 • edited Loading

1e100 commented Sep 7, 2019

fmassa left a comment

Choose a reason for hiding this comment

fmassa Sep 9, 2019

Choose a reason for hiding this comment

1e100 Sep 14, 2019 • edited Loading

Choose a reason for hiding this comment

fmassa Sep 9, 2019

Choose a reason for hiding this comment

1e100 Sep 14, 2019

Choose a reason for hiding this comment

1e100 commented Sep 14, 2019

fmassa commented Sep 19, 2019

1e100 left a comment

Choose a reason for hiding this comment

fmassa left a comment

Choose a reason for hiding this comment

fmassa commented Sep 20, 2019

1e100 commented Sep 20, 2019

fmassa commented Sep 23, 2019

1e100 commented Aug 10, 2019 •

edited

Loading

1e100 commented Aug 12, 2019 •

edited

Loading

zhangguanheng66 commented Aug 28, 2019 •

edited

Loading

1e100 commented Sep 7, 2019 •

edited

Loading

1e100 Sep 14, 2019 •

edited

Loading