Separating MBConvBlock, FusedMBConvBlock and Refactoring EfficientNetV2 #1146

DavidLandup0 · 2022-12-15T07:56:47Z

What does this PR do?

There's a design issue that's not aligned with KCV's principles. MBConv blocks, as well as well as Fused MBConv blocks are general-purpose blocks, introduced with MobileNets and then re-used in practically all mobile-oriented/efficient-desired architectures since. Currently, we only have them as part of EfficientNets, in which they're proprietary methods.

If we don't separate them, we'll accumulate tech debt that'll be only harder to recover from down the line.

Since MaxViT (currently WIP), MobileNets and various other future architectures will use them (and we'd want to expose these to the user as well as a common building block), I propose that we separate them into standalone layers, which is this PR.

I've refactored EfficientNetV2 to use layers instead of methods and they're identical in param counts, as validation that they behave the same. One issue to figure out is:

Original effnets list each layer within each MBConvBlock
New MBConvBlock layer encapsulates them within a single MBConvBlock

Because of this, the total number of layers differs (151 vs 26). We can:

Expose all layers within the MBConvBlock and FusedMBConvBlock and add them to the EffNet while it's being built (so as to be able to load weights as they are right now)
Keep layers encapsulated and port our own weights (or re-train effnets)

Here's the difference:

input_3 (InputLayer)        [(None, 224, 224, 3)]     0         
                                                                 
 rescaling_2 (Rescaling)     (None, 224, 224, 3)       0         
                                                                 
 stem_conv (Conv2D)          (None, 112, 112, 32)      864       
                                                                 
 stem_bn (BatchNormalization  (None, 112, 112, 32)     128       
 )                                                               
                                                                 
 stem_activation (Activation  (None, 112, 112, 32)     0         
 )                                                               
                                                                 
 block1a_ (FusedMBConvBlock)  (None, 112, 112, 16)     576       
                                                                 
 block2a_ (FusedMBConvBlock)  (None, 112, 112, 32)     3456      
...

VS:

input_1 (InputLayer)           [(None, 224, 224, 3  0           []                               
                                )]                                                                
                                                                                                  
 rescaling (Rescaling)          (None, 224, 224, 3)  0           ['input_1[0][0]']                
                                                                                                  
 stem_conv (Conv2D)             (None, 112, 112, 32  864         ['rescaling[0][0]']              
                                )                                                                 
                                                                                                  
 stem_bn (BatchNormalization)   (None, 112, 112, 32  128         ['stem_conv[0][0]']              
                                )                                                                 
                                                                                                  
 stem_activation (Activation)   (None, 112, 112, 32  0           ['stem_bn[0][0]']                
                                )                                                                 
# NOW PART OF FusedMBConvBlock                                      
 block1a_project_conv (Conv2D)  (None, 112, 112, 16  4608        ['stem_activation[0][0]']        
                                )                                                                 
                                                                                                  
 block1a_project_bn (BatchNorma  (None, 112, 112, 16  64         ['block1a_project_conv[0][0]']   
 lization)                      )                                                                 
                                                                                                  
 block1a_project_activation (Ac  (None, 112, 112, 16  0          ['block1a_project_bn[0][0]']

@LukeWood @tanzhenyu @ianstenbit @bhack

tanzhenyu

Thanks for the PR!
Now that they will become public API, please add:

unit test files, regular case + corner case
serialization test under serialization_test.py
better documentation for each layer, i.e., what's the paper reference, what does it do, what inputs / outputs shape does it require, what is the python example usage, etc

DavidLandup0 · 2022-12-15T15:38:22Z

Deal! Wanted to check whether this was an okay change to begin with :)
Docs and tests coming up tomorrow

tanzhenyu · 2022-12-15T15:46:32Z

We can still easily remap some of the layers weights when the total number of layer weights changes, I guess?
i.e., new_model.new_layer.mb_conv_1.set_weights(original_model.mb_conv_1.get_weights()

DavidLandup0 · 2022-12-15T15:50:26Z

Yeah, I think that's the best approach (what I meant by porting our own weights). No need to re-train since it's identical, just wrapped. Has a nicer summary call now too :)

DavidLandup0 · 2022-12-15T15:59:10Z

Should I do the remapping or should someone here do it instead, since you can directly upload the new weights and verify on the same eval set as before?
I'd have to test on V2 and open a PR which unnecessarily slows the process down

ianstenbit · 2022-12-15T16:40:35Z

Should I do the remapping or should someone here do it instead, since you can directly upload the new weights and verify on the same eval set as before? I'd have to test on V2 and open a PR which unnecessarily slows the process down

I will re-map the weights today and upload the new ones to GCS

keras_cv/layers/fusedmbconv.py

ianstenbit · 2022-12-15T23:52:18Z

I've re-mapped the weights to the new implementation (with changes I made to your PR based on my comments here). You can see my updates version of your new ConvBlock+FusedMBConvBlocks here: https://github.com/ianstenbit/keras-cv/tree/mbconv

Feel free to copy those changes in. The only change I commented here and didn't update in my branch is the unused depthwise conv filter in the Fused block.

My script to convert the weights can be found here, and I've uploaded the weights to GCS. Once this PR is merged, I will update our pre-trained weights config to point to the converted weights.

ianstenbit · 2022-12-16T00:12:04Z

PR to update our pretrained weights after this is merged: #1151

DavidLandup0 · 2022-12-16T02:40:05Z

Thanks for the changes and porting Ian!
My bad for not catching them myself. I didn't want to fiddle with the arguments since they come from the current EffNetV2s, so I a priori assumed that they're okay.

I'll copy the changes and attribute them to you and update this PR. Excited to see these disentangled, making EffNets more modular! We should do a sweep like this from time to time to reasses whether architectures are modular enough. :)

keras_cv/layers/mbconv.py

tanzhenyu · 2022-12-19T17:27:30Z

All tests pass locally and the PR doesn't change anything regarding CutMix and BaseAugmenter. Could you test this locally as well @tanzhenyu?

I don't think this PR breaks any CutMix related test, is there any way you can re-sync and run it again?

DavidLandup0 · 2022-12-20T03:39:09Z

Merged the latest changes, tried a new PR, ran tests locally again - the CI is still showing failed tests on some preprocessing layers. No clue why 🤔

tanzhenyu · 2022-12-20T04:20:09Z

Merged the latest changes, tried a new PR, ran tests locally again - the CI is still showing failed tests on some preprocessing layers. No clue why 🤔

Any luck with the new PR 1168?

DavidLandup0 · 2022-12-20T04:29:06Z

Nope, same issue...
Any ideas on why this might be happening?

bhack · 2022-12-20T10:45:58Z

Merged the latest changes, tried a new PR, ran tests locally again - the CI is still showing failed tests on some preprocessing layers. No clue why thinking

@DavidLandup0 You can reproduce the same issue/failures locally with the full test sequence:

pytest keras_cv/ --ignore keras_cv/models --ignore keras_cv/datasets/waymo

DavidLandup0 · 2022-12-20T12:20:52Z

@DavidLandup0 You can reproduce the same issue/failures locally with the full test sequence:
pytest keras_cv/ --ignore keras_cv/models --ignore keras_cv/datasets/waymo

Hmm, the individual test runs:

keras_cv/layers/preprocessing/base_image_augmentation_layer_test.py .............s

But it fails when done with the rest of the tests. I'll create a fresh branch and make the changes there...

Edit: I identified what files cause it to fail. the mbconv_test.py and fusedmbconv_test.py, existing, somehow makes other preprocessing tests fail, when run with the entire sequence, but not individually (individually, they all run fine). Looking further into it

DavidLandup0 · 2022-12-20T13:47:24Z

Found it. When calling:

output = layer(inputs)

It has to be:

output = layer(inputs, training=True)

Otherwise, the tests pass individually, but don't pass in the suite, presumably because some state might be reused between unit tests. When run for a single file, the state is fine because it's localized and can't affect other files. When run in the suite, the change in the state triggers a failure in other tests.

If this is true, we might want to add a before clause in the unit tests to refresh whatever caused this? @LukeWood @bhack @tanzhenyu

P.S. Might be worth adding this in the contributor guidelines if it's unfixable for whatever reason, for contributors to know upfront, because it's a very silent failure across 150 tests caused by one keyword in a fully separate test. 🤔

tanzhenyu · 2022-12-20T15:59:39Z

Found it. When calling:
output = layer(inputs)
It has to be:
output = layer(inputs, training=True)
Otherwise, the tests pass individually, but don't pass in the suite, presumably because some state might be reused between unit tests. When run for a single file, the state is fine because it's localized and can't affect other files. When run in the suite, the change in the state triggers a failure in other tests.

If this is true, we might want to add a before clause in the unit tests to refresh whatever caused this? @LukeWood @bhack @tanzhenyu

P.S. Might be worth adding this in the contributor guidelines if it's unfixable for whatever reason, for contributors to know upfront, because it's a very silent failure across 150 tests caused by one keyword in a fully separate test. 🤔

I suspect this might be caused by a bug. The preprocessing layer uses training=True by default, whereas the regular layers uses training=None by default.
Other preprocessing layers don't use this argument in test: https://github.com/keras-team/keras-cv/blob/master/keras_cv/layers/preprocessing/cut_mix_test.py
Other regular layers don't use this argument in test: https://github.com/keras-team/keras-cv/blob/master/keras_cv/layers/object_detection/non_max_suppression_test.py

DavidLandup0 · 2022-12-20T16:05:44Z

When was training set to True by default for the KPLs?
A few months back, I remember having an issue where they'd be True for the first execution, but False for all subsequent ones, so when visualizing training batches, the augmentations would be off during training, and I had to manually set training as True during batch visualization to avoid this.

It was tricky to reproduce, and only happened some times on Google Colab (but not locally) so I never opened an issue, and it eventually stopped. Might be tied to this somehow?

tanzhenyu · 2022-12-20T16:18:02Z

When was training set to True by default for the KPLs? A few months back, I remember having an issue where they'd be True for the first execution, but False for all subsequent ones, so when visualizing training batches, the augmentations would be off during training, and I had to manually set training as True during batch visualization to avoid this.

It was tricky to reproduce, and only happened some times on Google Colab (but not locally) so I never opened an issue, and it eventually stopped. Might be tied to this somehow?

https://github.com/keras-team/keras-cv/blob/master/keras_cv/layers/preprocessing/base_image_augmentation_layer.py#L362

tanzhenyu · 2022-12-20T16:28:28Z

When was training set to True by default for the KPLs? A few months back, I remember having an issue where they'd be True for the first execution, but False for all subsequent ones, so when visualizing training batches, the augmentations would be off during training, and I had to manually set training as True during batch visualization to avoid this.

It was tricky to reproduce, and only happened some times on Google Colab (but not locally) so I never opened an issue, and it eventually stopped. Might be tied to this somehow?

Can you try something like this? (instead of training=True)
https://github.com/keras-team/keras-cv/blob/master/keras_cv/models/object_detection/retina_net/retina_net_test.py#L27

bhack · 2022-12-20T16:34:19Z

I think It is better to introduce a pytest fixture

https://github.com/BenWhetton/keras-surgeon/blob/master/tests/test_surgeon.py#L31-L38

DavidLandup0 · 2022-12-21T06:38:36Z

Added fixture to clean the session - good call 👍
@bhack @tanzhenyu

tanzhenyu · 2022-12-21T16:26:43Z

/gcbrun

tanzhenyu · 2022-12-21T16:33:53Z

Consider adding this cmd to the contribution guideline :-)

bhack · 2022-12-21T16:39:38Z

@tanzhenyu Can we find a more general placement? Is this still used in keras keras-team/keras#11170?

bhack · 2022-12-21T16:44:44Z

Do we have switched over teardown func https://github.com/keras-team/keras/blob/b8c48898a3f316dda43231a39c21fc4a61aed3b1/keras/layers/preprocessing/integer_lookup_test.py#L614-L617 ?

tanzhenyu · 2022-12-21T16:50:53Z

The failing gcp test is due to custom ops build issue. Manually merging

tanzhenyu · 2022-12-21T16:53:07Z

@tanzhenyu Can we find a more general placement? Is this still used in keras keras-team/keras#11170?

Looks like a good thing to add to our codebase

…V2 (keras-team#1146) * initial port * fusedmb and mb * update * refactored * fixed args * formatting * removed import * Fix kernel size + filter count issues * strides * docs for mbconv * fixed SE block, updated docs * serialization test * added basic tests * fixed test * renamed test case * fixed tests * fixtures Co-authored-by: ianjjohnson <3072903+ianstenbit@users.noreply.github.com>

…eam#1151)

DavidLandup0 added 7 commits December 15, 2022 15:10

initial port

56daadb

fusedmb and mb

28a18c2

update

34f6a9c

refactored

9b3b998

fixed args

511af5a

formatting

00c6122

removed import

52810b6

DavidLandup0 mentioned this pull request Dec 15, 2022

Add MaxViT model #912

Closed

tanzhenyu reviewed Dec 15, 2022

View reviewed changes

ianstenbit reviewed Dec 15, 2022

View reviewed changes