Skip to content
This repository has been archived by the owner on Nov 3, 2022. It is now read-only.

Selection of features staying in v1.0, please comment if you like a feature already in keras-contrib! #415

Closed
gabrieldemarmiesse opened this issue Feb 3, 2019 · 27 comments

Comments

@gabrieldemarmiesse
Copy link
Contributor

gabrieldemarmiesse commented Feb 3, 2019

Given our manpower (spoiler alert, it's not much), we need to choose carefully how we select features which will stay in the repo.

Given that this repo will host a lot of features (hopefully more than in keras!) the maintainers cannot do everything. Which is why each layer/optimizer/callback/...etc should be assigned to a contributor or a maintainer who will commit to answer issues and review pull requests about it. The assignee will also commit to have proper unit tests for his/her feature and proper docstrings.

The maintainers from the keras team will be in charge of the general repo health and coding standards.

About the things that I will commit to maintain myself:

  • The Swish activation
  • The TensorBoardGrouped callback
  • The GroupNormalization layer

If you care about one of the features which is in this repo and think that you can take care of issues, PRs, unit tests and docstrings related to it, then feel free to drop a message here.

All features without someone assigned to it will be removed in v1.0.

@gabrieldemarmiesse gabrieldemarmiesse changed the title Selection of features which are going to stay in v1.0 Selection of features staying in v1.0, please comment if you like a feature already in keras-contrib! Feb 3, 2019
@gabrieldemarmiesse gabrieldemarmiesse added this to the 1.0 milestone Feb 3, 2019
@gabrieldemarmiesse
Copy link
Contributor Author

The Yogi optimizer will be owned by @MarcoAndreaBuchmann if you're still ok with it.
@MFreidank do you accept to be the owner of the Padam code?
@lzfelix do you accept to be the owner of the CRF code?

@gabrieldemarmiesse gabrieldemarmiesse pinned this issue Feb 3, 2019
@MarcoAndreaBuchmann
Copy link
Contributor

@gabrieldemarmiesse : Yes that's fine with me.

@gabrieldemarmiesse
Copy link
Contributor Author

Thanks @MarcoAndreaBuchmann !

@titu1994
Copy link
Contributor

titu1994 commented Feb 3, 2019

@gabrieldemarmiesse Most of the keras-contrib applications module, which hosts DenseNet, DenseNet-FCN, NASNet, Residual-of-Residual Network (which imo should be scrapped, its almost never used), and Wide ResNet module, were added by me, so I could take charge of maintaining them.

ResNets on the other hand, should be charged to their original author.

In addition, my other contributions are:

  • callbacks/snapshot: SnapshotModelCheckpoint and SnapshotCallbackBuilder (though for some reason it doesnt show my authorship, but its a port from my implementation in my personal repositories)
  • layers/convolution: SubPixelUpscaling
  • layers/normalization: BatchRenormalization and GroupNormalization. However, I recommend BatchRenormalization be fixed / removed, as it has signficant issues (and doesn't work with TF last I checked, and tf.keras.layers.BatchNormalization already supports renormalization in it). I would prefer if you would take charge of GroupNormalization.

@gabrieldemarmiesse
Copy link
Contributor Author

@titu1994 , could you make PRs to remove what you think is not needed anymore? In each PR description, you can put why we'll remove it and if there is a workaround, which it is.

Thanks a lot for the help!

@titu1994
Copy link
Contributor

titu1994 commented Feb 3, 2019

@gabrieldemarmiesse Sure, but the only one I recommend removing is the broken BatchRenormalization.

It was an issue at the time of implementation, and it stands to this date. BatchRenormalization's internal counter is updated per batch, but the original paper states that its internal counter needs to be updated per epoch. There was no reasonable hook to obtain epoch index inside a Layer, and many alternatives were discussed, but no overall correct one was found.

In the end, at the time, I decided to simply keep the counter at a batch level as evaluations on CIFAR 10 showed no significant differences as compared to the paper's own plots at small batchsizes.

Another issue is that it has been marked as incompatible with TF. I have not tested it out on TF, but the code was written using the standard Keras Backend, so I don't understand the underlying issue as to why it was marked as incompatible.

To address both issues, I personally think it is better to remove it in it's entirety, as any user can simply use the layer without being aware of the incorrectness of the implementation and therefore obtain incorrect results after expending significant resources.

@gabrieldemarmiesse
Copy link
Contributor Author

@titu1994 do you think we should remove some models too? It seems several of them are already in keras.applications.

@titu1994
Copy link
Contributor

titu1994 commented Feb 3, 2019

@gabrieldemarmiesse I don't think we should. Primary reason being that Keras versions are built to be used for inference only. They are pre-built models which have weights available and need not be trained from scratch.

However, the ones here are built to be completely configurable as well as trainable from scratch - including regularization which is explicitly dropped in Keras Applications (either l2 regularization or Dropout in the case of VGG).

Edit: Im not suggesting the Keras applications can't be used for fine-tuning, where normally one would keep the weights of the original model frozen in the first round of finetuning, and then unfreeze and train again with very low learning rate.

However, if one tries to train a Keras Application NASNet from scratch without proper regularization, they will probably overfit fast. I personally have seen it happen for NASNet and MobileNet frequently, so it is important to keep the contrib versions as well.

@gabrieldemarmiesse
Copy link
Contributor Author

gabrieldemarmiesse commented Feb 3, 2019

I see. So none of them come with pre-trained weigths?

It's fine by me to keep them but they all should have tests running on travis at every commit.

@titu1994
Copy link
Contributor

titu1994 commented Feb 3, 2019

Certain Contrib Application models do come with pre-trained weights.

  • DenseNet 121, 161, 169 weights were ported for ImageNet

  • NASNet Large and NASNet Mobile weights were ported for ImageNet

  • ROR (which is being removed) has CIFAR 10 weights trained by me

  • WideResNet 28-8 has weights trained on CIFAR 10 (trained by me).

  • ResNets: I don't recall if weights are available off hand.

As to tests, when they were built, we had no tests for applications in general. It was after Keras 2 I believe that we had solid shape tests for applications (or maybe before and I wasn't aware).

In any case, once I submit the PR for removing renorm and ROR, I'll get started on the remaining application tests.

@MFreidank
Copy link
Contributor

@gabrieldemarmiesse Yes, sure, I am fine with being the owner of Padam.

@gabrieldemarmiesse
Copy link
Contributor Author

Thanks @MFreidank !

@gabrieldemarmiesse
Copy link
Contributor Author

In any case, once I submit the PR for removing renorm and ROR, I'll get started on the remaining application tests.

@titu1994 Thanks a lot for that!

@lzfelix
Copy link
Contributor

lzfelix commented Feb 3, 2019

@gabrieldemarmiesse, as we talked previously, I did not write the code for the CRF layer, just a few changes to persist and restore models using it. IMHO the code is very convoluted and hard to read. What I can do is try to help people who want to use this layer and add/update some docstrings if necessary.

@gabrieldemarmiesse
Copy link
Contributor Author

@lzfelix sure, thanks for the help. We'll just have to find a solution for CRF, because it seems that it's popular, but we don't have anyone to maintain it.

@titu1994
Copy link
Contributor

titu1994 commented Feb 3, 2019

@gabrieldemarmiesse Speaking of tests for applications, the keras_applications codebase tests are quite extensive and dependent on the interaction between keras and keras_application. I do not believe that this should be replicated in contrib.

A scenario can occur that a user imports keras and keras contrib (which is what we want), and both keras and keras_contrib try to inject keras_applications in them. This is highly redundant.

On top of that, keras_application tests whether the classifier can accurately predict the class of a test image of an elephant. While we can do something similar for DenseNet and NASNet, ResNets and WideResNets do not have weights trained on ImageNet.

What would be a reasonable middle ground? Skip the elephant classification check for those two? Not have the test for any of the models?

@gabrieldemarmiesse
Copy link
Contributor Author

We should test that we are able to train on a single batch. Keras is going to run automatically shape checks which is what we want. Let's not test the weights, it's not feasable in travis anyway (it takes too long).

@titu1994
Copy link
Contributor

titu1994 commented Feb 3, 2019

@gabrieldemarmiesse Hmm if shape tests are all we need, then no need to train on any data at all. Back prop doubles the amount of memory needed, and for enormous models like NASNet and DenseNet, it isnt feasible to train a model on Travis on CPU.

@gabrieldemarmiesse
Copy link
Contributor Author

True. Let's just run a single prediction on random pixels.

@titu1994
Copy link
Contributor

titu1994 commented Feb 3, 2019

Yes that should be fast enough (hopefully)

@lzfelix
Copy link
Contributor

lzfelix commented Feb 3, 2019

This latter solution seems fine.
@gabrieldemarmiesse, regarding the CRF layer, hopefully we can invite onboard some future contributor that makes a reasonable PR to this layer.

@wilderrodrigues
Copy link
Contributor

Hi,

I commit to maintain and answer questions regarding the SineReLu activation function.

Cheers,
Wilder

@gabrieldemarmiesse
Copy link
Contributor Author

Thanks @wilderrodrigues !

@gabrieldemarmiesse
Copy link
Contributor Author

@titu1994 Could you add yourself to the codeowners when you have the time? Thanks!

@SriRangaTarun
Copy link
Contributor

@gabrieldemarmiesse Just to confirm, I'll be in charge of the Capsule layer and squash activation.

@gabrieldemarmiesse
Copy link
Contributor Author

@SriRangaTarun you're already in the CODEOWNERS so no problem here.

@gabrieldemarmiesse gabrieldemarmiesse unpinned this issue Sep 21, 2019
@gabrieldemarmiesse
Copy link
Contributor Author

Hello everyone, to give you guys a better experience overall, we'll move features who have a maintainer to tensorflow/addons, see the announcement here: #519

As such, I'll close this issue. We won't have a 1.0 of keras-contrib, but we'll be better off with tensorflow/addons (they have a pypi package and all!).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants