Skip to content
This repository has been archived by the owner on Nov 3, 2022. It is now read-only.

Add ResNet, ResNetV2, and ResNeXt variants #26

Merged
merged 11 commits into from
Nov 18, 2018
Merged

Conversation

taehoonlee
Copy link
Contributor

This PR adds ResNet, ResNetV2, and ResNeXt variants (9 models and 18 weight files). I would appreciate it if you could give me some feedbacks or style guides @fchollet.

@taehoonlee
Copy link
Contributor Author

@fchollet, Due to the recent backend interfacing changes, especially keras_modules_injection, applications must be added first in keras-team/keras in order to add new applications in keras-team/keras-applications.

@daavoo
Copy link

daavoo commented Sep 3, 2018

@taehoonlee Any chance you could share the weights??

@taehoonlee
Copy link
Contributor Author

@daavoo, Sure. You can refer to:

  • *.h5 (Keras style) in this PR (L195-L204 on keras_applications/resnet_common.py),
  • *.npz files (TensorNets style) in TensorNets.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code/style looks good to me! I'm not sure we need add arbitrarily many new models just because we can, though -- we should focus on a small set of models that are significantly differentiated and that will deliver unique value for users. ResNetV2, for instance, seems like a good idea. ResNext101c64, maybe not so much.

'c0ed64b8031c3730f411d2eb4eea35b5'),
'resnet152v2': ('a49b44d1979771252814e80f8ec446f9',
'ed17cf2e0169df9d443503ef94b23b33'),
'resnext50c32': ('67a5b30d522ed92f75a1f16eef299d1a',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical that we need the c32 and c64 versions. Model choices that don't map to real value-add for users should be avoided. More choice is not always a good thing, it confuses users and bloats the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I agree.

@@ -0,0 +1,209 @@
"""ResNet, ResNetV2, and ResNeXt models for Keras.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably you could just merge this with resnet.py.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integrated all the variants into resnet_common.py.

@rohit-gupta
Copy link

@fchollet in the spirit of your remark about significant differentiation, would you say that adding a model trained on Open Images instead of Imagenet meet this threshold ? I have been considering working on training such a model or porting the ResNet provided by Google. I suspect there's quite a lot of value to be extracted in transfer learning tasks using Open Images trained models, given that Imagenet is so full of animal categories.

@taehoonlee
Copy link
Contributor Author

@rohit-gupta, I think that it is worth and significant differentiation to add reference models trained with Open Images. Contributions welcome :) In my humble opinion,

  • Using different datasets without hyper-parameter changes: it is easy and worth to add because we just need to replace weights='imagenet' with weights='openimages', classes=5000 and add information on weight files.
  • Using different datasets with minor hyper-parameter changes (e.g., changing the numbers of blocks in ResNets): if factorization over imagenet and openimages models is easy, it is still worth to add keras-applications.
  • Using different datasets with significantly different hyper-parameters (e.g., changing the block configuration or adding some layers): it is better to consider them as the community contribution and exist outside of keras-applications.

@taehoonlee taehoonlee force-pushed the add_resnet_variants branch 2 times, most recently from 9da7b3e to 6292c70 Compare October 1, 2018 01:32
Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the CL. The new choice of models looks good to me.

# Returns
Preprocessed array.
"""
return imagenet_utils.preprocess_input(x, mode='tf', **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like it is somewhat error-prone that ResNet50, ResNeXt50, and ResNet50V2 use 3 different preprocessing methods. Is there anything we could do about that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fchollet, I agree that several users may be confused. However, it is not easy to revise because the weights are originated from different researchers using different libraries. I think that one of the possible ways is to insert addition and division operations into model definitions, but it is not consistent with the existing model definition styles.

Copy link

@gabrieldemarmiesse gabrieldemarmiesse Oct 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that having the preprocessing in the model definition make a lot of sense and is a good idea. Models in keras.application cannot be used without the appropriate preprocessing, even when fine-tuning, when weights is not None.

I would be in favor of adding the preprocessing in the model when weights='imagenet' but it would break the API and change the existing behavior. Can we maybe add a new argument when creating the model? preprocessing='imagenet' (or None)?

I think it adds a lot of unnecessary thinking to users when trying to use pre-trained models. (the number of issues and stackoverflow questions about pre-processing when using keras.applications is staggering). It also comes from the functions being all called preprocess_input ,so users who don't know the details of the preprocessing of the different models think that the same preprocessing_input will cover all models.

There are also performance issues to consider since the preprocessing will happen on the GPU. Since this is close to an element-wise operation in most cases, this should be fine because GPUs are good at this.

Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabrieldemarmiesse, Thank you for the comment. The preprocessing='imagenet' idea looks good to me. However, it is not easy to change the styles because users who use the current styles (model.predict(preprocess_input(x))) must be confused if a preprocessing is included in a model definition. If such is the case, the current styles will perform the preprocessing twice and result in wrong outputs. I think that the style change is a matter of choice.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. It's true that it can be a source of confusion for users. I guess we can let @fchollet decide if the API needs a modification.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabrieldemarmiesse, Yes, I agree with your basic principles. However, it would be perfect if there was a way to force users not to run preprocessing twice, instead of giving a warning message. I have no idea, thus frankly speaking, I think the current form is better from the perspective of users who are familiar with the current one.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, it would be perfect if there was a way to force users not to run preprocessing twice, instead of giving a warning message.

I totally agree. If we use a global variable, it would work, but I don't really like global variables. I'm not even sure it's going to work well with multithreading/multiprocessing.

Generally speaking, I would really like a way to avoid silent preprocessing errors (they are VERY common for many users), after that, which API we choose doesn't really matter to me.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, I would really like a way to avoid silent preprocessing errors (they are VERY common for many users), after that, which API we choose doesn't really matter to me.

Is it possible to give a warning in model.fit(), model.fit_generator() and model.train_on_batch based on the values of the input training data ? i.e. give a warning if value of image pixels is not of the scale [0-255] (could be something like max(X) > 1 ) indicating that data might have been pre-processed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohit-gupta, I think that is overkill. It must result in performance degradation.

Copy link

@gabrieldemarmiesse gabrieldemarmiesse Nov 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the preprocessing is declared in the graph, a graph optimizer (let's say XLA for example) can fold it in the first convolution, making the preprocessing time drop to 0. I may be wrong but I think it's possible. Not that it's going to be significant compared to the execution time of the full graph. But speedups are always nice.

@@ -16,12 +17,38 @@
from keras.applications import vgg16
from keras.applications import vgg19
from keras.applications import xception
from keras_applications import resnet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we simply apply this new testing style to all applications? It would be easier to have all or nothing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fchollet, The testing style has been revised.

@fchollet
Copy link
Member

fchollet commented Oct 1, 2018

would you say that adding a model trained on Open Images instead of Imagenet meet this threshold?

Yes, I think that would be great and it would provide a lot of value to users, especially given that ImageNet is pretty biased.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approval for addition of

ResNet101
ResNet152
ResNet101V2
ResNet50V2
ResNet152V2
ResNeXt50
ResNeXt101

Thanks a lot for the PR, and sorry for the delaying approval (lots of things going on lately...)

@taehoonlee
Copy link
Contributor Author

@fchollet, I always appreciate you giving me useful feedback and the opportunity to maintain the repository. Don't be sorry and thanks a lot!

@taehoonlee taehoonlee merged commit d33bdd8 into master Nov 18, 2018
@taehoonlee taehoonlee deleted the add_resnet_variants branch November 18, 2018 06:13
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants