Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Fine-tune InceptionV3/ResNet50 on a new set of classes" doesn't work, while VGG16 works (suspect BN) #9214

Open
ozabluda opened this issue Jan 28, 2018 · 35 comments

Comments

@ozabluda
Copy link
Contributor

commented Jan 28, 2018

The following code works as expected with vgg16 (no BN) but not with resnet50 or inception_v3 (BN). My hypothesis is that it's due to BN. The code follows "Fine-tune InceptionV3 on a new set of classes" from https://keras.io/applications/#usage-examples-for-image-classification-models

from keras.preprocessing import image
from keras.applications import resnet50, inception_v3, vgg16
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D, Input
from keras.optimizers import Adam
import numpy as np

batch_size = 50
num_classes = 2

#base_model = resnet50.ResNet50
#base_model = inception_v3.InceptionV3
base_model = vgg16.VGG16

base_model = base_model(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(num_classes, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in base_model.layers:
    layer.trainable = False

model.compile(loss='sparse_categorical_crossentropy',
              optimizer=Adam(lr=0.0001),
              metrics=['acc'])

x_train = np.random.normal(loc=127, scale=127, size=(50, 224,224,3))
y_train = np.array([0,1]*25)
x_train = resnet50.preprocess_input(x_train)

print(model.evaluate(x_train, y_train, batch_size=batch_size, verbose=0))
model.fit(x_train, y_train,
          epochs=100,
          batch_size=batch_size,
          shuffle=False,
          validation_data=(x_train, y_train))
@ozabluda

This comment has been minimized.

Copy link
Contributor Author

commented Jan 28, 2018

VGG16 output (works as expected):

  1. both validation loss and acc before training (from evaluate()) are exactly equal to those in the first training iteration
  2. validation loss/acc correspond to training loss/acc
  3. training loss->0, acc->1, validation loss->0, acc->1.
[1.1005926132202148, 0.5]
Train on 50 samples, validate on 50 samples
Epoch 1/100
50/50 [==============================] - 1s 11ms/step - loss: 1.1006 - acc: 0.5000 - val_loss: 0.7771 - val_acc: 0.5800
Epoch 2/100
50/50 [==============================] - 0s 8ms/step - loss: 0.7771 - acc: 0.5800 - val_loss: 0.8947 - val_acc: 0.4600
Epoch 3/100
50/50 [==============================] - 0s 9ms/step - loss: 0.8947 - acc: 0.4600 - val_loss: 0.9511 - val_acc: 0.4800
Epoch 4/100
50/50 [==============================] - 0s 9ms/step - loss: 0.9511 - acc: 0.4800 - val_loss: 0.8385 - val_acc: 0.4600
Epoch 5/100
50/50 [==============================] - 0s 9ms/step - loss: 0.8385 - acc: 0.4600 - val_loss: 0.7341 - val_acc: 0.5400
Epoch 6/100
50/50 [==============================] - 0s 9ms/step - loss: 0.7341 - acc: 0.5400 - val_loss: 0.7455 - val_acc: 0.5600
Epoch 7/100
50/50 [==============================] - 0s 8ms/step - loss: 0.7455 - acc: 0.5600 - val_loss: 0.7991 - val_acc: 0.6000
Epoch 8/100
50/50 [==============================] - 0s 9ms/step - loss: 0.7991 - acc: 0.6000 - val_loss: 0.7902 - val_acc: 0.6000
Epoch 9/100
50/50 [==============================] - 0s 9ms/step - loss: 0.7902 - acc: 0.6000 - val_loss: 0.7258 - val_acc: 0.5800
Epoch 10/100
50/50 [==============================] - 0s 9ms/step - loss: 0.7258 - acc: 0.5800 - val_loss: 0.6727 - val_acc: 0.6400
[...]
Epoch 98/100
50/50 [==============================] - 0s 9ms/step - loss: 0.2272 - acc: 1.0000 - val_loss: 0.2246 - val_acc: 1.0000
Epoch 99/100
50/50 [==============================] - 0s 9ms/step - loss: 0.2246 - acc: 1.0000 - val_loss: 0.2221 - val_acc: 1.0000
Epoch 100/100
50/50 [==============================] - 0s 9ms/step - loss: 0.2221 - acc: 1.0000 - val_loss: 0.2196 - val_acc: 1.0000
@ozabluda

This comment has been minimized.

Copy link
Contributor Author

commented Jan 28, 2018

resnet50 output (does not work as expected):

  1. validation loss before training (from evaluate()) is nowhere near to that in the first training iteration (BN?)
  2. validation loss/acc does not correspond to training loss/acc at all (BN?)
  3. training loss->0, acc->1 very quickly (acc=1.0 starting from epoch 5), validation loss stays huge forever, acc=0.5 (random) forever.
[2.3405368328094482, 0.5]
Train on 50 samples, validate on 50 samples
Epoch 1/100
50/50 [==============================] - 1s 21ms/step - loss: 0.6806 - acc: 0.5400 - val_loss: 1.6767 - val_acc: 0.5000
Epoch 2/100
50/50 [==============================] - 0s 8ms/step - loss: 0.6061 - acc: 0.6400 - val_loss: 1.8632 - val_acc: 0.5000
Epoch 3/100
50/50 [==============================] - 0s 9ms/step - loss: 0.5088 - acc: 0.9000 - val_loss: 2.0533 - val_acc: 0.5000
Epoch 4/100
50/50 [==============================] - 0s 9ms/step - loss: 0.4437 - acc: 0.9200 - val_loss: 1.9083 - val_acc: 0.5000
Epoch 5/100
50/50 [==============================] - 0s 9ms/step - loss: 0.3799 - acc: 1.0000 - val_loss: 1.5847 - val_acc: 0.5000
Epoch 6/100
50/50 [==============================] - 0s 9ms/step - loss: 0.3222 - acc: 1.0000 - val_loss: 1.3209 - val_acc: 0.5000
Epoch 7/100
50/50 [==============================] - 0s 8ms/step - loss: 0.2816 - acc: 1.0000 - val_loss: 1.2207 - val_acc: 0.5000
Epoch 8/100
50/50 [==============================] - 0s 8ms/step - loss: 0.2439 - acc: 1.0000 - val_loss: 1.2348 - val_acc: 0.5000
Epoch 9/100
50/50 [==============================] - 0s 9ms/step - loss: 0.2089 - acc: 1.0000 - val_loss: 1.2679 - val_acc: 0.5000
Epoch 10/100
50/50 [==============================] - 0s 9ms/step - loss: 0.1824 - acc: 1.0000 - val_loss: 1.2359 - val_acc: 0.5000
[...]
Epoch 98/100
50/50 [==============================] - 0s 8ms/step - loss: 0.0032 - acc: 1.0000 - val_loss: 2.2686 - val_acc: 0.5000
Epoch 99/100
50/50 [==============================] - 0s 8ms/step - loss: 0.0032 - acc: 1.0000 - val_loss: 2.2791 - val_acc: 0.5000
Epoch 100/100
50/50 [==============================] - 0s 8ms/step - loss: 0.0031 - acc: 1.0000 - val_loss: 2.2894 - val_acc: 0.5000
@ozabluda

This comment has been minimized.

Copy link
Contributor Author

commented Jan 28, 2018

The problem is not happening without

for layer in base_model.layers:
    layer.trainable = False

Note that this is repo master i.e with 24246ea already merged, see #8616 (comment)

@ozabluda

This comment has been minimized.

Copy link
Contributor Author

commented Feb 13, 2018

Just checked with TF 1.5 and Keras master, and the behavior is unchanged. Also identical on CPU (which I didn't check before).

@fchollet, it appears to be a serious problem, because AFAIK I follow (simple) docs to the letter.

@GougeC

This comment has been minimized.

Copy link

commented Mar 14, 2018

I am having the exact same issue. Inception does not work but VGG does fine. InceptionV3 picks the same class every time no matter what the test set is

@majiaji

This comment has been minimized.

Copy link

commented Apr 2, 2018

same issue when I try ResNet50 in keras

@Zamirquito

This comment has been minimized.

Copy link

commented Apr 6, 2018

I have tried using different way to optimize(Adam, SGD, SGD with momentum and so on) when I trained the ResNet50, finetune and just freeze all the layers except fc layer, My training loss is decreasing, but val accu is increasing and then just as you say, stopping at 50% ~60%......

I have tried to ramdom sampling the data and used some data augmentation tricks, but those didn't work.

@NProdanova

This comment has been minimized.

Copy link

commented Apr 6, 2018

I am having the same problem with Inception-v3, while VGG19 works. I can also confirm that when I remove

layer.trainable = False

validation accuracy starts imroving.
I posted a question for a workaround on stackoverflow:
https://stackoverflow.com/questions/49689122/keras-inception-v3-fine-tuning-workaraound

Maybe somebody has a suggestion

@ciprianfocsaneanu

This comment has been minimized.

Copy link

commented Apr 18, 2018

I am having the same problem with ResNet50. I am doing transfer learning and the same dataset/code works for InceptionV3 and DenseNet121, but ResNet seems to always predict one class

@datumbox

This comment has been minimized.

Copy link
Contributor

commented Apr 23, 2018

For all of you who are affected by this, please have a look at PR #9965. This probllem is caused by the way that the Batch Normalization layer is implemented in Keras.

To understand why this happens we need to understand how the BN works. When the network is in training mode, the mini-batch statistics of BN are used for training the network; when the network is in inference mode, we use the moving mean/var learned during the training. That's all good. The problem is how the layer behaves when it is frozen. Its side-effects are more profound when we use fine-tuning and Transfer Learning.

You see, when frozen and while in training mode the BN continues to use the mini-batch statistics for scaling the training data. This causes the unfrozen/trainable layers to adapt to the scale of the data. Unfortunately during inference mode (predictions) the network will switch to the moving mean/var. If the moving mean/var is different that the mini-batch statistics the data are scaled differently causing massive discrepancies on the accuracy. If you want more info, have a look at the PR.

@jksmither

This comment has been minimized.

Copy link

commented Apr 28, 2018

@ozabluda I'm facing the same problem on Keras 2.1.6 and TensorFlow 1.7. Did you find a solution?

@datumbox I tried installing your branch and it seems to work well. Unfortunately it is not synced with the latest version of Keras. Any plans to merge it with 2.1.6? If not I can do it.

@datumbox

This comment has been minimized.

Copy link
Contributor

commented May 1, 2018

@jksmither Sorry for the late response. I just synced my branch with the latest master and provided a patched fork of 2.1.6. Honestly I would like to see this fixed on master as maintaining a separate fork with the patch is not a viable solution on the long term. I'll probably keep syncing it for as long as we use Keras at work but I can't make any promises.

@shazamkash

This comment has been minimized.

Copy link

commented Jun 5, 2018

Did anyone find a concrete solution to this problem? I am also affected by this problem and I am working on Keras 2.1.6 and TensorFlow 1.7 to train and test my data using InceptionV3 and Resnet50.

I am very new to deep learning and any help will be appreciated.

@izharikov

This comment has been minimized.

Copy link

commented Jun 5, 2018

@shazamkash

This is temp fix:

pip install -U --force-reinstall --no-dependencies git+https://github.com/datumbox/keras@bugfix/trainable_bn 

So, this command reinstall keras with fixes, provided by @datumbox.
This works for me (Inception started train normally).

@shazamkash

This comment has been minimized.

Copy link

commented Jun 6, 2018

@izharikov @datumbox
Thank you so much for providing this information and the patch. Although are we expecting a permanent fix anytime soon?
Also has anyone faced this problem using just the TensorFlow? Any experience with that?

@datumbox

This comment has been minimized.

Copy link
Contributor

commented Jun 6, 2018

Thanks for the input. I would advise using the fork 2.1.6 instead of the trainable_bn branch. This is because the fork is synced with the latest stable version of keras, while the trainable_bn even though it's more fresh it's not based on a finalized release.

Unfortunately there are no plans for a permanent fix at the moment. My PR #9965 was rejected (you can read the rational on the link) because it modifies the semantics of trainable. It's not the first time that the BatchNormalization layer forces us to update the semantics of trainable (see version 2.1.3) but it can take a while until such a change gets enough momentum. So maybe on the future if enough people complain about it, it will reopen. Until then I'll do my best to maintain the fork for those of you who are brave enough to mess with custom implementations.

@drsxr

This comment has been minimized.

Copy link

commented Jun 13, 2018

Wow. Was spinning my wheels for a while with ResNet50 training trying to fine tune until I found these threads. Same problems. So batchnorm in Keras = no fine tuning? Either paste the FC layer on top of trained weights (imagenet) or train from scratch. I'm working with smaller N's so training from scratch with augmentation is not a defacto solution. A shame too because Transfer Learning is looking more attractive lately - see : "Do Better ImageNet Models Transfer Better?"

Apart from @datumbox's patch, or moving over to another framework, any other workarounds?
@fchollet suggested this in the main discussion thread on Vasils' PR (#9965):

But even if you don't, you can still do your style of fine-tuning in this case:

set learning phase to 0
load model
retrieve features you want to train on
set learning phase to 1
add new layers on top
optionally load weights from initial model layers to corresponding new layers
train
@joeyearsley

This comment has been minimized.

Copy link
Contributor

commented Jun 18, 2018

for layer in base_model.layers:
        if hasattr(layer, 'moving_mean') and hasattr(layer, 'moving_variance'):
            layer.trainable = True
            K.eval(K.update(layer.moving_mean, K.zeros_like(layer.moving_mean)))
            K.eval(K.update(layer.moving_variance, K.zeros_like(layer.moving_variance)))
        else:
            layer.trainable = False

Reset the batch norm moving averages and allow them to update to the new dataset - you'll see it transfer.

I'm writing a longer update on this matter and will open issues (easy PRs) so people can help contribute to fixing the documentation and the like.

@datumbox

This comment has been minimized.

Copy link
Contributor

commented Jun 18, 2018

Unfreezing the BNs while keeping the subsequent Convolutions frozen can have negative effects on accuracy. I describe this in more detail here.

@AndreGuerra123

This comment has been minimized.

Copy link

commented Aug 18, 2018

Just a simple question: why the value 1024 as units in the last dense layer?
x = Dense(1024, activation='relu')(x)

@ozabluda

This comment has been minimized.

Copy link
Contributor Author

commented Aug 19, 2018

x = Dense(1024, activation='relu')(x) is because

The code follows "Fine-tune InceptionV3 on a new set of classes" from https://keras.io/applications/#usage-examples-for-image-classification-models

For the purpose of this Issue, I was following the docs literally for stronger effect to make my point.

@XupingZHENG

This comment has been minimized.

Copy link

commented Aug 20, 2018

I fine tune keras pretrained model on my own dataset. I freeze some the layers in the early stage. I got decent validation accuracy on VGG, but bad validation accuracy on ResNet50.

VGG
epoch 1: train_acc 0.546514682730133, val_acc 0.6607583973804312
epoch16: train_acc 0.9279250402631126, val_acc 0.7440402661072892
There is overfitting.

ResNet50
epoch 1: train_acc 0.7301661501087283, val_acc 0.04389513340162522
Then I terminate the training.

I think this may be caused by BatchNormalization.

I once used keras-retinanet https://github.com/fizyr/keras-retinanet to train on my own dataset, which worked very well. So I want to find out the reason. RetinaNet uses ResNet as backbone, and BatchNormalization layers are frozen, see https://github.com/fizyr/keras-retinanet/blob/master/keras_retinanet/models/resnet.py#L98

The ResNet in that project is borrowed from another repo keras-resnet https://github.com/broadinstitute/keras-resnet. In this ResNet implementation, the authors customize the BatchNormalize layer, see
https://github.com/broadinstitute/keras-resnet/blob/master/keras_resnet/layers/_batch_normalization.py

    def call(self, *args, **kwargs):
        # return super.call, but set training
        return super(BatchNormalization, self).call(training=(not self.freeze), *args, **kwargs)

It seems that this operation is what @fchollet recommends in @datumbox 's PR
#9965 (comment)

I think it would be better to use keras-resnet https://github.com/broadinstitute/keras-resnet for fine tuning. I have not tried yet.

Yesterday I tried fine tuning InceptionV3 on the same dataset, with half of the layers set untrainable. But it is somewhat strange that the validation accuracy is quite well.

InceptionV3
epoch 1: train_acc 0.7295633685318796, val_acc 0.7712759384936045
epoch 10: train_acc 0.9254250409199613, val_acc 0.7986968871106656

The code I use is something like this https://gist.github.com/XupingZHENG/1e20d54a70c8e04912c0b37fa7e7b931

@cesarorosco

This comment has been minimized.

Copy link

commented Sep 18, 2018

I have the same problem with Resnet50.

This seems to work.

-Set the learning phase to 1
-In every batch normalization layer set Training=False

After that I get the correct accuracy.

K.set_learning_phase(1)

base_model = ResNet50(weights='imagenet', include_top=False)

x = base_model.output

x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(2, activation='softmax')(x)

model = Model(inputs=base_model.input, outputs=predictions)

for layer in base_model.layers:
    layer.trainable = False
    
    if layer.name.startswith('bn'):
        layer.call(layer.input, training=False)
@datumbox

This comment has been minimized.

Copy link
Contributor

commented Sep 28, 2018

@cesarorosco This means that you network runs always on Training mode. Even when you make predictions you use the mini-batch statistics. This is not great as your predictions will change depending on what images you pass on the batch.

@lpenet

This comment has been minimized.

Copy link

commented Dec 2, 2018

If I proceed as @cesarorosco mentionned, things go even worse and I stick around a ~0.5 acc on the learning and validation sets...

It is a bit disappointing to have this kind of problem, especially when this kind of stuff is presented as too simple when using Keras...

It is such a great tool, anyway.

@chabir

This comment has been minimized.

Copy link

commented Jan 8, 2019

All,
I don't know if this is completely related but I have been refering to this issue for several days now. I used the patch mentioned by @datumbox and I observed the following:

Keras version: 2.2.4

from keras.applications.resnet50 import ResNet50
from keras.layers import Dense, GlobalAveragePooling2D

case 1: model converges:
base_model = ResNet50(input_shape=(img_size, img_size, 3), weights=None, classes=5004)

case 2: model never ever converges:
res50 = ResNet50(input_shape=(img_size, img_size, 3), weights=None, include_top=False)
x = GlobalAveragePooling2D()(res50.output)
output = Dense(5004)(x)
base_model = Model(inputs=[res50.input], outputs=[output])

any ideas ?

Edit: I forgot the activation layer softmax in case 2...

@Wazaki-Ou

This comment has been minimized.

Copy link

commented Feb 20, 2019

I think I might be affected by a similar issue, except that I am using VGG16 instead. The training goes really well, to be honest a bit unexpectedly well, and it goes beyond 90%. I checked the model accuracy on a testing set and it gives good results and the same thing happens when I check with a confusion matrix. The problem is that, when I try to use the model to predict_classes() on my testing set (the same one that gave good results on accuracy and confusion matrix), the predictions are awfully bad. One class seems to be preferred over the others and I get 0 accuracy in 2 or the 5 classes. I was asked to check this post and I am wondering if anyone could help. Thanks a lot !!

@rohan19250

This comment has been minimized.

Copy link

commented Feb 21, 2019

I am having the same issue as well.Inception v3 giving low validation accuracy but high training accuracy.What would be the suggestion to fix this?

Epoch 1/30
185/185 [==============================] - 5478s 30s/step - loss: 0.1161 - acc: 0.9493 - val_loss: 2.4898 - val_acc: 0.5678
Epoch 2/30
185/185 [==============================] - 5453s 29s/step - loss: 0.0362 - acc: 0.9861 - val_loss: 1.1530 - val_acc: 0.7678
Epoch 3/30
185/185 [==============================] - 5457s 29s/step - loss: 0.0280 - acc: 0.9902 - val_loss: 5.4614 - val_acc: 0.4506
Epoch 4/30
185/185 [==============================] - 5458s 30s/step - loss: 0.0184 - acc: 0.9934 - val_loss: 5.2297 - val_acc: 0.5117
Epoch 5/30
185/185 [==============================] - 5474s 30s/step - loss: 0.0146 - acc: 0.9954 - val_loss: 4.2587 - val_acc: 0.5586
Epoch 6/30
185/185 [==============================] - 5463s 30s/step - loss: 0.0113 - acc: 0.9965 - val_loss: 4.5049 - val_acc: 0.6019
Epoch 7/30
185/185 [==============================] - 5467s 30s/step - loss: 0.0099 - acc: 0.9972 - val_loss: 6.9422 - val_acc: 0.3551
Epoch 8/30
185/185 [==============================] - 5467s 30s/step - loss: 0.0099 - acc: 0.9969 - val_loss: 5.8211 - val_acc: 0.4901
Epoch 9/30
185/185 [==============================] - 5466s 30s/step - loss: 0.0112 - acc: 0.9965 - val_loss: 5.2108 - val_acc: 0.5518
Epoch 10/30
185/185 [==============================] - 5471s 30s/step - loss: 0.0113 - acc: 0.9964 - val_loss: 6.1660 - val_acc: 0.5092
Epoch 11/30
104/185 [===============>..............] - ETA: 37:44 - loss: 0.0140 - acc: 0.9958

@pchris24

This comment has been minimized.

Copy link

commented Mar 7, 2019

same issue when I try ResNet50 in keras

Same here too

I use ResNet50 for fine-tuning. I want to predict the results for two classes. In one class the validation and trainning accuracy is 45% and on the other it's 0% but I unfreeze the final set of conv layers.

@AkshayRoy

This comment has been minimized.

Copy link

commented Apr 2, 2019

i tried to run resnet50 model for classifying colors of clothes, i used image net weights and added a globalavg pool layer, 2 dense and dropouts and final output layer with sigmoid/softmax. i frooze all the layers except the newly added ones then started training. Training goes well for some time and i managed to get some accuracies but when i tested my model, the predicitions are all wrong. can anybody help me solve this?

@xhm1014

This comment has been minimized.

Copy link

commented Aug 12, 2019

I tried transfer learning on resnet50 in keras for two class classification problem. I only fine-tune the top fully-connected layer, while all other layers are frozen. I encounter the same problem: training accuracy is increasing as expected, but validation accuracy is only between 50-60%.

Have any one had good solutions to overcome this problem in keras, please?

@geometrikal

This comment has been minimized.

Copy link

commented Aug 25, 2019

@xhm1014 Pre-compute the vectors using resnet50, then train model with only the dense layers on the vectors. After training, join the dense layers to the resnet50 layers if you want to save the whole network.

@BraveDistribution

This comment has been minimized.

Copy link

commented Aug 25, 2019

@geometrikal

could you provide any example how to do that? i know that this w/a was suggested by the author of keras, but I couldn't find any way how to do that.

It is really shame that official docs doesn't mention this. Any other model than VGG (without BN) is useless if you want to freeze any of the layers.

@geometrikal

This comment has been minimized.

Copy link

commented Aug 25, 2019

@BraveDistribution Yea it is a bit strange.

This is how I do it, (taken from my repo here: https://github.com/microfossil/particle-classification/blob/master/miso/training/model_trainer.py )

Functions to make the head and tail:

def resnet50_head(input_shape):
    inputs = Input(shape=input_shape)
    x = Lambda(lambda y: tf.reverse(y, axis=[-1]))(inputs)
    x = Lambda(lambda y: y * tf.constant(255.0)
                         - tf.reshape(tf.constant([103.939, 116.779, 128.68]),
                                      [1, 1, 1, 3]))(x)
    x = resnet50.ResNet50(include_top=False,
                          weights='imagenet',
                          pooling='avg')(x)
    model = Model(inputs=inputs, outputs=x)
    model.get_layer('resnet50').trainable = False
    return model


def tl_tail(nb_classes):
    model = Sequential()
    model.add(Dropout(0.05))
    model.add(Dense(512, activation='relu'))
    model.add(Dropout(0.15))
    model.add(Dense(512, activation='relu'))
    model.add(Dense(nb_classes, activation='softmax'))
    return model

Note the two lambda layers to make the head. The pre-trained ResNet50 uses some pre-processing that takes away the channel averages (of ImageNet). In my datasets I use images scaled to the range [0,1] by divding by 255. So these two layers convert from [0,1] range to correct pre-processing used by the pre-trained network. You can remove them if you are using resnet50.preprocessing() to create the images.

Now create the head and make the vectors:

model_head = resnet50_head(input_shape=(224,224,3))
train_vectors = model_head.predict(train_images)
test_vectors = model_head.predict(test_images)

Make the tail and train with these vectors:

model_tail = tl_tail(nb_classes=N)
model_tail.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
history = model_tail.fit(train_vector,
                                 train_onehots,
                                 validation_data=(test_vector, test_onehots),
                                 epochs=max_epochs,
                                 batch_size=batch_size,
                                 shuffle=True,
                                 verbose=0)

Then if you want to have a network that takes image as input, join them:

outputs = model_tail(model_head.outputs[0])
model = Model(model_head.inputs[0], outputs)

and then you can do

train_preds = model.predict(train_images) etc

Pre-calculating the vectors like this makes training very fast. You can even do it on a CPU.

By the way, my repo is designed for a project where we are enabling non-ML people to train networks. If you are interested see the docs read me here: https://github.com/microfossil/particle-classification-examples and especially the google colab tutorial (and use resnet50_tl as the cnn type)

@digital-thinking

This comment has been minimized.

Copy link

commented Oct 6, 2019

I also noticed this while training the efficientNet model, which includes BatchNormalization. I observed that it seems like freezing the BN layer, leads to bad accuracy (wrong predictions) in validation phase, while in training phase everything looks fine. When un-freezing the BN, the test accuracy recovers.

Here is an example notebook and here is some additional information:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.