Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using pre trained VGG16 for another classification task #4465

Closed
vabatista opened this issue Nov 22, 2016 · 38 comments
Closed

using pre trained VGG16 for another classification task #4465

vabatista opened this issue Nov 22, 2016 · 38 comments

Comments

@vabatista
Copy link

How can I user the new keras.applications.VGG16 class to start my training with the weights in H5 file, but for a new task with 8 classes only?

I didn't figure out how to pop the softmax layer and put another one with 8 perceptons only.

@JGuillaumin
Copy link

JGuillaumin commented Nov 22, 2016

One way to do this, is to not include the fully-connected layers at the top of the network.
Then add new fully-connect layers with random initialization, with the correct num of output.

The convolutional layers will be initialized with weights based on a training on ImageNet dataset.
Generally, we can say that the convolutional layers work as features extractors.

But you to train train the whole network for your 8 classes. This training will train the new fully-connected layers and fine-train the convolutional layers. (you can freeze the convolutional layers, to keep the same features extractors).

from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
from keras.layers import Input, Flatten, Dense
from keras.models import Model
import numpy as np

#Get back the convolutional part of a VGG network trained on ImageNet
model_vgg16_conv = VGG16(weights='imagenet', include_top=False)
model_vgg16_conv.summary()

#Create your own input format (here 3x200x200)
input = Input(shape=(3,200,200),name = 'image_input')

#Use the generated model 
output_vgg16_conv = model_vgg16_conv(input)

#Add the fully-connected layers 
x = Flatten(name='flatten')(output_vgg16_conv)
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dense(4096, activation='relu', name='fc2')(x)
x = Dense(8, activation='softmax', name='predictions')(x)

#Create your own model 
my_model = Model(input=input, output=x)

#In the summary, weights and layers from VGG part will be hidden, but they will be fit during the training
my_model.summary()


#Then training with your data ! 

@JGuillaumin
Copy link

JGuillaumin commented Nov 22, 2016

If you want to change only the last layer :

# Generate a model with all layers (with top)
vgg16 = VGG16(weights=None, include_top=True)

#Add a layer where input is the output of the  second last layer 
x = Dense(8, activation='softmax', name='predictions')(vgg16.layers[-2].output)

#Then create the corresponding model 
my_model = Model(input=vgg16.input, output=x)
my_model.summary()

@vabatista
Copy link
Author

Thank you very much!

@UkiDLucas
Copy link

Thank you JGuillaumin!

@parikshit95
Copy link

For training do you just use the model.compile() and model.fit(data,labels) commands?

@howardya
Copy link

howardya commented Feb 26, 2017

@JGuillaumin In your reply, when we only want to change the last layer, did you mean

vgg16 = VGG16(weights='imagenet', include_top=True)

instead of

vgg16 = VGG16(weights=None, include_top=True)

@JordanPeltier
Copy link

@howardya , actually not.
He wants to use pretrained vgg16 model with Imagenet but just remove the last layer (softmax) and use another one for another classification task.

@abdulsalama
Copy link

just a small correction it should be shape=(200, 200, 3) in
input = Input(shape=(3,200,200),name = 'image_input')

Otherwise, you will get this error:
ValueError: number of input channels does not match corresponding dimension of filter, 100 != 3

@LindaSt
Copy link

LindaSt commented Apr 8, 2017

@JGuillaumin
Thank you very much for your example! I have been trying to implement your example on the MNIST dataset. However, the accuracy just keeps fluctuation around 10% during training and validation. I really do not understand what I did wrong. I added my code below.

Thank you very much!

import sys
import numpy as np
import cv2
import sklearn.metrics as sklm

from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
from keras.layers import Input, Flatten, Dense
from keras.models import Model
from keras.datasets import mnist

from keras import backend as K
img_dim_ordering = 'tf'
K.set_image_dim_ordering(img_dim_ordering)

# the model
def pretrained_model(img_shape, num_classes):
    model_vgg16_conv = VGG16(weights='imagenet', include_top=False)
    #model_vgg16_conv.summary()
    
    #Create your own input format
    keras_input = Input(shape=img_shape, name = 'image_input')
    
    #Use the generated model 
    output_vgg16_conv = model_vgg16_conv(keras_input)
    
    #Add the fully-connected layers 
    x = Flatten(name='flatten')(output_vgg16_conv)
    x = Dense(4096, activation=layer_type, name='fc1')(x)
    x = Dense(4096, activation=layer_type, name='fc2')(x)
    x = Dense(num_classes, activation='softmax', name='predictions')(x)
    
    #Create your own model 
    pretrained_model = Model(inputs=keras_input, outputs=x)
    pretrained_model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
    
    return pretrained_model

# loading the data
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# converting it to RGB
x_train = [cv2.cvtColor(cv2.resize(i, (32,32)), cv2.COLOR_GRAY2BGR) for i in x_train]
x_train = np.concatenate([arr[np.newaxis] for arr in x_train]).astype('float32')

x_test = [cv2.cvtColor(cv2.resize(i, (32,32)), cv2.COLOR_GRAY2BGR) for i in x_test]
x_test = np.concatenate([arr[np.newaxis] for arr in x_test]).astype('float32')

# training the model
model = pretrained_model(x_train.shape[1:], len(set(y_train)), 'relu')
hist = model.fit(x_train, y_train, epochs=2, validation_data=(x_test, y_test), verbose=1)

@KamalOthman
Copy link

Hi @JGuillaumin and Everyone
I used the pre-trained VGG16 model without its top layers, i.e. fully connected layers. I trained my FC with train & validation datasets. I followed the example in https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
Everything is working well. Now, I have the VGG16 model with its weights and my FC layers with its weights.
For a new test data, How can I merge both models in order to predict a new data?

I really appreciate your help gyus

Kamal

@chauhan-utk
Copy link

@KamalOthman I am also trying to do the same thing. Update if you find any solution.

@KamalOthman
Copy link

Hi @chauhan-utk
Yes, I solved it. Please find it here #6408

Cheers

@jinkos
Copy link

jinkos commented May 23, 2017

Should the input images be normalised in some specific way in order to make best use of the features?
I am normalizing each image by subtracting the mean and dividing by the SD. Seems to work. But is there a better way?

@DanqingZ
Copy link

@LindaSt Is your problem solved?

@LindaSt
Copy link

LindaSt commented Jun 26, 2017

@DanqingZ Yes, I solved my problem by freezing the layers in the base model.

for layer in base_model.layers:
    layer.trainable = False

@aniket03
Copy link

aniket03 commented Jul 4, 2017

@LindaSt @DanqingZ I am using conv layers of VGG16 for feature extraction. I tried freezing layers of VGG16 but still accuracy on train and val still remains at 10%, Can u pls help?

from __future__ import division

from keras.applications.vgg16 import VGG16
from keras.layers import Input, Flatten, Dense, Dropout
from keras.models import Model, Sequential
from keras.optimizers import SGD
from keras.utils import np_utils

from pandas import read_csv
import pickle

from sklearn.preprocessing import LabelEncoder

# CONSTANTS
DATAPOINTS_TO_CONSIDER = 10000
TRAINING_DATA_PERCENTAGE = 0.8

# load data
data_file = open("cifar10.pkl")
X = pickle.load(data_file)

X = X[:DATAPOINTS_TO_CONSIDER]
X = X.reshape(X.shape[0], X.shape[3], X.shape[1], X.shape[2])

# Split images into train - validation sets
n_points = X.shape[0]
tr_points = int(TRAINING_DATA_PERCENTAGE * n_points)
X_train = X[0:tr_points]
X_val = X[tr_points:]
del X
data_file.close()

# normalize inputs from 0-255 to 0.0-1.0
X_train = X_train.astype('float32')
X_val = X_val.astype('float32')
X_train = X_train/255.0
X_val = X_val/255.0

# Load labels
dataframe = read_csv('trainLabels.csv')
labels = dataframe.values
Y = labels[:, 1]
encoder = LabelEncoder()
encoder.fit(Y)
encoded_Y = encoder.transform(Y)
categorical_Y = np_utils.to_categorical(encoded_Y)

# Split labels into train and validation sets
y_train = categorical_Y[:tr_points]
y_val = categorical_Y[tr_points:DATAPOINTS_TO_CONSIDER]
del labels, Y

num_classes = 10  # Hard coded now will change

# Get back the convolutional part of a VGG network trained on ImageNet
model_vgg16_conv = VGG16(weights='imagenet', include_top=False)
model_vgg16_conv.summary()

# Make vgg16 model layers as non trainable
for layer in model_vgg16_conv.layers:
    layer.trainable = False

# Create your own input format (here 3x32x32)
input = Input(shape=(3,32,32), name='image_input')

# Use the generated model
output_vgg16_conv = model_vgg16_conv(input)

# Add the flatten layer
x = Flatten(name='flatten')(output_vgg16_conv)

# Create your own model
my_model = Model(input=input, output=x)
my_model.summary()

# Collect features obtained from VGG
features_for_train_data = my_model.predict(X_train)
features_for_test_data = my_model.predict(X_val)

model = Sequential()
model.add(Dense(256, input_shape=features_for_train_data.shape[1:], activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))

sgd = SGD(lr=0.001, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(optimizer=sgd,
              loss='mse',
              metrics=['accuracy'])

model.fit(features_for_train_data, y_train,
          nb_epoch=80,
          batch_size=32,
          validation_data=(features_for_test_data, y_val))

@LindaSt @DanqingZ ^ sorted now.

@liketheflower
Copy link

@LindaSt @DanqingZ@aniket03
Here are more hints.
I am upsample the cifar10 data set from 3232 to 197197 and the accuracy is always 0.10.
I change the size to 64*64, it runs well at the beginning and the accuracy goes to 84.7, then since
46 epoch, the result reduced to 0.10.
Something must be wrong. Hope this can give you guys more hints and let's fix this problem together.

BTW, the application resnet50 model works good on the cifar10 data set when upsample the image size to 197*197.

@liketheflower
Copy link

liketheflower commented Jul 6, 2017

here is the output:
Epoch 44/200
1562/1562 [==============================] - 358s - loss: 0.4885 - acc: 0.8489 - val_loss: 0.5697 - val_acc: 0.8346
Epoch 45/200
1562/1562 [==============================] - 357s - loss: 0.4232 - acc: 0.8672 - val_loss: 1.2105 - val_acc: 0.6898
Epoch 46/200
1562/1562 [==============================] - 354s - loss: 12.1607 - acc: 0.2163 - val_loss: 14.5063 - val_acc: 0.1000
Epoch 47/200
1562/1562 [==============================] - 353s - loss: 14.5035 - acc: 0.1002 - val_loss: 14.5063 - val_acc: 0.1000
Epoch 48/200
1562/1562 [==============================] - 354s - loss: 14.5048 - acc: 0.1001 - val_loss: 14.5063 - val_acc: 0.1000
Epoch 49/200
1562/1562 [==============================] - 355s - loss: 14.5029 - acc: 0.1002 - val_loss: 14.5063 - val_acc: 0.1000
Epoch 50/200
1562/1562 [==============================] - 352s - loss: 14.5025 - acc: 0.1002 - val_loss: 14.5063 - val_acc: 0.1000
Epoch 51/200
1562/1562 [==============================] - 353s - loss: 14.5154 - acc: 0.0994 - val_loss: 14.5063 - val_acc: 0.1000
Epoch 52/200
1562/1562 [==============================] - 354s - loss: 14.5035 - acc: 0.1002 - val_loss: 14.5063 - val_acc: 0.1000

here is the code:

"""
Adapted from keras example cifar10_cnn.py
Train ResNet-18 on the CIFAR10 small images dataset.

GPU run command with Theano backend (with TensorFlow, the GPU is automatically used):
    THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python cifar10.py
"""
from __future__ import print_function
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.utils import np_utils
from keras.callbacks import ReduceLROnPlateau, CSVLogger, EarlyStopping
from scipy.misc import toimage, imresize
import numpy as np
#import resnet
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
from keras.layers import Input, Flatten, Dense
from keras.models import Model
import numpy as np
from keras.callbacks import ModelCheckpoint


from keras import backend as K
#K.set_image_dim_ordering('th')
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)


lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.1), cooldown=0, patience=5, min_lr=0.5e-6)
early_stopper = EarlyStopping(min_delta=0.001, patience=20)
csv_logger = CSVLogger('./results/vgg16imagenetpretrained_upsampleimage_cifar10_data_argumentation.csv')

batch_size = 32
nb_classes = 10
nb_epoch = 200
data_augmentation = True

# input image dimensions
img_rows, img_cols = 197, 197
I_R = 64
# The CIFAR10 images are RGB.
img_channels = 3

# The data, shuffled and split between train and test sets:
(X_train_original, y_train), (X_test_original, y_test) = cifar10.load_data()

# Convert class vectors to binary class matrices.
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

X_train_original = X_train_original.astype('float32')
X_test_original = X_test_original.astype('float32')


# upsample it to size 64X64X3


X_train = np.zeros((X_train_original.shape[0],I_R,I_R,3))
for i in range(X_train_original.shape[0]):
    X_train[i] = imresize(X_train_original[i], (I_R,I_R,3), interp='bilinear', mode=None)


X_test = np.zeros((X_test_original.shape[0],I_R,I_R,3))
for i in range(X_test_original.shape[0]):
    X_test[i] = imresize(X_test_original[i], (I_R,I_R,3), interp='bilinear', mode=None)


# subtract mean and normalize
mean_image = np.mean(X_train, axis=0)
X_train -= mean_image
X_test -= mean_image
X_train /= 128.
X_test /= 128.


print(X_train.shape)




#model = resnet.ResnetBuilder.build_resnet_18((img_channels, img_rows, img_cols), nb_classes)
#model =get_vgg_pretrained_model()


 #Get back the convolutional part of a VGG network trained on ImageNet
model_vgg16_conv = VGG16(input_shape=(I_R,I_R,3),weights='imagenet', include_top=False,pooling=max)
model_vgg16_conv.summary()
#print("ss")
    #Create your own input format (here 3x200x200)
input = Input(shape=(I_R,I_R,3),name = 'image_input')
#print("ss2")
    #Use the generated model 
output_vgg16_conv = model_vgg16_conv(input)
print("ss3")
    #Add the fully-connected layers 
x = Flatten(name='flatten')(output_vgg16_conv)
x = Dense(512, activation='relu', name='fc1')(x)
x = Dense(128, activation='relu', name='fc2')(x)
x = Dense(10, activation='softmax', name='predictions')(x)

    #Create your own model 
my_model = Model(input=input, output=x)

    #In the summary, weights and layers from VGG part will be hidden, but they will be fit during the training
my_model.summary()



my_model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

# serialize model to JSON
model_json = my_model.to_json()
with open("./results/model_data_argumentation.json", "w") as json_file:
    json_file.write(model_json)


print(my_model.summary())
if not data_augmentation:
    print('Not using data augmentation.')
    # checkpoint
   # filepath="./results/weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5"
   # checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
   # callbacks_list = [checkpoint]
    # Fit the model
    #model.fit(X, Y, validation_split=0.33, epochs=150, batch_size=10, callbacks=callbacks_list, verbose=0)

    my_model.fit(X_train, Y_train,
              batch_size=batch_size,
              nb_epoch=nb_epoch,
              validation_data=(X_test, Y_test),
              shuffle=True,
              callbacks=[lr_reducer, early_stopper, csv_logger])
else:
    print('Using real-time data augmentation.')
    # This will do preprocessing and realtime data augmentation:
    datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=True,  # randomly flip images
        vertical_flip=False)  # randomly flip images

    # Compute quantities required for featurewise normalization
    # (std, mean, and principal components if ZCA whitening is applied).
    datagen.fit(X_train)

    # Fit the model on the batches generated by datagen.flow().
    my_model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size),
                        steps_per_epoch=X_train.shape[0] // batch_size,
                        validation_data=(X_test, Y_test),
                        epochs=nb_epoch, verbose=1, max_q_size=100,
                        callbacks=[lr_reducer, early_stopper, csv_logger])
my_model.save_weights("./results/vgg16_pretrained_upsample_model_data_argumentation.h5")
print("Saved model to disk")

@BogoK
Copy link

BogoK commented Nov 23, 2017

How do You get all the known classes of the pretrained VGG16 model?Thanks

@nabsabraham
Copy link

Hi everyone,
I am using the same model posted here by @JGuillaumin however the model does not seem to be learning. The model summary shows that all the weights are trainable so I don't know what I am doing wrong. Can someone point me in the right direction?

This is the output I am seeing
image

Model Summary:

Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, None, None, 3)     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, None, None, 128)   0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, None, None, 256)   295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, None, None, 256)   0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, None, None, 512)   1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, None, None, 512)   0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, None, None, 512)   0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________

This is the code I am using:

batch = 8
epochs = 20
#Get back the convolutional part of a VGG network trained on ImageNet
model_vgg16_conv = VGG16(weights='imagenet', include_top=False)
model_vgg16_conv.summary()

#Create your own input format (here 3x200x200)
input = Input(shape=(256,256,3),name = 'image_input')

#Use the generated model 
output_vgg16_conv = model_vgg16_conv(input)

#Add the fully-connected layers 
x = Flatten(name='flatten')(output_vgg16_conv)
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dropout(0.5)(x)
x = Dense(4096, activation='relu', name='fc2')(x)
x = Dense(1, activation='softmax', name='predictions')(x)

#Create your own model 
my_model = Model(inputs=input, outputs=x)

#In the summary, weights and layers from VGG part will be hidden, but they will be fit during the training
my_model.summary()
my_model.compile(loss = "binary_crossentropy", 
                    optimizer = SGD(lr=1e-5, momentum=0.9), 
                    metrics=["accuracy"])

history = my_model.fit(train_shuffled,labels_shuffled, batch_size=batch, epochs=epochs, 
                    verbose=1,validation_split=0.2, shuffle=True)```

@JGuillaumin
Copy link

JGuillaumin commented Jul 23, 2018

Hi,
Did you try higher learning rate (1e-4, 1e-3, 1e-2) ? Even with a pre-trained neural network, 1e-5 is very small.
You can test with 2 output neurons and 'categorical_crossentropy' as the loss.

You final layers are very big !!
I recommend starting from a smaller network like this:

model_vgg16_conv = VGG16(weights='imagenet', include_top=False)

input = Input(shape=(256,256,3),name = 'image_input')

output_vgg16_conv = model_vgg16_conv(input)
# shape [?, 8, 8, 512]

x = GlobalAveragePooling2D(output_vgg16_conv)
# shape [?, 512]

x = Dense(1, activation='softmax', name='predictions')
# shape [?, 1]

@JGuillaumin
Copy link

I think it does not train because you use softmax as the final activation function !!
For binary cross entropy you have to use sigmoid

@nabsabraham
Copy link

nabsabraham commented Jul 24, 2018

Hi Julien,
The learning rates did not change anything but you are right, changing it to sigmoid made a difference. It converges really quickly (2-3 epochs) but this might be a data problem as I'm using biomedical images.
Do you know why it can only work with sigmoid and not softmax? Any literature you can provide would be great!

@JGuillaumin
Copy link

Because softmax is not an element-wise activation function. Here is the formula for x a vector of dim K (sorry there is no MathJax rendering in GitHub issue .. )

$$softmax(x)_i = \frac{exp(x_i)}{\sum_j^K exp(x_j)}$$

When you have K=1 (your case), whatever your network outputs, the softmax will be 1 !

That's why your loss is constant !

@nabsabraham
Copy link

awesome thank you for this! 👍

@dgtlmoon
Copy link

dgtlmoon commented Nov 5, 2018

Sorry for the bump - but would retraining those layers as mentioned herein, improve the results for my domain specific feature comparison (aka: similar image finding) where i'm using a cosine difference of the feature (4096-element array, which is the activations of the last fully-connected layer fc2 in VGG16) to compare image "similarity" between an image set?

I'm comparing a lot of images which are mostly related, but I suspect I could get better results by training on my own huge dataset (250,000 images, can be divided into 6 categories easily)

ie

feat_extractor = Model(inputs=model.input, outputs=model.get_layer("fc2").output)

@gledsonmelotti
Copy link

@JGuillaumin How do I do to train a model that has images with 5 channels in input using pre trained VGG16?

@gledsonmelotti
Copy link

just a small correction it should be shape=(200, 200, 3) in
input = Input(shape=(3,200,200),name = 'image_input')

Otherwise, you will get this error:
ValueError: number of input channels does not match corresponding dimension of filter, 100 != 3

How do I do to train a model that has images with 5 channels in input using pre trained VGG16?

@xuzhengxiao
Copy link

just a small correction it should be shape=(200, 200, 3) in
input = Input(shape=(3,200,200),name = 'image_input')
Otherwise, you will get this error:
ValueError: number of input channels does not match corresponding dimension of filter, 100 != 3

How do I do to train a model that has images with 5 channels in input using pre trained VGG16?

The key lies in keras api load_weights parameter by_name.If by_name is TRUE, weights are loaded into layers only if they share the same name. This is useful for fine-tuning or transfer-learning models where some of the layers have changed.

This is my model with 4 channels,you can do like this.

input_shape=(256,256,4)
input_tensor = Input(input_shape)
base_model_i4 = DenseNet121(include_top=False, weights=None,input_tensor=input_tensor)
######### this trick !
base_model_i4.layers[2].name='new_conv1/conv'
weights_path= 'densenet121_weights_tf_dim_ordering_tf_kernels_notop.h5'
base_model_i4.load_weights(weights_path,by_name=True)

####### manually get the first conv layer weights (for 3 channels),and computer 4-th channel weight
f=h5py.File(weights_path)
weight=f['conv1']['conv']['conv1']['conv']['kernel:0'].value
weights=[]
weights.append(np.concatenate([weight, 0.5 * (weight[:, :, :1, :] + weight[:, :, 2:, :])], axis=2))
base_model_i4.layers[2].set_weights(weights)

x=base_model_i4.output
x = GlobalAveragePooling2D()(x)
x = BatchNormalization(axis=-1)(x)
x = Dense(1024, activation='relu')(x)
x = Dense(28)(x)
model = Model(inputs=base_model_i4.input, outputs=x)

@gledsonmelotti
Copy link

gledsonmelotti commented Dec 27, 2018

just a small correction it should be shape=(200, 200, 3) in
input = Input(shape=(3,200,200),name = 'image_input')
Otherwise, you will get this error:
ValueError: number of input channels does not match corresponding dimension of filter, 100 != 3

How do I do to train a model that has images with 5 channels in input using pre trained VGG16?

The key lies in keras api load_weights parameter by_name.If by_name is TRUE, weights are loaded into layers only if they share the same name. This is useful for fine-tuning or transfer-learning models where some of the layers have changed.

This is my model with 4 channels,you can do like this.

input_shape=(256,256,4)
input_tensor = Input(input_shape)
base_model_i4 = DenseNet121(include_top=False, weights=None,input_tensor=input_tensor)
######### this trick !
base_model_i4.layers[2].name='new_conv1/conv'
weights_path= 'densenet121_weights_tf_dim_ordering_tf_kernels_notop.h5'
base_model_i4.load_weights(weights_path,by_name=True)

####### manually get the first conv layer weights (for 3 channels),and computer 4-th channel weight
f=h5py.File(weights_path)
weight=f['conv1']['conv']['conv1']['conv']['kernel:0'].value
weights=[]
weights.append(np.concatenate([weight, 0.5 * (weight[:, :, :1, :] + weight[:, :, 2:, :])], axis=2))
base_model_i4.layers[2].set_weights(weights)

x=base_model_i4.output
x = GlobalAveragePooling2D()(x)
x = BatchNormalization(axis=-1)(x)
x = Dense(1024, activation='relu')(x)
x = Dense(28)(x)
model = Model(inputs=base_model_i4.input, outputs=x)

Hi @xuzhengxiao, how are you? I am new to the area of neural networks and python. I'm studying alone, so I have some doubts. Your idea seems to be very interesting. But I did not understand some commands. I could not understand why you used base_model_i4.layers [2] .name = 'new_conv1 / conv'. What does 'new_conv1 / conv' mean? How did you know it was layer [2] to replace the name? In the command f ['conv1'] ['conv'] ['conv1'] ['conv'] ['kernel: 0']. Value, how did you discover this conv and conv1 sequence? And what does ['kernel: 0'] mean?

I thank you for your attention,
Gledson.

@xuzhengxiao
Copy link

@gledsonmelotti
1. 'new_conv1/conv' is just a new layer name,you can also use other names.Just as I mentioned before,in keras you can change the layer name to decide which layer's weight doesn't be loaded.

2.which layer shoud be changed depends on your needs and pre trained model.In my case,I want to substitute input layer with 4-channel layer.Using model.summary() to check the pre trained model structure.(check this link :https://user-images.githubusercontent.com/9840937/50503586-e84f4680-0aa2-11e9-8648-a54cce1406e2.png). In pre trained model,input layer has 3 channels,so the in channel of first conv layer is 3,which doesn't meet my customed need(in channel 4).So model.layers[2] should be changed.

3.f['conv1']['conv']['conv1']['conv']['kernel:0'].value,actually you need check the h5 file to load the weights you needed.

@pappuyadav
Copy link

@nabsabraham
Try this modified section of code and see if it helps.

#Use the generated model
**x = model_vgg16_conv.get_layer('block5_conv3').output

#Add the fully-connected layers
x = Flatten(name='flatten')(x)**
x = Dense(4096, activation='relu', name='fc1')(x)
x = Dropout(0.5)(x)
x = Dense(4096, activation='relu', name='fc2')(x)
x = Dense(1, activation='softmax', name='predictions')(x)

@sumaira-hussain
Copy link

Hi I also try to create my model with VGG16 pretrained model but I am getting following error:
ValueError: Error when checking model target: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 2 array(s), but instead got the following list of 1 arrays: [array([[[1., 0., 0., ..., 0., 0., 0.],

the code files are attached for reference
VGG.docx
generator.docx

@rushabhpatil
Copy link

just a small correction it should be shape=(200, 200, 3) in
input = Input(shape=(3,200,200),name = 'image_input')

Otherwise, you will get this error:
ValueError: number of input channels does not match corresponding dimension of filter, 100 != 3

if my input is having "channels_first" what should be the changes??

@gledsonmelotti
Copy link

@gledsonmelotti

  1. 'new_conv1/conv' is just a new layer name,you can also use other names.Just as I mentioned before,in keras you can change the layer name to decide which layer's weight doesn't be loaded.

2.which layer shoud be changed depends on your needs and pre trained model.In my case,I want to substitute input layer with 4-channel layer.Using model.summary() to check the pre trained model structure.(check this link :https://user-images.githubusercontent.com/9840937/50503586-e84f4680-0aa2-11e9-8648-a54cce1406e2.png). In pre trained model,input layer has 3 channels,so the in channel of first conv layer is 3,which doesn't meet my customed need(in channel 4).So model.layers[2] should be changed.

3.f['conv1']['conv']['conv1']['conv']['kernel:0'].value,actually you need check the h5 file to load the weights you needed.

Tank you very much @xuzhengxiao. I really liked your explanation. Thank you very much.

@felipe-chamas
Copy link

felipe-chamas commented Jan 29, 2020

just a small correction it should be shape=(200, 200, 3) in
input = Input(shape=(3,200,200),name = 'image_input')
Otherwise, you will get this error:
ValueError: number of input channels does not match corresponding dimension of filter, 100 != 3

How do I do to train a model that has images with 5 channels in input using pre trained VGG16?

The key lies in keras api load_weights parameter by_name.If by_name is TRUE, weights are loaded into layers only if they share the same name. This is useful for fine-tuning or transfer-learning models where some of the layers have changed.

This is my model with 4 channels,you can do like this.

input_shape=(256,256,4)
input_tensor = Input(input_shape)
base_model_i4 = DenseNet121(include_top=False, weights=None,input_tensor=input_tensor)
######### this trick !
base_model_i4.layers[2].name='new_conv1/conv'
weights_path= 'densenet121_weights_tf_dim_ordering_tf_kernels_notop.h5'
base_model_i4.load_weights(weights_path,by_name=True)

####### manually get the first conv layer weights (for 3 channels),and computer 4-th channel weight
f=h5py.File(weights_path)
weight=f['conv1']['conv']['conv1']['conv']['kernel:0'].value
weights=[]
weights.append(np.concatenate([weight, 0.5 * (weight[:, :, :1, :] + weight[:, :, 2:, :])], axis=2))
base_model_i4.layers[2].set_weights(weights)

x=base_model_i4.output
x = GlobalAveragePooling2D()(x)
x = BatchNormalization(axis=-1)(x)
x = Dense(1024, activation='relu')(x)
x = Dense(28)(x)
model = Model(inputs=base_model_i4.input, outputs=x)

Worked very well. I have used the same model as @xuzhengxiao and had the same need of adding the 4th channel. But if anyone tries this, remember to set the layer name back in order to save/load the model afterwards without any problems.

model.layers[2]._name='conv1/conv'

@MEhtisham8217
Copy link

@DanqingZ help me out please
Use the same dataset (LFW ) ==> http://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz
Apply a pretrained convolutional neural network, (using VGG16), and replace the fully-connected layers with your own. Freeze the weights of the convolutional layers and only train the new FC layer.
Sample code for using pre-trained VGG16 for another classification task is available from:
#4465

@H-Ayoobi
Copy link

here is the output: Epoch 44/200 1562/1562 [==============================] - 358s - loss: 0.4885 - acc: 0.8489 - val_loss: 0.5697 - val_acc: 0.8346 Epoch 45/200 1562/1562 [==============================] - 357s - loss: 0.4232 - acc: 0.8672 - val_loss: 1.2105 - val_acc: 0.6898 Epoch 46/200 1562/1562 [==============================] - 354s - loss: 12.1607 - acc: 0.2163 - val_loss: 14.5063 - val_acc: 0.1000 Epoch 47/200 1562/1562 [==============================] - 353s - loss: 14.5035 - acc: 0.1002 - val_loss: 14.5063 - val_acc: 0.1000 Epoch 48/200 1562/1562 [==============================] - 354s - loss: 14.5048 - acc: 0.1001 - val_loss: 14.5063 - val_acc: 0.1000 Epoch 49/200 1562/1562 [==============================] - 355s - loss: 14.5029 - acc: 0.1002 - val_loss: 14.5063 - val_acc: 0.1000 Epoch 50/200 1562/1562 [==============================] - 352s - loss: 14.5025 - acc: 0.1002 - val_loss: 14.5063 - val_acc: 0.1000 Epoch 51/200 1562/1562 [==============================] - 353s - loss: 14.5154 - acc: 0.0994 - val_loss: 14.5063 - val_acc: 0.1000 Epoch 52/200 1562/1562 [==============================] - 354s - loss: 14.5035 - acc: 0.1002 - val_loss: 14.5063 - val_acc: 0.1000

here is the code:

"""
Adapted from keras example cifar10_cnn.py
Train ResNet-18 on the CIFAR10 small images dataset.

GPU run command with Theano backend (with TensorFlow, the GPU is automatically used):
    THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python cifar10.py
"""
from __future__ import print_function
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.utils import np_utils
from keras.callbacks import ReduceLROnPlateau, CSVLogger, EarlyStopping
from scipy.misc import toimage, imresize
import numpy as np
#import resnet
from keras.applications.vgg16 import VGG16
from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input
from keras.layers import Input, Flatten, Dense
from keras.models import Model
import numpy as np
from keras.callbacks import ModelCheckpoint


from keras import backend as K
#K.set_image_dim_ordering('th')
# fix random seed for reproducibility
seed = 7
np.random.seed(seed)


lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.1), cooldown=0, patience=5, min_lr=0.5e-6)
early_stopper = EarlyStopping(min_delta=0.001, patience=20)
csv_logger = CSVLogger('./results/vgg16imagenetpretrained_upsampleimage_cifar10_data_argumentation.csv')

batch_size = 32
nb_classes = 10
nb_epoch = 200
data_augmentation = True

# input image dimensions
img_rows, img_cols = 197, 197
I_R = 64
# The CIFAR10 images are RGB.
img_channels = 3

# The data, shuffled and split between train and test sets:
(X_train_original, y_train), (X_test_original, y_test) = cifar10.load_data()

# Convert class vectors to binary class matrices.
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

X_train_original = X_train_original.astype('float32')
X_test_original = X_test_original.astype('float32')


# upsample it to size 64X64X3


X_train = np.zeros((X_train_original.shape[0],I_R,I_R,3))
for i in range(X_train_original.shape[0]):
    X_train[i] = imresize(X_train_original[i], (I_R,I_R,3), interp='bilinear', mode=None)


X_test = np.zeros((X_test_original.shape[0],I_R,I_R,3))
for i in range(X_test_original.shape[0]):
    X_test[i] = imresize(X_test_original[i], (I_R,I_R,3), interp='bilinear', mode=None)


# subtract mean and normalize
mean_image = np.mean(X_train, axis=0)
X_train -= mean_image
X_test -= mean_image
X_train /= 128.
X_test /= 128.


print(X_train.shape)




#model = resnet.ResnetBuilder.build_resnet_18((img_channels, img_rows, img_cols), nb_classes)
#model =get_vgg_pretrained_model()


 #Get back the convolutional part of a VGG network trained on ImageNet
model_vgg16_conv = VGG16(input_shape=(I_R,I_R,3),weights='imagenet', include_top=False,pooling=max)
model_vgg16_conv.summary()
#print("ss")
    #Create your own input format (here 3x200x200)
input = Input(shape=(I_R,I_R,3),name = 'image_input')
#print("ss2")
    #Use the generated model 
output_vgg16_conv = model_vgg16_conv(input)
print("ss3")
    #Add the fully-connected layers 
x = Flatten(name='flatten')(output_vgg16_conv)
x = Dense(512, activation='relu', name='fc1')(x)
x = Dense(128, activation='relu', name='fc2')(x)
x = Dense(10, activation='softmax', name='predictions')(x)

    #Create your own model 
my_model = Model(input=input, output=x)

    #In the summary, weights and layers from VGG part will be hidden, but they will be fit during the training
my_model.summary()



my_model.compile(loss='categorical_crossentropy',
              optimizer='adam',
              metrics=['accuracy'])

# serialize model to JSON
model_json = my_model.to_json()
with open("./results/model_data_argumentation.json", "w") as json_file:
    json_file.write(model_json)


print(my_model.summary())
if not data_augmentation:
    print('Not using data augmentation.')
    # checkpoint
   # filepath="./results/weights-improvement-{epoch:02d}-{val_acc:.2f}.hdf5"
   # checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')
   # callbacks_list = [checkpoint]
    # Fit the model
    #model.fit(X, Y, validation_split=0.33, epochs=150, batch_size=10, callbacks=callbacks_list, verbose=0)

    my_model.fit(X_train, Y_train,
              batch_size=batch_size,
              nb_epoch=nb_epoch,
              validation_data=(X_test, Y_test),
              shuffle=True,
              callbacks=[lr_reducer, early_stopper, csv_logger])
else:
    print('Using real-time data augmentation.')
    # This will do preprocessing and realtime data augmentation:
    datagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # apply ZCA whitening
        rotation_range=0,  # randomly rotate images in the range (degrees, 0 to 180)
        width_shift_range=0.1,  # randomly shift images horizontally (fraction of total width)
        height_shift_range=0.1,  # randomly shift images vertically (fraction of total height)
        horizontal_flip=True,  # randomly flip images
        vertical_flip=False)  # randomly flip images

    # Compute quantities required for featurewise normalization
    # (std, mean, and principal components if ZCA whitening is applied).
    datagen.fit(X_train)

    # Fit the model on the batches generated by datagen.flow().
    my_model.fit_generator(datagen.flow(X_train, Y_train, batch_size=batch_size),
                        steps_per_epoch=X_train.shape[0] // batch_size,
                        validation_data=(X_test, Y_test),
                        epochs=nb_epoch, verbose=1, max_q_size=100,
                        callbacks=[lr_reducer, early_stopper, csv_logger])
my_model.save_weights("./results/vgg16_pretrained_upsample_model_data_argumentation.h5")
print("Saved model to disk")

I had the same issue with VGG16 and this problem can be solved using BatchNormalization layers between the dense layers.
For the ResNET backbone, I did not have the same issue but you can still use the BachNormalization layers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests