Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keras model always predicts same output class. #2975

Closed
deepanwayx opened this issue Jun 14, 2016 · 50 comments
Closed

Keras model always predicts same output class. #2975

deepanwayx opened this issue Jun 14, 2016 · 50 comments

Comments

@deepanwayx
Copy link

@deepanwayx deepanwayx commented Jun 14, 2016

Please make sure that the boxes below are checked before you submit your issue. Thank you!

  • Check that you are up-to-date with the master branch of Keras. You can update with:
    pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
  • If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
    pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
  • Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).
@deepanwayx deepanwayx changed the title Keras model always predicts single output class. Keras model always predicts same output class. Jun 14, 2016
@deepanwayx

This comment has been minimized.

Copy link
Author

@deepanwayx deepanwayx commented Jun 14, 2016

I have a binary classification problem where positive and negative classes are almost evenly distributed among the train and test examples. I get >80% validation and test accuracy when I use random forest or sgd or svm algorithms.

But keras model almost always predicts same class for all validation and test examples and the accuracy is stuck at ~50%. I have tried with a lot of different hidden layer sizes, activation functions,
loss functions and optimizers but it was of no help.

Here's my code, params1, params2, etc are weights I got from a stacked denoising autoencoder.

`np.random.seed(1)
model = Sequential()
model.add(Dense(input_dim=X_train.shape[1], output_dim=1000, weights=params1, activation='tanh'))
model.add(Dense(output_dim=600, weights=params2, activation='tanh'))
model.add(Dense(output_dim=300, weights=params3, activation='tanh'))
model.add(Dense(output_dim=100, weights=params4, activation='tanh'))
model.add(Dense(output_dim=10, weights=params5, activation='tanh'))
model.add(Dense(input_dim=10, output_dim=y_train_ohe.shape[1], init='uniform', activation='softmax'))

adagrad = keras.optimizers.Adagrad(lr=0.001, epsilon=1e-08)
model.compile(loss='binary_crossentropy', optimizer=adagrad)
model.fit(X_train, y_train_ohe, nb_epoch=10, batch_size=100, verbose=1, validation_split=0.5, show_accuracy=True)`

The output:

`Train on 12500 samples, validate on 12500 samples
Epoch 1/10
12500/12500 [==============================] - 15s - loss: 0.6939 - acc: 0.5046 - val_loss: 0.6948 - val_acc: 0.4948
Epoch 2/10
12500/12500 [==============================] - 12s - loss: 0.6724 - acc: 0.6647 - val_loss: 0.6956 - val_acc: 0.4975
Epoch 3/10
12500/12500 [==============================] - 14s - loss: 0.6490 - acc: 0.7389 - val_loss: 0.6970 - val_acc: 0.4982
Epoch 4/10
12500/12500 [==============================] - 16s - loss: 0.6247 - acc: 0.7881 - val_loss: 0.6986 - val_acc: 0.5014
Epoch 5/10
12500/12500 [==============================] - 15s - loss: 0.6011 - acc: 0.8234 - val_loss: 0.7003 - val_acc: 0.5038
Epoch 6/10
12500/12500 [==============================] - 16s - loss: 0.5788 - acc: 0.8457 - val_loss: 0.7022 - val_acc: 0.5030
Epoch 7/10
12500/12500 [==============================] - 13s - loss: 0.5581 - acc: 0.8634 - val_loss: 0.7043 - val_acc: 0.5010
Epoch 8/10
12500/12500 [==============================] - 15s - loss: 0.5386 - acc: 0.8755 - val_loss: 0.7063 - val_acc: 0.5010
Epoch 9/10
12500/12500 [==============================] - 17s - loss: 0.5203 - acc: 0.8862 - val_loss: 0.7080 - val_acc: 0.5022
Epoch 10/10
12500/12500 [==============================] - 18s - loss: 0.5036 - acc: 0.8962 - val_loss: 0.7104 - val_acc: 0.5026

`

@kgrm

This comment has been minimized.

Copy link

@kgrm kgrm commented Jun 14, 2016

Do you get the same result if you use random initialisation for all weights?

@deepanwayx

This comment has been minimized.

Copy link
Author

@deepanwayx deepanwayx commented Jun 14, 2016

More often than not with randomly initialized weights the model doesn't learn anything. The training loss stays at 0.7 and accuracy stays at ~50% even after numerous iterations.

@fchollet

This comment has been minimized.

Copy link
Collaborator

@fchollet fchollet commented Jun 14, 2016

The Keras issues section is not the right place to discuss all the architectures choices you got wrong in your NNs, so I will be closing this issue.

But consider this:

  • if you deep net is not working, then use less hidden layers, until it works.
  • don't use tanh as an activation. It's not the 90s anymore.

In general: simplify until it works.

@fchollet fchollet closed this Jun 14, 2016
@venuktan

This comment has been minimized.

Copy link

@venuktan venuktan commented Sep 24, 2016

@deepanwayx How did you get around this ? I am facing the same issue.
I have 561 classes and I get the same prediction for what ever input I give it.

@MaratAkhmatnurov

This comment has been minimized.

Copy link

@MaratAkhmatnurov MaratAkhmatnurov commented Nov 13, 2016

Same problem.

Cifar-10 dataset

Simpliest model:

model = Sequential() model.add(Dense(output_dim=512, input_dim=32*32*3, init='uniform')) model.add(Activation("relu")) model.add(Dense(output_dim=10)) model.compile(loss='categorical_crossentropy', optimizer=SGD(lr=0.01, momentum=0.9, nesterov=True)) model.fit(norm, y_ohe, nb_epoch=5, batch_size=32)

More complicated CNN also predicts one class

@ankitshah009

This comment has been minimized.

Copy link

@ankitshah009 ankitshah009 commented Feb 16, 2017

Same problem faced.
Any update is suggested for this problem?

@ruzrobert

This comment has been minimized.

Copy link

@ruzrobert ruzrobert commented Apr 2, 2017

Um, same.

@aalesar

This comment has been minimized.

Copy link

@aalesar aalesar commented Apr 3, 2017

Same.

@stalagmite7

This comment has been minimized.

Copy link

@stalagmite7 stalagmite7 commented May 4, 2017

Hi, I have the same problem, trying to train 560 classes, but everytime I get accuracy of 50%. However, this happens only when I use a batch generator to train. When I load a subset of the whole data into memory, it works fine, gives reasonable values for accuracy metrics, the same data fails to produce the same accuracy when loaded via generator.
Any clues from anyone here who has figured it out?

When I try loading the whole data into memory, my logs look something like this:

4000/4000 [==============================] - 12s - loss: 0.2583 - acc: 0.4953 - val_loss: 0.2573 - val_acc: 0.5000
Epoch 2/20
4000/4000 [==============================] - 10s - loss: 0.2527 - acc: 0.5030 - val_loss: 0.2546 - val_acc: 0.4990
Epoch 3/20
4000/4000 [==============================] - 10s - loss: 0.2517 - acc: 0.4848 - val_loss: 0.2555 - val_acc: 0.5000
Epoch 4/20
4000/4000 [==============================] - 10s - loss: 0.2522 - acc: 0.5002 - val_loss: 0.2508 - val_acc: 0.5000
Epoch 5/20
4000/4000 [==============================] - 10s - loss: 0.2521 - acc: 0.5125 - val_loss: 0.2505 - val_acc: 0.4860
Epoch 6/20
4000/4000 [==============================] - 10s - loss: 0.2506 - acc: 0.4805 - val_loss: 0.2511 - val_acc: 0.4960
Epoch 7/20
4000/4000 [==============================] - 11s - loss: 0.2506 - acc: 0.4798 - val_loss: 0.2513 - val_acc: 0.4880
Epoch 8/20
4000/4000 [==============================] - 9s - loss: 0.2500 - acc: 0.4765 - val_loss: 0.2549 - val_acc: 0.5030
Epoch 9/20
4000/4000 [==============================] - 11s - loss: 0.2510 - acc: 0.4875 - val_loss: 0.2531 - val_acc: 0.4950
Epoch 10/20
4000/4000 [==============================] - 10s - loss: 0.2506 - acc: 0.4825 - val_loss: 0.2514 - val_acc: 0.4970
Epoch 11/20
4000/4000 [==============================] - 10s - loss: 0.2503 - acc: 0.4730 - val_loss: 0.2510 - val_acc: 0.5070
Epoch 12/20
4000/4000 [==============================] - 10s - loss: 0.2502 - acc: 0.4792 - val_loss: 0.2549 - val_acc: 0.4980
Epoch 13/20
4000/4000 [==============================] - 9s - loss: 0.2505 - acc: 0.4810 - val_loss: 0.2516 - val_acc: 0.5040
Epoch 14/20
4000/4000 [==============================] - 11s - loss: 0.2502 - acc: 0.4860 - val_loss: 0.2553 - val_acc: 0.4920
Epoch 15/20
4000/4000 [==============================] - 9s - loss: 0.2498 - acc: 0.4783 - val_loss: 0.2547 - val_acc: 0.4860
Epoch 16/20
4000/4000 [==============================] - 9s - loss: 0.2497 - acc: 0.4780 - val_loss: 0.2523 - val_acc: 0.4900
Epoch 17/20
4000/4000 [==============================] - 11s - loss: 0.2497 - acc: 0.4770 - val_loss: 0.2519 - val_acc: 0.5100
Epoch 18/20
4000/4000 [==============================] - 10s - loss: 0.2489 - acc: 0.4653 - val_loss: 0.2506 - val_acc: 0.5040
Epoch 19/20
4000/4000 [==============================] - 10s - loss: 0.2460 - acc: 0.4477 - val_loss: 0.2440 - val_acc: 0.4490
Epoch 20/20
4000/4000 [==============================] - 10s - loss: 0.2307 - acc: 0.4103 - val_loss: 0.2448 - val_acc: 0.4700
* Accuracy on training set: 73.88%
* Accuracy on test set: 64.42%

However, when I try loading data batchwise (batch size is 4 for both implementations), my looks look like this, and I don't understand why the overall accuracy suffers because the loss looks better in this case than the previous:

	375/375 [==============================] - 26s - loss: 0.1299 - val_loss: 0.1359
	Epoch 2/30
	375/375 [==============================] - 23s - loss: 0.1179 - val_loss: 0.1324
	Epoch 3/30
	375/375 [==============================] - 22s - loss: 0.1225 - val_loss: 0.1238
	Epoch 4/30
	375/375 [==============================] - 23s - loss: 0.1266 - val_loss: 0.1247
	Epoch 5/30
	375/375 [==============================] - 21s - loss: 0.1242 - val_loss: 0.1275
	Epoch 6/30
	375/375 [==============================] - 23s - loss: 0.1132 - val_loss: 0.1421
	Epoch 7/30
	375/375 [==============================] - 24s - loss: 0.1221 - val_loss: 0.1288
	Epoch 8/30
	375/375 [==============================] - 24s - loss: 0.1200 - val_loss: 0.1271
	Epoch 9/30
	375/375 [==============================] - 22s - loss: 0.1193 - val_loss: 0.1262
	Epoch 10/30
	375/375 [==============================] - 25s - loss: 0.1172 - val_loss: 0.1500
	Epoch 11/30
	375/375 [==============================] - 21s - loss: 0.1240 - val_loss: 0.1284
	Epoch 12/30
	375/375 [==============================] - 23s - loss: 0.1207 - val_loss: 0.1273
	Epoch 13/30
	375/375 [==============================] - 25s - loss: 0.1234 - val_loss: 0.1273
	Epoch 14/30
	375/375 [==============================] - 24s - loss: 0.1196 - val_loss: 0.1368
	Epoch 15/30
	375/375 [==============================] - 23s - loss: 0.1264 - val_loss: 0.1275
	Epoch 16/30
	375/375 [==============================] - 23s - loss: 0.1156 - val_loss: 0.1289
	Epoch 17/30
	375/375 [==============================] - 21s - loss: 0.1138 - val_loss: 0.1424
	Epoch 18/30
	375/375 [==============================] - 25s - loss: 0.1129 - val_loss: 0.1289
	Epoch 19/30
	375/375 [==============================] - 23s - loss: 0.1198 - val_loss: 0.1317
	Epoch 20/30
	375/375 [==============================] - 23s - loss: 0.1177 - val_loss: 0.1310
	Epoch 21/30
	375/375 [==============================] - 23s - loss: 0.1175 - val_loss: 0.1299
	Epoch 22/30
	375/375 [==============================] - 23s - loss: 0.1170 - val_loss: 0.1358
	Epoch 23/30
	375/375 [==============================] - 24s - loss: 0.1244 - val_loss: 0.1269
	Epoch 24/30
	375/375 [==============================] - 22s - loss: 0.1095 - val_loss: 0.1311
	Epoch 25/30
	375/375 [==============================] - 21s - loss: 0.1178 - val_loss: 0.1231
	Epoch 26/30
	375/375 [==============================] - 22s - loss: 0.1182 - val_loss: 0.1290
	Epoch 27/30
	375/375 [==============================] - 17s - loss: 0.1254 - val_loss: 0.1254
	Epoch 28/30
	375/375 [==============================] - 26s - loss: 0.1087 - val_loss: 0.1320
	Epoch 29/30
	375/375 [==============================] - 21s - loss: 0.1021 - val_loss: 0.1221
	Epoch 30/30
	375/375 [==============================] - 22s - loss: 0.1143 - val_loss: 0.1630
	
	
	* Accuracy on training set: 50.82%
	* Accuracy on test set: 49.52%

Reagrdless of how many runs I run this for, my accuracy is always stuck at this level.

Some more info:

This is what my keras generator looks like:
tr_steps = args.num_classes * args.num_train_pairs / args.batchsize
val_steps = args.num_classes * args.num_test_pairs / args.batchsize

model.fit_generator( \
	generator= data_gen(args.base_path, args.name_list,\
 	args.batchsize, args.num_train_pairs, args.num_test_pairs, args.num_classes, 1), \
 	steps_per_epoch=tr_steps, \
 	epochs=args.epochs, \
 	validation_data=data_gen(args.base_path, args.name_list, \
 	args.batchsize, args.num_train_pairs, args.num_test_pairs, args.num_classes, 2), \
 	validation_steps=val_steps, \
 	verbose=1, \
 	callbacks=[checkpointer])

I also normalized the predictions after the final layer of the model into a unit circle radius, to better adapt to the margin condition of the contrastive loss. This is my architecture:

model = Sequential()
model.add(Conv2D(16, (5, 5), input_shape=(64, 64, 3), activation='relu'))
model.add(MaxPooling2D((2,2), strides=(2,2)))

model.add(Conv2D(64, (5, 5), activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))

model.add(Conv2D(128, (5, 5), activation='relu'))
model.add(MaxPooling2D((2, 2), strides=(2, 2)))

model.add(Flatten())
model.add(Dense(640, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(128, activation='relu'))

I have tried different initializations for the model as well, to no avail. I noticed that this is not the first time somebody is stuck at 50% accuracy with a Keras model ( this stackoverflow post and #1597), could this be a more fundamental Keras problem? Or is there something wrong with my understanding?
Any help would be much appreciated, thanks!

PS: Also planning to add this question to stackoverflow.

@naomifridman

This comment has been minimized.

Copy link

@naomifridman naomifridman commented May 6, 2017

Same. I think its stuck on local minimum.
I have 5 classes evenly distributed,
acc stuck around 0.2 and predict one class for all test.
Any ideas ?

@ruzrobert

This comment has been minimized.

Copy link

@ruzrobert ruzrobert commented May 7, 2017

@naomifridman Well, it really stucked on local minimum.
In the RNN case you can try to save states (stateful RNN) between batches, and also try most possible hyperparameters. This can improve accuraccy (by my experience), and it can start learning, but maybe dataset is just very hard.

@lynnwong11

This comment has been minimized.

Copy link

@lynnwong11 lynnwong11 commented Sep 3, 2017

it almost drive me crazy after having same output possibility. binary classification and same ouput all the time. Also, result stop changing after episode 1. @fchollet i think @stalagmite7 maybe true that would it be a fundamental issue in keras? The answer you give us simply cannot answer our question. After using the most simply model architecture and changing different input with same input architecture, nothing change. Can you please help us figure out this confusing issue? @fchollet

@ruzrobert

This comment has been minimized.

Copy link

@ruzrobert ruzrobert commented Sep 3, 2017

@lynnwong11 I think problem is only in input data - it can't be learned - too hard for model.

@lynnwong11

This comment has been minimized.

Copy link

@lynnwong11 lynnwong11 commented Sep 14, 2017

@ruzrobert thank you so much. i have figured out the problem. it turns out that softmax and resprop shouldn't be used together.

@ruzrobert

This comment has been minimized.

Copy link

@ruzrobert ruzrobert commented Sep 14, 2017

@lynnwong11 Do you mean RMPProp? Well, they should work together actually, but it all depends, as always, on input data. What optimizer are you using now?

@lynnwong11

This comment has been minimized.

Copy link

@lynnwong11 lynnwong11 commented Sep 25, 2017

@ruzrobert yes~ i am using sigmoid with adam now, and the result is acceptable.(sorry to reply late)

@ruzrobert

This comment has been minimized.

Copy link

@ruzrobert ruzrobert commented Sep 25, 2017

@lynnwong11 Adam is always the winner :)
If we are talking about binary classification, it is better to make two separate classes - 0 for false, 1 for true, so there will be 2 output values. This will allow us to use Softmax.

@ahundt

This comment has been minimized.

Copy link
Contributor

@ahundt ahundt commented Jan 12, 2018

Adam is always the winner :)

@ruzrobert I'm not sure that's so easy to say. See The Marginal Value of Adaptive Gradient Methods in Machine Learning:

Adaptive optimization methods, which perform local optimization with a metric constructed from the history of iterates, are becoming increasingly popular for training deep neural networks. Examples include AdaGrad, RMSProp, and Adam. We show that for simple overparameterized problems, adaptive methods often find drastically different solutions than gradient descent (GD) or stochastic gradient descent (SGD). We construct an illustrative binary classification problem where the data is linearly separable, GD and SGD achieve zero test error, and AdaGrad, Adam, and RMSProp attain test errors arbitrarily close to half. We additionally study the empirical generalization capability of adaptive methods on several state-of-the-art deep learning models. We observe that the solutions found by adaptive methods generalize worse (often significantly worse) than SGD, even when these solutions have better training performance. These results suggest that practitioners should reconsider the use of adaptive methods to train neural networks.

@ruzrobert

This comment has been minimized.

Copy link

@ruzrobert ruzrobert commented Jan 12, 2018

@ahundt
That is interesting paper, thank you for mentioning it!
Well, in the most cases, Adam could be a winner, compaing to AdaGrad used by TopicStarter for example.
But in simple problems often we can try to use only one hidden layer, and also try simple optimizers like GD/SGD, that is definitelly true.

My word "always" is not really correct, I agree.

@wasifmasood

This comment has been minimized.

Copy link

@wasifmasood wasifmasood commented Mar 17, 2018

I am having the same problem: In my dataset, I have two classes but it classify everything in one.

model = Sequential()
model.add(Dense(20, input_dim=31, activation='relu', kernel_initializer='glorot_uniform'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

Fit the model

model.fit(X_train, y_train, epochs=3, batch_size=10, verbose=1, class_weight=class_weight)

@soumyapatro

This comment has been minimized.

Copy link

@soumyapatro soumyapatro commented May 18, 2018

Using sgd optimizer instead of Adam with clipvalue = 0.5 improves the accuracy each time. Using just Adam seems to suffer from vanishing gradients. Also used Batchnorm after each conv layer.

@AndreaGndv

This comment has been minimized.

Copy link

@AndreaGndv AndreaGndv commented May 31, 2018

Hello there,
I had a similar issue with my model when using Adam or RMSprop. Combination of relu activation function, random_normal kernel initializer and SGD optimizer seems to have solved the problem for me. Good luck!

@ruzrobert

This comment has been minimized.

Copy link

@ruzrobert ruzrobert commented May 31, 2018

@soumyapatro @AndreaVonBB
Hello there! I am using Adam for now, and yes, it can be not so effective everywhere.
So, as I see, you both used SGD instead of Adam or RMSprop.
Could you tell more about your models? What layers? How many units in them? This will allow others (and me) understand your case more.

For example, mine is next: Right now I am using Adam Optimizer, with learning rate = 0.003. Inputs are: 96 parameters, normalized to 0-1, values itself are 0.0..1.0 double values. Layers: Input (96) > Dropout (0.8 keep) > LSTM (30 seq, 100 param) > Dropout (0.8 keep) > Dense (2) (lstm -> output categories). Results are pretty bad, but that is my case.

How I understood, @soumyapatro has convolution layers, but @AndreaVonBB ? Dense layers (fully connected) ?

@AndreaGndv

This comment has been minimized.

Copy link

@AndreaGndv AndreaGndv commented May 31, 2018

Sorry, I should have included more information about my model, yes, I have dense layers only. I am currently playing around with node numbers, hidden layer numbers, learn rates, dropout and all that as I am looking for setting that give me best predictor for my dataset. I am getting the best results when number of nodes is equal to the number of features from the dataset, with learn rate = 0.01, dropout=0.1, and three hidden layers. But the best hyperparameter settings for your dataset will definitely differ from mine :)

@ruzrobert

This comment has been minimized.

Copy link

@ruzrobert ruzrobert commented May 31, 2018

@AndreaVonBB > But the best hyperparameter settings for your dataset will definitely differ from mine :)

I know that, of course, but it is interesting, anyway. Thank you for response!

@soumyapatro

This comment has been minimized.

Copy link

@soumyapatro soumyapatro commented Jun 4, 2018

@ruzrobert I built an AlexNet (so I had a mix of conv and FC layers) and the layer dimensions were the same as the original AlexNet except for the last layer where I only had 2 classes and input was a 256*256 image. I had a learning rate 0.01 for sgd, Dropout 0.5 in the last 2 dense layers of AlexNet.

@gdrubich

This comment has been minimized.

Copy link

@gdrubich gdrubich commented Jun 26, 2018

Try using StandardScaler. In my own experience, this happens because the data is not normalized, so for one thing or another the predicted values are always the same, the demonstration of this would be of some extense. You're probably not doing this in the extract inputs function you have defined.
I had this issue yesterday and solved it applying standard scaler to every set of inputs I used in the model. Hope this works for everyone. If someone's interested, I can post some code with more info about this problem.

@jupiters1117

This comment has been minimized.

Copy link

@jupiters1117 jupiters1117 commented Aug 31, 2018

@gdrubich I faced the same issue. The suggestion of standardize input dataset works for me!

@pranav-vempati

This comment has been minimized.

Copy link

@pranav-vempati pranav-vempati commented Sep 13, 2018

The other, unpalatable possibility for anyone else facing this issue is that your model may just be fitting on noise, so you might need to consider extracting more meaningful features. None of the above tips are likely to help if this is the case.

@ruzrobert

This comment has been minimized.

Copy link

@ruzrobert ruzrobert commented Sep 14, 2018

@pranav-vempati That's true, despite the fact that we were not only talking about problems in the model in the above. There are always only two possible problems probably - the model, or the data :)

@yanaxu333

This comment has been minimized.

Copy link

@yanaxu333 yanaxu333 commented Oct 2, 2018

In my case, the reason why I get the same output class is that I did set epoch=1. when change epoch to 10 or more, it works

@ruzrobert

This comment has been minimized.

Copy link

@ruzrobert ruzrobert commented Oct 2, 2018

@yanaxu333
Of course - one epoch, I think, can be enough only if dataset is large and not very different. In the other case, number of epoches can be quite large. I would like to not tell the exact number, but in my current task I need 20-200 epoches, to get some interesting results.

@adityaradk

This comment has been minimized.

Copy link

@adityaradk adityaradk commented Nov 8, 2018

I was facing a nearly identical problem. I fixed it by using a binary_crossentropy loss and I used the sigmoid function for hidden layers instead of ReLU. I also used the Adamax optimizer. I got 100% accuracy on my test dataset.

robertvunabandi added a commit to robertvunabandi/mit-smart-confessions-data that referenced this issue Nov 27, 2018
@juebrauer

This comment has been minimized.

Copy link

@juebrauer juebrauer commented Dec 14, 2018

I had the same problem: Tried a very simple CNN (CONV-POOL-CONV-POOL-FLATTEN-5 OUTPUT NEURONS) for classifying images read in with OpenCV's imread() method. After training the output was always the same.

Solution: image rescaling via image = image * (1.0/255.0)

--> Now loss went down and accuracy went up.

@Soufiane-Fartit

This comment has been minimized.

Copy link

@Soufiane-Fartit Soufiane-Fartit commented Mar 12, 2019

Using sgd optimizer instead of Adam with clipvalue = 0.5 improves the accuracy each time. Using just Adam seems to suffer from vanishing gradients. Also used Batchnorm after each conv layer.

Thank you, I was stuck in 50%, I did that and it jumped to 91% in 5 epochs.

@ruzrobert

This comment has been minimized.

Copy link

@ruzrobert ruzrobert commented Mar 12, 2019

@Soufiane-Fartit
Could you clarify several moments? You used SGD + Clipvalue(0.5) or just SGD? Also, was it the only change (the optimizer), or there were also other changes in the model? Thank you!

@Soufiane-Fartit

This comment has been minimized.

Copy link

@Soufiane-Fartit Soufiane-Fartit commented Mar 12, 2019

@ruzrobert

I used SGD with clipvalue = 0.5 and I added batchnormalization layers between the convolutional layers and their activation functions.

@sergodeeva

This comment has been minimized.

Copy link

@sergodeeva sergodeeva commented Mar 16, 2019

Thank you @soumyapatro , changing from adam to sgd with clipvalue = 0.5 worked for me!

My final code (for those who experiencing the same problem):

model = VGG16(weights="imagenet", include_top=False, input_shape=(img_size, img_size, 3))
for layer in model.layers[:-5]:
    layer.trainable = False

top_layers = model.output
top_layers = Flatten(input_shape=model.output_shape[1:])(top_layers)
top_layers = Dense(4096, activation='relu', kernel_initializer='random_normal')(top_layers)
top_layers = Dense(1, activation='sigmoid')(top_layers)

model_final = Model(input=model.input, output=top_layers)

sgd = optimizers.SGD(lr=0.01, clipvalue=0.5)
model_final.compile(loss='binary_crossentropy', optimizer=sgd, metrics=['accuracy'])
@bruceweir

This comment has been minimized.

Copy link

@bruceweir bruceweir commented Apr 12, 2019

I have found that this can be a problem with perfectly balanced classes - I guess there must be a local loss minimum where the model just predicts a single class in that situation.

Slightly unbalancing the data by added a few examples of one of the classes seems to fix it.

@ASRodrigo1

This comment has been minimized.

Copy link

@ASRodrigo1 ASRodrigo1 commented Jul 24, 2019

Well, after 1 month looking for a solution, i tried everything:
lowering the learning rate, changing the optimizer, using a bigger dataset, increasing and decreasing model complexity, changing the input shape to smaller and larger images, changin the imports from from keras import to from tensorflow.keras import and further to from tensorflow.python.keras import, changing the activation function of every layer, combining them, trying others datasets, etc... Nothing helped. Even if i used a net like VGG16/19 my results would be the same.
But, yesterday, i was reading a book (Deep Learning with Keras - Antonio Gulli, Sujit Pal) and i realized that the autors use imports like this:

from keras.layers.core import Dense, Flatten, Dropout

and not like this:

from keras.layers import Dense, Flatten, Dropout

the same for Conv, i was using:

from keras.layers import Conv2D, MaxPooling2D, SeparableConv2D

and the autors use:

from keras.layers.convolutional import Conv2D, MaxPooling2D, SeparableConv2D

And when i changed the imports everything started to work finally!
I don't know if this is a bug or something like this, cause now my models always works even in datasets that it was predicting the same class.
So now i'm using the imports like this, for example:

from keras.layers.core import Dense, Dropout, Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D, SeparableConv2D

Try this, if doesn't work, look if your dataset is balanced, for example if you problem is to classify images of cats and dogs and you have 200 cat images and 800 dog images, try to use a number of images not so different cause it can cause problem: your model can 'think' well if i say that 10/10 images are dogs so i get 80% accuracy, but this is not what we want. You can use class_weight if you don't have more images to balance you dataset and everything will be ok, you can as well use data-augmentation. You can use callbacks too, like ReduceLROnPlateau which reduce your learning rate when your loss is not getting lower. You can increase your batch_size, don't forget to shuffle the data on your ImageDataGenerator and normalize your images, like this:

g2 = ImageDataGenerator(rescale=1./255)

All these things are very important, but nothing really helped me. The only thing that worked was importing keras.layers.core and keras.layers.convolutional, may this help you too!

@ruzrobert

This comment has been minimized.

Copy link

@ruzrobert ruzrobert commented Jul 24, 2019

@ASRodrigo1 Funny story :D.
Btw, you can also use Weighted Loss to calculate loss with class weights, to fight imbalance, and also there is weighted accuracy with class weights (don't remember if it was in scikit learn or keras), but the idea was, that you will see not 80% accuracy, but 50% for example, because class weights are applied.
In this case, you may not need to balance your images, and can use them as they are. Just one of the solutions.

@michaels011235

This comment has been minimized.

Copy link

@michaels011235 michaels011235 commented Jul 28, 2019

@ruzrobert could you give an example of using 'weighted loss' and 'weighted accuracy'?

@ruzrobert

This comment has been minimized.

Copy link

@ruzrobert ruzrobert commented Jul 28, 2019

@michaels011235
Sure! But keep in mind, that I have not used this code for a long time, and it can be not accurate, but here it is:

This is one of the ways how 'weighted loss' is realised. As I remember, here class weights are calculated for each batch every time.
def cost_fn_weighted_multiclass (target_network, is_training): self.use_sample_weights = True losses = tf.nn.softmax_cross_entropy_with_logits(labels=self.targets_ph, logits=target_network.outputs, name='cost_CE_weighted') if is_training: return tf.reduce_mean(losses * self.sample_weights_ph) else: return tf.reduce_mean(losses)

So when you are feeding data to the session, you need to pass the weights:
from sklearn.utils.class_weight import compute_sample_weight sample_weights = compute_sample_weight(class_weight='balanced', y=batch_y) feed_dict = {**feed_dict, **{self.sample_weights_ph: sample_weights}}

There also exist other variants, for example static class weights for the whole dataset and etc. I think you can easily find them in the internet.

Weighted accuracy can be calculated like this probably:
sample_weight = compute_sample_weight(class_weight='balanced', y=dt.targets_argmax, indices=None) acc_normal = round(skmetrics.accuracy_score(y_true=dt.targets_argmax, y_pred=dt.outputs_argmax), 4) acc_weighted = round(skmetrics.accuracy_score(y_true=dt.targets_argmax, y_pred=dt.outputs_argmax, sample_weight=sample_weight), 4)

@ASRodrigo1

This comment has been minimized.

Copy link

@ASRodrigo1 ASRodrigo1 commented Aug 11, 2019

A few weeks later, i got stuck on the same problem:
My acc couldn't get higher than 50% and my val_acc was always 60.17%, I got loss stuck at 0.6931 and even if i improve the model complexity or reduce it, the problem continues.
So, after 3 days i just take off all of my dropout layers and my model got overfitting wich is great! Cause now i can solve overfitting more easily than this other problem that i don't know the name.
So if you got yourself on a similar situation, you can try to remove all of your dropout layers and see if you will have some overfitting problem, wich is more easy to solve.

@ASRodrigo1

This comment has been minimized.

Copy link

@ASRodrigo1 ASRodrigo1 commented Aug 11, 2019

If you found any solution, please drop your contribution, i bet it will help someone.

@ruzrobert

This comment has been minimized.

Copy link

@ruzrobert ruzrobert commented Aug 11, 2019

@ASRodrigo1
It is easy to get overfitting, but really hard to get off, if your data is bad.
Data, that I have used, was very bad, without any correlation. Finally, after a lot of different attemps, I gave up on neural networks, and switched to simple statistics and patterns.. Working in Financial field.
So, you will get only random NN decisions, if your data is bad. With or without overfitting. Sometimes model is bad. But data is crucial. If you can easily get overfitting, that only means, that your model can remember things. But can it understand them?

@ASRodrigo1

This comment has been minimized.

Copy link

@ASRodrigo1 ASRodrigo1 commented Aug 12, 2019

Oh. I understand you, it's true.
I'm working with some mammography images, it's kinda correlated, i think that this time i can get off this overfitting problem cause my images are not so bad, but it will be hard as you mentioned. Let's hope i succeed 😁

@logar16

This comment has been minimized.

Copy link

@logar16 logar16 commented Sep 20, 2019

I had the problem of all outputs being identical regardless of input, and the biggest thing that helped me was setting the learning rate in range [0.001, 0.01] and running more epochs (20 started to show decent results, but the more the merrier). From there I could do a random search of the hyper-parameter space.

@sushanttripathy

This comment has been minimized.

Copy link

@sushanttripathy sushanttripathy commented Jan 4, 2020

@soumyapatro Just wondering what was the learning rate you used with clipvalue=0.5. I am trying to train a DNN using BatchNorm and elu activation. I set : momentum=0.9, nesterov=True, however the convergence seems delayed. Thanks in advance for replying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.