Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

acc and val_acc don't change? #1597

Closed
talentlei opened this issue Jan 30, 2016 · 58 comments
Closed

acc and val_acc don't change? #1597

talentlei opened this issue Jan 30, 2016 · 58 comments

Comments

@talentlei
Copy link

I use LSTM to do a sequence labeling task, but I got the same acc and cal_acc for each epoch.
here is my code:

def moduleRnn(self):
model = Sequential()
model.add(LSTM(output_dim=64,input_length=self.seq_len,batch_input_shape=(16,1,200),input_dim=self.embed_length,return_sequences=True,stateful=False ))
#model.add(LSTM(output_dim=16,return_sequences=True,stateful=False ))
model.add(Dropout(0.2))
model.add(TimeDistributedDense(output_dim=self.labs_len))
model.add(Activation('softmax'))
model.compile(loss="categorical_crossentropy" , optimizer='rmsprop' , class_mode='categorical')
#model.fit(self.train,self.train_lab,batch_size=16,nb_epoch=3,verbose=1, validation_split=0.1,show_accuracy=True)
model.fit(self.X_train,self.Y_train,batch_size=16,nb_epoch=15,verbose=1,show_accuracy=True,validation_split=0.2)
score = model.evaluate(self.X_test,self.Y_test,batch_size=16)
print score

Anyone meets the same problem? please help me

@ymcui
Copy link

ymcui commented Jan 30, 2016

Do you mean training accuracy and validation accuracy doesn't change in training procedure?
You'd better post your logs

@talentlei
Copy link
Author

@ymcui yes,it is.

Epoch 1/15
18272/18272 [==============================] - 118s - loss: 0.0479 - acc: 0.4296 - val_loss: 0.0285 - val_acc: 0.4286
Epoch 2/15
18272/18272 [==============================] - 114s - loss: 0.0322 - acc: 0.4297 - val_loss: 0.0282 - val_acc: 0.4286
Epoch 3/15
18272/18272 [==============================] - 113s - loss: 0.0319 - acc: 0.4297 - val_loss: 0.0281 - val_acc: 0.4286
Epoch 4/15
18272/18272 [==============================] - 114s - loss: 0.0317 - acc: 0.4297 - val_loss: 0.0283 - val_acc: 0.4286
Epoch 5/15
18272/18272 [==============================] - 120s - loss: 0.0316 - acc: 0.4297 - val_loss: 0.0281 - val_acc: 0.4286
Epoch 6/15
18272/18272 [==============================] - 117s - loss: 0.0314 - acc: 0.4297 - val_loss: 0.0281 - val_acc: 0.4286
Epoch 7/15
18272/18272 [==============================] - 115s - loss: 0.0314 - acc: 0.4297 - val_loss: 0.0280 - val_acc: 0.4286
Epoch 8/15
18272/18272 [==============================] - 119s - loss: 0.0314 - acc: 0.4297 - val_loss: 0.0280 - val_acc: 0.4286
Epoch 9/15
18272/18272 [==============================] - 116s - loss: 0.0312 - acc: 0.4297 - val_loss: 0.0280 - val_acc: 0.4286
Epoch 10/15
18272/18272 [==============================] - 116s - loss: 0.0314 - acc: 0.4297 - val_loss: 0.0280 - val_acc: 0.4286
Epoch 11/15
18272/18272 [==============================] - 115s - loss: 0.0313 - acc: 0.4297 - val_loss: 0.0280 - val_acc: 0.4286
Epoch 12/15
18272/18272 [==============================] - 113s - loss: 0.0312 - acc: 0.4297 - val_loss: 0.0280 - val_acc: 0.4286
Epoch 13/15
18272/18272 [==============================] - 113s - loss: 0.0312 - acc: 0.4297 - val_loss: 0.0279 - val_acc: 0.4286
Epoch 14/15
18272/18272 [==============================] - 113s - loss: 0.0312 - acc: 0.4297 - val_loss: 0.0280 - val_acc: 0.4286
Epoch 15/15
18272/18272 [==============================] - 114s - loss: 0.0312 - acc: 0.4297 - val_loss: 0.0280 - val_acc: 0.4286

how does this come out?

@ymcui
Copy link

ymcui commented Feb 1, 2016

@talentlei
In your log, the loss seems to start from a very low value, and converge very soon after a few epochs.
I've no particular idea about this, but I think you should check validity of your data. (and maybe remove batch_input_shape attribute in your LSTM layer, i guess.)

@lqj1990
Copy link

lqj1990 commented May 10, 2016

@talentlei Have solved the problem? I stuck in the same situation when I use RNN, but I don't know how to solve it.

@DSA101
Copy link

DSA101 commented Jun 10, 2016

I have a similar problem. In my case when I attempt LSTM time series classification often val_acc starts with a high value and stays the same, even though loss, val_loss and acc change. I've narrowed down the issue to not enough training sequences (around 300). When I increased the number to 500+, it started to converge better, but still there are periods when loss, acc and val_loss changes, but val_acc sticks to the same value. How could that be? Is there a bug when it's not updating (even though loss, acc and val_loss update during the same epoch)?

model = Sequential()
model.add(LSTM(256, input_shape=(6, 10)))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
hist = model.fit(X_train_mat, Y_train_mat, nb_epoch=10000, batch_size=30, validation_split=0.1)

Epoch 2816/10000
50/472 [==>...........................] - ETA: 0s - loss: 0.6281 - acc: 0.6800Epoch 02815: val_acc did not improve
472/472 [==============================] - 0s - loss: 0.5151 - acc: 0.7648 - val_loss: 1.2978 - val_acc: 0.4151
Epoch 2817/10000
50/472 [==>...........................] - ETA: 0s - loss: 0.4406 - acc: 0.8600Epoch 02816: val_acc did not improve
472/472 [==============================] - 0s - loss: 0.5179 - acc: 0.7479 - val_loss: 1.2844 - val_acc: 0.4151
Epoch 2818/10000
50/472 [==>...........................] - ETA: 0s - loss: 0.5385 - acc: 0.7400Epoch 02817: val_acc did not improve
472/472 [==============================] - 0s - loss: 0.5100 - acc: 0.7585 - val_loss: 1.2699 - val_acc: 0.4151

@braingineer
Copy link
Contributor

A good method for debugging this issue is to use an ipython/jupyter notebook, compile the model, and then have it predict for one of your batches. Then, go through the accuracy code with the ability to manually inspect the values of the matrices. I've found stepping through code like this in mysterious situations to be enlightening.

@ersinyar
Copy link

@DSA101 Have you solved the problem? I am doing sentence classification task with variable sentence lengths using LSTMs. My problem is that training loss and training accuracy decrease over epochs but validation accuracy fluctuates in a small interval. Maybe your solution could be helpful for me too.

@DSA101
Copy link

DSA101 commented Sep 15, 2016

My solution was to increase the size of the training set, reduce the number of features, start with just one layer and not too many units (say 128). When I ensured that in such configuration the training progresses in a reasonable way, I have slowly added more features, more units, etc, and in the end got a satisfactory result. Still if I make the model overly complex (e.g. increase to 3 layers with say 512 units without providing more training data), it would behave the same as before - flat or irregular training accuracy.

In the end I don't know if there is still a bug in the framework, or it all results from an overly complicated model and the insufficient size of the training set, but all things considered, I am satisfied with the performance of the model and the results that I have achieved and believe that Keras LSTM is usable for time series classification.

So if your training acc improves but validation accuracy stays in a small interval, can it be indicative of overfitting?

@maxpagels
Copy link

I'm having the same issue. Loss and accuracy on the training set change from epoch to epoch, but the validation accuracy / loss doesn't, which is a bit odd.

Epoch 1/20
158/158 [==============================] - 24s - loss: 2.3558 - acc: 0.4051 - val_loss: 1.0986 - val_acc: 0.3684
Epoch 2/20
158/158 [==============================] - 24s - loss: 1.8001 - acc: 0.3924 - val_loss: 1.0986 - val_acc: 0.3684
Epoch 3/20
158/158 [==============================] - 24s - loss: 1.2940 - acc: 0.3608 - val_loss: 1.0986 - val_acc: 0.3684
Epoch 4/20
158/158 [==============================] - 24s - loss: 1.8052 - acc: 0.4114 - val_loss: 1.0986 - val_acc: 0.3684
Epoch 5/20
158/158 [==============================] - 24s - loss: 1.7127 - acc: 0.3734 - val_loss: 1.0986 - val_acc: 0.3684
Epoch 6/20
158/158 [==============================] - 24s - loss: 1.8030 - acc: 0.3734 - val_loss: 1.0986 - val_acc: 0.3684
Epoch 7/20
158/158 [==============================] - 24s - loss: 1.7076 - acc: 0.3861 - val_loss: 1.0986 - val_acc: 0.3684
Epoch 8/20
158/158 [==============================] - 24s - loss: 1.4173 - acc: 0.4241 - val_loss: 1.0986 - val_acc: 0.3684
Epoch 9/20
158/158 [==============================] - 24s - loss: 1.3042 - acc: 0.3797 - val_loss: 1.0986 - val_acc: 0.3684

The model I'm using is a convnet:

model = Sequential()
model.add(Convolution2D(20, 5, 5, input_shape=(3, img_width, img_height)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))

model.add(Flatten())
model.add(Dense(20))
model.add(Activation('relu'))
model.add(Dropout(0.2))
model.add(Dense(3))
model.add(Activation('sigmoid'))

sgd = SGD(lr=0.0005)
model.compile(loss='categorical_crossentropy',
              optimizer=sgd,
              metrics=['accuracy'])

@savourylie
Copy link

Similar problem here. It really feels like a bug to me. The reason is that my validation set has 2500+ observations for a dataset of size like this, as long as there's change in the weights (and theres is since the training error is decreasing), there should be change in the val_loss, either positive or negative. Also it's unlikely it's overfitting as I'm really using heavy dropouts (between 0.5~0.7 for each layer).

My solution to this is changing the learning rate of the optimizer....sometimes it helps, haha. I've never experienced the same phenomenon using raw tensorflow so I think it's a keras thing.

@andrew-ayers
Copy link

andrew-ayers commented Jan 26, 2017

I'm gunna throw my voice in here, too. I'm currently doing the Udacity Self-Driving Car Engineer Nanodegree course; my cohort is currently doing the behavioral cloning lab. We were given a dataset of approximately 20k+ features and labels; I take it and augment it with flipping - so I have about 40k of data. My convnet is the same one from the NVidia end-to-end paper (relu on all layers). I am using adam and mse for optimizer/loss. I've tried heavy dropout on the fully-connected layers, on all layers, on random layers. Ultimately, my validation accuracy stays stuck at a single value. I'd think if I were overfitting, the accuracy would peg close or at 100%? Rather, it seems like it is getting stuck in a local minima. I think I'm going to need to do some visualization of the data, to verify that it is balanced, plus I have some other ideas to try, but so far it is very frustrating. I don't know if it is a bug with the framework; my best guess is that it is not, because other students are finding success.

@dvillevald
Copy link

@andrew-ayers Did you manage to solve this issue? I have a similar problem with NVIDIA (adam, mse, 120k samples including flipped data) model for Self_Driving Car Engineer course - validation loss changes but validation accuracy stays the same.

@msmah
Copy link

msmah commented Jan 23, 2018

I had the same problem while training a convolutional auto encoder. I made learning rate ("lr" parameter in optimizer) smaller and it solved the problem.

@hujiao1314
Copy link

Have you solved the problem? I met a similar problem with my keras CNN model, my training samples were 4000, and validation samples were 1000. During the training process, the loss and val_loss was decreasing, but the acc and val_acc never changing during this process.

this is my code:

'inputs_x=Input(shape=(1,65,21))
x=Conv2D(64,(3,3),padding='same',data_format='channels_first',activation='relu',use_bias=True)(x)
x=Conv2D(64,(3,3),padding='same',data_format='channels_first',activation='relu',use_bias=True)(x)
x=MaxPooling2D(pool_size=(2,2),strides=(2,2))(x)

x=Conv2D(32,(5,5),padding='same',data_format='channels_first',activation='relu',use_bias=True)(x)
x=Conv2D(16,(5,5),padding='valid',data_format='channels_first',activation='relu',use_bias=True)(x)
x=MaxPooling2D(pool_size=(2,2),strides=(2,2))(x)

x=Dropout(0.25)(x)
x=Flatten()(x)

inputs_y=Input(shape=(1,32,21))
y=Conv2D(32,(2,2),padding='same',data_format='channels_first',activation='relu',use_bias=True)(y)
y=Conv2D(32,(2,2),padding='same',data_format='channels_first',activation='relu',use_bias=True)(y)
y=MaxPooling2D(pool_size=(2,2),strides=(2,2))(y)

y=Conv2D(32,(4,4),padding='same',data_format='channels_first',activation='relu',use_bias=True)(y)
y=Conv2D(8,(4,4),padding='valid',data_format='channels_first',activation='relu',use_bias=True)(y)
y=MaxPooling2D(pool_size=(2,2),strides=(2,2))(y)

y=Dropout(0.30)(y)
y=Flatten()(y)

merged_input=keras.layers.concatenate([x,y],axis=-1)

z=Dense(16,activation='softmax')(merged_input)
z=Dense(8,activation='softmax')(z)
z=Dense(4,activation='softmax')(z)

outp=Dense(1,activation='softmax')(z)

model=Model(inputs=[inputs_x,inputs_y],outputs=outp)
model.compile(loss='binary_crossentropy',
optimizer='sgd',
metrics=['accuracy'])

history=model.fit(x=[train_inputs_x,train_inputs_y],y=train_label,batch_size=32,
epochs=30,validation_split=0.2,shuffle=True)`

any ideas for this?

@hadisaadat
Copy link

Does anyone know how to solve this issues?
in my model, by LSTM I have got repeating training and validation accuracy for each epoch!!
the model learns slightly within the epoch and after each batch, but seems it reset before next epoch and start again from the beginning!
its the training log after epochs:

  • 4s - loss: 0.2217 - acc: 0.6464 - val_loss: 0.1487 - val_acc: 0.8137
  • 3s - loss: 0.2217 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
  • 3s - loss: 0.2217 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
  • 3s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
  • 4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
  • 4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
  • 4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
  • 4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
  • 4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
  • 4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
  • 4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
  • 4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
  • 4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
  • 4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
  • 4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
  • 4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
  • 4s - loss: 0.2216 - acc: 0.6469 - val_loss: 0.1487 - val_acc: 0.8137
    ...
    it's the log after batches:
    Train on 21000 samples, validate on 9000 samples
    ==================New Training Start====================
    Epoch 1/50
    batch: 0 ended Loss: 0.21712732 Accuracy: 0.625
    batch: 1 ended Loss: 0.229398 Accuracy: 0.65166664
    batch: 2 ended Loss: 0.22204755 Accuracy: 0.6383333
    batch: 3 ended Loss: 0.21405634 Accuracy: 0.6533333
    batch: 4 ended Loss: 0.21910276 Accuracy: 0.63
    batch: 5 ended Loss: 0.22354788 Accuracy: 0.70166665
    batch: 6 ended Loss: 0.23390895 Accuracy: 0.62166667
    batch: 7 ended Loss: 0.21102294 Accuracy: 0.62833333
    batch: 8 ended Loss: 0.22611171 Accuracy: 0.66833335
    batch: 9 ended Loss: 0.21904916 Accuracy: 0.62
    batch: 10 ended Loss: 0.23376058 Accuracy: 0.645
    batch: 11 ended Loss: 0.21929795 Accuracy: 0.6766667
    batch: 12 ended Loss: 0.22111656 Accuracy: 0.6483333
    batch: 13 ended Loss: 0.2131401 Accuracy: 0.65
    batch: 14 ended Loss: 0.2148913 Accuracy: 0.6566667
    batch: 15 ended Loss: 0.22052963 Accuracy: 0.635
    batch: 16 ended Loss: 0.22950262 Accuracy: 0.6333333
    batch: 17 ended Loss: 0.22890009 Accuracy: 0.64666665
    batch: 18 ended Loss: 0.22269897 Accuracy: 0.65166664
    batch: 19 ended Loss: 0.22959195 Accuracy: 0.645
    batch: 20 ended Loss: 0.22551142 Accuracy: 0.6566667
    batch: 21 ended Loss: 0.2217158 Accuracy: 0.635
    batch: 22 ended Loss: 0.21928492 Accuracy: 0.64
    batch: 23 ended Loss: 0.21457554 Accuracy: 0.66333336
    batch: 24 ended Loss: 0.22461174 Accuracy: 0.655
    batch: 25 ended Loss: 0.21772751 Accuracy: 0.665
    batch: 26 ended Loss: 0.21689837 Accuracy: 0.63166666
    batch: 27 ended Loss: 0.22468112 Accuracy: 0.6333333
    batch: 28 ended Loss: 0.2141714 Accuracy: 0.6533333
    batch: 29 ended Loss: 0.22494899 Accuracy: 0.6483333
    batch: 30 ended Loss: 0.22441803 Accuracy: 0.62833333
    batch: 31 ended Loss: 0.22385867 Accuracy: 0.62666667
    batch: 32 ended Loss: 0.2221946 Accuracy: 0.66
    batch: 33 ended Loss: 0.2230069 Accuracy: 0.64166665
    batch: 34 ended Loss: 0.21400177 Accuracy: 0.66

@ManuConcepBrito
Copy link

@hujiao1314 I do not know if I really understand what you are trying to do, so forgive me if it does not make sense. My observations:
In your last layer outp, you are using softmax when you only have one output neuron. You might find it useful to change to 'sigmoid'. Again, for the layers named z, they do not seem to be a final output and you are using a softmax activation function.
It is quite a bit confusing so if you could specify the characteristics of your problem I could be more helpful.

@AkhilAshref
Copy link

@hadisaadat reduce ur learning rate and try for a few smaller learning rates. SHould solve ur problem

@ghost
Copy link

ghost commented Jul 9, 2018

@AkhilAshref , even i had the similar issue as @hadisaadat , mine worked after reducing the lr. But could you give a bit more detailed explanation as to why the gradient becomes zero.
Thanks

@amcneil1998
Copy link

@vishnu-zsf I'm having the same problem it seems, what optimizer/ learning rate did you use?

@ghost
Copy link

ghost commented Jul 11, 2018

@amcneil1998 , i used adam optimizer and settled on a learning rate of 0.0008 , . This was when i used 100,000 data samples and had 10 epochs. But later on when i tried to run with 30 epochs , i shifted to decaying learning rate, which after tuning for a while gave me satisfactory results. I'm pretty sure that the learning rate and all the parameters in the optimizer vary with the kind of data we have and the sheer magnitude of the features.

<initial code when i ran with 10 epochs >
.
opt = optimizers.adam(lr=0.0008)
self.model.compile(loss='binary_crossentropy', optimizer=opt,metrics = ['accuracy'])
.
code to run with decaying lr in Keras
.

keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
.
.
Do reply if you the issue still persists.

@amcneil1998
Copy link

@vishnu-zsf still having the issue. I have tried reducing the learning rate, increasing the learning rate, tried both sdg and adam optimizers. I have event tried to over fit my data by just using a small part of my data. I currently have 900 data points, of which I am using 100 for both test and validation, and 700 for training. I have tried increasing my amount of data to 2800, using 400 for both test and validation, and 2000 for training. I'm currently using a batch size of 50, and even running past 50 epochs showed no increase in accuracy or loss. I noticed later on while trying to predict results that my predictions were heading towards 0, with them coming closer the longer I trained. This seems to be the case really no matter what I do.

@hadisaadat
Copy link

@vishnu-zsf @amcneil1998 in my case, the lr had no impact actually and the solution for me was shuffling data for each epoch.
depends on your data nature [time series or not] you should select a convenient cross-validation and shuffling strategy.

@amcneil1998
Copy link

@hadisaadat setting shuffle=true did not improve my results. Accuracy still stayed around 0.5 but loss started pretty low (0.01). So I increased the learning rate and loss started around 5.1 and then dropped of to 0.02 after the 6th Epoch. Accuracy started at 0.5 and averaged around that on both training and validation data for the 120 epochs that I trained. However when predicting I am only able to get 2 values from the output.

@ghost
Copy link

ghost commented Jul 13, 2018

@amcneil1998 you may have to regularize and can even use the Earlystopping in callbacks, but before that could you share your code and ur data ( 5 sample points would do) , coz like i said the methods we use pretty much depend on the type of data we use. Mine is all resolved now btw

@amcneil1998
Copy link

@vishnu-zsf All of my input/output data is regularized from -1-1 with a mean of 0. The input data is a 3d array with the form (Nsamples, Entries/Sample, EntryDim). In this case it is (900, 225, 6). The output data is a 2d array with shape (Nsamples, 2), so in this case it is (900,2). Some of the samples did not have enough entries so they are zero-padded to the correct size. Here is the code for the model after the test data has been split off:

initilizer = RandomNormal(mean=0.0, stddev=0.05, seed=None)
modelInput = Input(batch_shape=(batch_size, 225, 6), name="Model_Input")
mid = LSTM(128, return_sequences=True, input_dim= (225, 6), bias_initializer=initilizer)(modelInput)
mid = LSTM(128, return_sequences=False, bias_initializer=initilizer)(mid)
output = Dense(2, activation='linear')(mid)
model = Model(inputs = modelInput, outputs = output)
adam = optimizers.Adam(lr = 0.000000001)
model.compile(loss='mean_squared_error', optimizer = adam, metrics=['accuracy'])
model.fit(trainInputData, trainTruthData, epochs=20, batch_size=batch_size, verbose=2, validation_split=(1/8), shuffle=True)

@AniketDhar
Copy link

I have faced the same issue multiple times while using Keras. I have tried data normalization, shuffling, different learning rates and different optimizers. Nothing seems to help out, except increasing the data size. Now that is a problem for me, as I am trying to compare the effect of the data sample size on my network.
I see a lot of problems but rarely any solution in the discussions above. If anyone has a decent solution except sample size, kindly let me know.

@Timlo512
Copy link

I used to face the same result before. I found that using smaller neural network architecture. Reason behind should be due to vanishing gradient. In some situation, your input might not carry as much information as the neural network expects, and therefore, the weights are gonna vanish to zeros even after several layers. Such problem is more serious when you are doing ConvNet, and it's the reason why we got residual network. Hope this help.

@BahadirGLCK
Copy link

BahadirGLCK commented Jan 15, 2019

I 'm not sure but I solved this problem. I used Keras for CNN model on the Kaggle platform with GPU.
I took the same problems all epoch step had same val_loss and val_acc. Like:
Epoch 2/50 - val_loss: 0.6931 - val_acc:0.5521
Epoch 3/50 - val_loss: 0.6931 -val_acc: 0.5521
...
When I changed optimization methods from Adam to RMSprop, it was run but I refreshed all kernel and restart I took the same issue. I changed again RMSprop to SGD. It had worked.
Sometimes the problem is caused by a unsuitable Dense layers.

@sayedathar11
Copy link

For Those who still have this problem and wondering why this occurs. The reason is pretty straightforward in your final Dense layers where you are specifying the output basically the softmax layer , here number of cells should be equal to number of classes.
If you are solving Binary Classification all you need to do this use 1 cell with sigmoid activation.

for Binary

model.add(Dense(1,activation='sigmoid'))

for n_class

model.add(Dense(n_class,activation='softmax')) #where n_class is number of classes
Thanks to :https://stackoverflow.com/questions/51581521/accuracy-stuck-at-50-keras

@sowmy19
Copy link

sowmy19 commented Feb 21, 2019

@sayedathar11
When I use model.add(Dense(1,activation='sigmoid')), am getting the following error.
ValueError: Error when checking target: expected dense_4 to have shape (1,) but got array with shape (2,)

Here is my code:

batch_size = 32
nb_classes = 2
data_augmentation = True

img_rows, img_cols = 224,224
img_channels = 3

#Creating array of training samples
train_path = "D:/data/train*.*"
training_data=[]
for file in glob.glob(train_path):
print(file)
train_array= cv2.imread(file)
train_array=cv2.resize(train_array,(img_rows,img_cols),3)
training_data.append(train_array)

x_train=np.array(training_data)

#Creating array of validation samples
valid_path = "D:/data/valid*.*"
valid_data=[]
for file in glob.glob(valid_path):
print(file)
valid_array= cv2.imread(file)
valid_array=cv2.resize(valid_array,(img_rows,img_cols),3)
valid_data.append(train_array)

x_valid=np.array(valid_data)

x_train = np.array(x_train, dtype="float")/255.0
x_valid = np.array(x_valid, dtype="float")/255.0

#Creating array for Labels
y_train=np.ones((num_trainsamples,),dtype = int)
y_train[0:224]=0 #Class1=0
y_train[225:363]=1 #Class2=1
print(y_train)

y_valid=np.ones((num_validsamples,),dtype = int)
y_valid[0:101]=0
y_valid[102:155]=1
print(y_valid)

y_train = np_utils.to_categorical(y_train,nb_classes,dtype='int32')
y_valid = np_utils.to_categorical(y_valid,nb_classes,dtype='int32')

base_model=ResNet50(weights='imagenet',include_top=False)

x = base_model.output
x = GlobalMaxPooling2D()(x)
x=Dense(1024,activation='relu')(x)
x=Dense(1024,activation='relu')(x)
x=Dense(512,activation='relu')(x)
x=Dense(1, activation= 'sigmoid')(x)
model = Model(inputs = base_model.input, outputs = x)

for i,layer in enumerate(model.layers):
print(i,layer.name)

for layer in model.layers[:75]:
layer.trainable=False
for layer in model.layers[75:]:
layer.trainable=True

adam = Adam(lr=0.0001)
model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy'])

train_datagen = ImageDataGenerator(
brightness_range=(0.2,2.5),
rotation_range=180,
zoom_range=0.5,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
vertical_flip=True)

train_datagen.fit(x_train)

history= model.fit_generator(train_datagen.flow(x_train, y_train, batch_size = 10,shuffle=True),steps_per_epoch=len(x_train),epochs = 500,shuffle=True,
validation_data=(x_valid,y_valid),validation_steps=num_validsamples // batch_size,callbacks=[tensorboard])

eval = model.evaluate(x_valid, y_valid)
print ("Loss = " + str(eval[0]))
print ("Test Accuracy = " + str(eval[1]))

predictions= model.predict(x_valid)
print(predictions)

@skhadem
Copy link

skhadem commented Mar 21, 2019

I had same issue: epoch accuracy was growing while validation was the same value (0.41). But, I saved the weights after an epoch and then when I loaded the weights and continued training, everything worked.

First time: create the model, compile, call fit_generator: bad validation results every epoch.
Then: create the model, compile, load weights, call fit_generator: everything works beautifully.

To me it seems like I missed a step, but when calling load_weights on the model it was corrected

@abhineet99
Copy link

Had the same issue. Reducing Initial Learning Rate helps.

@prabaHridayami
Copy link

hey, I'm new at deep learning especially CNN. I've been trying to train 100 class with 10 images for each class.
I've been using many kinds of architecture but the val_loss really high and val_acc really low.
Do you guys have any suggestion for that?

@skhadem
Copy link

skhadem commented Apr 10, 2019

@prabaHridayami That is very low amount of data, it can be hard to obtain good results. Are you doing any type of data augmentation? That would be my suggestion to increase the variety of data your model sees.

@prabaHridayami
Copy link

@skhadem yeah, i'm doing several augmentations so 1 image is going to be having 88 image augmentation. i'm currently trying to train 10 class with val_acc is 0.6870 and val_loss is 1.4573. what do you think?

@skhadem
Copy link

skhadem commented Apr 11, 2019

@prabaHridayami what architecture are you using?

@prabaHridayami
Copy link

@prabaHridayami what architecture are you using?

model = Sequential()

model.add(Conv2D(32, (3, 3), input_shape=(100, 400, 3), activation='relu', padding='same',name='block1_conv1'))
model.add(Conv2D(32, (3, 3), activation='relu',padding='same',name='block1_conv2'))
model.add(Conv2D(32, (3, 3), activation='relu',padding='same',name='block1_conv3'))
model.add(Conv2D(32, (3, 3), activation='relu',padding='same',name='block1_conv4'))
model.add(MaxPooling2D(pool_size=(2, 2),name='block1_pool'))

model.add(Conv2D(64, (3, 3), activation='relu',padding='same',name='block2_conv1'))
model.add(Conv2D(64, (3, 3), activation='relu',padding='same',name='block2_conv2'))
model.add(Conv2D(64, (3, 3), activation='relu',padding='same',name='block2_conv3'))
model.add(MaxPooling2D(pool_size=(2, 2),name='block2_pool'))

model.add(Conv2D(128, (3, 3), activation='relu',padding='same',name='block3_conv1'))
model.add(Conv2D(128, (3, 3), activation='relu',padding='same',name='block3_conv2'))
model.add(Conv2D(128, (3, 3), activation='relu',padding='same',name='block3_conv3'))
model.add(MaxPooling2D(pool_size=(2, 2),name='block3_pool'))

model.add(Conv2D(256, (3, 3), activation='relu',padding='same',name='block4_conv1'))
model.add(Conv2D(256, (3, 3), activation='relu',padding='same',name='block4_conv2'))
model.add(MaxPooling2D(pool_size=(2, 2),strides =(2,2),name='block4pool'))

model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.4))

model.add(Dense(256, activation='relu'))
model.add(Dropout(0.4))

model.add(Dense(20, activation='softmax'))

this is my architecture model using sequential

@skhadem
Copy link

skhadem commented Apr 11, 2019

@prabaHridayami I would recommend using a pre trained and well studied architecture for feature extraction and then fine tuning the layers on the top. My personal go-to is VGG19. In keras you can do something like this:

base = keras.applications.VGG19(input_shape=(100,400,3), 
                                include_top=False, 
                                input_size=(100,400,3),
                                weights='imagenet',
                                pooling='max')
# freeze base layers
for layer in base.layers:
    layer.trainable=False

model = keras.Sequential()
model.add(base)
# you should experiment with different top level designs 
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(20, activation='softmax'))

check out https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

@prabaHridayami
Copy link

@prabaHridayami I would recommend using a pre trained and well studied architecture for feature extraction and then fine tuning the layers on the top. My personal go-to is VGG19. In keras you can do something like this:

base = keras.applications.VGG19(input_shape=(100,400,3), 
                                include_top=False, 
                                input_size=(100,400,3),
                                weights='imagenet',
                                pooling='max')
# freeze base layers
for layer in base.layers:
    layer.trainable=False

model = keras.Sequential()
model.add(base)
# you should experiment with different top level designs 
model.add(Dense(1024, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(512, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(20, activation='softmax'))

check out https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html

thank you very much, i'll check that out...
i don't really understand about dense and dropout.
do you know what is the function of these two?

@skhadem
Copy link

skhadem commented Apr 11, 2019

So Dense is just a fully connected layer, it is what does a lot of the "decision making" based on the resulting feature vector. It's a way to take large feature vectors and map to a class. The more you have the more "flexible" it can be, i.e. learn better, but that means more parameters. Dropout literally takes random weights and drops them by setting them to 0. The way I think about it is that if there are certain sections that are contributing a lot to a correct result, the optimizer could ignore everything else. With Dropout the optimizer is forced to focus on many different places. It helps to avoid over fitting and is almost standard at this point.

@prabaHridayami
Copy link

prabaHridayami commented Apr 12, 2019

So Dense is just a fully connected layer, it is what does a lot of the "decision making" based on the resulting feature vector. It's a way to take large feature vectors and map to a class. The more you have the more "flexible" it can be, i.e. learn better, but that means more parameters. Dropout literally takes random weights and drops them by setting them to 0. The way I think about it is that if there are certain sections that are contributing a lot to a correct result, the optimizer could ignore everything else. With Dropout the optimizer is forced to focus on many different places. It helps to avoid over fitting and is almost standard at this point.

thank you very much... now i understand

@MukundGK1986
Copy link

@sayedathar11
When I use model.add(Dense(1,activation='sigmoid')), am getting the following error.
ValueError: Error when checking target: expected dense_4 to have shape (1,) but got array with shape (2,)

Here is my code:

batch_size = 32
nb_classes = 2
data_augmentation = True

img_rows, img_cols = 224,224
img_channels = 3

#Creating array of training samples
train_path = "D:/data/train*.*"
training_data=[]
for file in glob.glob(train_path):
print(file)
train_array= cv2.imread(file)
train_array=cv2.resize(train_array,(img_rows,img_cols),3)
training_data.append(train_array)

x_train=np.array(training_data)

#Creating array of validation samples
valid_path = "D:/data/valid*.*"
valid_data=[]
for file in glob.glob(valid_path):
print(file)
valid_array= cv2.imread(file)
valid_array=cv2.resize(valid_array,(img_rows,img_cols),3)
valid_data.append(train_array)

x_valid=np.array(valid_data)

x_train = np.array(x_train, dtype="float")/255.0
x_valid = np.array(x_valid, dtype="float")/255.0

#Creating array for Labels
y_train=np.ones((num_trainsamples,),dtype = int)
y_train[0:224]=0 #Class1=0
y_train[225:363]=1 #Class2=1
print(y_train)

y_valid=np.ones((num_validsamples,),dtype = int)
y_valid[0:101]=0
y_valid[102:155]=1
print(y_valid)

y_train = np_utils.to_categorical(y_train,nb_classes,dtype='int32')
y_valid = np_utils.to_categorical(y_valid,nb_classes,dtype='int32')

base_model=ResNet50(weights='imagenet',include_top=False)

x = base_model.output
x = GlobalMaxPooling2D()(x)
x=Dense(1024,activation='relu')(x)
x=Dense(1024,activation='relu')(x)
x=Dense(512,activation='relu')(x)
x=Dense(1, activation= 'sigmoid')(x)
model = Model(inputs = base_model.input, outputs = x)

for i,layer in enumerate(model.layers):
print(i,layer.name)

for layer in model.layers[:75]:
layer.trainable=False
for layer in model.layers[75:]:
layer.trainable=True

adam = Adam(lr=0.0001)
model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy'])

train_datagen = ImageDataGenerator(
brightness_range=(0.2,2.5),
rotation_range=180,
zoom_range=0.5,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
vertical_flip=True)

train_datagen.fit(x_train)

history= model.fit_generator(train_datagen.flow(x_train, y_train, batch_size = 10,shuffle=True),steps_per_epoch=len(x_train),epochs = 500,shuffle=True,
validation_data=(x_valid,y_valid),validation_steps=num_validsamples // batch_size,callbacks=[tensorboard])

eval = model.evaluate(x_valid, y_valid)
print ("Loss = " + str(eval[0]))
print ("Test Accuracy = " + str(eval[1]))

predictions= model.predict(x_valid)
print(predictions)

I am also facing the exact same issue. If I keep the number of neurons in the output layer and use sigmoid, for each epochs, there is no change in the accuracy. But, if I make a change in the number of layers as mentioned above, same error as you are getting. Were you able to resolve ? In case yes, pls let us know the solution. Thank you in Advance.

@DevMetwaly
Copy link

I have a similar issue when i tried to build an autoencoder using LSTM for sequences or CNN for images, the model reaches around 50% accuracy, 2.5 loss then stuck, nothing improving at all.
I tried to increase number of nodes, number of layers but with no progress.

After 3 days I tuned the optimizer trying to change learning rate and learning rate decay, and finally everything improved and everything makes sense, trying to increase learning rate decay slightly till the model start to improve without stuck at 50%.

I used Adam optimizer with following parameters
Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.01, amsgrad=False)
Tuning the parameters will change from problem to another of course.
Thanks!

@Ahmed-Araby
Copy link

this happened when I used
winit = RandomNormal(mean=0.0 , stddev=0.1)
for weight initialization in the Dense layers
and it just worked when I removed it and used the default settings !!!!

@Pratikdomadiya
Copy link

I had the same problem while training a convolutional auto encoder. I made learning rate ("lr" parameter in optimizer) smaller and it solved the problem.

can you send me your code of optimization of autoencoder. i want to optimize my autoencoder network but i have no idea how to do that. can you please help me .

@AdislanSaidov
Copy link

had the same problem, solved by a changing adam optimizer to sgd

@amapic
Copy link

amapic commented May 8, 2020

I think that the learning rate is the problem. Actually mine was equal to 7 ahah. I wrote 10-3 instead of 1e-3.

@Terkea
Copy link

Terkea commented Jul 4, 2020

model = keras.Sequential([
    keras.layers.Conv2D(input_shape=(224,224,3),filters=64,kernel_size=(3,3),padding="same", activation="relu"),
    keras.layers.Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"),
    keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
    keras.layers.Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
    keras.layers.Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
    keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
    keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
    keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
    keras.layers.Flatten(),
    keras.layers.Dense(units=4096,activation="relu"),
#     keras.layers.Dropout(.5),
    keras.layers.Dense(units=4096,activation="relu"),
    keras.layers.Dropout(.5),
    keras.layers.Dense(units=2, activation="sigmoid"),
])

model.compile(optimizer="adam",
            loss="categorical_crossentropy",
            metrics=['accuracy'])

with this architecture, I get 0.73 constantly. couldn't find a fix yet

@kodon0
Copy link

kodon0 commented Dec 1, 2020

reducing batch size solved it for me :)

I guess my test set was too small to feed large batches into the CNN.

i hope this may be of use!

@burhr2
Copy link

burhr2 commented Dec 17, 2020

Hi, I recently had the same experience of training a CNN while my validation accuracy doesn't change. I tried different setups from LR, optimizer, number of filters and even playing with the model size. But later I discovered it was an issue with my preprocessing of data. Basically, I was doing some preprocessing to my data before training which ends up squeezing the pixel intensity to near zero (in short all images were just black images). I discovered it after debugging my preprocessing step in which I tried to write some of the images in a disk.

To be honest, I was suspecting it was a bug from Keras but boom! it was not. I tried to share my experience in case anyone else is facing the same issue.

@yousofaly
Copy link

yousofaly commented Dec 27, 2020

Were you dividing your images by 255? I am facing the same issue and am starting to suspect this is the problem. I divide my pixels by 255 (as is customary) but can still see what the image looks like when plotting it.

@shreyapamecha
Copy link

I faced the same issue. It got resolved by changing the optimizer from 'rmsprop' to 'adam'.

@yousofaly
Copy link

I tried changing optimizers, learning rates, momentum, network depth, and all other parameters. Turns out, I just needed to let it train for a long time before it started to find where the loss was decreasing. The AUC was stagnant for 35 epochs then it started increasing. Can't think of why, but it eventually started to learn.

@MuizU
Copy link

MuizU commented Mar 10, 2021

@sayedathar11
When I use model.add(Dense(1,activation='sigmoid')), am getting the following error.
ValueError: Error when checking target: expected dense_4 to have shape (1,) but got array with shape (2,)

Here is my code:

batch_size = 32
nb_classes = 2
data_augmentation = True

img_rows, img_cols = 224,224
img_channels = 3

#Creating array of training samples
train_path = "D:/data/train*.*"
training_data=[]
for file in glob.glob(train_path):
print(file)
train_array= cv2.imread(file)
train_array=cv2.resize(train_array,(img_rows,img_cols),3)
training_data.append(train_array)

x_train=np.array(training_data)

#Creating array of validation samples
valid_path = "D:/data/valid*.*"
valid_data=[]
for file in glob.glob(valid_path):
print(file)
valid_array= cv2.imread(file)
valid_array=cv2.resize(valid_array,(img_rows,img_cols),3)
valid_data.append(train_array)

x_valid=np.array(valid_data)

x_train = np.array(x_train, dtype="float")/255.0
x_valid = np.array(x_valid, dtype="float")/255.0

#Creating array for Labels
y_train=np.ones((num_trainsamples,),dtype = int)
y_train[0:224]=0 #Class1=0
y_train[225:363]=1 #Class2=1
print(y_train)

y_valid=np.ones((num_validsamples,),dtype = int)
y_valid[0:101]=0
y_valid[102:155]=1
print(y_valid)

y_train = np_utils.to_categorical(y_train,nb_classes,dtype='int32')
y_valid = np_utils.to_categorical(y_valid,nb_classes,dtype='int32')

base_model=ResNet50(weights='imagenet',include_top=False)

x = base_model.output
x = GlobalMaxPooling2D()(x)
x=Dense(1024,activation='relu')(x)
x=Dense(1024,activation='relu')(x)
x=Dense(512,activation='relu')(x)
x=Dense(1, activation= 'sigmoid')(x)
model = Model(inputs = base_model.input, outputs = x)

for i,layer in enumerate(model.layers):
print(i,layer.name)

for layer in model.layers[:75]:
layer.trainable=False
for layer in model.layers[75:]:
layer.trainable=True

adam = Adam(lr=0.0001)
model.compile(optimizer= adam, loss='binary_crossentropy', metrics=['accuracy'])

train_datagen = ImageDataGenerator(
brightness_range=(0.2,2.5),
rotation_range=180,
zoom_range=0.5,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
vertical_flip=True)

train_datagen.fit(x_train)

history= model.fit_generator(train_datagen.flow(x_train, y_train, batch_size = 10,shuffle=True),steps_per_epoch=len(x_train),epochs = 500,shuffle=True,
validation_data=(x_valid,y_valid),validation_steps=num_validsamples // batch_size,callbacks=[tensorboard])

eval = model.evaluate(x_valid, y_valid)
print ("Loss = " + str(eval[0]))
print ("Test Accuracy = " + str(eval[1]))

predictions= model.predict(x_valid)
print(predictions)

What is the variable num_trainsamples?

@BhujayKumarBhatta
Copy link

Go with the suggestion given by @kodon0 . It works !

@npucino
Copy link

npucino commented Aug 25, 2021

For Those who still have this problem and wondering why this occurs. The reason is pretty straightforward in your final Dense layers where you are specifying the output basically the softmax layer , here number of cells should be equal to number of classes.
If you are solving Binary Classification all you need to do this use 1 cell with sigmoid activation.

for Binary

model.add(Dense(1,activation='sigmoid'))

for n_class

model.add(Dense(n_class,activation='softmax')) #where n_class is number of classes
Thanks to :https://stackoverflow.com/questions/51581521/accuracy-stuck-at-50-keras

This worked for me!

@MortezaKarimian
Copy link

For Those who still have this problem and wondering why this occurs. The reason is pretty straightforward in your final Dense layers where you are specifying the output basically the softmax layer , here number of cells should be equal to number of classes. If you are solving Binary Classification all you need to do this use 1 cell with sigmoid activation.

for Binary

model.add(Dense(1,activation='sigmoid'))

for n_class

model.add(Dense(n_class,activation='softmax')) #where n_class is number of classes Thanks to :https://stackoverflow.com/questions/51581521/accuracy-stuck-at-50-keras

This solution work like a charm! thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests