New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
acc and val_acc don't change? #1597
Comments
Do you mean training accuracy and validation accuracy doesn't change in training procedure? |
@ymcui yes,it is. Epoch 1/15 how does this come out? |
@talentlei |
@talentlei Have solved the problem? I stuck in the same situation when I use RNN, but I don't know how to solve it. |
I have a similar problem. In my case when I attempt LSTM time series classification often val_acc starts with a high value and stays the same, even though loss, val_loss and acc change. I've narrowed down the issue to not enough training sequences (around 300). When I increased the number to 500+, it started to converge better, but still there are periods when loss, acc and val_loss changes, but val_acc sticks to the same value. How could that be? Is there a bug when it's not updating (even though loss, acc and val_loss update during the same epoch)? model = Sequential() Epoch 2816/10000 |
A good method for debugging this issue is to use an ipython/jupyter notebook, compile the model, and then have it predict for one of your batches. Then, go through the accuracy code with the ability to manually inspect the values of the matrices. I've found stepping through code like this in mysterious situations to be enlightening. |
@DSA101 Have you solved the problem? I am doing sentence classification task with variable sentence lengths using LSTMs. My problem is that training loss and training accuracy decrease over epochs but validation accuracy fluctuates in a small interval. Maybe your solution could be helpful for me too. |
My solution was to increase the size of the training set, reduce the number of features, start with just one layer and not too many units (say 128). When I ensured that in such configuration the training progresses in a reasonable way, I have slowly added more features, more units, etc, and in the end got a satisfactory result. Still if I make the model overly complex (e.g. increase to 3 layers with say 512 units without providing more training data), it would behave the same as before - flat or irregular training accuracy. In the end I don't know if there is still a bug in the framework, or it all results from an overly complicated model and the insufficient size of the training set, but all things considered, I am satisfied with the performance of the model and the results that I have achieved and believe that Keras LSTM is usable for time series classification. So if your training acc improves but validation accuracy stays in a small interval, can it be indicative of overfitting? |
I'm having the same issue. Loss and accuracy on the training set change from epoch to epoch, but the validation accuracy / loss doesn't, which is a bit odd.
The model I'm using is a convnet:
|
Similar problem here. It really feels like a bug to me. The reason is that my validation set has 2500+ observations for a dataset of size like this, as long as there's change in the weights (and theres is since the training error is decreasing), there should be change in the val_loss, either positive or negative. Also it's unlikely it's overfitting as I'm really using heavy dropouts (between 0.5~0.7 for each layer). My solution to this is changing the learning rate of the optimizer....sometimes it helps, haha. I've never experienced the same phenomenon using raw tensorflow so I think it's a keras thing. |
I'm gunna throw my voice in here, too. I'm currently doing the Udacity Self-Driving Car Engineer Nanodegree course; my cohort is currently doing the behavioral cloning lab. We were given a dataset of approximately 20k+ features and labels; I take it and augment it with flipping - so I have about 40k of data. My convnet is the same one from the NVidia end-to-end paper (relu on all layers). I am using adam and mse for optimizer/loss. I've tried heavy dropout on the fully-connected layers, on all layers, on random layers. Ultimately, my validation accuracy stays stuck at a single value. I'd think if I were overfitting, the accuracy would peg close or at 100%? Rather, it seems like it is getting stuck in a local minima. I think I'm going to need to do some visualization of the data, to verify that it is balanced, plus I have some other ideas to try, but so far it is very frustrating. I don't know if it is a bug with the framework; my best guess is that it is not, because other students are finding success. |
@andrew-ayers Did you manage to solve this issue? I have a similar problem with NVIDIA (adam, mse, 120k samples including flipped data) model for Self_Driving Car Engineer course - validation loss changes but validation accuracy stays the same. |
I had the same problem while training a convolutional auto encoder. I made learning rate ("lr" parameter in optimizer) smaller and it solved the problem. |
Have you solved the problem? I met a similar problem with my keras CNN model, my training samples were 4000, and validation samples were 1000. During the training process, the loss and val_loss was decreasing, but the acc and val_acc never changing during this process. this is my code: 'inputs_x=Input(shape=(1,65,21)) x=Conv2D(32,(5,5),padding='same',data_format='channels_first',activation='relu',use_bias=True)(x) x=Dropout(0.25)(x) inputs_y=Input(shape=(1,32,21)) y=Conv2D(32,(4,4),padding='same',data_format='channels_first',activation='relu',use_bias=True)(y) y=Dropout(0.30)(y) merged_input=keras.layers.concatenate([x,y],axis=-1) z=Dense(16,activation='softmax')(merged_input) outp=Dense(1,activation='softmax')(z) model=Model(inputs=[inputs_x,inputs_y],outputs=outp) history=model.fit(x=[train_inputs_x,train_inputs_y],y=train_label,batch_size=32, any ideas for this? |
Does anyone know how to solve this issues?
|
@hujiao1314 I do not know if I really understand what you are trying to do, so forgive me if it does not make sense. My observations: |
@hadisaadat reduce ur learning rate and try for a few smaller learning rates. SHould solve ur problem |
@AkhilAshref , even i had the similar issue as @hadisaadat , mine worked after reducing the lr. But could you give a bit more detailed explanation as to why the gradient becomes zero. |
@vishnu-zsf I'm having the same problem it seems, what optimizer/ learning rate did you use? |
@amcneil1998 , i used adam optimizer and settled on a learning rate of 0.0008 , . This was when i used 100,000 data samples and had 10 epochs. But later on when i tried to run with 30 epochs , i shifted to decaying learning rate, which after tuning for a while gave me satisfactory results. I'm pretty sure that the learning rate and all the parameters in the optimizer vary with the kind of data we have and the sheer magnitude of the features. <initial code when i ran with 10 epochs > keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False) |
@vishnu-zsf still having the issue. I have tried reducing the learning rate, increasing the learning rate, tried both sdg and adam optimizers. I have event tried to over fit my data by just using a small part of my data. I currently have 900 data points, of which I am using 100 for both test and validation, and 700 for training. I have tried increasing my amount of data to 2800, using 400 for both test and validation, and 2000 for training. I'm currently using a batch size of 50, and even running past 50 epochs showed no increase in accuracy or loss. I noticed later on while trying to predict results that my predictions were heading towards 0, with them coming closer the longer I trained. This seems to be the case really no matter what I do. |
@vishnu-zsf @amcneil1998 in my case, the lr had no impact actually and the solution for me was shuffling data for each epoch. |
@hadisaadat setting shuffle=true did not improve my results. Accuracy still stayed around 0.5 but loss started pretty low (0.01). So I increased the learning rate and loss started around 5.1 and then dropped of to 0.02 after the 6th Epoch. Accuracy started at 0.5 and averaged around that on both training and validation data for the 120 epochs that I trained. However when predicting I am only able to get 2 values from the output. |
@amcneil1998 you may have to regularize and can even use the Earlystopping in callbacks, but before that could you share your code and ur data ( 5 sample points would do) , coz like i said the methods we use pretty much depend on the type of data we use. Mine is all resolved now btw |
@vishnu-zsf All of my input/output data is regularized from -1-1 with a mean of 0. The input data is a 3d array with the form (Nsamples, Entries/Sample, EntryDim). In this case it is (900, 225, 6). The output data is a 2d array with shape (Nsamples, 2), so in this case it is (900,2). Some of the samples did not have enough entries so they are zero-padded to the correct size. Here is the code for the model after the test data has been split off:
|
I have faced the same issue multiple times while using Keras. I have tried data normalization, shuffling, different learning rates and different optimizers. Nothing seems to help out, except increasing the data size. Now that is a problem for me, as I am trying to compare the effect of the data sample size on my network. |
I used to face the same result before. I found that using smaller neural network architecture. Reason behind should be due to vanishing gradient. In some situation, your input might not carry as much information as the neural network expects, and therefore, the weights are gonna vanish to zeros even after several layers. Such problem is more serious when you are doing ConvNet, and it's the reason why we got residual network. Hope this help. |
I 'm not sure but I solved this problem. I used Keras for CNN model on the Kaggle platform with GPU. |
For Those who still have this problem and wondering why this occurs. The reason is pretty straightforward in your final Dense layers where you are specifying the output basically the softmax layer , here number of cells should be equal to number of classes. for Binarymodel.add(Dense(1,activation='sigmoid')) for n_classmodel.add(Dense(n_class,activation='softmax')) #where n_class is number of classes |
@sayedathar11 Here is my code: batch_size = 32 img_rows, img_cols = 224,224 #Creating array of training samples x_train=np.array(training_data) #Creating array of validation samples x_valid=np.array(valid_data) x_train = np.array(x_train, dtype="float")/255.0 #Creating array for Labels y_valid=np.ones((num_validsamples,),dtype = int) y_train = np_utils.to_categorical(y_train,nb_classes,dtype='int32') base_model=ResNet50(weights='imagenet',include_top=False) x = base_model.output for i,layer in enumerate(model.layers): for layer in model.layers[:75]: adam = Adam(lr=0.0001) train_datagen = ImageDataGenerator( train_datagen.fit(x_train) history= model.fit_generator(train_datagen.flow(x_train, y_train, batch_size = 10,shuffle=True),steps_per_epoch=len(x_train),epochs = 500,shuffle=True, eval = model.evaluate(x_valid, y_valid) predictions= model.predict(x_valid) |
I had same issue: epoch accuracy was growing while validation was the same value (0.41). But, I saved the weights after an epoch and then when I loaded the weights and continued training, everything worked. First time: create the model, compile, call fit_generator: bad validation results every epoch. To me it seems like I missed a step, but when calling load_weights on the model it was corrected |
Had the same issue. Reducing Initial Learning Rate helps. |
hey, I'm new at deep learning especially CNN. I've been trying to train 100 class with 10 images for each class. |
@prabaHridayami That is very low amount of data, it can be hard to obtain good results. Are you doing any type of data augmentation? That would be my suggestion to increase the variety of data your model sees. |
@skhadem yeah, i'm doing several augmentations so 1 image is going to be having 88 image augmentation. i'm currently trying to train 10 class with val_acc is 0.6870 and val_loss is 1.4573. what do you think? |
@prabaHridayami what architecture are you using? |
model = Sequential() model.add(Conv2D(32, (3, 3), input_shape=(100, 400, 3), activation='relu', padding='same',name='block1_conv1')) model.add(Conv2D(64, (3, 3), activation='relu',padding='same',name='block2_conv1')) model.add(Conv2D(128, (3, 3), activation='relu',padding='same',name='block3_conv1')) model.add(Conv2D(256, (3, 3), activation='relu',padding='same',name='block4_conv1')) model.add(Flatten()) model.add(Dense(256, activation='relu')) model.add(Dense(20, activation='softmax')) this is my architecture model using sequential |
@prabaHridayami I would recommend using a pre trained and well studied architecture for feature extraction and then fine tuning the layers on the top. My personal go-to is VGG19. In keras you can do something like this:
check out https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html |
thank you very much, i'll check that out... |
So Dense is just a fully connected layer, it is what does a lot of the "decision making" based on the resulting feature vector. It's a way to take large feature vectors and map to a class. The more you have the more "flexible" it can be, i.e. learn better, but that means more parameters. Dropout literally takes random weights and drops them by setting them to 0. The way I think about it is that if there are certain sections that are contributing a lot to a correct result, the optimizer could ignore everything else. With Dropout the optimizer is forced to focus on many different places. It helps to avoid over fitting and is almost standard at this point. |
thank you very much... now i understand |
I am also facing the exact same issue. If I keep the number of neurons in the output layer and use sigmoid, for each epochs, there is no change in the accuracy. But, if I make a change in the number of layers as mentioned above, same error as you are getting. Were you able to resolve ? In case yes, pls let us know the solution. Thank you in Advance. |
I have a similar issue when i tried to build an autoencoder using LSTM for sequences or CNN for images, the model reaches around 50% accuracy, 2.5 loss then stuck, nothing improving at all. After 3 days I tuned the optimizer trying to change learning rate and learning rate decay, and finally everything improved and everything makes sense, trying to increase learning rate decay slightly till the model start to improve without stuck at 50%. I used Adam optimizer with following parameters |
this happened when I used |
can you send me your code of optimization of autoencoder. i want to optimize my autoencoder network but i have no idea how to do that. can you please help me . |
had the same problem, solved by a changing |
I think that the learning rate is the problem. Actually mine was equal to 7 ahah. I wrote 10-3 instead of 1e-3. |
model = keras.Sequential([
keras.layers.Conv2D(input_shape=(224,224,3),filters=64,kernel_size=(3,3),padding="same", activation="relu"),
keras.layers.Conv2D(filters=64,kernel_size=(3,3),padding="same", activation="relu"),
keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
keras.layers.Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=128, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
keras.layers.Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=256, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.Conv2D(filters=512, kernel_size=(3,3), padding="same", activation="relu"),
keras.layers.MaxPool2D(pool_size=(2,2),strides=(2,2)),
keras.layers.Flatten(),
keras.layers.Dense(units=4096,activation="relu"),
# keras.layers.Dropout(.5),
keras.layers.Dense(units=4096,activation="relu"),
keras.layers.Dropout(.5),
keras.layers.Dense(units=2, activation="sigmoid"),
])
model.compile(optimizer="adam",
loss="categorical_crossentropy",
metrics=['accuracy']) with this architecture, I get 0.73 constantly. couldn't find a fix yet |
reducing batch size solved it for me :) I guess my test set was too small to feed large batches into the CNN. i hope this may be of use! |
Hi, I recently had the same experience of training a CNN while my validation accuracy doesn't change. I tried different setups from LR, optimizer, number of filters and even playing with the model size. But later I discovered it was an issue with my preprocessing of data. Basically, I was doing some preprocessing to my data before training which ends up squeezing the pixel intensity to near zero (in short all images were just black images). I discovered it after debugging my preprocessing step in which I tried to write some of the images in a disk. To be honest, I was suspecting it was a bug from Keras but boom! it was not. I tried to share my experience in case anyone else is facing the same issue. |
Were you dividing your images by 255? I am facing the same issue and am starting to suspect this is the problem. I divide my pixels by 255 (as is customary) but can still see what the image looks like when plotting it. |
I faced the same issue. It got resolved by changing the optimizer from 'rmsprop' to 'adam'. |
I tried changing optimizers, learning rates, momentum, network depth, and all other parameters. Turns out, I just needed to let it train for a long time before it started to find where the loss was decreasing. The AUC was stagnant for 35 epochs then it started increasing. Can't think of why, but it eventually started to learn. |
What is the variable |
Go with the suggestion given by @kodon0 . It works ! |
This worked for me! |
This solution work like a charm! thx |
I use LSTM to do a sequence labeling task, but I got the same acc and cal_acc for each epoch.
here is my code:
def moduleRnn(self):
model = Sequential()
model.add(LSTM(output_dim=64,input_length=self.seq_len,batch_input_shape=(16,1,200),input_dim=self.embed_length,return_sequences=True,stateful=False ))
#model.add(LSTM(output_dim=16,return_sequences=True,stateful=False ))
model.add(Dropout(0.2))
model.add(TimeDistributedDense(output_dim=self.labs_len))
model.add(Activation('softmax'))
model.compile(loss="categorical_crossentropy" , optimizer='rmsprop' , class_mode='categorical')
#model.fit(self.train,self.train_lab,batch_size=16,nb_epoch=3,verbose=1, validation_split=0.1,show_accuracy=True)
model.fit(self.X_train,self.Y_train,batch_size=16,nb_epoch=15,verbose=1,show_accuracy=True,validation_split=0.2)
score = model.evaluate(self.X_test,self.Y_test,batch_size=16)
print score
Anyone meets the same problem? please help me
The text was updated successfully, but these errors were encountered: