-
Notifications
You must be signed in to change notification settings - Fork 19.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loss not changing when training #2711
Comments
Try increasing the learning rate to a higher value, possibly to 0.1. That way you can ensure that noticeable changes to weights are made for each successive update. |
Tried it, it stopped after 2 epochs. Here are the results
The first class is from 0 to 999 and the second class is from 1000 to 1999 I tried to predict right at the border and got all [1,0]. Shuffling the training set should not matter should it? |
After reading some blogs, looks like the batch size is important, because if our data is not shuffled it will learn one class for a few batches and then another class for a few batches. Similarly My loss seems to stay the same, here is an interesting read on the loss function. I really am still unsure as to what I may be doing wrong. Here are a few things I tried:
I am really unsure as to what I can do to get my loss to go down. Any other ideas? Code:
Here is the code for me to read in the images if it helps:
Here is my output:
|
|
Hi @joelthchao, I am unsure as to what you mean in 1 for miss activation. Are you saying if you remove the activation the loss increases and when you use activation it learns? For 2: I actually tried with a deeper network, but I figured since it was giving me no improvement, it may be best to simplify the model and troubleshoot with that. I am wondering if this could be an issue with my data. I can try to increasing the depth. Anything else I should look at? I will post my results from the cifar10. EDIT: I found this example: https://github.com/fchollet/keras/blob/master/examples/cifar10_cnn.py I will try it and adapt it to my needs. Can anyone explain why we do double convolution like this? |
@kevkid Try this, does loss still not decrease? model.add(Convolution2D(32, 10, 10, border_mode='same',name='conv1', input_shape = (1, 106, 106)))
model.add(Activation('relu'))
# ... |
Just tried it:
loss goes to nan. Could the weights be blowing up? I will try the example from keras for cifar 10. |
Could this be my architecture? Are there any resources for designing the neural network? Here is how the model currently looks:
Based on cifar10 example. I am currently running the model. |
I was able to make a decent model that gave me excellent results. I am unsure why rmsprop seems to make the loss go up, but here is my model:
|
I have got the same problem as you , but I guess there must be something worng with my function of load_data, and I am not sure if I get the right Xtrain and Ytrain, so if you please share your code of load_data , thanks a lot. |
@111hypo I have posted my loadCustomData function a few posts above this one. The portion :
is unnecessary because we do not need to shuffle the input (This was just a test to try and figure out why My network would not converge). I still have problems with RMSprop. It quickly gains loss, and the accuracy goes to 0 (which to me is funky). I tried a few different SGDs and the one in my latest post seemed to work the best for me. |
@kevkid I also meet your problem. I collect 1505 numbers pics as my dataset and use a simple model. |
@111hypo ,how do you solve your problem. |
@kevkid Have you tried to change the ' momentum=1.9 '.I found that this problem may connected to the argument named 'momentum' in SGD optimizter. I did't find the solution yet but when I changed the momentum to 0.5, the loss changed.But after several epoch , the loss did not change again...... hope this can help you ! |
You may want to reduce your drop out rate and shuffle your data |
Hello, I used vgg19 architecture to classify my data set into 2 classes but the problem is the value of accuracy doesn't change after 30 iteration and I don't what is the problem and this my code: number of output classesnb_classes = 3 number of epochs to trainnb_epoch = 150 #each epochs contains around 70000/128=468 batches with 128 images STEP 1: split X and y into training and testing setsX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4) compute the total time of prearing and importing the datasett_generateArray = time.time() print('X_train shape:', X_train.shape) convert class vectors to binary class matricesY_train = np_utils.to_categorical(y_train, nb_classes) Step 1:Network structuremodel = Sequential() model.add(ZeroPadding2D((1, 1))) model.add(ZeroPadding2D((1, 1))) model.add(ZeroPadding2D((1, 1))) model.add(ZeroPadding2D((1, 1))) model.add(Flatten()) #step 2: Learning target(computiong the loss using the entropy function adagrad = Adagrad(lr=0.01, epsilon=1e-08)model.compile(loss='categorical_crossentropy',optimizer= sgd,metrics=['accuracy']) #checkpointer=ModelCheckpoint(filepath='exp_161123_best_lr0.0001_weights.h5',monitor='acc', verbose=1, save_best_only=True, mode='max') #training the model model.save_weights('final_last5_weights.h5')#model.save_weights('exp_161123_final_lr0.0001_weights.h5') #evaluate the model |
If you have unbalanced classes, maybe you should consider weighting classes, check class_weight and sample_weight in Keras docs Here's a similar question asked on stackoverflow |
@redouanelg Do you mean by adding |
@alyato Sorry for the late reply. My modest experience tells me that if you have only two classes use a dict in class_weight. If you have more you'll get the error class_weight not supported for +3 dim. A way to overcome this consists in adding sample_weight in fit() using a 2D weight array (one weight per timestep per sample), and adding sample_weight_mode="temporal" in compile() It's not an elegant solution but it works. I'll be glad if someone has another answer. |
Hey, i am having a similar problem i am trying to train a network to learn word embeddings using skip grams. i have a vocabulary of 256 and a sequence of about 166000 words. But when i train, the accuracy stays the same at around 0.1327 no matter what i do, i tried changing learning rates and batch_size. But no luck. This has happened every time i used keras. But it usually starts learning after tweaking the batch size a bit. But this one just doesn't work. Here is the model:
` y will be This is what happens when i try to train: 9s - loss: 4.5012 - acc: 0.1794 - val_loss: 4.5873 - val_acc: 0.1327 I ve waited for a about 50 epochs and the acc still does not change. Any idea what i am doing wrong? Ive faced this problem everytime ive used keras even when training other models like language modelling using RNNs text generation using LSTMs. |
Hi @adityashinde1506, You may have already found a solution but if not, try to decrease your learning rate. |
Hi guys, I am having a similar problem. I am training an LSTM model for text classification and my loss does not improve on subsequent epochs. I tried many optimizers with different learning rates. But same problem.
The OUTPUT I get is this:
I am unable to figure out what the problem is. |
Out of curiosity, why are you passing in a "weights" matrix to the Embedding layer? Thanks. |
Hi @td2014 , that |
Okay. And you set the learning rate to 0.1 for your optimizer(s). Just curious, but was the default not working? Thanks. |
Initially, it was default. Then I read on a similar issue page on stackoverflow where it told to alter learning rates. So I was trying to see the change on different values. Using lr=0.1 the loss starts from 0.83 and becomes constant at 0.69. When I was using default value, loss was stuck same at 0.69 |
Okay. I created a simplified version of what you have implemented, and it does seem to work (loss decreases). Here is the code you can cut and paste. Note that the first section is setting up the environment for reproducible results (which I provide at the end in my case). In your case, you may want to check a few things:
I hope this helps. Thanks. Start: Set up environment for reproduction of resultsimport numpy as np from keras import backend as K End: Set up environment for reproduction of resultsfrom keras.layers import LSTM, Dense, Embedding Create input sequencesword_index=21 Preprocessmax_length = 5 preparing y_trainy_train = [] y_train = np.array(y_train) Create modelEMBEDDING_DIM=16 model = Sequential() Trainprint('Training model...') output predictionspredictions = model.predict(X_train) ====OUTPUT BELOW==== Layer (type) Output Shape Param #embedding_29 (Embedding) (None, 5, 16) 352 lstm_29 (LSTM) (None, 5) 440 dense_29 (Dense) (None, 2) 12Total params: 804.0 None |
Dude, you architecture just does not work. Try something different. |
It works, I had to clean the data. Then the loss started to converge |
try 'sigmoid' activation for the last layer since it's a binary classification problem |
Here is a good list of issues to check for that I have found useful: https://blog.slavv.com/37-reasons-why-your-neural-network-is-not-working-4020854bd607 |
Another reason could be class imbalance. So try upsampling or downsampling using SMOTE/OneSidedSelection from imblearn package, then reshape your data back to 4 dimensions for your model. |
I encountered the problem while I was trying to finetune a pretrained VGGFace model, using keras_vggface.utils.preprocess_input as my custom preprocessing function.
The problem seems to come from the scaling. I used preprocessing_function=keras_vggface.utils.preprocess_input and got into that problem. However, when I rescale it with 1/255. the problem is fixed. I think it may be that the pretrained model was trained additionally with a scaling factor to normalize it to [0,1], but the preprocessing function only gives us the mean so we know which means to subtract to center the data. I'd recommend you check if your scaling makes sense; a bad scaling of inputs into a Neural Network may cause your updates to either move very slowly (i.e. the derivative of the sigmoid function beyond -3 and +3 are near 0 and so your gradients are almost 0), or if you're using something like the ReLU function, the updates may be big (the derivative is 1) and a wrong update makes you jump pass the local minima very easily. ALSO, if you're rescaling in python 2, make sure you have that dot in 1/255., or else all your inputs will be multiplied by 0 and you aren't making any updates!!! |
sigmoid_cross_entropy_with_logits may encounters the gradients explosion problem, try using clip_gradients. |
In my case, It is the normalization problem: x_train /= 255 |
I am trying to Create DNN but it is not converging, any idea rmsprop = optimizers.RMSprop(lr=0.01, rho=0.7, epsilon=1e-8, decay=0.0)
|
I had a similar issue today when training on Google cloud GPU. I tried changing network architecture, weights, etc. The solution was to reset the TF graph: tf_reset_default_graph() |
Thx,I will have a try,hope it would work! |
For me, theese 3 things did the trick:
|
For me, this works: |
I had the same problem. But for me it is just the learning_rate is too small. Try with a bigger learning_rate. It might help |
Based on my own experience as a starter, one possible reason or bug in your model is that you probably used a wrong activation function, i.e. the way you activated your result at the last output layer, for example, if you are trying to solve a multi class proplem, usually we use softmat rather sigmoid, while sigmoid is meant to activate the output for binary task. And in this case, it's a binary application, therefore just change your activation function as sigmoid, you should not find such exception. |
Try removing the Activation('softmax') layer. |
I had a model that did not train at all. It just stucks at random chance of particular result with no loss improvement during training. Loss was constant 4.000 and accuracy 0.142 on 7 target values dataset. It become true that I was doing regression with ReLU last activation layer, which is obviously wrong. Before I was knowing that this is wrong, I did add Batch Normalisation layer after every learnable layer, and that helps. However, training become somehow erratic so accuracy during training could easily drop from 40% down to 9% on validation set. Accuracy on training dataset was always okay. Then I realized that it is enough to put Batch Normalisation before that last ReLU activation layer only, to keep improving loss/accuracy during training. That probably did fix wrong activation method. However, when I did replace ReLU with Linear activation (for regression), no Batch Normalisation was needed any more and model started to train significantly better. |
I'd highly recommend messing around with learning rates... Testing with extreme variability, like |
I think that the problem comes from the learning rate, Mine was actually equal to 7 ahah. I wrote 10-3 instead of 1e-3. |
My data had 3 classes but last layer was Also if you are training binary classifier, you can just use |
Happened many times, my solution is change or remove the activation function in the last layer. |
finally |
I have a model that I am trying to train where the loss does not go down. I have a custom image set that I am using. These images are 106 x 106 px (black and white) and I have two (2) classes, Bargraph or Gels. These two classes are very different. I have run the Cifar10 dataset and it did reduce the loss, but I am very confused as to why my model will always predict only one class for everything.
Xtrain is a numpy array of images (which are numpy arrays), Ytrain is a numpy array of arrays ([0,1] or [1,0]) the shapes look like this:
Here is my model:
Right now I am just doing very small training sets (I tried doing 1000 examples as well, with similar results).
I have also tried RMS and SDG with large and small learning rates.
What else can I try ?
pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
The text was updated successfully, but these errors were encountered: