New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
keras 2 - fit_generator broken? #5818
Comments
I'm also having a problem with fit_generator after upgrading to Keras2. The model training time has gone up about 1000 times! Have not figured out yet why. I read that in Keras2 fit_generator number of samples has been replaced by the number of batches. I suspect this is the cause of the issue but don't know for sure. |
Hi @daavoo, can you explain why you closed this issue? Did you manage to solve your issue? I have seen several people have this issue today, so if you have found a workaround it would be much appreciated. |
Ok so:
Lastly, this is a little embarrasing, but I closed the issue by mistake when closing issues from my repos. xD |
Tracked down the problem and I think that the issue is related to this class: |
Hi, I'm having this same problem after upgrading to Keras 2 on Ubuntu 16.04. Any progress on a fix? |
Hello, Im not sure if this is the right place to ask, but hopefully someone can help. Im having some issues with fit_generator() in keras V1. It seems to work well on the first epoch , but not on the epochs afterwards. I say this because on the first epoch, the model takes a significant amount of train time, and returns accuracy metrics that seem plausible. However, epoch 2 onwards, the train time decreases significantly, and accuracy shoots up to 1 ( obviously suspicious). It seems as though the generator doesn't reset appropriately. Does anyone know what could be causing this? My code :
and my network definitions
this, in turn, returns
|
You should all note that generator methods have switched from being
sample-based (one epoch = defined number of samples) to being step-based
(one epoch = defined number of batches). The conversion is handled
automatically when possible, but if you are using custom generators then
that may not be possible.
For more info, see the release notes:
https://github.com/fchollet/keras/wiki/Keras-2.0-release-notes
…On 22 March 2017 at 12:36, jerpint ***@***.***> wrote:
Hello, Im not sure if this is the right place to ask, but hopefully
someone can help. Im having some issues with fit_generator() in keras V1.
It seems to work well on the first epoch , but not on the epochs
afterwards. I say this because on the first epoch, the model takes a
significant amount of train time, and returns accuracy metrics that seem
plausible. However, epoch 2 onwards, the train time decreases
significantly, and accuracy shoots up to 1 ( obviously suspicious). It
seems as though the generator doesn't reset appropriately. Does anyone know
what could be causing this?
My code :
def batch_generator_train():
from keras.utils import np_utils
global f_train
dset_train = f_train['urbansound']
global batch_size
global count_train
global meta_info_train
global nb_classes
idx = range(0,count_train)
np.random.shuffle(idx)
count=0
while 1:
idx_tmp = idx[count*batch_size:(count+1)*batch_size]
X_train = np.zeros((batch_size,128,128,1))
y_train = np.zeros(batch_size)
#y_meta_train_all = []
for ii,jj in enumerate( idx_tmp ):
X_train[ii,:,:,0] = dset_train[jj]
y_train[ii] = meta_info_train[jj][6]
#y_meta_train_all.append( meta_info_train[jj])
Y_train = np_utils.to_categorical(y_train, nb_classes)
yield X_train,Y_train
count=count+1
def batch_generator_val():
from keras.utils import np_utils
global f_val
dset_val = f_val['urbansound']
global batch_size
global count_val
global meta_info_valid
global nb_classes
idx = range(0,count_val)
np.random.shuffle(idx)
count=0
while 1:
idx_tmp = idx[count*batch_size:(count+1)*batch_size]
X_val = np.zeros((batch_size,128,128,1))
y_val = np.zeros(batch_size)
#y_meta_train_all = []
for ii,jj in enumerate( idx_tmp ):
X_val[ii,:,:,0] = dset_val[jj]
y_val[ii] = meta_info_valid[jj][6]
#y_meta_train_all.append( meta_info_train[jj])
Y_val = np_utils.to_categorical(y_val, nb_classes)
yield X_val,Y_val
count=count+1
and my network definitions
f_train = h5py.File("/home/jerpint/Desktop/Audiostuff/aug/Xtrain.h5", "r")
f_val = h5py.File("/home/jerpint/Desktop/Audiostuff/aug/Xvalid.h5", "r")
generator_train = batch_generator_train()
generator_val = batch_generator_val()
# callbacks
filepath = 'test2_callback_audio.hdf5'
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=0, save_best_only=True, mode='max')
callbacks_list = [checkpoint]
#count_val = number of validation samples, count_train = number of train samples
history = model.fit_generator(generator= generator_train,samples_per_epoch= int(np.floor((count_train)/batch_size)*batch_size),nb_epoch=5,verbose=2,validation_data=generator_val,
nb_val_samples = int(np.floor((count_val)/batch_size)*batch_size))#,callbacks = callbacks_list
score = model.evaluate(X_test, Y_test, verbose=0)
print('Test score:', score[0])
print('Test accuracy:', score[1])
f_train.close()
f_val.close()
this, in turn, returns
75584 test samples
Epoch 1/5
7412s - loss: 0.8559 - acc: 0.7073 - val_loss: 1.3755 - val_acc: 0.6275
Epoch 2/5
435s - loss: 4.5010e-04 - acc: 0.9999 - val_loss: 1.1921e-07 - val_acc: 1.0000
Epoch 3/5
437s - loss: 3.4126e-06 - acc: 1.0000 - val_loss: 1.1921e-07 - val_acc: 1.0000
Epoch 4/5
437s - loss: 1.8840e-06 - acc: 1.0000 - val_loss: 1.1921e-07 - val_acc: 1.0000
Epoch 5/5
437s - loss: 1.6184e-06 - acc: 1.0000 - val_loss: 1.1921e-07 - val_acc: 1.0000
Test score: 7.35714074846
Test accuracy: 0.40946496613
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#5818 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AArWb7Qln6FC7vgu80IhqWGSi02FZTXkks5roXg6gaJpZM4MfNie>
.
|
I'm not having this issue anymore. |
FYI I'm getting the exact same weird character printing issue in a Jupyter notebook without data generators. Steps to reproduce: copy paste the mnist_cnn example below to a notebook and evaluate cell. |
Are you using Windows? |
Jupyter doesn't handle Keras verbosity very well for me either (python 2.7, Ubuntu 16.04), it crashes it most of the time for me as well, I recommend setting verbosity to a minimum or to 0 completely. Or using it through a terminal by running python directly , that seems to work fine |
@daavoo Ubuntu 16.04 and Jupyter on latest Chrome |
@harpone This is really strange. I was facing this issue last week on Windows but right now I'm using same configuration as yours and it works fine for me. Do you have the lastest jupyter and keras versions??. Pd. Solutions for those interested, apart from setting verbose to 0:
|
@harpone as I mentioned above, the issue is related to the I noticed the problem using |
Yes, latest Jupyter and Keras... took a look at the Progbar class and the
git history briefly, but can't figure out the cause
…On Sun, Mar 26, 2017 at 9:51 PM David de la Iglesia Castro < ***@***.***> wrote:
@harpone <https://github.com/harpone> as I mentioned above, the issue is
related to the class Progbar(object) in :
https://github.com/fchollet/keras/blob/master/keras/utils/generic_utils.py#L211
I noticed the problem using fit_generator but any method that uses the
Progbar will also print those strange symbols
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5818 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AE4ECAz9JeVM_Ns2CjqIdSI4fPr273Vlks5rprOvgaJpZM4MfNie>
.
|
For those who are looking at this issue now. My solution: I believe this is what @fchollet mentioned in his response. so my batch size is 32 , no of samples was 8000 and validation set size was 2000 As per the keras documentation the parameter steps_per_epoch = no of train samples/batch_size and validation_steps = no of validation samples/batch_size
Following code tested on the day of this comment. 08 Sep 2017. Old Code Changed Code |
In case it helps anyone coming across this issue of fit() calls flooding jupyter with characters and crashing the tab: you can probably fix with
if that doesn't work you can try
the old jupyter kernel doesn't correctly interpret the new jupyter kernel properly interprets these characters and should not have this issue. there is a similar issue in vscode's output, where
to:
or just using verbose=2, depending on how bad you want a progress bar. There is also another solution in #4880, which just replaces the progbar entirely. |
Were these issues (as far as not being user errors) with Keras (2) and fit_generator resolved? I arrived at this thread googling for explanation/solution to similar problems that I have. I have a very simple (toy) model set up. Training etc. works very well when using fit:
Now I'm trying to get fit_generator to work, but without success. Note that all my data fits well in memory at the same time, and the generator is just a dummy one, only taking care of shuffling and batch-division, no real data processing:
First, a comment regarding the "verbose=1"-outputs above: Why is the "fit version" showing "sample/samples" for each epoch, while the "fit_generator version" shows "epoch/epochs"? Shouldn't his have been a change from ver. 1 to 2 of Keras, rather than a difference between "fit/fit_generator"? Second: Loss and accuracy values seem fine, and converging nicely, in both cases. Running time is also about the same. However, the test on additional test data shows that only the "fit version" produces a good model! (96.57% vs. 40.20% accuracy on test data!) Note that my number of training samples is 1507, batch size 128, so number of batches should be 12, with the last one not completely filled. Any comments appreciated! |
A short followup to my own post: Even stranger, is that for a given model, evaluation of the model on a test data set gives different results, depending on whether
In this case, it's hard to make any mistakes (maybe I still managed?!) since the generator should be extremely simple; mine is this:
I really need to get this resolved before committing more effort to using this system... |
Ugh, yet another comment: I just noticed that subsequent calls to the test data evaluation with the
|
Uh... I must hurriedly admit to having made an embarrassingly stupid bug involving a missed indirection in training data dereferencing, which appears to be the cause of my odd observations. Hope I didn't waste anybody's time too much... "Rubber duck debugging" to a real and observant colleague solved it! |
Hi, I have the same issue with my notebook: https://gist.github.com/GuillaumeDesforges/da20d65b825a8e13da9cc1489eeee543/7be860e635b03291e0657f1f4896212d9ccf3f4c I can clearly see that before the fit_generator, the RAM is stable at 6Go. When I start fit_generator I see the RAM usage growing until it reaches max of my PC (16Go). However getting an item form my Sequence class is not increasing RAM. Does the fit_generator behave other than just computing the batches? Thanks, |
I guess that's because you are passing a dataframe both as input and target. |
Hi, Convolutional Neural NetwrokImporting the Keras libraries and packagesimport keras #Initializing CNN #Step 1: Convolution Step2 : Max poolingclassifier.add(MaxPooling2D(pool_size = (2, 2))) Step 3: Flatteningclassifier.add(Flatten()) Step 4: Full Connectionclassifier.add(Dense(units = 128, activation='relu')) Adding the output layerclassifier.add(Dense(units = 1, activation='sigmoid')) Compiling the CNNclassifier.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics=['accuracy']) Fitting the CNN to the imagesfrom keras.preprocessing.image import ImageDataGenerator train_datagen = ImageDataGenerator(rescale=1./255, test_datagen = ImageDataGenerator(rescale=1./255) training_set = train_datagen.flow_from_directory('dataset/training_set', test_set = test_datagen.flow_from_directory('dataset/test_set', Error in console: |
Hi, I am facing training the dataset the .model file was got less than the number of epoch (i.e). The epoch count was 10 but .model file save on my system only 6 |
I've updated to keras v2 yesterday.
I adapted all my code from version 1 to the new API, following all the warnings I encountered.
However I'm having some very strange problems with
fit_generator
method ofModel
.Using this toy example, wich worked totally fine in version 1:
The output in jupyter notebook is quite strange, printing a unknown symbol until the notebook crashes:
Running the code from the terminal don't print those strange symbols.
The code works perfect when manually getting the batches from the generator to use with
model.fit
:Anyone is facing similar problems with
fit_generator
and/or know something about it?The text was updated successfully, but these errors were encountered: