Keras freezing on last batch of first epoch (can't move to second epoch) #8595

Moondra · 2017-11-26T20:50:19Z

I'm using Keras 2.1.1 and Tensorflow 1.4, Python 3.6, Windows 7.

I'm attempting transfer learning using the Inception model.
The code is straight from the Keras Application API, just a few tweaks (using my data).

Here is the code

from keras.preprocessing import image```
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D
from keras import backend as K
from keras import optimizers


img_width, img_height = 299, 299
train_data_dir = r'C:\Users\Moondra\Desktop\Keras Applications\data\train'
total_samples = 13581
batch_size = 3
epochs = 5


train_datagen = ImageDataGenerator(
rescale = 1./255,
horizontal_flip = True,
zoom_range = 0.1,
rotation_range=15)



train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size = (img_height, img_width),
batch_size = batch_size, 
class_mode = 'categorical')  #class_mode = 'categorical'


# create the base pre-trained model
base_model = InceptionV3(weights='imagenet', include_top=False)

# add a global spatial average pooling layer
x = base_model.output
x = GlobalAveragePooling2D()(x)
# let's add a fully-connected layer
x = Dense(1024, activation='relu')(x)
# and a logistic layer -- let's say we have 200 classes
predictions = Dense(12, activation='softmax')(x)

# this is the model we will train
model = Model(input=base_model.input, output=predictions)

# first: train only the top layers (which were randomly initialized)
# i.e. freeze all convolutional InceptionV3 layers
for layer in base_model.layers:
    layer.trainable = False

# compile the model (should be done *after* setting layers to non-trainable)
model.compile(optimizer=optimizers.SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy', metrics = ['accuracy'])

# train the model on the new data for a few epochs
model.fit_generator(
train_generator,
steps_per_epoch = 20,
epochs = epochs)


# at this point, the top layers are well trained and we can start fine-tuning
# convolutional layers from inception V3. We will freeze the bottom N layers
# and train the remaining top layers.

# let's visualize layer names and layer indices to see how many layers
# we should freeze:
for i, layer in enumerate(base_model.layers):
   print(i, layer.name)

# we chose to train the top 2 inception blocks, i.e. we will freeze
# the first 172 layers and unfreeze the rest:
for layer in model.layers[:249]:
   layer.trainable = False
for layer in model.layers[249:]:
   layer.trainable = True

# we need to recompile the model for these modifications to take effect
# we use SGD with a low learning rate
from keras.optimizers import SGD
model.compile(optimizer=SGD(lr=0.0001, momentum=0.9), loss='categorical_crossentropy', metrics = ['accuracy'])

# we train our model again (this time fine-tuning the top 2 inception blocks
# alongside the top Dense layers
model.fit_generator(
train_generator,
steps_per_epoch = 25,
epochs = epochs)`


Output is

Found 13581 images belonging to 12 classes.

Warning (from warnings module):
  File "C:\Users\Moondra\Desktop\Keras Applications\keras_transfer_learning_inception_problem_one_epoch.py", line 44
    model = Model(input=base_model.input, output=predictions)
UserWarning: Update your `Model` call to the Keras 2 API: `Model(inputs=Tensor("in..., outputs=Tensor("de...)`
Epoch 1/5

 1/20 [>.............................] - ETA: 38s - loss: 2.8652 - acc: 0.0000e+00����������������������������������������������������������������������������������
 3/20 [===>..........................] - ETA: 12s - loss: 2.6107 - acc: 0.1111    ������������������������������������������������������������������������������
 4/20 [=====>........................] - ETA: 8s - loss: 2.6454 - acc: 0.0833 �����������������������������������������������������������������������������
 5/20 [======>.......................] - ETA: 6s - loss: 2.6483 - acc: 0.0667�����������������������������������������������������������������������������
 6/20 [========>.....................] - ETA: 5s - loss: 2.6863 - acc: 0.0556�����������������������������������������������������������������������������
 7/20 [=========>....................] - ETA: 4s - loss: 2.6230 - acc: 0.0952�����������������������������������������������������������������������������
 8/20 [===========>..................] - ETA: 3s - loss: 2.6212 - acc: 0.0833�����������������������������������������������������������������������������
 9/20 [============>.................] - ETA: 3s - loss: 2.6192 - acc: 0.1111�����������������������������������������������������������������������������
10/20 [==============>...............] - ETA: 2s - loss: 2.6223 - acc: 0.1000�����������������������������������������������������������������������������
11/20 [===============>..............] - ETA: 2s - loss: 2.6626 - acc: 0.0909�����������������������������������������������������������������������������
12/20 [=================>............] - ETA: 2s - loss: 2.6562 - acc: 0.1111�����������������������������������������������������������������������������
13/20 [==================>...........] - ETA: 1s - loss: 2.6436 - acc: 0.1282�����������������������������������������������������������������������������
14/20 [====================>.........] - ETA: 1s - loss: 2.6319 - acc: 0.1190�����������������������������������������������������������������������������
15/20 [=====================>........] - ETA: 1s - loss: 2.6343 - acc: 0.1111
Warning (from warnings module):
  File "C:\Users\Moondra\AppData\Local\Programs\Python\Python36\lib\site-packages\keras\callbacks.py", line 116
    % delta_t_median)
UserWarning: Method on_batch_end() is slow compared to the batch update (0.102000). Check your callbacks.
�����������������������������������������������������������������������������
16/20 [=======================>......] - ETA: 0s - loss: 2.6310 - acc: 0.1042�����������������������������������������������������������������������������
17/20 [========================>.....] - ETA: 0s - loss: 2.6207 - acc: 0.1176�����������������������������������������������������������������������������
18/20 [==========================>...] - ETA: 0s - loss: 2.6063 - acc: 0.1296�����������������������������������������������������������������������������
19/20 [===========================>..] - ETA: 0s - loss: 2.6056 - acc: 0.1228




It just hangs at the 19/20.

I already asked on stack overflow but no help.

https://stackoverflow.com/questions/47382952/cant-get-past-first-epoch-just-hangs-keras-transfer-learning-inception

The text was updated successfully, but these errors were encountered:

whatisAI · 2017-11-28T13:17:03Z

I have the same issue. I've been trying to change batch sizes, but that doesn't seem to change anything.

moondra2017 · 2017-11-29T21:09:28Z

I think there is bug with Imagedatagenerator. If I load my images from h5py using
model.train_on_batch I have no problems.

moustaki · 2017-12-01T20:31:29Z

Same issue here. fit_generator works fine in 2.0.9, but hangs indefinitely at the end of the first epoch from 2.1.0 onwards.

fchollet · 2017-12-01T21:14:03Z

This is likely due to changes in keras/utils/data_utils.py between 2.0.9 and 2.1.0. Specifically this: 612f530#diff-ba9d38600a2df565e5ae8757eb2b1b35

@Dref360 please take a look, this seems like a serious issue.

Dref360 · 2017-12-02T15:36:38Z

@moustaki Are you also using flow_from_directory?

Dref360 · 2017-12-02T16:03:03Z

Could you all update to master / 2.1.2 please?
Pretty sure this has been fixed with : 2f3edf9#diff-299cfd5886683a4b012f286403769fc1

moustaki · 2017-12-02T16:52:58Z

@Dref360 Thanks - just tried both master and 2.1.2 and it indeed fixes the issue. Should have tried that before -- sorry about that! For your earlier question, I am using a custom Sequence sub-class.

NikeNano · 2017-12-23T16:58:46Z

I still have this problem with Keras 2.1.2 using tensorflow-gpu 1.4.1. Some advise how to solve it?

oliran · 2018-01-04T02:12:58Z

NikeNano - make sure that your validation_steps is reasonable. I had a similar problem, but turns out I forgot to divide by batch_size.

LivingProgram · 2018-01-11T20:04:28Z

same with @NikeNano , using keras 2.1.2 and tensorflow-gpu 1.4.1 and keras freezes on epoch 11

minaMagedNaeem · 2018-01-19T03:09:10Z

I have the same problem it is stuck on last batch of first epoch.
Keras version 2.1.3
Tensorflow version 1.4.0

Epoch 1/30
C:\Users\Minal\AppData\Local\Programs\Python\Python36\lib\site-packages\skimage\transform_warps.py:84: UserWarning: The default mode, 'constant', will be changed to 'reflect' in skimage 0.15.
warn("The default mode, 'constant', will be changed to 'reflect' in "

1/6428 [..............................] - ETA: 9:25:55 - loss: 0.0580
2/6428 [..............................] - ETA: 7:46:11 - loss: 0.0560
3/6428 [..............................] - ETA: 7:14:06 - loss: 0.0569
4/6428 [..............................] - ETA: 6:52:54 - loss: 0.0536
5/6428 [..............................] - ETA: 6:49:36 - loss: 0.0541
6/6428 [..............................] - ETA: 6:51:51 - loss: 0.0556
7/6428 [..............................] - ETA: 6:45:15 - loss: 0.0580
8/6428 [..............................] - ETA: 6:33:50 - loss: 0.0595
9/6428 [..............................] - ETA: 6:20:48 - loss: 0.0594
10/6428 [..............................] - ETA: 6:12:55 - loss: 0.0604
11/6428 [..............................] - ETA: 6:07:12 - loss: 0.0596
12/6428 [..............................] - ETA: 6:00:31 - loss: 0.0588
13/6428 [..............................] - ETA: 6:00:06 - loss: 0.0589
14/6428 [..............................] - ETA: 5:59:53 - loss: 0.0591
15/6428 [..............................] - ETA: 5:57:44 - loss: 0.0590
16/6428 [..............................] - ETA: 5:55:21 - loss: 0.0601
.
.
.
6420/6428 [============================>.] - ETA: 14s - loss: 0.0213
6421/6428 [============================>.] - ETA: 12s - loss: 0.0213
6422/6428 [============================>.] - ETA: 10s - loss: 0.0213
6423/6428 [============================>.] - ETA: 8s - loss: 0.0213
6424/6428 [============================>.] - ETA: 7s - loss: 0.0213
6425/6428 [============================>.] - ETA: 5s - loss: 0.0213
6426/6428 [============================>.] - ETA: 3s - loss: 0.0213
6427/6428 [============================>.] - ETA: 1s - loss: 0.0212

minaMagedNaeem · 2018-01-19T03:56:16Z

It's solved, It just took so much time in the last batch but then it got to epoch 2

KenHollandWHY · 2018-02-24T01:42:12Z

I also have the same issue, where first epoch hangs on the last step. Using the latest Keras, gpu, python 3.5, windows 10

LivingProgram · 2018-02-24T02:01:23Z

If you are still having this problem, try rebooting, I don't know why but that fixed my issue as I was running keras on the cloud

JackCurrie · 2018-04-13T22:38:59Z

Hello! I am running into this issue still on Ubuntu running Python 3.5.2 and Keras 2.1.4. I've been waiting a few hours at the end of the first epoch on a very similar issue (Training a transfer binary classifier on VGG19).

At first I thought that it must have been just running through my validation data which was taking an exorbitant amount of time until I found this thread. Is it still a possibility that it is just a very slow iteration over my validation set (it's about 12,000 images, running on a GTX 950)? Or is my mental model of how fit_generator works mistaken?

Also, thanks to all who are maintaining this project! It's been great to work with as I'm beginning to dive deeper into ML. 😄

Update: Found I was using Keras 1 api for fit_generator method, switched to using the Keras 2 api and its working now.

kaka7 · 2018-04-26T12:36:45Z

@minaMagedNaeem:same with @oliran, i have the same issue and resolve it after setting validation_steps=validation_size//batch_size

history_ft = model.fit_generator(
generator_train,#可自定义
samples_per_epoch=4170, # nb_train_samples
# steps_per_epoch=10, # nb_train_samples#每轮epoch遍历的samples
validation_data=generator_test,#可自定义
nb_epoch=10,
# verbose=0,
validation_steps=530//64,
# epochs=100
# nb_val_samples=530
)

ptah23 · 2018-05-02T22:17:38Z

same here. i have this problem with the code from Deep Learning with Python Listing 6.37
I am on Ubuntu 18.04 with keras 2.1.6, tensorflow-gpu 1.8.0

Tensorfengsheng1926 · 2018-05-09T01:54:47Z

I have same issue when I was running Inception V3 to do transfer learning. Windows 10, python 3.5, keras 2.1.6, tensorflow 1.4-gpu

hashJoe · 2018-05-10T21:51:29Z

Same here with python3, keras v2.1.6, tensorflow v1.8, ubuntu 18.04
After multiple reinstallations and tries
the solution was to wait for several minutes for it to jump to epoch 2/25, after it was stuck on epoch 1 (7999/8000) xD

ldelphinpoulat · 2018-05-15T09:24:45Z

I had a similar issue with python3, keras v2.1.6, tensorflow v1.8.0, ubuntu 16.04. I interrupted the processing and was able to see that was busy running self.sess.run([self.merged], feed_dict=feed_dict) in keras/callbacks.py.
I guessed that it was related to histogram computations in TensorBoard. So, I set histogram_freq=0 on TensorBoard object creation. And, for me it solved the issue, at the cost of loosing TensorBoard histograms.
I had previous versions of keras and tensorflow for which the histogram computation for tensorboard did not take such a huge time (unfortunately I do not recall for which versions it was ok).

shaktisd · 2018-05-21T14:22:02Z

Changing validation_steps=validation_size//batch_size worked for me

whatdhack · 2018-06-12T02:03:40Z

Experiencing the same with Keras 2.2.0 Tensorflow 1.8 on Ubuntu 16.04 .

bmitrauncc · 2018-06-20T23:41:02Z

Getting stuck here

yangjh39 · 2018-07-11T23:58:17Z

Experiencing the same with Keras 2.2.0 Tensorflow 1.10 on Ubuntu 16.04 .

kjaisingh · 2018-07-31T07:03:52Z

Experiencing the same - stuck on the final batch for my CNN!

ejcer · 2018-08-04T21:11:27Z

same. For what it's worth, I think this is a CPU thing, because when I run my code on 1080, it works fine.

dantheman3333 · 2018-08-10T21:06:28Z

Have the same issue. Stuck on first epoch step 1999/2000. Using windows, tensorflow-gpu 1.10.0, Keras 2.2.2, CUDA V9.0.176. Using the ImageDataGenerator flow_from_directory for training and validation

I have way too much data - I have 50 million images and I split it 70 train and 30 val, so I thought it would have way too much validation data to run through every epoch. But if I set validation_steps in fit_generator to 1 it should only do one step of validation (one batch?) before moving on to the next epoch?

I'm new to this so I'm having a hard time debugging, but this is the profile after I few hours:

when sorted by time taken the top two methods are get and wait in pool.py, and the other get is from keras' data_utils.py

Edit: I downgraded Keras to 2.0.9 and now it works
Edit: I actually still sometimes have this issue on 2.0.9. Can't seem to find out why it's happening occasionally.

MinnML · 2018-08-22T14:59:07Z

I had this issue with both CPU and GPU, keras 2.2.0. What solved it for me was to set workers=0.

ashuta03 · 2018-08-27T04:30:23Z

This worked for me:

set workers=1, and use_multiprocessing=False in self.keras_model.fit_generator in model.py
Make sure that:
steps_per_epoch = number of train samples//batch_size
and
validation_steps = number of validation samples//batch_size

Quetzalcohuatl · 2019-04-17T14:45:15Z

I encountered this problem using the fit function. I believe I fixed it by setting batch_size=2 and using Adam instead of SGD as my optimizer. I think it may be a memory issue, and the machine was coping using swap memory which is notoriously slow.

mnguyenmti · 2019-06-20T02:37:07Z

I confirm the valid_generator was the problem. The problem was gone after I had turned it off. But if the validation set is big, I still need the method. I would appreciate if the Keras team can help with this!

srv902 · 2019-08-30T12:58:27Z

Any progress with this issue?

SWHL · 2019-09-16T02:37:57Z

I am meeting the same issue with keras 2.2.4, tf 1.8.
I think the reason is the IMAGES_PER_GPU = 4. When I change the IMAGES_PER_GPU = 1 from IMAGES_PER_GPU = 4, the problem is gone.

BioScince · 2019-10-07T18:11:42Z

I am meeting the same issue with keras

the code stop in 18 epoch from 60 epoch ????
any one help me

Quetzalcohuatl · 2019-10-07T19:55:33Z

@BioScince I think that's just a problem with the website. It looks like you can't scroll down within the output. Try committing and see if it clips output still. Or write your standardout to a text file

Akiqqqqqqq · 2019-10-20T16:17:09Z

I also solved this by removing the validation process entirely. I use Ubuntu18.4 LTS, cuda10.0 cudnn7.6, keras2.3.1, and tesoflow 1.14.

ghost · 2019-11-07T10:40:59Z

The same issue happened during the fine-tuning of a VGG16 model. Keras 2.2.0 and python: 2.7.15+
Removing use_multiprocessing=True solves the freezing problem.

Any update from the development teams on this problem?

BEEugene · 2019-11-20T08:31:54Z

I've got the same issue as @BioScince.
That wasn't an issue when I used a small number of images.
The training is run under Linux cluster.
Python 3.6.8 Keras 2.3.1
I waited for a day till it starts a new epoch after validation, nothing happens.
Training the network with such generator.

class DataGeneratorFilesCrop(Sequence):
    def __init__(self, image_mask_prepoc, image_filenames=None, mask_names=None, image_folder=None, mask_folder=None, root_dir=None,
                 batch_size=1, image_size=256, nb_y_features=1,
                 augmentation=None, mask_transform=lambda x: np.expand_dims((x > 0).astype(np.int8), -1),
                 suffle=True):
        self.image_filenames = image_filenames if image_filenames else self.listdir_fullpath(os.path.join(root_dir, image_folder))
        self.mask_names = mask_names if mask_names else self.listdir_fullpath(os.path.join(root_dir, mask_folder))
        self.batch_size = batch_size
        self.currentIndex = 0
        self.augmentation = augmentation
        self.image_size = image_size
        self.nb_y_features = nb_y_features
        self.indexes = None
        self.mask_transform = mask_transform
        self.suffle = suffle

    def listdir_fullpath(self, d):
        return np.sort([os.path.join(d, f) for f in os.listdir(d)])

    def __len__(self):
        """
        Calculates size of batch
        """
        return int(len(self.image_filenames) / (self.batch_size))

    def on_epoch_end(self):
        """Updates indexes after each epoch"""
        if self.suffle == True:
            self.image_filenames, self.mask_names = shuffle(self.image_filenames, self.mask_names)

    def read_image_mask(self, image_name, mask_name):
        return cv2.resize(cv2.imread(image_name), (self.image_size, self.image_size)) / 255, cv2.resize(cv2.imread(mask_name, 0), (self.image_size,self.image_size))

    def __getitem__(self, index):
        """
        Generate one batch of data

        """
        # Generate indexes of the batch
        data_index_min = int(index * self.batch_size)
        data_index_max = int(min((index + 1) * self.batch_size, len(self.image_filenames)))

        indexes = self.image_filenames[data_index_min:data_index_max]

        this_batch_size = len(indexes)  # The last batch can be smaller than the others

        # Defining dataset
        X = np.empty((this_batch_size, self.image_size, self.image_size, 3), dtype=np.float32)
        y = np.empty((this_batch_size, self.image_size, self.image_size, self.nb_y_features), dtype=np.uint8)

        for i, sample_index in enumerate(indexes):

            X_sample, y_sample = self.read_image_mask(self.image_filenames[index * self.batch_size + i],
                                                      self.mask_names[index * self.batch_size + i])

            if self.mask_transform:
                y_sample = self.mask_transform(y_sample)

            # if augmentation is defined, we assume its a train set

            X[i, ...] = np.clip(X_sample, a_min=0, a_max=1)
            y[i, ...] = y_sample

        return X, y

BEEugene · 2019-11-20T12:54:16Z

In my case, the problem was here:

def on_epoch_end(self):
        """Updates indexes after each epoch"""
        if self.suffle == True:
            self.image_filenames, self.mask_names = shuffle(self.image_filenames, self.mask_names)

On the epoch end, I've shuffled the data with shuffle from the random package, instead of sklearn.shuffle, it returns None as it an in-place operation.
In my case, it throws an exception which wasn't shown in the terminal and the process stopped.
That is why it doesn't start a new epoch and seemed to be frozen.
It is a clue to almost all of what was discussed here.
So, note to the developers - let exception appear:)

jalilasadi · 2019-11-25T16:53:06Z

I had the same problem and I solved it by downgrading the graphic card's driver.

doantientai · 2020-01-24T09:54:32Z

This problem can also occur when the path to valiation data is invalid, which is actually my case. I have two seperated directories for training and validation. However, the path to my validation set is incorrect. So at the end of the epoch, Keras could not load the validation data and it got freezed.

I think it would be better if Keras can promp an Error like File not found or something like that.

erasmo-aln · 2020-04-11T06:45:22Z

This worked for me:

set workers=1, and use_multiprocessing=False in self.keras_model.fit_generator in model.py

Make sure that:
steps_per_epoch = number of train samples//batch_size
and
validation_steps = number of validation samples//batch_size

That worked for me, thanks! But I just did the second step, forgot the validation_steps.

rajpaldish · 2020-04-28T18:36:56Z

Hi, I am very new to deep learning and i am using cnn for image classification. I am having the same problem that the epoch is not moving beyond 1/15. i have left it to train overnight but no reposnse and the kernel shows as busy. I am using windows 10, tensorflow 2.0.0, keras 2.3.1 and python 3.6.1

def cnn(x_train, y_train, x_test, y_test):
model = Sequential()
model.add(Conv2D(32,(3,3), activation='relu'))
model.add(MaxPool2D(pool_size=(2,2)))

model.add(Conv2D(64, (3,3), activation = 'relu'))
model.add(MaxPool2D(pool_size=(2, 2)))

model.add(Conv2D(64, (3,3), activation= 'relu'))
model.add(MaxPool2D(pool_size=(2, 2)))

model.add(Flatten())

model.add(Dense(1, activation='sigmoid'))

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history = model.fit(x_train, y_train, epochs=15, batch_size=16, verbose = 1)
loss, accuracy = model.evaluate(x_train, y_train)
probabilities = model.predict(x_test)
predictions = [float(np.round(x)) for x in probabilities]
accuracy = np.mean(predictions == y_test)
print("Prediction Accuracy: %.2f%%" % (accuracy*100))
model.save('result.h5')

the output is stuck at:

hello
i am in second block
i am in third block
(1, 64, 64)
(0, 64, 64)
Epoch 1/15

Please help me out. I tried changing the batch_size from 128 to 16 and set verbose = 1, still no change

BEEugene · 2020-04-28T19:09:19Z

Hi, @rajpaldish !
It seems the shape of your second batch is zero (0, 64, 64). It means that it is an empty dataset. That is why Keras freezes. Try to test the generator separately. You will probably find some mistakes in it. Check this issue

dushyant-007 · 2020-05-14T17:59:54Z

I had similiar issue, i just set the initial_epoch =1 and it was removed. epochs will start from number 2 though, so add more one more epoch to the existing number.

yyhhlancelot · 2020-08-28T12:03:18Z

200k validation data costs me 20min in my 6 GPU machine after every end of epoch, and this scene costs me 1 week to accept the truth!!! Unbelievable right? our view of STUCK/FREEZEhowever the machine is still computing!!! and where there is no log under computing val loss which make us think something getting wrong...

lalitbhagat7 · 2021-02-07T10:11:22Z

I faced the same issue.
This is because the model is running on the validation dataset, and this usually takes a lot of time. Try reducing the validation dataset, or wait for some time it worked for me. It seems like it's stuck, but it is running on the validation dataset.

matterport/Mask_RCNN#2243 SOLUTION: Downgrade to scickit-image==0.16.2 matterport/Mask_RCNN#749 You can safely ignore this warning. It's a preemptive warning from TensorFlow when it cannot be certain of the size of the generated tensor. SOLUTION: import warnings warnings.filterwarnings('ignore') matterport/Mask_RCNN#127 keras-team/keras#8595 (comment) SOLUTION: Set workers=1 and ensure use_multiprocessing=False in model.py matterport/Mask_RCNN#2111 Delete the use_mini_mask from the argument list. It's enabled by default in config.py in this version.

NeerajanS · 2021-06-02T10:06:08Z

I also experienced similar issue, when running a training job with Tensorflow 1.15 using keras sequential model. I also got a warning, Method (on_train_batch_end) is slow compared to the batch update. Check your callbacks (refer this issue).

I was able to overcome this issue by following these steps,

Increase the batch size (earlier I was using 128, I increased the batch size to 512)
Setting validation_steps = int(number of validation samples/batchsize) in model.fit
I found that changing the verbose value has no effect on the issue.

ronithsaju · 2021-07-09T16:32:38Z

for those who were not able to solve this issue using the methods above, if you are using from tensorflow.keras.preprocessing.image import ImageDataGenerator, try changing it to from keras.preprocessing.image import ImageDataGenerator, or vice versa. worked for me. its said that you should never mix keras and tensorflow.

supermomo668 · 2021-07-09T20:12:49Z

I've been stuck at this issue for like a day, but I found my elegant fix with this.

#train_generator = ...
#val_generator = ...
history = model.fit(
        train_generator,
        epochs=200,
        validation_data=val_generator,
        use_multiprocessing=True,
        workers=16,
        steps_per_epoch= train_generator.samples//train_generator.batch_size,  ######  Here
        validation_steps= val_generator.samples//val_generator.batch_size,   ##### Here
        callbacks=callbacks
        )

The key for me is to defined validation_steps & steps_per_epoch by the samples & batch-size variables within the generator, so there won't be any discrepancies or mistake.

leon-kwy · 2022-03-04T18:28:12Z

The same issue, I stuck at the first beginning when training the model using colab. And it shows that I am stuck at the get function of _get_next_batch, and colab doesn't show anything to tell me what was wrong. Is there anyone could tell me what was going on?

For for keras-team/keras#8595

emmanuel-nwogu · 2022-07-17T23:45:35Z

@leon-kwy Did you ever figure it out? I'm having the exact same problem with Mask RCNN.

ashuta03 mentioned this issue Aug 27, 2018

Stuck training at Epoch 1/1 Mask RCNN matterport/Mask_RCNN#287

Open

BEEugene mentioned this issue Nov 21, 2019

Feature Request. Pass exception from the generator. #13567

Closed

yzhuang mentioned this issue Jan 28, 2020

keras freezing at last step in the first epoch tensorflow/tensorflow#36287

Closed

fchollet closed this as completed Jun 24, 2021

mzur mentioned this issue Nov 17, 2021

Training stops after first batch of epochs since TensorFlow upgrade biigle/maia#87

Closed

svbeuningen added a commit to Living-Technologies/Mask_RCNN that referenced this issue Mar 7, 2022

Update model.py

93b8821

For for keras-team/keras#8595

muellerdo mentioned this issue Mar 7, 2023

Training freeze at end of first epoch (validation computation) frankkramer-lab/aucmedi#198

Open

Keras freezing on last batch of first epoch (can't move to second epoch) #8595

Keras freezing on last batch of first epoch (can't move to second epoch) #8595

Comments

Moondra commented Nov 26, 2017

whatisAI commented Nov 28, 2017

moondra2017 commented Nov 29, 2017

moustaki commented Dec 1, 2017

fchollet commented Dec 1, 2017

Dref360 commented Dec 2, 2017

Dref360 commented Dec 2, 2017

moustaki commented Dec 2, 2017

NikeNano commented Dec 23, 2017

oliran commented Jan 4, 2018

LivingProgram commented Jan 11, 2018

minaMagedNaeem commented Jan 19, 2018

minaMagedNaeem commented Jan 19, 2018

KenHollandWHY commented Feb 24, 2018

LivingProgram commented Feb 24, 2018

JackCurrie commented Apr 13, 2018 • edited Loading

kaka7 commented Apr 26, 2018

ptah23 commented May 2, 2018

Tensorfengsheng1926 commented May 9, 2018

hashJoe commented May 10, 2018

ldelphinpoulat commented May 15, 2018

shaktisd commented May 21, 2018

whatdhack commented Jun 12, 2018

bmitrauncc commented Jun 20, 2018

yangjh39 commented Jul 11, 2018

kjaisingh commented Jul 31, 2018

ejcer commented Aug 4, 2018 • edited Loading

dantheman3333 commented Aug 10, 2018 • edited Loading

MinnML commented Aug 22, 2018

ashuta03 commented Aug 27, 2018

Quetzalcohuatl commented Apr 17, 2019

mnguyenmti commented Jun 20, 2019

srv902 commented Aug 30, 2019

SWHL commented Sep 16, 2019

BioScince commented Oct 7, 2019

Quetzalcohuatl commented Oct 7, 2019

Akiqqqqqqq commented Oct 20, 2019

ghost commented Nov 7, 2019

BEEugene commented Nov 20, 2019

BEEugene commented Nov 20, 2019

jalilasadi commented Nov 25, 2019

doantientai commented Jan 24, 2020

erasmo-aln commented Apr 11, 2020 • edited Loading

rajpaldish commented Apr 28, 2020

BEEugene commented Apr 28, 2020 • edited Loading

dushyant-007 commented May 14, 2020

yyhhlancelot commented Aug 28, 2020

lalitbhagat7 commented Feb 7, 2021

NeerajanS commented Jun 2, 2021

ronithsaju commented Jul 9, 2021

supermomo668 commented Jul 9, 2021 • edited Loading

leon-kwy commented Mar 4, 2022

emmanuel-nwogu commented Jul 17, 2022

JackCurrie commented Apr 13, 2018 •

edited

Loading

ejcer commented Aug 4, 2018 •

edited

Loading

dantheman3333 commented Aug 10, 2018 •

edited

Loading

erasmo-aln commented Apr 11, 2020 •

edited

Loading

BEEugene commented Apr 28, 2020 •

edited

Loading

supermomo668 commented Jul 9, 2021 •

edited

Loading