Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorBoard callback does not create histograms when a generator is used to create validation data #3358

Closed
jdoerrie opened this issue Jul 31, 2016 · 32 comments

Comments

@jdoerrie
Copy link
Contributor

@jdoerrie jdoerrie commented Jul 31, 2016

Please make sure that the boxes below are checked before you submit your issue. Thank you!

  • Check that you are up-to-date with the master branch of Keras. You can update with:
    pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
  • If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
    pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
  • Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

Currently the TensorBoard callback does not create histograms when a generator is used to create validation data. When a generator is passed as validation_data to model.fit_generator, the self.validation_data attribute is not set:

However, in order to generate histograms, self.validation_data currently must evaluate to True:

I would like to see a way to create histograms even when using a val_gen, unfortunately I can't think of a very clear way to do this. My current workaround is to not pass a generator, but to exhaust it until I have the required number of validation samples. I then concatenate the samples and pass them as a plain array. However, this workaround will fail once my whole validation dataset does not fit into memory anymore. So I created this issue to discuss possible better solutions.

@kuza55

This comment has been minimized.

Copy link
Contributor

@kuza55 kuza55 commented Aug 30, 2016

I just ran into this and I'm just putting my validation data in memory and that doesn't crash anything, but it's a rough edge that could be cleaned up.

The way to do this is probably to setup an accumulator variable for the distribution and add to through each batch of validation data, then divide and evaluate the summary node after we're done going through the batches. This would also resolve the comment about GPU memory that's currently there.

Alternatively, what Google's models seem to do is keep a moving average: https://github.com/tensorflow/models/blob/master/inception/inception/inception_train.py#L288 Which might be easier to implement than an accumulator, since I don't really see a way to capture the histogram to even accumulate...

@mynameisguy

This comment has been minimized.

Copy link
Contributor

@mynameisguy mynameisguy commented Dec 12, 2016

does the pull request will be merged?
I'm having the same use case

@s1st

This comment has been minimized.

Copy link

@s1st s1st commented May 2, 2017

Would be really nice to get this fixed. Having the same problem in keras 1.2.2 - 2.0.4.

@n1kt0

This comment has been minimized.

Copy link

@n1kt0 n1kt0 commented May 18, 2017

same problem here!

@alorozco53

This comment has been minimized.

Copy link

@alorozco53 alorozco53 commented Jun 28, 2017

any comment Monsieur @fchollet ? :)

@n1kt0

This comment has been minimized.

Copy link

@n1kt0 n1kt0 commented Jun 28, 2017

i'm not sure but i think this works now in 2.0.5

@bharris47

This comment has been minimized.

Copy link

@bharris47 bharris47 commented Jun 28, 2017

This is still an issue for me in 2.0.5. It writes out scalars and graphs, but no histograms, or distributions.

@alexklibisz

This comment has been minimized.

Copy link

@alexklibisz alexklibisz commented Jul 3, 2017

A relatively simple fix is to "fill in" the validation_data property before the TensorBoard on_epoch_end hook is called by inheriting TensorBoard in a wrapper like below. Obviously the way you fill in validation_data is specific to your problem. You can just replace your TensorBoard callback with TensorBoardWrapper, pass the batch_gen and nb_steps arguments, and then all of the same arguments as `Tensorboard. Unfortunately this also means that if you are using a generator for validation, it will get called once in the wrapper and then again for validation. If you can afford to keep your data in memory, the below solution could be moved into the on_train_begin hook.

class TensorBoardWrapper(TensorBoard):
    '''Sets the self.validation_data property for use with TensorBoard callback.'''

    def __init__(self, batch_gen, nb_steps, **kwargs):
        super().__init__(**kwargs)
        self.batch_gen = batch_gen # The generator.
        self.nb_steps = nb_steps     # Number of times to call next() on the generator.

    def on_epoch_end(self, epoch, logs):
        # Fill in the `validation_data` property. Obviously this is specific to how your generator works.
        # Below is an example that yields images and classification tags.
        # After it's filled in, the regular on_epoch_end method has access to the validation_data.
        imgs, tags = None, None
        for s in range(self.nb_steps):
            ib, tb = next(self.batch_gen)
            if imgs is None and tags is None:
                imgs = np.zeros((self.nb_steps * ib.shape[0], *ib.shape[1:]), dtype=np.float32)
                tags = np.zeros((self.nb_steps * tb.shape[0], *tb.shape[1:]), dtype=np.uint8)
            imgs[s * ib.shape[0]:(s + 1) * ib.shape[0]] = ib
            tags[s * tb.shape[0]:(s + 1) * tb.shape[0]] = tb
        self.validation_data = [imgs, tags, np.ones(imgs.shape[0]), 0.0]
        return super().on_epoch_end(epoch, logs)


...

callbacks = [TensorBoardWrapper(gen_val, nb_steps=5, log_dir=self.cfg['cpdir'], histogram_freq=1,
                               batch_size=32, write_graph=False, write_grads=True)]
...
@s1st

This comment has been minimized.

Copy link

@s1st s1st commented Jul 7, 2017

Thanks a lot! Unfortunately I ran into the next bug: issue 6364. But I least I can now fill the validation data.
Will post an update once I update to the newest keras version.

@ViaFerrata

This comment has been minimized.

Copy link

@ViaFerrata ViaFerrata commented Oct 5, 2017

Thanks a lot for the code snippet!
Here's the Python2 copy & paste version for lazy people like me:

class TensorBoardWrapper(ks.callbacks.TensorBoard):
    '''Sets the self.validation_data property for use with TensorBoard callback.'''

    def __init__(self, batch_gen, nb_steps, **kwargs):
        super(TensorBoardWrapper, self).__init__(**kwargs)
        self.batch_gen = batch_gen # The generator.
        self.nb_steps = nb_steps   # Number of times to call next() on the generator.

    def on_epoch_end(self, epoch, logs):
        # Fill in the `validation_data` property. Obviously this is specific to how your generator works.
        # Below is an example that yields images and classification tags.
        # After it's filled in, the regular on_epoch_end method has access to the validation_data.
        imgs, tags = None, None
        for s in range(self.nb_steps):
            ib, tb = next(self.batch_gen)
            if imgs is None and tags is None:
                imgs = np.zeros(((self.nb_steps * ib.shape[0],) + ib.shape[1:]), dtype=np.float32)
                tags = np.zeros(((self.nb_steps * tb.shape[0],) + tb.shape[1:]), dtype=np.uint8)
            imgs[s * ib.shape[0]:(s + 1) * ib.shape[0]] = ib
            tags[s * tb.shape[0]:(s + 1) * tb.shape[0]] = tb
        self.validation_data = [imgs, tags, np.ones(imgs.shape[0]), 0.0]
        return super(TensorBoardWrapper, self).on_epoch_end(epoch, logs)
@GalAvineri

This comment has been minimized.

Copy link

@GalAvineri GalAvineri commented Nov 4, 2017

If i understand correctly the solution @alexklibisz suggested, we still have to load the entire validation suite into memory.
Is there a way to view the histograms on the entire validation set without loading all in once to memory?

@isaacgerg

This comment has been minimized.

Copy link

@isaacgerg isaacgerg commented Nov 10, 2017

Same issue. I train over multiple GPUs and putting my validation data into memory isn't possible with today's computers. Following thread.

@alexklibisz

This comment has been minimized.

Copy link

@alexklibisz alexklibisz commented Nov 11, 2017

@GalAvineri I think that's the fundamental issue -- Tensorboard (or at least keras' usage of tensorboard) requires having all the data at once.

For what it's worth, in my experience if the model is performing poorly enough to require debugging via Tensorboard, the problems will still exist for a subset of the original validation set. Similarly for @isaacgerg, perhaps you can still use a small dataset on a single GPU to replicate the behavior which you're trying to debug/observe via Tensorboard.

@isaacgerg

This comment has been minimized.

Copy link

@isaacgerg isaacgerg commented Nov 13, 2017

@alexklibisz Makes sense. However, there still another issue. It looks likes the weights are only shown for some of my convolutional layers. Any idea why?

@alexklibisz

This comment has been minimized.

Copy link

@alexklibisz alexklibisz commented Nov 13, 2017

@isaacgerg I'm not sure why that would be. In my experience it's helpful to explicitly name them because it's very easy to get lost in the many kernels, biases, gradients, activations, outputs, etc..

@isaacgerg

This comment has been minimized.

Copy link

@isaacgerg isaacgerg commented Nov 13, 2017

@alexklibisz I usually develop with tensorflow but had a small toy problem and thought I would give keras a try again but alas, the tb functionality doesn't seem to work right at the moment. Would providing any screenshots or code help you to help me? (I dont want to make any extra work for anyone.)

@paragon00

This comment has been minimized.

Copy link

@paragon00 paragon00 commented Nov 17, 2017

I used alexklibisz's code above and it got me some TensorBoard functionality working using validation generators (ie. TensorBoard scalars). But TensorBoard Images were still blank
ie.
screen shot 2017-11-17 at 2 48 53 pm

does anyone know a way to get Images working with validation generators?

@isaacgerg

This comment has been minimized.

Copy link

@isaacgerg isaacgerg commented Nov 17, 2017

@paragon00 This issue is similar to mine, except I get a subset of the images from the conv2d layers. Oddly enough, layers with stride=2 are the ones it exclusively displays.

@NTNguyen13

This comment has been minimized.

Copy link

@NTNguyen13 NTNguyen13 commented Jul 28, 2018

@alexklibisz I adapted your code to my fit_generator:

callbacks = [
    #EarlyStopping(monitor='val_loss', patience=10, verbose=0),
    ModelCheckpoint('./models/XVIII-2-200-reseg5_weights.{epoch:02d}-{val_loss:.2f}.hdf5', monitor='val_loss', save_best_only=True, verbose=2),
    CSVLogger('./log/XVIII-log.csv',append=False, separator=','),
    TensorBoardWrapper(validation_generator, nb_steps=nb_validation_samples // batch_size, log_dir='./tf-log',
                       histogram_freq=1,
                       batch_size=int(batch_size), write_graph=False, write_grads=True)]

#train
 model.fit_generator(train_generator,
            steps_per_epoch=nb_train_samples // batch_size,
            epochs=epochs,
            validation_data=validation_generator,
            validation_steps=nb_validation_samples // batch_size,verbose=2
            ,callbacks=callbacks
            )

but after epoch 1, there's a error:

Traceback (most recent call last):
  File "/home/cngc3/anaconda3/envs/tensorflow/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 2963, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-10-d09cfa5a9da2>", line 6, in <module>
    ,callbacks=callbacks
  File "/home/cngc3/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/home/cngc3/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/engine/training.py", line 1426, in fit_generator
    initial_epoch=initial_epoch)
  File "/home/cngc3/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/engine/training_generator.py", line 229, in fit_generator
    callbacks.on_epoch_end(epoch, epoch_logs)
  File "/home/cngc3/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/callbacks.py", line 77, in on_epoch_end
    callback.on_epoch_end(epoch, logs)
  File "<ipython-input-7-c03528813be3>", line 20, in on_epoch_end
    imgs[s * ib.shape[0]:(s + 1) * ib.shape[0]] = ib
ValueError: could not broadcast input array from shape (64,250,250,3) into shape (47,250,250,3)

I guess there's a mismatch between batch size, how can I fix this?

@ShiangYong

This comment has been minimized.

Copy link

@ShiangYong ShiangYong commented Jul 28, 2018

hi @NTNguyen13 , what are the values for the following variables?

nb_train_samples
nb_validation_samples
batch_size

I suspect setting validation_steps=math.ceil(nb_validation_samples / batch_size) will help

@NTNguyen13

This comment has been minimized.

Copy link

@NTNguyen13 NTNguyen13 commented Jul 29, 2018

hi @ShiangYong
nb_train_samples = 9004
nb_validation_samples = 2245
batch_size = 64

I tried to set nb_validation_samples = 2240(equal to 64*35), but still, I got the same error. I will try your method and notify later! Thank you

@CMCDragonkai

This comment has been minimized.

Copy link

@CMCDragonkai CMCDragonkai commented Sep 3, 2018

I was wondering why using a Sequence dataset was considered a generator? In the keras/engine/training_generator.py, it basically doesn't assign self.validation_data to each callback if val_gen is True. And the val_gen is True if the validation datasets is an instance of the Sequence class! But this doesn't make much sense, since a Sequence dataset still offers a random access interface, you can always call the __getitem__ function on any index into the dataset.

This makes it difficult to write callbacks that is intended to use the self.validation_data as does the Tensorboard callback currently.

@juiceboxjoe

This comment has been minimized.

Copy link

@juiceboxjoe juiceboxjoe commented Sep 19, 2018

Using

class TensorBoardWrapper(TensorBoard):
    '''Sets the self.validation_data property for use with TensorBoard callback.'''

    def __init__(self, batch_gen, nb_steps, b_size, **kwargs):
        super(TensorBoardWrapper, self).__init__(**kwargs)
        self.batch_gen = batch_gen # The generator.
        self.nb_steps = nb_steps   # Number of times to call next() on the generator.
        #self.batch_size = b_size

    def on_epoch_end(self, epoch, logs):
        # Fill in the `validation_data` property. Obviously this is specific to how your generator works.
        # Below is an example that yields images and classification tags.
        # After it's filled in, the regular on_epoch_end method has access to the validation_data.
        imgs, tags = None, None
        for s in range(self.nb_steps):
            ib, tb = next(self.batch_gen)
            if imgs is None and tags is None:
                imgs = np.zeros(((self.nb_steps * self.batch_size,) + ib.shape[1:]), dtype=np.float32)
                tags = np.zeros(((self.nb_steps * self.batch_size,) + tb.shape[1:]), dtype=np.uint8)
            imgs[s * ib.shape[0]:(s + 1) * ib.shape[0]] = ib
            tags[s * tb.shape[0]:(s + 1) * tb.shape[0]] = tb
        
        self.validation_data = [imgs, tags, np.ones(imgs.shape[0])]
              
        return super(TensorBoardWrapper, self).on_epoch_end(epoch, logs)

and

TensorBoardWrapper(self.val_generator2, ceil(self.val_dataset_size / self.batch_size), self.batch_size, log_dir="{}/{}".format(self.logs_dir, time()), histogram_freq=1, batch_size=self.batch_size)

worked for me. Notice that initialization of imgs and tags uses batch_size ( imgs = np.zeros(((self.nb_steps * self.batch_size,) + ib.shape[1:]), dtype=np.float32)) instead of the first element of the batch's shape (imgs = np.zeros(((self.nb_steps * ib.shape[0],) + ib.shape[1:]), dtype=np.float32)).

This is because if total_batches % batch_size != 0 the first call to next(self.batch_gen) will return a batch who's shape's first element is not equal to the batch size, resulting in the same broadcast shape error @NTNguyen13 reported.

Then I got an AssertionError in loc 884 from Keras' callbacks.py. According to that loc validation_data must have at most three elements just like that tensors array.

I fixed that by changing self.validation_data = [imgs, tags, np.ones(imgs.shape[0]), 0.0] to self.validation_data = [imgs, tags, np.ones(imgs.shape[0])]. I kept the third element as np.ones(imgs.shape[0]) because my generator only outputs images and labels.

Remember to use a generator that supports multi-threading or use two instances of the same generator to avoid getting a ValueError: generator already executing. I used two instances for a quick fix.

I'm using keras 2.1.4 and tensorflow-gpu 1.4.1 on one NVIDIA Titan Xp with CUDA 8 and I haven't run into any memory issues.

@wt-huang

This comment has been minimized.

Copy link
Collaborator

@wt-huang wt-huang commented Nov 12, 2018

Closing as this is resolved

@wt-huang wt-huang closed this Nov 12, 2018
@MaximilianProll

This comment has been minimized.

Copy link

@MaximilianProll MaximilianProll commented Nov 14, 2018

@juiceboxjoe thanks for sharing the fixed code!
Excuse me if my questions are not that well written, I am still adapting to python, TensorFlow and keras :)

I have two questions:

  1. I had to change the on_epoch_end function at the step of ib, tb = next(self.batch_gen) because I got an error saying that my DataGenerator is not an iterator. Instead I used for ib, tb in self.batch_gen: directly. Then you also don't need to take into account how often you call the DataGenerator. I am using the TensorBoardWrapper like this:
tbCallBack = TensorBoardWrapper(validation_generator, val_df.shape[0] // validation_generator.batch_size, validation_generator.batch_size, log_dir='./logs/')

val_df is a pandas data frame containing the ids of the validation set, thus val_df.shape[0] gives me the size of the validation set.

am I using it wrongly?

  1. When I implemented the above mentioned changes everything runs well and I am able to create histograms for the weights in my network. But the moment I add one layer of Batch Normalization I run into the AssertionError you mentioned. Can someone explain to me why that happens? Because I would like to use a Batch Normalisation for the benefits of faster convergence.
@jtchilders

This comment has been minimized.

Copy link

@jtchilders jtchilders commented Dec 13, 2018

I can confirm what @MaximilianProll says about AssertionError in the case that you have a BatchNormalization layer in your model. It would be nice to know a fix.

@guysoft

This comment has been minimized.

Copy link

@guysoft guysoft commented Dec 20, 2018

Using the TensorBoardWrapper classes provided here gives me:

Traceback (most recent call last):
  File "/home/guy/workspace/shapedo.com/shapedo/external/ml/cnn_sliding_window/train.py", line 156, in <module>
    shuffle=True,
  File "/usr/local/lib/python3.6/dist-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_generator.py", line 94, in fit_generator
    callbacks.set_model(callback_model)
  File "/usr/local/lib/python3.6/dist-packages/keras/callbacks.py", line 54, in set_model
    callback.set_model(model)
  File "/usr/local/lib/python3.6/dist-packages/keras/callbacks.py", line 799, in set_model
    weight)
  File "/usr/local/lib/python3.6/dist-packages/keras/optimizers.py", line 91, in get_gradients
    raise ValueError('An operation has `None` for gradient. '
ValueError: An operation has `None` for gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.

My current optimizer looks like this:

opt = keras.optimizers.rmsprop()

Same with adam() too

@CMCDragonkai

This comment has been minimized.

Copy link

@CMCDragonkai CMCDragonkai commented Feb 27, 2019

A Keras utils.Sequence should be allowed. But when using Keras Sequence, Tensorboard still replies back with:

ValueError: If printing histograms, validation_data must be provided, and cannot be a generator.

Keras sequences can support slices or at least the overloaded __getitem__ should handle a slice type.

@CinderellaRobaker

This comment has been minimized.

Copy link

@CinderellaRobaker CinderellaRobaker commented Mar 15, 2019

TensorBoardWrapper Class indeed gives me great help when my net contains no BatchNormalization layer. But if I add BatchNormalization, I would get the same error as guysoft.
ValueError: An operation hasNonefor gradient. Please make sure that all of your ops have a gradient defined (i.e. are differentiable). Common ops without gradient: K.argmax, K.round, K.eval.
Any suggestion or question will be thankful.

@Attila94

This comment has been minimized.

Copy link

@Attila94 Attila94 commented Apr 3, 2019

@haimat

This comment has been minimized.

Copy link

@haimat haimat commented May 8, 2019

I tried @juiceboxjoe's code, but it leads to the following exception:

...
File "/usr/local/lib/python3.6/dist-packages/keras/callbacks.py", line 941, in on_epoch_end
  result = self.sess.run([self.merged], feed_dict=feed_dict)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 929, in run
  run_metadata_ptr)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1128, in _run
  str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (32,) for Tensor 'dense_2_target:0', which has shape '(?, ?)'

I am using the default ImageDataGenerator shipped with keras.
Any ideas what could be the problem here?

@rabenimmermehr

This comment has been minimized.

Copy link

@rabenimmermehr rabenimmermehr commented Sep 19, 2019

I managed to get @juiceboxjoe s version running, but I had to include some of the sections he removed ( self.batch_size = b_size, self.validation_data = [imgs, tags, np.ones(imgs.shape[0]), 0.0] )

I'm using keras 2.2.4 based on tensorflow-gpu version 1.13.1

My full Wrapper:

class TensorBoardWrapper(keras.callbacks.TensorBoard):
    '''Sets the self.validation_data property for use with TensorBoard callback.'''

    def __init__(self, batch_gen, nb_steps, b_size, **kwargs):
        super(TensorBoardWrapper, self).__init__(**kwargs)
        self.batch_gen = batch_gen # The generator.
        self.nb_steps = nb_steps   # Number of times to call next() on the generator.
        self.batch_size = b_size

    def on_epoch_end(self, epoch, logs):
        # Fill in the `validation_data` property. Obviously this is specific to how your generator works.
        # Below is an example that yields images and classification tags.
        # After it's filled in, the regular on_epoch_end method has access to the validation_data.
        imgs, tags = None, None
        for s in range(self.nb_steps):
            ib, tb = next(self.batch_gen)
            if imgs is None and tags is None:
                imgs = np.zeros(((self.nb_steps * self.batch_size,) + ib.shape[1:]), dtype=np.float32)
                tags = np.zeros(((self.nb_steps * self.batch_size,) + tb.shape[1:]), dtype=np.float32)
            imgs[s * ib.shape[0]:(s + 1) * ib.shape[0]] = ib
            tags[s * tb.shape[0]:(s + 1) * tb.shape[0]] = tb
        
        self.validation_data = [imgs, tags, np.ones(imgs.shape[0]), 0.0]
              
        return super(TensorBoardWrapper, self).on_epoch_end(epoch, logs)

Called like this:

tBCallback = TensorBoardWrapper(test_it, math.ceil(image_count_test/config.batch_size), config.batch_size,
                                log_dir=model_dir_path, histogram_freq=5, write_graph=True, write_images=True, write_grads=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.