ImageDataGenerator samplewise_center standardize method subtracting mean over image channels #2559

Liyang90 · 2016-04-29T21:44:00Z

In the standardize method of ImageDataGenerator:

def standardize(self, x):
        # x is a single image, so it doesn't have image number at index 0
        img_channel_index = self.channel_index - 1
        if self.samplewise_center:
            x -= np.mean(x, axis=img_channel_index, keepdims=True)
        if self.samplewise_std_normalization:
            x /= (np.std(x, axis=img_channel_index, keepdims=True) + 1e-7)
...
...

The mean for samplewise_center and std for samplewise_std_normalization are calculated only over the image channel axis instead of the whole image (all pixels and all channels).

As an example (test code attached), if the input image has only one channel (gray scale image), after performing
x -= np.mean(x, axis=img_channel_index, keepdims=True)
The image will be all 0!

According to the definition of
samplewise_center: set each sample mean to 0.
samplewise_std_normalization: divide each input by its std.
I think calculating mean and std over the whole input image (Global Contrast Normalization), by removing "axis=img_channel_index, " would make more sense.

Test code:

# test the ImageDataGenerator

from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
from keras.utils import np_utils
import numpy as np

nb_classes = 10

(X_train, y_train), (X_test, y_test) = cifar10.load_data()
print('X_train shape:', X_train.shape)
print(X_train.shape[0], 'train samples')
print(X_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, nb_classes)
Y_test = np_utils.to_categorical(y_test, nb_classes)

X_train = X_train.astype('float32')
X_test = X_test.astype('float32')
X_train /= 255
X_test /= 255

# change RGB image to 1-channel gray scale image
X_train = np.mean(X_train, axis=1, keepdims=True) 

datagen_GCN = ImageDataGenerator(
        featurewise_center=False, 
        samplewise_center=True,  # set each sample mean to 0
        featurewise_std_normalization=False,  
        samplewise_std_normalization=False,  
        zca_whitening=False) 

datagen_GCN.fit(X_train) 

batch_count = 0
for X_batch1, Y_batch1 in datagen_GCN.flow(X_train, Y_train, batch_size=20):
    batch_count += 1

    if batch_count >= 1:

        break

# the images in X_batch1 are all 0!!!

Please make sure that the boxes below are checked before you submit your issue. Thank you!

Check that you are up-to-date with the master branch of Keras. You can update with:
pip install git+git://github.com/fchollet/keras.git --upgrade --no-deps
If running on Theano, check that you are up-to-date with the master branch of Theano. You can update with:
pip install git+git://github.com/Theano/Theano.git --upgrade --no-deps
Provide a link to a GitHub Gist of a Python script that can reproduce your issue (or just copy the script here if it is short).

joelthchao · 2016-04-30T03:37:12Z

I am also confused about meaning of zero-mean for each pixel over channels. In my opinion, standardize can have several form:
(pixelwise, channelwise, imagewise) x (samplewise, datasetwise)
For example, featurewise_center does pixelwise x datasetwise.
But samplewise_center does weird thing, it should be

# channelwise x samplewise
if self.samplewise_center:
    x = np.rollaxis(x, img_channel_index, 0)
    for i in range(x.shape[0]):
        x[i] -= np.mean(x[i])
    x = np.rollaxis(x, 0, img_channel_index+1)

if self.samplewise_std_normalization:
    x = np.rollaxis(x, img_channel_index, 0)
    for i in range(x.shape[0]):
        x[i] /= np.std(x[i])
    x = np.rollaxis(x, 0, img_channel_index+1)

or

# imagewise x samplewise
if self.samplewise_center:
    x -= np.mean(x)

if self.samplewise_std_normalization:
    x /= np.std(x)

@Liyang90 can you give me some advice?

Liyang90 · 2016-04-30T04:00:18Z

For now I changed the code I'm using to

def standardize(self, x):
        # x is a single image, so it doesn't have image number at index 0
        img_row_index = self.row_index - 1
        img_col_index = self.col_index - 1
        img_channel_index = self.channel_index - 1
        if self.samplewise_center:
            x -= np.mean(x, axis=(img_channel_index,img_row_index,img_col_index), keepdims=True)
        if self.samplewise_std_normalization:
            x /= (np.std(x, axis=(img_channel_index,img_row_index,img_col_index), keepdims=True) + 1e-7)
...
...
        return x

which is totally equivalent to your second example. And it works fine.

By removing img_channel_index from the axis tuples, it will be equivalent to your first example.

sallamander · 2016-05-30T13:53:29Z

@Liyang90 @joelthchao Can I ask what your understanding of the samplewise_center and samplewise_std_normalization is as compared to the featurewise_center and featurewise_std_normalization? My understanding was that there are two common ways of pre-processing batches of images when dealing with centering/normalizing:

Subtract off the mean image from every image in the batch. This looks to correspond to the featurewise_center (and then the featurewise_std_normalization does the equivalent with variance).
Subtract off the mean per channel from every image in the batch, where the mean per channel is obtained across the entire batch. I'm wondering if the samplewise_center was trying to get at this (I'm in agreement that it's confusing, and might not be doing what we expect), and then the samplewise_std_normalization the equivalent again with the variance. Currently, though, it looks as if this is not at all what samplewise_center does.

What were you expecting each of these methods to do?

joelthchao · 2016-05-30T14:16:08Z

featurewise subtract the mean image from every image in the dataset.
samplewise can only use its own information and not involve with other images, therefore, I think the correct way is to subtract channel mean for each channel, which is demonstrated in my previous code.

sallamander · 2016-05-30T14:41:18Z

Cool, that makes sense. The featurewise and samplewise distinction makes sense, but that leaves a couple of possibilities for exactly what to do within that (from what I see):

Featurwise

Supposing we have images that are 224 x 224 x 3:

Subtract the mean image of the dataset from each image (so, subtracting a 224x224x3 image from every image in the dataset).
Subtract the channel means of the dataset from each image (so, subtracting a 1x 1x 3 vector from every image in the dataset).
Subtract the mean pixel of the data set from each image. This would be a single value.

With samplewise, we have the same options, but calculating the mean across the sample instead of the image.

It seems like we would want to be able to do 1 and 2 in the featurewise case, and 2 and 3 in the samplewise case (1 doesn't make sense in the samplewise case, and 3 seems like samplewise equivalent of 1). Is that fair? It looks like you're code above does 2 and 3 for samplewise, and from my reading the existing code for samplewise does neither. I didn't see a PR open on this, but would be happy to put something together if this would be valuable.

On a related note, I personally find the use of featurewise a little misleading, as we're really making a distinction between the dataset (here a batch) and a sample (here an image). I think the terminology could be a little clearer.

joelthchao · 2016-05-30T15:40:58Z

You probably need to take care of function signature. A huge change may break lots of code.

eyaler · 2016-06-02T08:48:59Z

#1793

eyaler · 2016-10-25T15:35:41Z

#3422

sallamander · 2016-10-25T16:05:52Z

@eyaler is this what you had in mind?

#2879

isaacgerg · 2016-11-23T04:23:27Z

I fixed for grayscale images but leave RGB images as they are. See #4482

stale · 2017-05-23T19:35:40Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs, but feel free to re-open it if needed.

eyaler · 2017-05-27T22:12:21Z

i believe this is still an issue

stale · 2017-08-25T23:42:16Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

eyaler · 2017-08-25T23:49:53Z

Dont close bot

sallamander mentioned this issue Jun 1, 2016

Update ImageDataGenerator to allow channelwise standardization #2879

Closed

oeway mentioned this issue Aug 3, 2016

Extending ImageDataGenerator #3338

Closed

eyaler mentioned this issue Oct 25, 2016

ImageDataGenerator sample-wise mean removal and std normalization algorithms are incorrect for single channel imagery #4047

Closed

stale bot added the stale label May 23, 2017

stale bot removed the stale label May 27, 2017

stale bot added the stale label Aug 25, 2017

stale bot removed the stale label Aug 25, 2017

fchollet closed this as completed Jun 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ImageDataGenerator samplewise_center standardize method subtracting mean over image channels #2559

ImageDataGenerator samplewise_center standardize method subtracting mean over image channels #2559

Liyang90 commented Apr 29, 2016 •

edited

joelthchao commented Apr 30, 2016

Liyang90 commented Apr 30, 2016

sallamander commented May 30, 2016

joelthchao commented May 30, 2016 •

edited

sallamander commented May 30, 2016

joelthchao commented May 30, 2016

eyaler commented Jun 2, 2016

eyaler commented Oct 25, 2016

sallamander commented Oct 25, 2016

isaacgerg commented Nov 23, 2016

stale bot commented May 23, 2017

eyaler commented May 27, 2017

stale bot commented Aug 25, 2017

eyaler commented Aug 25, 2017

ImageDataGenerator samplewise_center standardize method subtracting mean over image channels #2559

ImageDataGenerator samplewise_center standardize method subtracting mean over image channels #2559

Comments

Liyang90 commented Apr 29, 2016 • edited

joelthchao commented Apr 30, 2016

Liyang90 commented Apr 30, 2016

sallamander commented May 30, 2016

joelthchao commented May 30, 2016 • edited

sallamander commented May 30, 2016

joelthchao commented May 30, 2016

eyaler commented Jun 2, 2016

eyaler commented Oct 25, 2016

sallamander commented Oct 25, 2016

isaacgerg commented Nov 23, 2016

stale bot commented May 23, 2017

eyaler commented May 27, 2017

stale bot commented Aug 25, 2017

eyaler commented Aug 25, 2017

Liyang90 commented Apr 29, 2016 •

edited

joelthchao commented May 30, 2016 •

edited