<a href="https://colab.research.google.com/github/rahiakela/deep-learning-for-computer-vision/blob/main/1-image-data-preparation/4_scaling_image_pixel_data_with_keras.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Scaling Image Pixel Data with Keras

The pixel values in images must be scaled prior to providing the images as input to a deep learning neural network model during the training or evaluation of the model. Traditionally, the images would have to be scaled prior to the development of the model and stored in memory or on disk in the scaled format.

An alternative approach is to scale the images using a preferred scaling technique just-in-time during the training or model evaluation process. Keras supports this type of data preparation for image data via the **ImageDataGenerator** class and API.

##Setup

In [1]:
import tensorflow as tf

from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.datasets import mnist

## Load MNIST dataset

Before we dive into the usage of the ImageDataGenerator class for preparing image data, we must select an image dataset on which to test the generator. The MNIST problem, is an image classification problem comprised of 70,000 images of handwritten digits. The goal of the problem is to classify a given image of a handwritten digit as an integer from 0 to 9. As such, it is a multiclass image classification problem.

In [2]:
# loading the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# summarize dataset shape
print("Train: ", x_train.shape, y_train.shape)
print("Test: ", x_test.shape, y_test.shape)

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
Train:  (60000, 28, 28) (60000,)
Test:  (10000, 28, 28) (10000,)


In [3]:
# summarize pixel values
print("Train: ", x_train.min(), x_train.max(), x_train.mean(), x_train.std())
print("Test: ", x_test.min(), x_test.max(), x_test.mean(), x_test.std())

Train:  0 255 33.318421449829934 78.56748998339798
Test:  0 255 33.791224489795916 79.17246322228644


## Image Pixel Scaling

The ImageDataGenerator class in Keras provides a suite of techniques for scaling pixel values in your image dataset prior to modeling. The class will wrap your image dataset, then when requested, it will return images in batches to the algorithm during training, validation, or evaluation and apply the scaling operations just-in-time. This provides an efficient and convenient approach to scaling image data when modeling with neural networks.

The three main types of pixel scaling techniques supported by the ImageDataGenerator class are as follows:

- **Pixel Normalization**: scale pixel values to the range 0-1.
- **Pixel Centering**: scale pixel values to have a zero mean.
- **Pixel Standardization**: scale pixel values to have a zero mean and unit variance.

Pixel standardization is supported at two levels: either per-image (called sample-wise) or per-dataset (called feature-wise). Specifically, just the mean, or the mean and standard deviation statistics required to standardize pixel values can be calculated from the pixel values in each image only (sample-wise) or across the entire training dataset (feature-wise).

## Normalizing Images

The ImageDataGenerator class can be used to rescale pixel values from the range of `0-255` to the range `0-1` preferred for neural network models. Scaling data to the range of `0-1` is traditionally referred to as normalization. This can be achieved by setting the rescale argument to a ratio by which each pixel can be multiplied to achieve the desired range. In this case, the ratio is $\frac{1}{255}$ or about `0.0039`.

The ImageDataGenerator does not need to be fit in this case because there are no global statistics line mean and standard deviation that need to be calculated. Next, iterators can be created using the generator for both the train and test datasets. We will use a batch size of `64`. This means that each of the train and test datasets of images are divided into groups of `64` images that will then be scaled when returned from the iterator.

We can then confirm that the pixel normalization has been performed as expected by retrieving the first batch of scaled images and inspecting the min and max pixel values.

In [4]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# reshape dataset to have a single channel
width, height, channels = x_train.shape[1], x_train.shape[2], 1
x_train = x_train.reshape((x_train.shape[0], width, height, channels))
x_test = x_test.reshape((x_test.shape[0], width, height, channels))

# confirm scale of pixels
print("Train min=%.3f, max=%.3f" % (x_train.min(), x_train.max()))
print("Test min=%.3f, max=%.3f" % (x_test.min(), x_test.max()))

Train min=0.000, max=255.000
Test min=0.000, max=255.000


In [5]:
# create generator (1.0/255.0 = 0.003921568627451)
datagen = ImageDataGenerator(rescale=1.0 / 255.0)

##########Note: there is no need to fit the generator in this case##########
# prepare a iterators to scale images
train_iterator = datagen.flow(x_train, y_train, batch_size=64)
test_iterator = datagen.flow(x_test, y_test, batch_size=64)
print("Batches train=%d, test=%d" % (len(train_iterator), len(test_iterator)))

Batches train=938, test=157


In [6]:
# confirm the scaling works
batch_x, batch_y = train_iterator.next()
print("Batch shape=%s, min=%.3f, max=%.3f" % (batch_x.shape, batch_x.min(), batch_x.max()))

Batch shape=(64, 28, 28, 1), min=0.000, max=1.000


The ImageDataGenerator does not need to be fit on the training dataset as there is nothing that needs to be calculated, we have provided the scale factor directly. A single batch of normalized images is retrieved and we can confirm that the min and max pixel values are zero and one respectively.

##Centering Images

Another popular pixel scaling method is to calculate the mean pixel value across the entire training dataset, then subtract it from each image. This is called centering and has the effect of centering the distribution of pixel values on zero: that is, the mean pixel value for centered images will be zero. 

The ImageDataGenerator class refers to centering that uses the mean calculated on the training dataset as feature-wise centering. It requires that the statistic is calculated on the training dataset prior to scaling.

In [8]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# reshape dataset to have a single channel
width, height, channels = x_train.shape[1], x_train.shape[2], 1
x_train = x_train.reshape((x_train.shape[0], width, height, channels))
x_test = x_test.reshape((x_test.shape[0], width, height, channels))

# confirm scale of pixels
print("Mean train=%.3f, test=%.3f" % (x_train.mean(), x_test.mean()))

# create generator that centers pixel values
datagen = ImageDataGenerator(featurewise_center=True)
# calculate the mean on the training dataset
datagen.fit(x_train)
print("Data Generator Mean:%.3f" % datagen.mean)

# demonstrate effect on a single batch of samples
iterator = datagen.flow(x_train, y_train, batch_size=64)
# get a batch
x_batch, y_batch = iterator.next()
# mean pixel value in the batch
print(x_batch.shape, x_batch.mean())

# demonstrate effect on entire training dataset
iterator = datagen.flow(x_train, y_train, batch_size=len(x_train), shuffle=False)
# get a batch
x_batch, y_batch = iterator.next()
# mean pixel value in the batch
print(x_batch.shape, x_batch.mean())

Mean train=33.318, test=33.791
Data Generator Mean:33.318
(64, 28, 28, 1) 0.2942951
(60000, 28, 28, 1) -1.9512918e-05


The ImageDataGenerator is fit on the training dataset and we can confirm that the mean pixel value matches our own manual calculation. A single batch of centered images is retrieved and we can confirm that the mean pixel value is a small-ish value close to zero. The test is repeated using the entire training dataset as the batch size, and in this case, the mean pixel value for the
scaled dataset is a number very close to zero, confirming that centering is having the desired effect.

##Standardizing Images

Standardization is a data scaling technique that assumes that the distribution of the data is Gaussian and shifts the distribution of the data to have a mean of zero and a standard deviation of one. Data with this distribution is referred to as a standard Gaussian. It can be beneficial when training neural networks as the dataset sums to zero and the inputs are small values in
the rough range of about -3.0 to 3.0 (e.g. 99.7 of the values will fall within three standard deviations of the mean). 

Standardization of images is achieved by subtracting the mean pixel value and dividing the result by the standard deviation of the pixel values. The mean and
standard deviation statistics can be calculated on the training dataset, Keras refers to this as feature-wise.

In [10]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
# reshape dataset to have a single channel
width, height, channels = x_train.shape[1], x_train.shape[2], 1
x_train = x_train.reshape((x_train.shape[0], width, height, channels))
x_test = x_test.reshape((x_test.shape[0], width, height, channels))

# report pixel means and standard deviations
print("Statistics train=%.3f (%.3f), test=%.3f (%.3f)" % (x_train.mean(), x_train.std(), x_test.mean(), x_test.std()))

# create generator that centers pixel values
datagen = ImageDataGenerator(featurewise_center=True, featurewise_std_normalization=True)
# calculate the mean on the training dataset
datagen.fit(x_train)
print("Data Generator Mean:%.3f, std=%.3f" % (datagen.mean, datagen.std))

# demonstrate effect on a single batch of samples
iterator = datagen.flow(x_train, y_train, batch_size=64)
# get a batch
x_batch, y_batch = iterator.next()
# mean pixel value in the batch
print(x_batch.shape, x_batch.mean(), x_batch.std())

# demonstrate effect on entire training dataset
iterator = datagen.flow(x_train, y_train, batch_size=len(x_train), shuffle=False)
# get a batch
x_batch, y_batch = iterator.next()
# mean pixel value in the batch
print(x_batch.shape, x_batch.mean(), x_batch.std())

Statistics train=33.318 (78.567), test=33.791 (79.172)
Data Generator Mean:33.318, std=78.567
(64, 28, 28, 1) -0.010537894 0.98354304
(60000, 28, 28, 1) -3.4560264e-07 0.9999998
