To scale the data in preparation for the neural network in image classification, standardization and normalization can be used. Pixel values are ranging between 0 and 255.
The normalization rescales the values to a range between 0 and 1.
The standardization assumes a gaussian distribution of the data values. The rescaling will set the mean as zero and most of the data points will fall within three standard deviations from it (range between about -3 and 3).

For scaling images the ImageDataGenerator can be used. It comes from keras, which is a python library.
The ImageDataGenerator can be called for various tasks, including scaling but also augmentation.
It takes batches of the data and when called in the model training (using fit_generator) it will return each batch of data scaled (and/or augmentated) to the model.

A normalization can be achieved by giving the command (rescale = 1.0/255.0). 
A Standadization is done via (featurewise_center = True, featurewise_std_normalization = True). For standarization the mean pixel value is substracted from all pixels and the result is divided by the standard deviation of the pixel values.

Calling the ImageDataGenerator on the whole training set is called a feature-wise generator. If the statistics are calculated separately for each image, it is a sample-wise standardization.
For feature-wise: (featurewise_center = True, featurewise_std_normalization = True)
For sample-wise: (samplewise_center = True, samplewise_std_normalization = True)

Installs / Imports necessary

In [None]:
%pip install keras
%pip install tensorflow

# converts a class vector (integers) to binary class matrix 
from keras.utils import to_categorical
# The ImageDataGenerator itself
from keras.preprocessing.image import ImageDataGenerator


Preparing the data

To start this all other preprocessing steps have to be finished and the data has to be split into train test and validation set.

In [None]:
# if necessary: reshape dataset to have the correct format for the convolutional layers later (input needed is rows, cols, channel; the channel has to be created)
width, height, channels = X_train.shape[1], X_train.shape[2], 1
X_train = X_train.reshape((X_train.shape[0], width, height, channels))
X_test = X_test.reshape((X_test.shape[0], width, height, channels)) 

# this adds a second dimension
trainY = to_categorical(trainY)
testY = to_categorical(testY)

Optional part for testing the scaling

In [None]:
print('Train min=%.3f, max=%.3f' % (X_train.min(), X_train.max()))
print('Test min=%.3f, max=%.3f' % (X_test.min(), X_test.max()))

Data Scaling in time during model fit

In [None]:
# Create an ImageDataGenerator and input the chosen scaling choices (also augmentation is possible)
# for Normalization: (rescale = 1.0/255.0)
# for Standardization (this includes Centering): (featurewise_center = True, featurewise_std_normalization = True)
datagen = ImageDataGenerator()

# if needed (depends on scaling method), calculate for the whole training data set the statistics using the .fit() function. Later on this can be applied to test and validation data set.
datagen.fit(X_train)

# A neural network model can be fitted with the data generator by using .flow() . It retrieves an iterator which returns batches of data and passes it to the fit_generator() function.

# creating the iterator, choose the wanted batch size
train_iterator = datagen.flow(X_train, y_train, batch_size = 64)
test_iterator = datagen.flow(X_test, y_test, batch_size = 64) 

# optional: creating an iterator for the validation data set (only used if a validation data set is present)
val_iterator = datagen.flow(X_val, y_val, batch_size = 64)

# Optional: confirm the scaling
# print('Train min=%.3f, max=%.3f' % (X_train.min(), X_train.max()))
# print('Test min=%.3f, max=%.3f' % (X_test.min(), X_test.max()))

# Optional: confirm that the iterators work
# batchX, batchy = train_iterator.next()
# print('Batch shape=%s, min=%.3f, max=%.3f' % (batchX.shape, batchX.min(),
#                                               batchX.max()))

# fitting the model using the train_iterator for scaling in real time, choose number of epochs
model.fit_generator(train_iterator, steps_per_epoch=len(train_iterator), epochs=5)

# Optional: The model can also be evaluated
# test_iterator = datagen.flow(X_test, y_test)
# loss = model.evaluate_generator(test_iterator)
