When training a neural network, we feed the training data to our network. <br>
Each full scan of the training data is called an **epoch**. <br>
If we feed all of the training data in one step, we call it **batch mode** <font color="blue">(the batch size equals the size of the training set)</font>. <br> 
However, in most cases, we divide the training data into smaller subsets while feeding the data to our model, just as in other machine learning algorithms. This is called **mini-batch** mode. <br>
Sometimes, we are forced to do this because the complete training set is too big and doesn't fit in the memory. If we look at the training time, we would say: the bigger the batch size, the better (as long as the batch fits in the memory). <br><br>
However, using mini-batches also has other advantages.<br>
**Firstly**, it reduces the complexity of the training process.<br> 
**Secondly**, it reduces the effect of noise on the model by summing or averaging the gradient (reducing the variance).
<br><br>
**In mini-batch mode, the optimizer uses a balance of efficiency and robustness.**

If the size of the mini-batch–also called batch size–is too small, the model can converge faster but is more likely to pick up noise. <br>
If the batch size is too large, the model will probably converge slower but the estimates of error gradients will be more accurate.<br> 
<font color="red"> **Note:** </font> For deep learning, when dealing with a lot of data, it's good to use large batch sizes. These batches are also great for parallelization on GPUs. 

In [None]:
from keras.datasets import cifar10
from keras.preprocessing.image import ImageDataGenerator
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten, Conv2D, MaxPooling2D
from keras.optimizers import Adam
from keras.callbacks import EarlyStopping, TensorBoard, ModelCheckpoint
from keras.utils import to_categorical

In [None]:
n_classes = 5

In [None]:
train_datagen = ImageDataGenerator(rescale=1./255,
                                   shear_range=0.2,
                                   zoom_range=0.2,
                                   width_shift_range=0.1,
                                   height_shift_range=0.1,
                                   horizontal_flip=True,
                                   vertical_flip=False,
                                   validation_split=0.25)

test_datagen = ImageDataGenerator(rescale=1./255)

training_set = train_datagen.flow_from_directory('data',
                                                target_size = (150,150),
                                                 batch_size = batch_size,
                                                 class_mode = 'categorical',
                                                 subset = "training")

validation_set = train_datagen.flow_from_directory('data',
                                            target_size = (150,150),
                                            batch_size = batch_size,
                                            class_mode = 'categorical',
                                            subset = "validation")

In [None]:
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same', input_shape = (150, 150,3)))
model.add(Activation('relu'))
model.add(Conv2D(32, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64, (3, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(n_classes))
model.add(Activation('softmax'))
model.summary()

In [None]:
opt = Adam()
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

Next, we set our callback functions. We use early stopping to prevent **overfitting**, ModelCheckpoint to save our best model automatically, and TensorBoard for an analysis of our results:

In [None]:
callbacks = [EarlyStopping(monitor='val_acc', patience=5, verbose=2),
             ModelCheckpoint('checkpoints/weights.{epoch:02d}-'+str(batch_size)+'.hdf5',
                             save_best_only=True),  TensorBoard()]

Let's start training our first model with a batch size of **32**:

In [None]:
batch_size = 32
n_epochs = 50
history_32 = model.fit_generator(
            training_set,
            epochs = n_epochs,
            verbose = 1,
            validation_data = validation_set)

Next, we recompile our model to make sure the weights are initialized:

In [None]:
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

We can now start training our model with a batch size of **256**:

In [None]:
batch_size = 256
n_epochs = 50
history_256 = model.fit_generator(
            training_set,
            epochs = n_epochs,
            verbose = 1,
            validation_data = validation_set)

In [None]:
val_acc_256 = history_256.history['val_acc']
val_acc_32 = history_32.history['val_acc']
plt.plot(range(len(val_acc_32)), val_acc_32, label='CNN model with 32 BN')
plt.plot(range(len(val_acc_256)), val_acc_256, label='CNN model with 256 BN')
plt.title('Validation accuracy on dataset')
plt.xlabel('epochs')
plt.ylabel('accuracy')
plt.legend()
plt.show()

In [None]:
print(max(val_acc_256), max(val_acc_32))
print(len(val_acc_256), len(val_acc_32))

<font color="red"> **Note:** </font><br>
when using a bigger batch size, we need more epochs (in theory, the total number of steps/updates for converging should be the same regardless of the batch size). However, more interestingly, the validation accuracy of the model with a batch size of 256 is a bit higher. Whereas the model with a batch size of 32 tops. <br><br>
When we train our model with a smaller batch size, the model might pick up a bit more noise. However, by fine-tuning our model further (for example, increasing the patience when using early stopping and decreasing the learning rate), the model with a batch size of 32 should be able to achieve similar accuracies. 