### Mini-batching

In this section, you'll go over what mini-batching is and how to apply it in TensorFlow.

Mini-batching is a technique for training on subsets of the dataset instead of all the data at one time. This provides the ability to train a model, even if a computer lacks the memory to store the entire dataset.

Mini-batching is computationally inefficient, since you can't calculate the loss simultaneously across all samples. However, this is a small price to pay in order to be able to run the model at all.

It's also quite useful combined with SGD. The idea is to randomly shuffle the data at the start of each epoch, then create the mini-batches. For each mini-batch, you train the network weights with gradient descent. Since these batches are random, you're performing SGD with each batch.

Let's look at the MNIST dataset with weights and a bias to see if your machine can handle it.

In [4]:
from tensorflow import keras
from keras.utils import to_categorical
import tensorflow as tf
import numpy as np

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
# mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)
mnist = keras.datasets.mnist

(train_features, train_labels), (test_features, test_labels) = mnist.load_data()

train_features = np.reshape(train_features, [-1, n_input])
test_features = np.reshape(test_features, [-1, n_input])

# to_categorical: one hot encoding
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

### Question 1

Calculate the memory size of ```train_features```, ```train_labels```, ```weights```, and ```bias``` in bytes. Ignore memory for overhead, just calculate the memory required for the stored data.

You may have to look up how much memory a float32 requires, using this link.

train_features Shape: (55000, 784) Type: float32

train_labels Shape: (55000, 10) Type: float32

weights Shape: (784, 10) Type: float32

bias Shape: (10,) Type: float32



In [5]:
(55000*784 + 55000*10 + 784*10 + 10) * 32

1397691200

So $1397691200$ bit; is $174,7114$ Mega Bytes. Note that 32bit = 1 float

-----
The total memory space required for the inputs, weights and bias is around 174 megabytes, which isn't that much memory. You could train this whole dataset on most CPUs and GPUs.

But larger datasets that you'll use in the future measured in gigabytes or more. It's possible to purchase more memory, but it's expensive. A Titan X GPU with 12 GB of memory costs over $1,000.

Instead, in order to run large models on your machine, you'll learn how to use mini-batching.

Let's look at how you implement mini-batching in TensorFlow.

### TensorFlow Mini-batching

In order to use mini-batching, you must first divide your data into batches.

Unfortunately, it's sometimes impossible to divide the data into batches of exactly equal size. For example, imagine you'd like to create batches of 128 samples each from a dataset of 1000 samples. Since 128 does not evenly divide into 1000, you'd wind up with 7 batches of 128 samples, and 1 batch of 104 samples. (7*128 + 1*104 = 1000)

----
Sample size = 1000  
Batch = 128  

$1000 / 128 = 7.8125$  
$0.8125 \times 128 = 104$  

So 7 batches of 128 samples  
and 1 batch of 104 samples

----

In that case, the size of the batches would vary, so you need to take advantage of TensorFlow's ```tf.placeholder()``` function to receive the varying batch sizes.

Continuing the example, if each sample had ```n_input = 784``` features and ```n_classes = 10``` possible labels, the dimensions for ```features``` would be ```[None, n_input]``` and ```labels``` would be ```[None, n_classes]```.

In [6]:
# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

What does ```None``` do here?

The None dimension is a placeholder for the batch size. At runtime, TensorFlow will accept any batch size greater than 0.

Going back to our earlier example, this setup allows you to feed ```features``` and ```labels``` into the model as either the batches of 128 samples or the single batch of 104 samples.

### Question 2

Use the parameters below, how many batches are there, and what is the last batch size?

features is (50000, 400)

labels is (50000, 10)

batch_size is 128

----
Sample size = 50000  
Batch = 128  

$50000 / 128 = 390.625$  
$0.625 \times 128 = 80$  

So 390 batches of 128 samples  
and 1 batch of 80 samples

----

Now that you know the basics, let's learn how to implement mini-batching.

### Question 3

Implement the ```batches``` function to batch ```features``` and ```labels```. The function should return each batch with a maximum size of ```batch_size```. To help you with the quiz, look at the following example output of a working ```batches``` function.

In [7]:
import math
import pprint

def batches(batch_size, features, labels):
    """
    Create batches of features and labels
    :param batch_size: The batch size
    :param features: List of features
    :param labels: List of labels
    :return: Batches of (Features, Labels)
    """
    assert len(features) == len(labels)
    # TODO: Implement batching
    output_batches = []
    
    sample_size = len(features)
    for start_i in range(0, sample_size, batch_size):
        end_i = start_i + batch_size
        batch = [features[start_i:end_i], labels[start_i:end_i]]
        output_batches.append(batch)
        
    return output_batches

# 4 Samples of features
example_features = [
    ['F11','F12','F13','F14'],
    ['F21','F22','F23','F24'],
    ['F31','F32','F33','F34'],
    ['F41','F42','F43','F44']]
# 4 Samples of labels
example_labels = [
    ['L11','L12'],
    ['L21','L22'],
    ['L31','L32'],
    ['L41','L42']]

example_batches = batches(3, example_features, example_labels)

pp = pprint.PrettyPrinter(depth=4)
pp.pprint(example_batches)

[[[['F11', 'F12', 'F13', 'F14'],
   ['F21', 'F22', 'F23', 'F24'],
   ['F31', 'F32', 'F33', 'F34']],
  [['L11', 'L12'], ['L21', 'L22'], ['L31', 'L32']]],
 [[['F41', 'F42', 'F43', 'F44']], [['L41', 'L42']]]]


Let's use mini-batching to feed batches of MNIST features and labels into a linear model.

Set the batch size and run the optimizer over all the ```batches``` with the batches function. The recommended batch size is 128. If you have memory restrictions, feel free to make it smaller.

In [8]:
def normalize_grayscale(image_data):
    """
    Normalize the image data with Min-Max scaling to a range of [0.1, 0.9]
    :param image_data: The image data to be normalized
    :return: Normalized image data
    """
    a = 0.1
    b = 0.9
    grayscale_min = 0
    grayscale_max = 255
    return a + ( ( (image_data - grayscale_min)*(b - a) )/( grayscale_max - grayscale_min ) )

In [9]:
"""
Note there are a set of problems: if I normalize grayscale, the kernel dies.
If I cast the labels to float32 the kernel dies
"""
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
from tensorflow import keras
from keras.utils import to_categorical
import numpy as np

learning_rate = 0.001
n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

is_features_normal = True
is_labels_encod = False

# Import MNIST data
# mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)
mnist = keras.datasets.mnist

(train_features, train_labels), (test_features, test_labels) = mnist.load_data()

train_features = np.reshape(train_features, [-1, n_input])
test_features = np.reshape(test_features, [-1, n_input])

# to_categorical: one hot encoding
if not is_labels_encod:
    train_labels = to_categorical(train_labels)
    test_labels = to_categorical(test_labels)
    
    # Change to float32, so it can be multiplied against the features in TensorFlow, which are float32
    # train_labels = train_labels.astype(np.float32)
    # test_labels = test_labels.astype(np.float32)
    is_labels_encod = True

if not is_features_normal:
    # The features are already scaled and the data is shuffled
    train_features = normalize_grayscale(train_features)
    test_features = normalize_grayscale(test_features)
    is_features_normal = True


# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=labels))

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

# TODO: Set batch size
batch_size = 128
assert batch_size is not None, 'You must set the batch size'

init = tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    # TODO: Train optimizer on all batches
    for batch_features, batch_labels in batches(batch_size, train_features, train_labels):
        sess.run(optimizer, feed_dict={features: batch_features, labels: batch_labels})

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))

Test Accuracy: 0.8112999796867371


The accuracy is low, but you probably know that you could train on the dataset more than once. You can train a model using the dataset multiple times. You'll go over this subject in the next section where we talk about "epochs".

### Epochs

An epoch is a single forward and backward pass of the whole dataset. This is used to increase the accuracy of the model without requiring more data. This section will cover epochs in TensorFlow and how to choose the right number of epochs.

The following TensorFlow code trains a model using 10 epochs.

In [12]:
"""
Please take not that the dataset called with keras.datasets.mnist is different that the one called with
input_data.read_data_sets('/datasets/ud730/mnist', one_hot=True)

First the data has 60,000 observations for the training set
10,000 for the test set.

We have to split the training set into a validation/training set of size 55,000 and 5000
that's why we use 0.08332 as the test size

Moreover, the target variable is not "One-Hot encoded"

"""

from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
import numpy as np
from tensorflow import keras
from keras.utils import to_categorical
from sklearn.model_selection import train_test_split
#from helper import batches  # Helper function created in Mini-batching section

def batches(batch_size, features, labels):
    """
    Create batches of features and labels
    :param batch_size: The batch size
    :param features: List of features
    :param labels: List of labels
    :return: Batches of (Features, Labels)
    """
    assert len(features) == len(labels)
    # TODO: Implement batching
    output_batches = []
    
    sample_size = len(features)
    for start_i in range(0, sample_size, batch_size):
        end_i = start_i + batch_size
        batch = [features[start_i:end_i], labels[start_i:end_i]]
        output_batches.append(batch)
        
    return output_batches

def print_epoch_stats(epoch_i, sess, last_features, last_labels):
    """
    Print cost and validation accuracy of an epoch
    """
    current_cost = sess.run(
        cost,
        feed_dict={features: last_features, labels: last_labels})
    valid_accuracy = sess.run(
        accuracy,
        feed_dict={features: valid_features, labels: valid_labels})
    print('Epoch: {:<4} - Cost: {:<8.3} Valid Accuracy: {:<5.3}'.format(
        epoch_i,
        current_cost,
        valid_accuracy))

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

mnist = keras.datasets.mnist

(train_features, train_labels), (test_features, test_labels) = mnist.load_data()

train_features = np.reshape(train_features, [-1, n_input])
test_features = np.reshape(test_features, [-1, n_input])

# to_categorical: one hot encoding
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

train_features, valid_features, train_labels, valid_labels = train_test_split(train_features, train_labels,
                                              shuffle = True, test_size=0.08332)

# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

# Logits - xW + b
logits = tf.add(tf.matmul(features, weights), bias)

# Define loss and optimizer
learning_rate = tf.placeholder(tf.float32)
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(logits=logits, labels=labels))
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate).minimize(cost)

# Calculate accuracy
correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(labels, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

init = tf.global_variables_initializer()

batch_size = 128
epochs = 10
learn_rate = 0.001

train_batches = batches(batch_size, train_features, train_labels)

with tf.Session() as sess:
    sess.run(init)

    # Training cycle
    for epoch_i in range(epochs):

        # Loop over all batches
        for batch_features, batch_labels in train_batches:
            train_feed_dict = {
                features: batch_features,
                labels: batch_labels,
                learning_rate: learn_rate}
            sess.run(optimizer, feed_dict=train_feed_dict)

        # Print cost and validation accuracy of an epoch
        print_epoch_stats(epoch_i, sess, batch_features, batch_labels)

    # Calculate accuracy for test dataset
    test_accuracy = sess.run(
        accuracy,
        feed_dict={features: test_features, labels: test_labels})

print('Test Accuracy: {}'.format(test_accuracy))

Epoch: 0    - Cost: 2.51e+02 Valid Accuracy: 0.782
Epoch: 1    - Cost: 1.98e+02 Valid Accuracy: 0.82 
Epoch: 2    - Cost: 1.72e+02 Valid Accuracy: 0.832
Epoch: 3    - Cost: 1.62e+02 Valid Accuracy: 0.841
Epoch: 4    - Cost: 1.57e+02 Valid Accuracy: 0.848
Epoch: 5    - Cost: 1.48e+02 Valid Accuracy: 0.851
Epoch: 6    - Cost: 1.46e+02 Valid Accuracy: 0.855
Epoch: 7    - Cost: 1.39e+02 Valid Accuracy: 0.86 
Epoch: 8    - Cost: 1.32e+02 Valid Accuracy: 0.861
Epoch: 9    - Cost: 1.27e+02 Valid Accuracy: 0.865
Test Accuracy: 0.8745999932289124
