Mini-batching is a technique for training on subsets of the dataset instead of all the data at one time. This provides the ability to train a model, even if a computer lacks the memory to store the entire dataset.

Mini-batching is computationally inefficient, since you can't calculate the loss simultaneously across all samples. However, this is a small price to pay in order to be able to run the model at all.

It's also quite useful combined with SGD. The idea is to randomly shuffle the data at the start of each epoch, then create the mini-batches. For each mini-batch, you train the network weights with gradient descent. Since these batches are random, you're performing SGD with each batch.

Let's look at the MNIST dataset with weights and a bias to see if your machine can handle it.

In [3]:
import numpy as np
import tensorflow as tf

In [1]:
from tensorflow.examples.tutorials.mnist import input_data

n_input = 784  # MNIST data input (img shape: 28*28)
n_classes = 10  # MNIST total classes (0-9 digits)

# Import MNIST data
mnist = input_data.read_data_sets('datasets/ud730/mnist', one_hot=True)

Extracting datasets/ud730/mnist/train-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/train-labels-idx1-ubyte.gz
Extracting datasets/ud730/mnist/t10k-images-idx3-ubyte.gz
Extracting datasets/ud730/mnist/t10k-labels-idx1-ubyte.gz


In [4]:
# The features are already scaled and the data is shuffled
train_features = mnist.train.images
test_features = mnist.test.images

train_labels = mnist.train.labels.astype(np.float32)
test_labels = mnist.test.labels.astype(np.float32)

# Weights & bias
weights = tf.Variable(tf.random_normal([n_input, n_classes]))
bias = tf.Variable(tf.random_normal([n_classes]))

## Question 1

Calculate the memory size of train_features, train_labels, weights, and bias in bytes. Ignore memory for overhead, just calculate the memory required for the stored data.

You may have to look up how much memory a float32 requires, using this link: https://en.wikipedia.org/wiki/Single-precision_floating-point_format .

`train_features` Shape: `(55000, 784)` Type: `float32`

In [6]:
55000*784*4

172480000

train_labels Shape: (55000, 10) Type: float32

In [7]:
55000*10*4

2200000

weights Shape: (784, 10) Type: float32

In [8]:
784*10*4

31360

bias Shape: (10,) Type: float32

In [10]:
10 * 4

40

## TensorFlow Mini-batching

In order to use mini-batching, you must first divide your data into batches.

Unfortunately, it's sometimes impossible to divide the data into batches of exactly equal size. For example, imagine you'd like to create batches of 128 samples each from a dataset of 1000 samples. Since 128 does not evenly divide into 1000, you'd wind up with 7 batches of 128 samples, and 1 batch of 104 samples. (7*128 + 1*104 = 1000)

In that case, the size of the batches would vary, so you need to take advantage of TensorFlow's tf.placeholder() function to receive the varying batch sizes.

Continuing the example, if each sample had n_input = 784 features and n_classes = 10 possible labels, the dimensions for features would be [None, n_input] and labels would be [None, n_classes].



In [11]:
# Features and Labels
features = tf.placeholder(tf.float32, [None, n_input])
labels = tf.placeholder(tf.float32, [None, n_classes])

What does None do here?

The None dimension is a placeholder for the batch size. At runtime, TensorFlow will accept any batch size greater than 0.

Going back to our earlier example, this setup allows you to feed features and labels into the model as either the batches of 128 samples or the single batch of 104 samples.


## Question 2

Use the parameters below, how many batches are there, and what is the last batch size?

features is (50000, 400)

In [16]:
def batches_last(n, batch_size=128):
    return (n // batch_size) + 1, n % batch_size

batches_last(50000)

(391, 80)

labels is (50000, 10)