When the batch size is equal to 1, the delay from INPUT -> OUTPUT appears to be 4 batches instead of 3. This is not true for batch sizes != 1.