Data frames are pulled from tensors by an operation known as **slicing**

Slicing is a very important tensor operation. It enables us to get hold of different parts of the tensor. Slicing is especially useful when we are designing network tests but can also be necessary for data processing. We don't need to worry about preparing data for training: Tensorflow is clever enough to automatically slice the data for us.

The slice operator is `:` and `a:b` means select elements `a` to `b-1`

The slice operator is `:` and `a:b` means select elements `a` to `b-1`.

In [1]:
import numpy as np
x = np.array([3, 7, 5, 1])
print(x[1:3])

[7 5]


Elements 1 and 2.

The last element is `-1`, the second last is `-2` etc.

The last element is `-1`, the second last is `-2` etc.

In [2]:
x = np.array([3, 7, 5, 1])
print(x[-3:-1])           

[7 5]


The vector has length 4. 

-3 is short for element 4 - 3 i.e. element 1 and -1 is short for element 4 - 1 i.e. element 3. 

So -3:-1 means from 1 (inclusive) to 3 (exclusive).

In [3]:
# stack of three (2, 2) tensors
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
c = np.array([[9, 10], [11, 12]])
x = np.array([a, b, c])
print(x, '\t', x.shape)

[[[ 1  2]
  [ 3  4]]

 [[ 5  6]
  [ 7  8]]

 [[ 9 10]
  [11 12]]] 	 (3, 2, 2)


Let's demonstrate slicing with a rank 3 tensor. This tensor is a stack of 2 by 2 arrays, `a`, `b` and `c`.

In [4]:
# [a]
slice = x[0:1, :, :]
print(slice, '\t', slice.shape)

[[[1 2]
  [3 4]]] 	 (1, 2, 2)


In this example, `0:1` selects element 0 only. A blank either side indicates all elements. So `x[0]` is the first 2 by 2 array, and the colons in positions two and three mean select everything. 

In [5]:
# equivalent to x[:1, :, :]
slice = x[:1, :, :]
print(slice, '\t', slice.shape)
print()
slice = x[:1]
print(slice, '\t', slice.shape)

[[[1 2]
  [3 4]]] 	 (1, 2, 2)

[[[1 2]
  [3 4]]] 	 (1, 2, 2)


Here are a couple of alternative forms. The first 2 by 2 array is selected in each case.

In [6]:
# [b, c]
print('[b, c] =\n', np.array([b, c]))
slice = x[1:3]
print('\nx[1:3] =\n', slice, '\t', slice.shape)

[b, c] =
 [[[ 5  6]
  [ 7  8]]

 [[ 9 10]
  [11 12]]]

x[1:3] =
 [[[ 5  6]
  [ 7  8]]

 [[ 9 10]
  [11 12]]] 	 (2, 2, 2)


1:3 means select elements 1 and 2, that is, b and c.

In [7]:
# the first row of a, b and c
slice = x[:, 0, :] # or, x[:, 0]
print(slice, '\t', slice.shape)

[[ 1  2]
 [ 5  6]
 [ 9 10]] 	 (3, 2)


The second position in the slice operator expression corresponds to the first element of `a`, `b` and `c` i.e. to the first 'row' or 'vector' of `a`, `b` or `c`.

Slicing MNIST image tensors about the first axis selects contiguous images

Slicing MNIST image tensors about the first axis selects contiguous images.

In [8]:
from tensorflow.keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# select images 10 through to image 19
slice = train_images[10:20]   
print(slice.shape)

(10, 28, 28)


The first index of the images tensors selects a specific image. `[10:20]` selects images 10 to 19.

`7:-7` from element 7 up to, but not including, element 21 (= 28 - 7)
 
 
0, ..., 6,|7, ..., 20|, 21, ..., 27

`7:-7` means from element 7 up to, but not including, element 21 (= 28 - 7)

In [9]:
# extract the middle 14 x 14 subimages from all test images
slice = train_images[:, 7:-7, 7:-7]
slice.shape

(60000, 14, 14)

Imagine we wish to extract the middle 14 by 14 sub-image of every test image. Easy with slicing!

Axis 0 (the first axis) is always the samples axis

- `train_images[0]` - the first 28 x 28 image

- `train_images[1]` - the second 28 x 28 image

Axis 0 (the first axis) is always the samples axis

`train_images[0]` - the first 28 x 28 image

`train_images[1]` - the second 28 x 28 image

Data is rarely processed in its entirety

Data is broken up into **mini-batches**

- `train_images[:128]` a mini-batch of the first 128 images

- `train_images[128:256]` the second mini-batch

- `train_images[128 * n, 128 * (n + 1)]` the $n+1$'th mini-batch

Data is rarely processed by the network in its entirety. Data is broken up into 'mini-batches'.

`train_images[:128]` a mini-batch of the first 128 images, `train_images[128:256]` the second mini-batch etc. Luckily for us, Tensorflow automatically slices training data mini-batches. We just need to specifit the size of the mini-batch - the `batch_size` argument to `fit`.