<img src = 'https://images.manning.com/book/3/8e5d003-09e3-430e-a5a3-f42ee1cafb5f/Stevens-DLPy-HI.png' width = '500' height = '770'>

# 4. Real-world data representation using tensors

Here's a question that we can already address: how do we take a piece of data, a video, or a line of text,

and represent it with a tensor in a way that is appropriate for training a deep learning model?

This is what we'll learn in this chapter.

In every section, we will stop where a deep learning researcher would start: right before feeding the data to a model.
    
We encourage you to keep these datasets; they will constitute excellent material for when we start learning

how to train neural network models in the next chapter.

## 4.1 Working with images

An image is represented as a collection of scalars arranged in a regular grid with a height and a width (in pixels).

We might have a single scalar per grid point (the pixel), which would be represented as a grayscale image; or multiple scalars per grid point,

which would typically represent different colors, as we saw in the previous chapter, or different features like depth from a depth camera.

### 4.1.1 Adding color channels

There are several ways to encode colors into numbers.

The most common is RGB.

### 4.1.2 Loading an image file

Let's start by loading a PNG image using the imageio module.

In [1]:
import imageio

In [2]:
imageio.__version__

'2.9.0'

In [3]:
img_arr = imageio.imread('bobby.jpg')

In [4]:
img_arr.shape

(720, 1280, 3)

(Note: For many purposes, using TorchVision is a great default choice to deal with image and video data.

We go with imageio here for somewhat lighter exploration.)

At this point, img is a NumPy array-like object with three dimensions:
    
two spatial dimensions, width and height; and a third dimension corresponding to the red, green, and blue channels.

Any library that outputs a NumPy array will suffic to obtain a PyTorch tensor.

The only thing to watch out for is the layout of the dimensions.

PyTorch modules dealing with image data require tensors to be laid out as

C x H x W: channels, height, and width. respectively.
    
### 4.1.3 Changing the layout

We can use the tensor's permute method with the old dimensions for each new dimension

to get to an appropriate layout.

Given an input tensor H x W x C as obtained previously,

we get a proper layout by having channel 2 first and then channels 0 and 1.

In [5]:
import torch

img = torch.from_numpy(img_arr)

In [6]:
out = img.permute(2, 0, 1)

As a slightly more effiecient alternative to using 'stack' to build up the tensor,

we can pre-allocate a tensor of appropriate size and fill it with images loaded from a directory.

In [8]:
batch_size = 3
batch = torch.zeros(batch_size, 3, 256, 256, dtype=torch.uint8)

This indicates that our batch will consist of three RGB images 256 pixels in height

and 256 pixels in width.

### 4.1.4 Normalizing the data

Neural networks exhibit the best training performance when the input data

ranges roughly from 0 to 1, from -1 to 1 (this is an effect of how their building blokcks are defined).

So a typical thing we'll want to do is cast a tensor to floating-point and normalize the values of the pixles.

Casting to floating-point is easy, but normalization is trickier.

One possibility is to just divide the values of the pixels by 255
(the maximum representable number in 8-bit unsigned)

In [15]:
batch = batch.float()
batch /= 255.0

Another possibility is to compute the mean and standard deviation of the input data

and scale it so that the output has zero mean and unit standard deviation across each channel

In [11]:
n_channels = batch.shape[1]

for c in range(n_channels):
    mean = torch.mean(batch[:, c])
    std = torch.std(batch[:, c])
    batch[:, c] = (batch[:, c] -mean) / std

## 4.2 3D images: Volumetric data

### 4.2.1 Loading a specialized format

In [19]:
import imageio

dir_path = 'data/p1ch4/volumetric-dicom/2-LUNG 3.0  B70f-04083'
vol_arr = imageio.volread(dir_path, 'DICOM')
vol_arr.shape

Reading DICOM (examining files): 1/99 files (1.0%99/99 files (100.0%)
  Found 1 correct series.
Reading DICOM (loading data): 78/99  (78.899/99  (100.0%)


(99, 512, 512)

In [20]:
vol = torch.from_numpy(vol_arr).float()
vol = torch.unsqueeze(vol, 0)

vol.shape

torch.Size([1, 99, 512, 512])

## 4.3 Reprsenting tabular data

The simplest form of data we'll encounter on a machine learning job

is sitting in a spreadsheet, CSV file, or database.

### 4.3.1 Using a real-world dataset

Our first job as deep learning practitioners is to encode heterogeneous,

real-world data into a tensor of floatig-point numbers, read for consumption by a neural network.

Let's start with something fun: wine!

The file contains a comma-separated collection of values

organized in 12 columns preceded by a header line containing the column names.

The first 11 columns contain values of chemical variables,

and the last column contains the sensory quality score

from 0 (very bad) to 10 (excellent).

Pythoon offers several options for quickly loading a CSV file.

Three popular options are

- csv module

- NumPy

- Pandas

The third option is the most time- and memory-efficient.

However, we'll avoid introducing an additional library in our learning trajectory

just because we need to load a file.

In [25]:
import numpy as np
import csv
wine_path = 'data/p1ch4/tabular-wine/winequality-white.csv'
wineq_numpy = np.loadtxt(wine_path, dtype=np.float32, delimiter=';', skiprows=1)

wineq_numpy

array([[ 7.  ,  0.27,  0.36, ...,  0.45,  8.8 ,  6.  ],
       [ 6.3 ,  0.3 ,  0.34, ...,  0.49,  9.5 ,  6.  ],
       [ 8.1 ,  0.28,  0.4 , ...,  0.44, 10.1 ,  6.  ],
       ...,
       [ 6.5 ,  0.24,  0.19, ...,  0.46,  9.4 ,  6.  ],
       [ 5.5 ,  0.29,  0.3 , ...,  0.38, 12.8 ,  7.  ],
       [ 6.  ,  0.21,  0.38, ...,  0.32, 11.8 ,  6.  ]], dtype=float32)

In [26]:
col_list = next(csv.reader(open(wine_path), delimiter=';'))

wineq_numpy.shape, col_list

((4898, 12),
 ['fixed acidity',
  'volatile acidity',
  'citric acid',
  'residual sugar',
  'chlorides',
  'free sulfur dioxide',
  'total sulfur dioxide',
  'density',
  'pH',
  'sulphates',
  'alcohol',
  'quality'])

and proceed to convert the NumPy array to a PyTorch tensor

In [28]:
wineq = torch.from_numpy(wineq_numpy)

wineq.shape, wineq.dtype

(torch.Size([4898, 12]), torch.float32)

### 4.3.3 Representing scores

We will typicallly remove the score from the tensor of input data

and keep it in a separate tensor, so that we can use the score as the groun truth

without it being input to our model

In [30]:
data = wineq[:, :-1] # selects all rows and all columns except the last
data, data.shape

(tensor([[ 7.0000,  0.2700,  0.3600,  ...,  3.0000,  0.4500,  8.8000],
         [ 6.3000,  0.3000,  0.3400,  ...,  3.3000,  0.4900,  9.5000],
         [ 8.1000,  0.2800,  0.4000,  ...,  3.2600,  0.4400, 10.1000],
         ...,
         [ 6.5000,  0.2400,  0.1900,  ...,  2.9900,  0.4600,  9.4000],
         [ 5.5000,  0.2900,  0.3000,  ...,  3.3400,  0.3800, 12.8000],
         [ 6.0000,  0.2100,  0.3800,  ...,  3.2600,  0.3200, 11.8000]]),
 torch.Size([4898, 11]))

In [33]:
target= wineq[:, -1] # selects all rows and the last column
target, target.shape

(tensor([6., 6., 6.,  ..., 6., 7., 6.]), torch.Size([4898]))

If we want to transform the target tensor in a tensor of labels, we have two options.

One is simply to treat labels as an integer vector of scores

In [35]:
target = wineq[:, -1].long()
target

tensor([6, 6, 6,  ..., 6, 7, 6])

### 4.3.4 One-hot encoding

The other approach is to build a one-hot encoding of the scores:
    
that is, encode each of the 10 scores in a vector of 10 elements,

with all elements set to 0 but one, at a different index for each score.

We can achieve one-hot encoding using the 'scatter_' method,

which fills the tensor with values from a source tensor along the indices provided as arguments.

In [37]:
target_onehot = torch.zeros(target.shape[0], 10)

target_onehot.scatter_(1, target.unsqueeze(1), 1.0)
# unsqueeze 함수는 1인 차원을 생성하는 함수

tensor([[0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.],
        ...,
        [0., 0., 0.,  ..., 0., 0., 0.],
        [0., 0., 0.,  ..., 1., 0., 0.],
        [0., 0., 0.,  ..., 0., 0., 0.]])

The arguments for scatter_ are as follows

- The dimension along which the following two arguments are specified.

- A column tensor indicating the indices of the elements to scatter

- A tensor containing the elements to scatter or a single scalar to scatter (1, in this case)

In other words, the previous invocation reads,

"For each row, take the index of the target label (which coincides with the score in our case)

and use it as the columns index to set the valu 1.0."

The second argument of scatter_, the index tensor, is required to have

the same number of dimensions as the tensor we scatter into.

Since target_onehot has two dimensions (4,898 x 10),

we need to add an extra dummy dimension to target using unsqueeze

In [38]:
target_unsqueezed = target.unsqueeze(1)
target_unsqueezed

tensor([[6],
        [6],
        [6],
        ...,
        [6],
        [7],
        [6]])

### 4.3.5 When to categorize

Let's go back to our 'data' tensor, containing 11 variables associated with

the chemical analysis.

In [39]:
data_mean = torch.mean(data, dim=0)
data_mean

tensor([6.8548e+00, 2.7824e-01, 3.3419e-01, 6.3914e+00, 4.5772e-02, 3.5308e+01,
        1.3836e+02, 9.9403e-01, 3.1883e+00, 4.8985e-01, 1.0514e+01])

In [40]:
data_var = torch.var(data, dim=0)
data_var

tensor([7.1211e-01, 1.0160e-02, 1.4646e-02, 2.5726e+01, 4.7733e-04, 2.8924e+02,
        1.8061e+03, 8.9455e-06, 2.2801e-02, 1.3025e-02, 1.5144e+00])

In [41]:
data_normalized = (data - data_mean) / torch.sqrt(data_var)
data_normalized

tensor([[ 1.7208e-01, -8.1761e-02,  2.1326e-01,  ..., -1.2468e+00,
         -3.4915e-01, -1.3930e+00],
        [-6.5743e-01,  2.1587e-01,  4.7996e-02,  ...,  7.3995e-01,
          1.3422e-03, -8.2419e-01],
        [ 1.4756e+00,  1.7450e-02,  5.4378e-01,  ...,  4.7505e-01,
         -4.3677e-01, -3.3663e-01],
        ...,
        [-4.2043e-01, -3.7940e-01, -1.1915e+00,  ..., -1.3130e+00,
         -2.6153e-01, -9.0545e-01],
        [-1.6054e+00,  1.1666e-01, -2.8253e-01,  ...,  1.0049e+00,
         -9.6251e-01,  1.8574e+00],
        [-1.0129e+00, -6.7703e-01,  3.7852e-01,  ...,  4.7505e-01,
         -1.4882e+00,  1.0448e+00]])

In [42]:
### 4.3.6 Finding thresholds

bad_indexes = target <= 3
bad_indexes.shape, bad_indexes.dtype, bad_indexes.sum()

(torch.Size([4898]), torch.bool, tensor(20))

Note that only 20 of the bad_indexs entries are set to True!

By using a feature in PyTorch called advanced indexing,

we can use a tensor with data type torch.bool to index the data tensor.

In [43]:
bad_data = data[bad_indexes]
bad_data.shape

torch.Size([20, 11])

In [47]:
bad_data = data[target <= 3]
mid_data = data[(target > 3) & (target <7)]
good_data = data[target >= 7]

bad_mean = torch.mean(bad_data, dim=0)
mid_mean = torch.mean(mid_data, dim=0)
good_mean = torch.mean(good_data, dim=0)

In [55]:
for i, args in enumerate(zip(col_list, bad_mean, mid_mean, good_mean)):
    print('{:2} {:20} {:6.2f} {:6.2f} {:6.2f}'.format(i, *args))

 0 fixed acidity          7.60   6.89   6.73
 1 volatile acidity       0.33   0.28   0.27
 2 citric acid            0.34   0.34   0.33
 3 residual sugar         6.39   6.71   5.26
 4 chlorides              0.05   0.05   0.04
 5 free sulfur dioxide   53.33  35.42  34.55
 6 total sulfur dioxide 170.60 141.83 125.25
 7 density                0.99   0.99   0.99
 8 pH                     3.19   3.18   3.22
 9 sulphates              0.47   0.49   0.50
10 alcohol               10.34  10.26  11.42


Let's get the indexes where the total sulfur dioxide colummn is below the midpoint we calculated earlier.

In [57]:
total_sulfur_threshold = 141.83
total_sulfur_data = data[:, 6]
predicted_indexes = torch.lt(total_sulfur_data, total_sulfur_threshold)

# torch.lt(input, other, *, out)
# input: tensor to compare, other: the tensor or value to compare
# out: outpput tensor
# it returns A boolean tensor that is True where input is less than other and False elsewhere

predicted_indexes.shape, predicted_indexes.dtype, predicted_indexes.sum()

(torch.Size([4898]), torch.bool, tensor(2727))

This means our threshold implies that just over half of all the wines are

going to be high quality.

In [58]:
actual_indexes = target > 5

actual_indexes.shape, actual_indexes.dtype, actual_indexes.sum()

(torch.Size([4898]), torch.bool, tensor(3258))

Since there are about 500 more actually good wines than our threshold predicted,

we already have hard evidence that it's not perfect.

Now we need to see how well our predictions line up with the actual rankings.

In [60]:
n_matches = torch.sum(actual_indexes & predicted_indexes).item()
n_predicted = torch.sum(predicted_indexes).item()
n_actual = torch.sum(actual_indexes).item()

n_matches, n_matches / n_predicted, n_matches / n_actual

(2018, 0.74000733406674, 0.6193984039287906)

We got around 2,000 wines right!

Since we predicted 2,700 wines, this gives us a 74% chance

that if we predict a wine to be high quality, it actually is.

Unfortunately, there are 3,200 good wines, and we only identified 61% of them.

## 4.4 Working with time series

We'll switch to another interesting dataset: data from Washington, D.C.,

bike-sharing system reporting the hourly count of rental bikes in 2011-2012

in the Capital Bikeshare system. along with weather and seasonal information.

Our goal will be to take a flat, 2D dataset and transform it into a 3D one.

### 4.4.1 Adding a time dimension

In the source data, each row is a separate hour of data.

We want to change the row-per-hor organization so that

we have one axis that increases at a rate of one day per index increment,

and another axis that represents the hour of the day.

The third axis will be our different columns of data (weather, temperature, and so on).

In [61]:
bikes_numpy = np.loadtxt('data/p1ch4/bike-sharing-dataset/hour-fixed.csv',
                         dtype=np.float32,
                         delimiter=',',
                         skiprows=1,
                         converters={1: lambda x: float(x[8:10])})

bikes = torch.from_numpy(bikes_numpy)
bikes

tensor([[1.0000e+00, 1.0000e+00, 1.0000e+00,  ..., 3.0000e+00, 1.3000e+01,
         1.6000e+01],
        [2.0000e+00, 1.0000e+00, 1.0000e+00,  ..., 8.0000e+00, 3.2000e+01,
         4.0000e+01],
        [3.0000e+00, 1.0000e+00, 1.0000e+00,  ..., 5.0000e+00, 2.7000e+01,
         3.2000e+01],
        ...,
        [1.7377e+04, 3.1000e+01, 1.0000e+00,  ..., 7.0000e+00, 8.3000e+01,
         9.0000e+01],
        [1.7378e+04, 3.1000e+01, 1.0000e+00,  ..., 1.3000e+01, 4.8000e+01,
         6.1000e+01],
        [1.7379e+04, 3.1000e+01, 1.0000e+00,  ..., 1.2000e+01, 3.7000e+01,
         4.9000e+01]])

In [62]:
bikes.shape

torch.Size([17520, 17])

For every hour, the dataset reports the following variables

- Index of record: instant
- Day of month: day
- Season: season(1: spring, 2: summer, 3: falll, 4: winter)

...

- Count of rental bikes: cnt

In a time series dataset such as this one, rows represent successive time-points:

there is a dimension along which they are ordered.

### 4.4.2 Shaping the data by time period

We might want to break up the two-year dataset into wider observation periods, like days.

Thhis way we'll have N (for number of samples) collections of C sequences of length L.

In other words, our time series dataset would be a tensor of dimension 3 and shape N x C x L.

The C would remain our 17 channels, while L would be 24: 1 per hour of the day.

Let's go back to our bike-sharing dataset.

The first column is the index (the global ordering of the data),

the second is the date, and the sixth is the time of the day.

All we have to do to obtain our daily hours dataset is view the same tensor

in batches of 24 hours.

In [63]:
bikes.shape, bikes.stride()

(torch.Size([17520, 17]), (17, 1))

That's 17,520 hours, 17 columns.

Now let's reshape the data to have 3 axes-day, hour, and then our 17 columns.

In [64]:
daily_bikes = bikes.view(-1, 24, bikes.shape[1])
daily_bikes.shape, daily_bikes.stride()

(torch.Size([730, 24, 17]), (408, 17, 1))

What happened here?

First, bikes.shape[1] is 17, the number of columns in the bikes tensor.

But the real crux of this code is the call to view, which is really important:
    
it changes the way the tensor looks at the same data as contained in storage.

For daily_bikes, the stride is telling us that advancing by 1 along the hour dimension

(the second dimension) requires us to advance by 17 places in the storage (or one set of columns);

whereas advancing along the day dimension(the first dimension) requires us to

advance by a number of elements equal to the length of a row in the storage times 24.

We see that the rightmost dimension is the number of columns in the original dataset.

Then, in the middle dimension, we have time, split into chunks of 24 sequential hours.

In other words, we now have N sequences of L hours in a day, for C channels.

To get to our desired N x C x L ordering, we need to transpose the tensor:

In [65]:
daily_bikes = daily_bikes.transpose(1, 2)
daily_bikes.shape, daily_bikes.stride()

(torch.Size([730, 17, 24]), (408, 1, 17))

### 4.4.3 Ready for training

The 'weather situation' variable is ordinal.

It has four levels: 1 gor good weather, and 4 for really bad.

We could treat this variable as categorical.

In [66]:
first_day = bikes[:24].long()
weather_onehot = torch.zeros(first_day.shape[0], 4)
first_day[:,9]

tensor([1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 2, 2, 2, 2])

In [68]:
weather_onehot.scatter_(dim=1,
                       index=first_day[:,9].unsqueeze(1).long() - 1,
                        # decreases the values by 1 because
                        # because weather situation ranges from 1 to 4,
                        # while indices are 0-based
                       value = 1.0)

tensor([[1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [0., 1., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [1., 0., 0., 0.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.],
        [0., 0., 1., 0.],
        [0., 0., 1., 0.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.],
        [0., 1., 0., 0.]])

Last, we concatenate our matrix to our original dataset using the cat function.

Let's look at the first of our results.

In [69]:
torch.cat((bikes[:24], weather_onehot), 1)[:1]

tensor([[ 1.0000,  1.0000,  1.0000,  0.0000,  1.0000,  0.0000,  0.0000,  6.0000,
          0.0000,  1.0000,  0.2400,  0.2879,  0.8100,  0.0000,  3.0000, 13.0000,
         16.0000,  1.0000,  0.0000,  0.0000,  0.0000]])

We could have done the same with the reshaped daily_bikes tensor.

Remember that it is shaped (B, C, L), where L=24.

We first create the zero tensor, with the same B and L,

but with the number of additional columns as C.

In [70]:
daily_weather_onehot = torch.zeros(daily_bikes.shape[0], 4,
                                  daily_bikes.shape[2])

daily_weather_onehot.shape

torch.Size([730, 4, 24])

In [71]:
daily_bikes = torch.cat((daily_bikes, daily_weather_onehot), dim=1)

Other kinds of data look like a time series, in that there is a strict ordering.

Top two on the list? Text and audio.

We'll take a look at text next, and the "Conclusion" section has links to additional examples for audio.

## 4.5. Representing text

Deep learning has taken the field of NLP by storm,

particularly using models that repeatedly consume a combination of

new input and previous model output.

These models are called recurrent neural networks (RNNs),

and they have been applied with great success to text categorization,

text generation, and automated translation systems.

### 4.5.1. Converting text to numbers

There are two particularly intuitive levels at which networks operate on text:
    
at the character level, by processing one character at a time,

and at the word level, where individual words are the finest-grained entities to be seen by the network.

Let's start with a character-level example.

First, let's get some text to process.

An amazing resource here is Project Gutenberg.

Let's load Jane Austen's Pride and Prejudice.

In [72]:
with open('data/p1ch4/jane-austen/1342-0.txt', encoding='utf8') as f:
    text = f.read()

### 4.5.2 One-hot-encoding characters

We are going to one-hot encode our characters.

It is instrumental to limit the one-hot encoding to a character set

that is useful for the text being analyzed.

In our case, since we loaded text in English, it is safe to use ASCII and deal with a small encoding.

We could also make all of the characters lowercase, to reduce the number of different characters in our encoding.

Similarly, we could screen out puncuation, numbers, or other characters that aren't relevant to our expected kinds of text.

We first split our text into a list of lines and pick an arbitrary line to focus on.

In [75]:
lines = text.split('\n')
line = lines[200]
line

'“Impossible, Mr. Bennet, impossible, when I am not acquainted with him'

Let's create a tensor that can hold the total number of one-hot-encoded characters for the whole line.

In [77]:
letter_t = torch.zeros(len(line), 128) # 128 hardcoded due to the limits of ASCII
letter_t.shape

torch.Size([70, 128])

In [78]:
for i, letter in enumerate(line.lower().strip()):
    letter_index = ord(letter) if ord(letter) < 128 else 0
    # The text uses directional double quotes, which are not valid ASCII,
    # so we screen them out here.
    letter_t[i][letter_index] = 1

### 4.5.3 One-hot encoding whole words

We'll define 'clean_words', which takes text and returns it in lowercase and strippped of punctuation.

When we call it on our 'Impossible, Mr.Bennet' line, we get the following

In [88]:
def clean_words(input_str):
    punctuation = '.,;:"!?_-' # 이 부분 문제가 있다...안에 더 포함돼야 하는데
    word_list = input_str.lower().replace('\n',' ').split()
    word_list = [word.strip(punctuation) for word in word_list]
    return word_list

words_in_line = clean_words(line)
line, words_in_line

('“Impossible, Mr. Bennet, impossible, when I am not acquainted with him',
 ['“impossible',
  'mr',
  'bennet',
  'impossible',
  'when',
  'i',
  'am',
  'not',
  'acquainted',
  'with',
  'him'])

Next, let's build a mapping of words to indexes in our encoding

In [85]:
word_list = sorted(set(clean_words(text)))
word2index_dict = {word: i for (i, word) in enumerate(word_list)}

len(word2index_dict), word2index_dict['impossible']

(8484, 3828)

Note that word2index_dict is now a dictionary with words as keys and integer as a value.

We will use it to efficienntly find the index of a word as we one-hot encode it.

In [90]:
word_t = torch.zeros(len(words_in_line), len(word2index_dict))

for i, word in enumerate(words_in_line):
    word_index = word2index_dict[word]
    word_t[i][word_index] = 1
    print('{:2} {:4} {}'.format(i, word_index, word))
    
print(word_t.shape)

 0 8324 “impossible
 1 4905 mr
 2  891 bennet
 3 3828 impossible
 4 8017 when
 5 3740 i
 6  445 am
 7 5054 not
 8  247 acquainted
 9 8094 with
10 3619 him
torch.Size([11, 8484])


The choice between character-level and word-level encoding leaves us to make a trade-off.

### 4.5.4 Text embeddings

### 4.5.5 Text embeddings as a blueprint

## 4.6 Conclusion

We learned to load the most common types of data and shape them for consumption by a neural network.

Now that we're familiar with tensors and how to store data in them,

we can move on to the next step toward the goal of the book: teaching you to train deep neural networks!