# 3 It starts with a tensor
This chapter covers
* Understanding tensors, the basic data structure in PyTorch
* Indexing and operating on tensors
* Interoperating with NumPy multidimensional arrays
* Moving computations to the GPU for speed

The process begins by converting our input into floating-point numbers. We will cover converting image pixels to numbers, as we see in the first step of figure 3.1, in chap- ter 4 (along with many other types of data). But before we can get to that, in this chapter, we learn how to deal with all the floating-point numbers in PyTorch by using tensors.

## 3.1 The world as floating-point numbers
Since floating-point numbers are the way a network deals with information, we need a way to encode real-world data of the kind we want to process into something digestible by a network and then decode the output back to something we can understand and use for our purpose.

![](images/3.1.jpg)

PyTorch introduces a fundamental data structure: **the tensor**. We already bumped into tensors in chapter 2, when we ran inference on pretrained net-works. For those who come from mathematics, physics, or engineering, the term tensor comes bundled with the notion of spaces, reference systems, and transformations between them. For better or worse, those notions do not apply here. In the context of deep learning, tensors refer to the generalization of vectors and matrices to an arbitrary number of dimensions, as we can see in figure 3.2. Another name for the same concept is **multidimensional array**. The dimensionality of a tensor coincides with the number of indexes used to refer to scalar values within the tensor.

![](images/3.3.png)

Compared to NumPy arrays, PyTorch tensors have a few superpowers, such as the ability to perform very fast operations on graphical processing units (GPUs), distribute operations on multiple devices or machines, and keep track of the graph of computations that created them. These are all important features when implementing a modern deep learning library.

## 3.2 Tensors: Multidimensional arrays
We have already learned that tensors are the fundamental data structure in PyTorch. A tensor is an array: that is, a data structure that stores a collection of numbers that are accessible individually using an index, and that can be indexed with multiple indices.

### 3.2.1 From Python lists to PyTorch tensors
Let’s see `list` indexing in action so we can compare it to tensor indexing. Take a list of three numbers in Python

In [1]:
a = [1.0, 2.0, 1.0]

In [2]:
a[0]

1.0

In [3]:
a[2] = 3.0

In [4]:
a[2]

3.0

It is not unusual for simple Python programs dealing with vectors of numbers, such as the coordinates of a 2D line, to use Python lists to store the vectors. As we will see in the following chapter, using the more efficient tensor data structure, many types of data—from images to time series, and even sentences—can be represented. 

### 3.2.2 Constructing our first tensors
Let’s construct our first PyTorch tensor and see what it looks like. It won’t be a particularly meaningful tensor for now, just three ones in a column:

In [5]:
import torch      # imports the torch module
a = torch.ones(3) # Creates a one-dimensional tensor of size 3 filled with 1s
a

tensor([1., 1., 1.])

In [6]:
a[1]

tensor(1.)

In [7]:
float(a[1])

1.0

In [9]:
a[2] = 2.0
a

tensor([1., 1., 2.])

After importing the `torch` module, we call a function that creates a (one-dimensional) tensor of size 3 filled with the value `1.0` . We can access an element using its zero-based index or assign a new value to it. Although on the surface this example doesn’t differ much from a list of number objects, under the hood things are completely different.

### 3.2.3 The essence of tensors
Python lists or tuples of numbers are collections of Python objects that are individually allocated in memory, as shown on the left in figure 3.3. PyTorch tensors or NumPy arrays, on the other hand, are views over (typically) contiguous memory blocks containing unboxed C numeric types rather than Python objects. Each element is a 32-bit (4-byte) float in this case, as we can see on the right side of figure 3.3. This means storing a 1D tensor of 1,000,000 float numbers will require exactly 4,000,000 contiguous bytes, plus a small overhead for the metadata (such as dimensions and numeric type).

![](images/3.4.png)

Say we have a list of coordinates we’d like to use to represent a geometrical object: perhaps a 2D triangle with vertices at coordinates *(4, 1)*, *(5, 3)*, and *(2, 1)*. The example is not particularly pertinent to deep learning, but it’s easy to follow. Instead of having coordinates as numbers in a Python list, as we did earlier, we can use a one-dimensional tensor by storing Xs in the even indices and Ys in the odd indices,
like this:

In [10]:
points = torch.zeros(6)  # using .zero is just way to het an appropriately sized array.
points[0] = 4.0          # we overwrite those zeros with the values we actually want
points[1] = 1.0
points[2] = 5.0
points[3] = 3.0
points[4] = 2.0
points[5] = 1.0

We can also pass a Python list to the constructor, to the same effect:

In [11]:
points = torch.tensor([4.0, 1.0, 5.0, 3.0, 2.0, 1.0])
points

tensor([4., 1., 5., 3., 2., 1.])

To get the coordinates of the first point, we do the following:

In [12]:
float(points[0]), float(points[1])

(4.0, 1.0)

This is OK, although it would be practical to have the first index refer to individual 2D points rather than point coordinates. For this, we can use a 2D tensor:

In [13]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])

Here, we pass a list of lists to the constructor. We can ask the tensor about its shape:

In [14]:
points.shape

torch.Size([3, 2])

This informs us about the size of the tensor along each dimension. We could also use
zeros or ones to initialize the tensor, providing the size as a tuple:

In [15]:
points = torch.zeros(3, 2)
points

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

Now we can access an individual element in the tensor using two indices:

In [16]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])

In [17]:
points[0, 1]

tensor(1.)

This returns the Y-coordinate of the zeroth point in our dataset. We can also access the first element in the tensor as we did before to get the 2D coordinates of the first point:

In [18]:
points[0]

tensor([4., 1.])

## 3.3 Indexing tensors
What if we need to obtain a tensor containing all points but the first? That’s easy using range indexing notation, which also applies to standard Python lists. Here’s a reminder:

In [20]:
some_list = list(range(6))
some_list[:]      # All elements in the list
some_list[1:4]    # From element 1 inclusive to element 4 exclusive
some_list[1:]     # From element 1 inclusive to the end of the list
some_list[:4]     # From the start of the list to element 4 exclusive
some_list[:-1]    # From the start of the list to one before the last element
some_list[1:4:2]  # From element 1 inclusive to element 4 exclusive, in steps of 2

[1, 3]

In [21]:
points[1:]        # All rows after the first; implicitly all columns
points[1:, :]     # All rows after the first; all columns
points[1:, 0]     # All rows after the first; first column
points[None]      # Adds a dimension of size 1 ,just like unsqueeze

tensor([[[4., 1.],
         [5., 3.],
         [2., 1.]]])

In addition to using ranges, PyTorch features a powerful form of indexing, called *advanced indexing*, which we will look at in the next chapter.

## 3.4 Named tensors
To make things concrete, imagine that we have a 3D tensor like `mg_t` from section 2.1.4 (we will use dummy data for simplicity here), and we want to convert it to grayscale. We looked up typical weights for the colors to derive a single brightness value:

In [25]:
img_t = torch.randn(3, 5, 5) # shape [channels, rows, columns]
weights = torch.tensor([0.2126, 0.7152, 0.0722])

In [26]:
batch_t = torch.randn(2, 3, 5, 5) # shape [batch, channels, rows, columns]

So sometimes the RGB channels are in dimension 0, and sometimes they are in dimension 1. But we can generalize by counting from the end: they are always in dimension–3, the third from the end. The lazy, unweighted mean can thus be written as follows:

In [31]:
img_gray_naive = img_t.mean(-3)
batch_gray_naive = batch_t.mean(-3)
img_gray_naive.shape, batch_gray_naive.shape

(torch.Size([5, 5]), torch.Size([2, 5, 5]))

But now we have the weight, too. PyTorch will allow us to multiply things that are the same shape, as well as shapes where one operand is of size 1 in a given dimension. It also appends leading dimensions of size 1 automatically. This is a feature called `broadcasting. batch_t` of shape `(2, 3, 5, 5)` is multiplied by `unsqueezed_weights` of shape `(3, 1, 1)`, resulting in a tensor of shape `(2, 3, 5, 5)`, from which we can then sum the third dimension from the end (the three channels):

In [29]:
unsqueezed_weights = weights.unsqueeze(-1).unsqueeze_(-1)
img_weights = (img_t * unsqueezed_weights)
batch_weights = (batch_t * unsqueezed_weights)
img_gray_weighted = img_weights.sum(-3)
batch_gray_weighted = batch_weights.sum(-3)
batch_weights.shape, batch_t.shape, unsqueezed_weights.shape

(torch.Size([2, 3, 5, 5]), torch.Size([2, 3, 5, 5]), torch.Size([3, 1, 1]))

Because this gets messy quickly and for the sake of efficiency the PyTorch function `einsum` (adapted from NumPy) specifies an indexing mini-language 2 giving index names to dimensions for sums of such products. As often in Python, broadcasting—a form of summarizing unnamed things—is done using three dots '`...`' ; but don’t worry too much about `einsum` , because we will not use it in the following:

In [32]:
img_gray_weighted_fancy = torch.einsum('...chw,c->...hw', img_t, weights)
batch_gray_weighted_fancy = torch.einsum('...chw,c->...hw', batch_t, weights)
batch_gray_weighted_fancy.shape

torch.Size([2, 5, 5])

Because this gets messy quickly—and for the sake of efficiency—the PyTorch function einsum (adapted from NumPy) specifies an indexing mini-language 2 giving index names to dimensions for sums of such products. As often in Python, broadcasting—a form of summarizing unnamed things—is done using three dots '...' ; but don’t worry too much about einsum , because we will not use it in the following:

In [33]:
img_gray_weighted_fancy = torch.einsum('...chw,c->...hw', img_t, weights)
batch_gray_weighted_fancy = torch.einsum('...chw,c->...hw', batch_t, weights)
batch_gray_weighted_fancy.shape

torch.Size([2, 5, 5])