<a href="https://colab.research.google.com/github/prat8897/DL_PyTorch/blob/master/Chapter3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This chapter covers
- Understanding tensors, the basic data structure in PyTorch
- Indexing and operating on tensors
- Interoperating with NumPy multidimensional
arrays
- Moving computations to the GPU for speed

Deep learning really consists of building a system that can transform data from one representation to another.

In the context of deep learning, *tensors* refer to the generalization of vectors and matrices to an arbitrary number of dimensions. Another name for the same concept is *multidimensional array*. The dimensionality of a tensor coincides with the number of indexes used to refer to scalar values within the tensor.

PyTorch features seamless interoperability with NumPy, which brings with it first-class integration with the rest of the scientific libraries in Python, such as SciPy (www.scipy.org), Scikit-learn (https://scikit-learn .org), and Pandas (https://pandas.pydata.org).

Compared to NumPy arrays, PyTorch tensors have a few superpowers, such as the ability to perform very fast operations on graphical processing units (GPUs), distribute operations on multiple devices or machines, and keep track of the graph of computations that created them. These are all important features when implementing a modern deep learning library.

## From Python lists to PyTorch tensors

Let’s see list indexing in action so we can compare it to tensor indexing. Take a list of three numbers in Python ([code/p1ch3/1_tensors.ipynb](https://github.com/deep-learning-with-pytorch/dlwpt-code/blob/master/p1ch3/1_tensors.ipynb)):

In [38]:
a = [1.0, 2.0, 1.0]

We can access the first element of the list using the corresponding zero-based index:


In [39]:
a[0]

1.0

In [40]:
a[2] = 3.0
a

[1.0, 2.0, 3.0]

It is not unusual for simple Python programs dealing with vectors of numbers, such as the coordinates of a 2D line, to use Python lists to store the vectors. As we will see in the following chapter, using the more efficient tensor data structure, many types of data—from images to time series, and even sentences—can be represented. By defining operations over tensors, some of which we’ll explore in this chapter, we can slice and manipulate data expressively and efficiently at the same time, even from a high- level (and not particularly fast) language such as Python.

## Constructing Our First Tensors

In [41]:
import torch #imports torch module

a = torch.ones(3) #Creates a 1-dimensional tensor with size 3, filled with 1s
a

tensor([1., 1., 1.])

In [42]:
a[1]

tensor(1.)

In [43]:
float(a[1])

1.0

In [44]:
a[2] = 2.0
a

tensor([1., 1., 2.])

After importing the `torch` module, we call a function that creates a (one-dimensional) tensor of size `3` filled with the value `1.0`. We can access an element using its zero-based index or assign a new value to it. Although on the surface this example doesn’t differ much from a list of number objects, under the hood things are completely different.

## The essence of tensors

Python `lists` or `tuples` of numbers are collections of Python objects that are individually allocated in memory. PyTorch `tensors` or `NumPy arrays`, on the other hand, are views over (typically) contiguous memory blocks containing unboxed C numeric types rather than Python objects. Each element is a 32-bit (4-byte) float in this case. This means storing a 1D tensor of 1,000,000 float numbers will require exactly 4,000,000 contiguous bytes, plus a small overhead for the metadata (such as dimensions and numeric type).


Say we have a list of coordinates we’d like to use to represent a geometrical object: perhaps a 2D triangle with vertices at coordinates `(4, 1)`, `(5, 3)`, and `(2, 1)`. The example is not particularly pertinent to deep learning, but it’s easy to follow. Instead of having coordinates as numbers in a Python list, as we did earlier, we can use a one-dimensional tensor by storing Xs in the even indices and Ys in the odd indices, like this:

In [45]:
points = torch.zeros(6)
points[0] = 4.0
points[1] = 1.0
points[2] = 5.0
points[3] = 3.0
points[4] = 2.0
points[5] = 1.0

points

tensor([4., 1., 5., 3., 2., 1.])

We can also pass a Python list to the constructor, to the same effect:

In [46]:
points = torch.tensor([4.0, 1.0, 5.0, 3.0, 2.0, 1.0])
points

tensor([4., 1., 5., 3., 2., 1.])

To get the coordinates of the first point, we do the following:

In [47]:
float(points[0]), float(points[1])

(4.0, 1.0)


This is OK, although it would be practical to have the first index refer to individual 2D points rather than point coordinates. For this, we can use a 2D tensor:

In [48]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])

Here, we pass a list of lists to the constructor. We can ask the tensor about its shape:

In [49]:
points.shape


torch.Size([3, 2])

This informs us about the size of the tensor along each dimension i.e we have 2 dimensions each for a total of 3 points.


We could also use zeros or ones to initialize the tensor, providing the size as a tuple:

In [50]:
points = torch.zeros(3, 2)
points

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

Now we can access an individual element in the tensor using two indices:


In [51]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])

In [52]:
points[0, 1]

tensor(1.)

This returns the Y-coordinate of the zeroth point in our dataset. We can also access the first element in the tensor as we did before to get the 2D coordinates of the first point:

In [53]:
points[0]

tensor([4., 1.])

The output is another tensor that presents a different view of the same underlying data. The new tensor is a 1D tensor of size 2, referencing the values of the first row in the points tensor. Does this mean a new chunk of memory was allocated, values were copied into it, and the new memory was returned wrapped in a new tensor object? No, because that would be very inefficient, especially if we had millions of points.

## Indexing tensors

What if we need to obtain a tensor containing all points but the first? That’s easy using range indexing notation, which also applies to standard Python lists. Here’s a reminder:

In [54]:
some_list = list(range(6))
some_list

[0, 1, 2, 3, 4, 5]

In [55]:
some_list[:] #All elements of the list

[0, 1, 2, 3, 4, 5]

In [56]:
some_list[1:4] #From element 1 inclusive to element 4 exclusive

[1, 2, 3]

In [57]:
some_list[1:] #From element 1 inclusive to the end of the list

[1, 2, 3, 4, 5]

In [58]:
some_list[:4] #From the start of the list to element 4 exclusive

[0, 1, 2, 3]

In [59]:
some_list[:-1] #From the start of the list to one before the last element

[0, 1, 2, 3, 4]

In [60]:
some_list[1:4:2] #From element 1 inclusive to element 4 exclusive in steps of 2

[1, 3]

To achieve our goal (What if we need to obtain a tensor containing all points but the first?), we can use the same notation for PyTorch tensors, with the added benefit that, just as in NumPy and other Python scientific libraries, we can use range indexing for each of the tensor’s dimensions:

In [61]:
points

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])

In [62]:
points[1:] #All rows after the first; implicitly all columns

tensor([[5., 3.],
        [2., 1.]])

In [63]:
points[1:, :] #All rows after the first; explicitly all columns

tensor([[5., 3.],
        [2., 1.]])

In [64]:
points[1:, 0] #All rows after the first; first column

tensor([5., 2.])

In [65]:
points[None] #Adds a dimension of size 1, just like unsqueeze

tensor([[[4., 1.],
         [5., 3.],
         [2., 1.]]])

## Named tensors

The dimensions (or axes) of our tensors usually index something like pixel locations or color channels. This means when we want to index into a tensor, we need to remember the ordering of the dimensions and write our indexing accordingly. As data is transformed through multiple tensors, keeping track of which dimension contains what data can be error-prone.

To make things concrete, imagine that we have a 3D tensor like `img_t` from Chapter 2 (we will use dummy data for simplicity here), and we want to convert it to gray-scale. We looked up typical weights for the colors to derive a single brightness value:


In [70]:
img_t = torch.randn(3, 5, 5) # shape [channels, rows, columns]
weights = torch.tensor([0.2126, 0.7152, 0.0722])
img_t

tensor([[[ 2.0491, -1.7133,  0.3301, -0.4774,  1.5977],
         [-0.5554, -0.4161, -0.2039, -0.1720,  0.8689],
         [ 0.7432,  0.0708, -0.3763, -1.0310, -0.5771],
         [-1.9236, -0.7082, -0.2647, -1.5736,  0.8818],
         [ 0.6958,  0.4802,  0.6357, -1.1568,  0.9194]],

        [[-0.0711, -0.7977,  0.2972, -2.1691, -1.7855],
         [ 0.1464,  1.4155,  0.8484, -0.2839,  0.7080],
         [-1.3324, -0.2196, -1.0050, -1.0532, -0.4240],
         [ 0.6019,  1.0038, -0.6556, -1.2584,  0.2469],
         [ 0.4797, -0.3120,  0.3117,  0.8204, -0.9861]],

        [[-0.3466, -0.8451, -0.6231,  0.4115,  0.5478],
         [-0.2135, -1.2920, -1.6635,  0.1329,  0.6698],
         [-0.1089, -0.3372, -1.5829, -2.3031,  0.1435],
         [ 1.7641,  1.9216,  2.4142, -0.3526, -0.1309],
         [ 0.9444,  1.1911,  0.4640, -0.1588, -0.0162]]])

We also often want our code to generalize—for example, from grayscale images represented as 2D tensors with height and width dimensions to color images adding a third channel dimension (as in RGB), or from a single image to a batch of images. In Chapter 2, we introduced an additional batch dimension in `batch_t`; here we pretend to have a batch of 2:

In [71]:
batch_t = torch.randn(2, 3, 5, 5) # shape [batch, channels, rows, columns]
batch_t

tensor([[[[ 0.2516,  0.2138,  0.6414,  1.0603,  1.0442],
          [-1.0301,  0.7717,  1.4197,  0.7074,  2.0589],
          [ 0.0048,  1.1261,  1.2242, -2.5519, -1.2985],
          [ 0.6269, -0.0979, -0.3137, -0.3822,  0.6504],
          [ 1.5533, -0.2308, -0.2092, -0.8924, -1.8158]],

         [[-1.6189, -0.7874, -0.1779, -0.0462, -0.8193],
          [ 0.5442, -1.4447,  0.2230,  1.6417,  1.4519],
          [-0.7011, -0.2460, -0.4905, -0.0053,  0.2614],
          [ 0.0504, -1.3980, -1.3963, -1.3551, -0.7476],
          [ 0.2842, -2.8905,  0.2627, -0.9862,  0.4286]],

         [[ 2.1992, -0.7362,  0.1114, -1.2092, -0.9888],
          [ 0.1568,  0.3269, -1.3994,  0.4076, -0.4247],
          [-0.2315,  1.4396,  0.4597, -0.4249,  0.3956],
          [ 0.1637, -0.0583, -0.1873, -0.4336,  1.2129],
          [ 0.2462,  0.3260, -0.8600, -1.1604, -0.9840]]],


        [[[-0.4269,  1.3557, -0.3782,  0.5774,  0.9278],
          [ 0.1613,  1.1451, -0.5269, -0.9213, -0.5751],
          [-0.2934, -0.

So sometimes the RGB channels are in dimension 0, and sometimes they are in dimension 1. But we can generalize by counting from the end: they are always in dimension –3, the third from the end. The lazy, unweighted [mean](https://pytorch.org/docs/master/generated/torch.mean.html) (R+G+B)/3 can thus be written as follows:

In [74]:
img_gray_naive = img_t.mean(-3)
batch_gray_naive = batch_t.mean(-3)
img_gray_naive, img_gray_naive.shape

(tensor([[ 5.4377e-01, -1.1187e+00,  1.3810e-03, -7.4498e-01,  1.2001e-01],
         [-2.0752e-01, -9.7497e-02, -3.3966e-01, -1.0767e-01,  7.4889e-01],
         [-2.3270e-01, -1.6200e-01, -9.8811e-01, -1.4624e+00, -2.8586e-01],
         [ 1.4744e-01,  7.3903e-01,  4.9796e-01, -1.0615e+00,  3.3262e-01],
         [ 7.0664e-01,  4.5308e-01,  4.7048e-01, -1.6509e-01, -2.7662e-02]]),
 torch.Size([5, 5]))

In [75]:
 batch_gray_naive, batch_gray_naive.shape

(tensor([[[ 2.7729e-01, -4.3658e-01,  1.9163e-01, -6.5041e-02, -2.5462e-01],
          [-1.0970e-01, -1.1535e-01,  8.1088e-02,  9.1887e-01,  1.0287e+00],
          [-3.0925e-01,  7.7327e-01,  3.9779e-01, -9.9405e-01, -2.1381e-01],
          [ 2.8032e-01, -5.1807e-01, -6.3241e-01, -7.2363e-01,  3.7189e-01],
          [ 6.9453e-01, -9.3178e-01, -2.6883e-01, -1.0130e+00, -7.9038e-01]],
 
         [[-9.2533e-01,  3.5549e-01,  2.0077e-01, -2.2659e-01, -6.3116e-01],
          [ 6.0621e-01, -3.8013e-01, -1.3576e-03,  4.7599e-01, -1.3198e-01],
          [-6.6157e-01, -1.5290e-01,  3.8529e-02, -3.0042e-02,  1.5735e+00],
          [ 1.4536e-02, -7.8159e-01,  1.9534e-01, -6.7991e-01,  1.1842e+00],
          [-3.9573e-02, -4.6374e-01, -3.4047e-01, -1.3175e-01,  6.4945e-01]]]),
 torch.Size([2, 5, 5]))

But now we have the weight, too. PyTorch will allow us to multiply things that are the same shape, as well as shapes where one operand is of size 1 in a given dimension. It also appends leading dimensions of size 1 automatically. This is a feature called [broadcasting](https://stackoverflow.com/questions/51371070/how-does-pytorch-broadcasting-work). `batch_t` of shape `(2, 3, 5, 5)` is multiplied by *unsqueezed_weights* of shape `(3, 1, 1)`, resulting in a tensor of shape `(2, 3, 5, 5)`, from which we can then sum the third dimension from the end (the three channels):

In [87]:
weights

tensor([0.2126, 0.7152, 0.0722])

In [88]:
unsqueezed_weights = weights.unsqueeze(-1).unsqueeze_(-1)
unsqueezed_weights

tensor([[[0.2126]],

        [[0.7152]],

        [[0.0722]]])

In [89]:
img_weights = (img_t * unsqueezed_weights)
img_weights

tensor([[[ 4.3563e-01, -3.6424e-01,  7.0171e-02, -1.0149e-01,  3.3967e-01],
         [-1.1808e-01, -8.8453e-02, -4.3358e-02, -3.6562e-02,  1.8472e-01],
         [ 1.5800e-01,  1.5048e-02, -8.0008e-02, -2.1919e-01, -1.2269e-01],
         [-4.0896e-01, -1.5057e-01, -5.6280e-02, -3.3454e-01,  1.8748e-01],
         [ 1.4793e-01,  1.0208e-01,  1.3516e-01, -2.4594e-01,  1.9545e-01]],

        [[-5.0879e-02, -5.7051e-01,  2.1257e-01, -1.5513e+00, -1.2770e+00],
         [ 1.0470e-01,  1.0124e+00,  6.0680e-01, -2.0307e-01,  5.0634e-01],
         [-9.5292e-01, -1.5706e-01, -7.1881e-01, -7.5324e-01, -3.0326e-01],
         [ 4.3048e-01,  7.1788e-01, -4.6889e-01, -8.9998e-01,  1.7659e-01],
         [ 3.4306e-01, -2.2313e-01,  2.2295e-01,  5.8676e-01, -7.0528e-01]],

        [[-2.5026e-02, -6.1017e-02, -4.4991e-02,  2.9713e-02,  3.9554e-02],
         [-1.5416e-02, -9.3280e-02, -1.2010e-01,  9.5962e-03,  4.8363e-02],
         [-7.8607e-03, -2.4345e-02, -1.1429e-01, -1.6628e-01,  1.0363e-02],
        

`batch_t` of shape `(2, 3, 5, 5)` is multiplied by unsqueezed_weights of shape `(3, 1, 1)`, resulting in a tensor of shape `(2, 3, 5, 5)`

In [92]:
batch_weights = (batch_t * unsqueezed_weights)
batch_weights, batch_weights.shape

(tensor([[[[ 5.3484e-02,  4.5450e-02,  1.3635e-01,  2.2542e-01,  2.2199e-01],
           [-2.1899e-01,  1.6406e-01,  3.0182e-01,  1.5039e-01,  4.3772e-01],
           [ 1.0207e-03,  2.3942e-01,  2.6027e-01, -5.4254e-01, -2.7605e-01],
           [ 1.3327e-01, -2.0808e-02, -6.6690e-02, -8.1265e-02,  1.3827e-01],
           [ 3.3022e-01, -4.9062e-02, -4.4472e-02, -1.8973e-01, -3.8603e-01]],
 
          [[-1.1578e+00, -5.6311e-01, -1.2725e-01, -3.3027e-02, -5.8594e-01],
           [ 3.8920e-01, -1.0332e+00,  1.5948e-01,  1.1741e+00,  1.0384e+00],
           [-5.0140e-01, -1.7592e-01, -3.5083e-01, -3.8208e-03,  1.8698e-01],
           [ 3.6014e-02, -9.9988e-01, -9.9861e-01, -9.6915e-01, -5.3471e-01],
           [ 2.0325e-01, -2.0673e+00,  1.8789e-01, -7.0533e-01,  3.0653e-01]],
 
          [[ 1.5878e-01, -5.3151e-02,  8.0466e-03, -8.7307e-02, -7.1388e-02],
           [ 1.1321e-02,  2.3604e-02, -1.0104e-01,  2.9427e-02, -3.0663e-02],
           [-1.6714e-02,  1.0394e-01,  3.3189e-02, -3.0675

from which we can then sum the third dimension from the end (the three channels):

In [93]:
img_gray_weighted = img_weights.sum(-3)
img_gray_weighted, img_gray_weighted.shape

(tensor([[ 0.3597, -0.9958,  0.2378, -1.6231, -0.8978],
         [-0.0288,  0.8307,  0.4433, -0.2300,  0.7394],
         [-0.8028, -0.1664, -0.9131, -1.1387, -0.4156],
         [ 0.1489,  0.7060, -0.3509, -1.2600,  0.3546],
         [ 0.5592, -0.0351,  0.3916,  0.3293, -0.5110]]), torch.Size([5, 5]))

In [94]:
batch_gray_weighted = batch_weights.sum(-3)
batch_gray_weighted, batch_gray_weighted.shape

(tensor([[[-0.9456, -0.5708,  0.0171,  0.1051, -0.4353],
          [ 0.1815, -0.8456,  0.3603,  1.3539,  1.4455],
          [-0.5171,  0.1674, -0.0574, -0.5770, -0.0605],
          [ 0.1811, -1.0249, -1.0788, -1.0817, -0.3089],
          [ 0.5512, -2.0928,  0.0813, -0.9788, -0.1505]],
 
         [[-1.0268,  0.1784,  0.5849, -0.2297, -1.3860],
          [ 1.2636, -0.8190, -0.5621,  0.5909, -0.5432],
          [-0.8169,  0.4074,  0.4215, -0.7123,  1.3442],
          [-0.3152, -0.7717,  0.3355, -0.8199,  1.4382],
          [ 0.2585, -0.3035, -0.7519, -0.1126,  0.4799]]]),
 torch.Size([2, 5, 5]))

In [95]:
batch_weights.shape, batch_t.shape, unsqueezed_weights.shape

(torch.Size([2, 3, 5, 5]), torch.Size([2, 3, 5, 5]), torch.Size([3, 1, 1]))

Because this gets messy quickly—and for the sake of efficiency—the PyTorch function `einsum` (adapted from NumPy) specifies an indexing mini-language giving index names to dimensions for sums of such products. As often in Python, broadcasting—a form of summarizing unnamed things—is done using three dots '...'; but don’t worry too much about `einsum`, because we will not use it in the following:

In [96]:
img_gray_weighted_fancy = torch.einsum('...chw,c->...hw', img_t, weights)
batch_gray_weighted_fancy = torch.einsum('...chw,c->...hw', batch_t, weights)
batch_gray_weighted_fancy.shape

torch.Size([2, 5, 5])

As we can see, there is quite a lot of bookkeeping involved. This is error-prone, especially when the locations where tensors are created and used are far apart in our code. This has caught the eye of practitioners, and so it has been suggested that the dimension be given a name instead.
PyTorch 1.3 added named tensors as an experimental feature (see https://pytorch.org/tutorials/intermediate/named_tensor_tutorial.html and https://pytorch.org/docs/stable/named_tensor.html). Tensor factory functions such as `tensor` and `rand` take a `names` argument. The names should be a sequence of strings:

In [97]:
weights_named = torch.tensor([0.2126, 0.7152, 0.0722], names=['channels'])
weights_named



tensor([0.2126, 0.7152, 0.0722], names=('channels',))

Note: PyTorch recommends that Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable.

When we already have a tensor and want to add names (but not change existing ones), we can call the method `refine_names` on it. Similar to indexing, the ellipsis (...) allows you to leave out any number of dimensions. With the rename sibling method, you can also overwrite or drop (by passing in None) existing names:


In [99]:
img_named = img_t.refine_names(..., 'channels', 'rows', 'columns')
batch_named = batch_t.refine_names(..., 'channels', 'rows', 'columns')
print("img named:", img_named.shape, img_named.names)
print("batch named:", batch_named.shape, batch_named.names)

img named: torch.Size([3, 5, 5]) ('channels', 'rows', 'columns')
batch named: torch.Size([2, 3, 5, 5]) (None, 'channels', 'rows', 'columns')


For operations with two inputs, in addition to the usual dimension checks—whether sizes are the same, or if one is 1 and can be broadcast to the other—PyTorch will now check the names for us. So far, it does not automatically align dimensions, so we need to do this explicitly. The method `align_as` returns a tensor with missing dimensions added and existing ones permuted to the right order:

In [100]:
weights_aligned = weights_named.align_as(img_named)
weights_aligned.shape, weights_aligned.names

(torch.Size([3, 1, 1]), ('channels', 'rows', 'columns'))

Functions accepting dimension arguments, like `sum`, also take named dimensions:

In [101]:
gray_named = (img_named * weights_aligned).sum('channels')
gray_named.shape, gray_named.names

(torch.Size([5, 5]), ('rows', 'columns'))

If we try to combine dimensions with different names, we get an error:

In [102]:
gray_named = (img_named[..., :3] * weights_named).sum('channels')

RuntimeError: ignored

If we want to use tensors outside functions that operate on named tensors, we need to drop the names by renaming them to None. The following gets us back into the world of unnamed dimensions:

In [103]:
gray_plain = gray_named.rename(None)
gray_plain.shape, gray_plain.names

(torch.Size([5, 5]), (None, None))


Given the experimental nature of this feature at the time of writing, and to avoid mucking around with indexing and alignment, we will stick to unnamed in the remainder of the book. Named tensors have the potential to eliminate many sources of alignment errors, which—if the PyTorch forum is any indication—can be a source of headaches. It will be interesting to see how widely they will be adopted.

## Tensor Element Types

So far, we have covered the basics of how tensors work, but we have not yet touched on what kinds of numeric types we can store in a Tensor. As we hinted at in section 3.2, using the standard Python numeric types can be suboptimal for several reasons:
- Numbers in Python are objects. Whereas a floating-point number might require only, for instance, 32 bits to be represented on a computer, Python will convert it into a full-fledged Python object with reference counting, and so on. This operation, called boxing, is not a problem if we need to store a small number of numbers, but allocating millions gets very inefficient.
- Lists in Python are meant for sequential collections of objects. There are no operations defined for, say, efficiently taking the dot product of two vectors, or summing vec- tors together. Also, Python lists have no way of optimizing the layout of their con- tents in memory, as they are indexable collections of pointers to Python objects (of any kind, not just numbers). Finally, Python lists are one-dimensional, and although we can create lists of lists, this is again very inefficient.
- The Python interpreter is slow compared to optimized, compiled code. Performing math- ematical operations on large collections of numerical data can be much faster using optimized code written in a compiled, low-level language like C.

For these reasons, data science libraries rely on NumPy or introduce dedicated data structures like PyTorch tensors, which provide efficient low-level implementations of numerical data structures and related operations on them, wrapped in a convenient high-level API. To enable this, the objects within a tensor must all be numbers of the same type, and PyTorch must keep track of this numeric type.

### Specifying the numeric type with dtype


The `dtype` argument to tensor constructors (that is, functions like tensor, zeros, and ones) specifies the numerical data (d) type that will be contained in the tensor. The data type specifies the possible values the tensor can hold (integers versus floating-point numbers) and the number of bytes per value. The `dtype` argument is deliberately similar to the standard NumPy argument of the same name. Here's a list of all the possible values for the `dtype` argument:
- `torch.float32` or `torch.float`: 32-bit floating-point
- `torch.float64` or `torch.double`: 64-bit, double-precision floating-point
- `torch.float16` or `torch.half`: 16-bit, half-precision floating-point
- `torch.int8`: signed 8-bit integers
- `torch.uint8`: unsigned 8-bit integers
- `torch.int16` or `torch.short`: signed 16-bit integers
- `torch.int32` or `torch.int`: signed 32-bit integers
- `torch.int64` or `torch.long`: signed 64-bit integers
- `torch.bool`: Boolean

The default data type for tensors is 32-bit floating-point.

### A `dtype` for every occasion

As we will see in future chapters, computations happening in neural networks are typically executed with 32-bit floating-point precision. Higher precision, like 64-bit, will not buy improvements in the accuracy of a model and will require more memory and computing time. The 16-bit floating-point, half-precision data type is not present natively in standard CPUs, but it is offered on modern GPUs. It is possible to switch to half-precision to decrease the footprint of a neural network model if needed, with a minor impact on accuracy.


Tensors can be used as indexes in other tensors. In this case, PyTorch expects indexing tensors to have a 64-bit integer data type. Creating a tensor with integers as arguments, such as using `torch.tensor([2, 2])`, will create a 64-bit integer tensor by default. As such, we’ll spend most of our time dealing with `float32` and `int64`.

Finally, predicates on tensors, such as `points > 1.0`, produce `bool` tensors indicating whether each individual element satisfies the condition. These are the numeric types in a nutshell.

### Managing a tensor’s `dtype` attribute

In order to allocate a tensor of the right numeric type, we can specify the proper `dtype` as an argument to the constructor. For example:

In [104]:
double_points = torch.ones(10, 2, dtype=torch.double)
short_points = torch.tensor([[1, 2], [3, 4]], dtype=torch.short)

We can find out about the dtype for a tensor by accessing the corresponding attribute:

In [105]:
short_points.dtype

torch.int16

We can also cast the output of a tensor creation function to the right type using the corresponding casting method, such as

In [107]:
double_points = torch.zeros(10, 2).double()
short_points = torch.ones(10, 2).short()
double_points.dtype, short_points.dtype

(torch.float64, torch.int16)

or the more convenient to method:

In [108]:
double_points = torch.zeros(10, 2).to(torch.double)
short_points = torch.ones(10, 2).to(dtype=torch.short)
double_points.dtype, short_points.dtype

(torch.float64, torch.int16)

## The tensor API

It is worth taking a look at the tensor operations that PyTorch offers. It would be of little use to list them all here. Instead, we’re going to get a general feel for the API and establish a few directions on where to find things in the online documentation at http://pytorch.org/docs.

First, the vast majority of operations on and between tensors are available in the torch module and can also be called as methods of a tensor object.

For instance, the transpose function we encountered earlier can be used from the torch module:

In [110]:
a = torch.ones(3, 2)
a_t = torch.transpose(a, 0, 1)
a.shape, a_t.shape

(torch.Size([3, 2]), torch.Size([2, 3]))

or as a method of the a tensor:

In [111]:
a = torch.ones(3, 2)
a_t = a.transpose(0, 1)
a.shape, a_t.shape

(torch.Size([3, 2]), torch.Size([2, 3]))

There is no difference between the two forms; they can be used interchangeably.

The PyTorch docs are exhaustive and well organized, with the tensor operations divided into groups:

- *Creation ops* — Functions for constructing a tensor, like ones and from_numpy
- *Indexing, slicing, joining, mutating ops*—Functions for changing the shape, stride, or content of a tensor, like transpose
- *Math ops*—Functions for manipulating the content of the tensor through computations

    * *Pointwise ops*-Functions for obtaining a new tensor by applying a function to each element independently, like abs and cos.

    * *Reduction ops*-Functions for computing aggregate values by iterating through tensors, like mean, std, and norm
    * Comparison ops—Functions for evaluating numerical predicates over tensors, like equal and max

    * *Spectral ops*—Functions for transforming in and operating in the frequency domain, like stft and hamming_window

    * *Other operations* —Special functions operating on vectors, like cross, or matrices, like trace

    * *BLAS and LAPACK operations*—Functions following the Basic Linear Algebra Subprograms (BLAS) specification for scalar, vector-vector, matrix-vector, and matrix-matrix operations

- *Random sampling*—Functions for generating values by drawing randomly from
probability distributions, like randn and normal
- *Serialization* —Functions for saving and loading tensors, like load and save
- Parallelism—Functions for controlling the number of threads for parallel CPU
execution, like set_num_threads