<a href="https://colab.research.google.com/github/prat8897/DL_PyTorch/blob/master/Chapter3.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This chapter covers
- Understanding tensors, the basic data structure in PyTorch
- Indexing and operating on tensors
- Interoperating with NumPy multidimensional
arrays
- Moving computations to the GPU for speed

Deep learning really consists of building a system that can transform data from one representation to another.

In the context of deep learning, *tensors* refer to the generalization of vectors and matrices to an arbitrary number of dimensions. Another name for the same concept is *multidimensional array*. The dimensionality of a tensor coincides with the number of indexes used to refer to scalar values within the tensor.

PyTorch features seamless interoperability with NumPy, which brings with it first-class integration with the rest of the scientific libraries in Python, such as SciPy (www.scipy.org), Scikit-learn (https://scikit-learn .org), and Pandas (https://pandas.pydata.org).

Compared to NumPy arrays, PyTorch tensors have a few superpowers, such as the ability to perform very fast operations on graphical processing units (GPUs), distribute operations on multiple devices or machines, and keep track of the graph of computations that created them. These are all important features when implementing a modern deep learning library.

## From Python lists to PyTorch tensors

Let’s see list indexing in action so we can compare it to tensor indexing. Take a list of three numbers in Python ([code/p1ch3/1_tensors.ipynb](https://github.com/deep-learning-with-pytorch/dlwpt-code/blob/master/p1ch3/1_tensors.ipynb)):

In [47]:
a = [1.0, 2.0, 1.0]

We can access the first element of the list using the corresponding zero-based index:


In [48]:
a[0]

1.0

In [49]:
a[2] = 3.0
a

[1.0, 2.0, 3.0]

It is not unusual for simple Python programs dealing with vectors of numbers, such as the coordinates of a 2D line, to use Python lists to store the vectors. As we will see in the following chapter, using the more efficient tensor data structure, many types of data—from images to time series, and even sentences—can be represented. By defining operations over tensors, some of which we’ll explore in this chapter, we can slice and manipulate data expressively and efficiently at the same time, even from a high- level (and not particularly fast) language such as Python.

## Constructing Our First Tensors

In [50]:
import torch #imports torch module

a = torch.ones(3) #Creates a 1-dimensional tensor with size 3, filled with 1s
a

tensor([1., 1., 1.])

In [51]:
a[1]

tensor(1.)

In [52]:
float(a[1])

1.0

In [53]:
a[2] = 2.0
a

tensor([1., 1., 2.])

After importing the `torch` module, we call a function that creates a (one-dimensional) tensor of size `3` filled with the value `1.0`. We can access an element using its zero-based index or assign a new value to it. Although on the surface this example doesn’t differ much from a list of number objects, under the hood things are completely different.

## The essence of tensors

Python `lists` or `tuples` of numbers are collections of Python objects that are individually allocated in memory. PyTorch `tensors` or `NumPy arrays`, on the other hand, are views over (typically) contiguous memory blocks containing unboxed C numeric types rather than Python objects. Each element is a 32-bit (4-byte) float in this case. This means storing a 1D tensor of 1,000,000 float numbers will require exactly 4,000,000 contiguous bytes, plus a small overhead for the metadata (such as dimensions and numeric type).


Say we have a list of coordinates we’d like to use to represent a geometrical object: perhaps a 2D triangle with vertices at coordinates `(4, 1)`, `(5, 3)`, and `(2, 1)`. The example is not particularly pertinent to deep learning, but it’s easy to follow. Instead of having coordinates as numbers in a Python list, as we did earlier, we can use a one-dimensional tensor by storing Xs in the even indices and Ys in the odd indices, like this:

In [54]:
points = torch.zeros(6)
points[0] = 4.0
points[1] = 1.0
points[2] = 5.0
points[3] = 3.0
points[4] = 2.0
points[5] = 1.0

points

tensor([4., 1., 5., 3., 2., 1.])

We can also pass a Python list to the constructor, to the same effect:

In [55]:
points = torch.tensor([4.0, 1.0, 5.0, 3.0, 2.0, 1.0])
points

tensor([4., 1., 5., 3., 2., 1.])

To get the coordinates of the first point, we do the following:

In [56]:
float(points[0]), float(points[1])

(4.0, 1.0)


This is OK, although it would be practical to have the first index refer to individual 2D points rather than point coordinates. For this, we can use a 2D tensor:

In [57]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])

Here, we pass a list of lists to the constructor. We can ask the tensor about its shape:

In [58]:
points.shape


torch.Size([3, 2])

This informs us about the size of the tensor along each dimension i.e we have 2 dimensions each for a total of 3 points.


We could also use zeros or ones to initialize the tensor, providing the size as a tuple:

In [59]:
points = torch.zeros(3, 2)
points

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

Now we can access an individual element in the tensor using two indices:


In [60]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])

In [61]:
points[0, 1]

tensor(1.)

This returns the Y-coordinate of the zeroth point in our dataset. We can also access the first element in the tensor as we did before to get the 2D coordinates of the first point:

In [62]:
points[0]

tensor([4., 1.])

The output is another tensor that presents a different view of the same underlying data. The new tensor is a 1D tensor of size 2, referencing the values of the first row in the points tensor. Does this mean a new chunk of memory was allocated, values were copied into it, and the new memory was returned wrapped in a new tensor object? No, because that would be very inefficient, especially if we had millions of points.

## Indexing tensors

What if we need to obtain a tensor containing all points but the first? That’s easy using range indexing notation, which also applies to standard Python lists. Here’s a reminder:

In [63]:
some_list = list(range(6))
some_list

[0, 1, 2, 3, 4, 5]

In [64]:
some_list[:] #All elements of the list

[0, 1, 2, 3, 4, 5]

In [65]:
some_list[1:4] #From element 1 inclusive to element 4 exclusive

[1, 2, 3]

In [66]:
some_list[1:] #From element 1 inclusive to the end of the list

[1, 2, 3, 4, 5]

In [67]:
some_list[:4] #From the start of the list to element 4 exclusive

[0, 1, 2, 3]

In [68]:
some_list[:-1] #From the start of the list to one before the last element

[0, 1, 2, 3, 4]

In [69]:
some_list[1:4:2] #From element 1 inclusive to element 4 exclusive in steps of 2

[1, 3]

To achieve our goal (What if we need to obtain a tensor containing all points but the first?), we can use the same notation for PyTorch tensors, with the added benefit that, just as in NumPy and other Python scientific libraries, we can use range indexing for each of the tensor’s dimensions:

In [70]:
points

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])

In [71]:
points[1:] #All rows after the first; implicitly all columns

tensor([[5., 3.],
        [2., 1.]])

In [72]:
points[1:, :] #All rows after the first; explicitly all columns

tensor([[5., 3.],
        [2., 1.]])

In [73]:
points[1:, 0] #All rows after the first; first column

tensor([5., 2.])

In [74]:
points[None] #Adds a dimension of size 1, just like unsqueeze

tensor([[[4., 1.],
         [5., 3.],
         [2., 1.]]])

## Named tensors

The dimensions (or axes) of our tensors usually index something like pixel locations or color channels. This means when we want to index into a tensor, we need to remember the ordering of the dimensions and write our indexing accordingly. As data is transformed through multiple tensors, keeping track of which dimension contains what data can be error-prone.

To make things concrete, imagine that we have a 3D tensor like `img_t` from Chapter 2 (we will use dummy data for simplicity here), and we want to convert it to gray-scale. We looked up typical weights for the colors to derive a single brightness value:


In [75]:
img_t = torch.randn(3, 5, 5) # shape [channels, rows, columns]
weights = torch.tensor([0.2126, 0.7152, 0.0722])
img_t

tensor([[[-0.4908, -0.9236, -1.8775, -0.1125,  0.3821],
         [ 1.7600,  1.9684,  0.0066, -1.1974,  2.4375],
         [-1.9211, -0.4184, -1.6118, -0.3680, -1.1099],
         [-2.0732,  0.2650, -0.1579, -1.2324, -0.1139],
         [-0.8073,  1.1427, -0.5344,  0.2231,  0.9041]],

        [[-0.2148, -0.3572, -0.6933,  0.6115, -0.3495],
         [-1.3418, -1.8093, -0.1345, -0.1675,  0.9695],
         [ 0.3177,  0.3681,  1.6774,  2.0129, -0.6489],
         [ 2.0833,  0.2066,  0.2774, -0.4641,  2.6546],
         [ 0.0495, -0.9592,  1.2016,  1.0830,  1.2897]],

        [[-0.2775, -2.1811, -0.9999,  0.1867, -1.6738],
         [-1.1349,  0.2197, -0.0189, -0.2675,  1.5478],
         [ 0.6625, -0.0145, -1.5767, -1.5972,  0.8957],
         [-0.6628,  0.3079,  0.2111, -2.1949,  0.1105],
         [ 0.6951,  0.7535,  1.0956, -0.9477,  1.9169]]])

We also often want our code to generalize—for example, from grayscale images represented as 2D tensors with height and width dimensions to color images adding a third channel dimension (as in RGB), or from a single image to a batch of images. In Chapter 2, we introduced an additional batch dimension in `batch_t`; here we pretend to have a batch of 2:

In [76]:
batch_t = torch.randn(2, 3, 5, 5) # shape [batch, channels, rows, columns]
batch_t

tensor([[[[-2.2263, -2.0457, -0.6210,  0.5475, -1.7689],
          [-2.1972, -0.1958, -0.6730,  1.6901,  0.3577],
          [ 0.7147, -0.6981,  1.1645,  0.1658, -1.1968],
          [-1.3056, -0.5509, -0.9069,  0.3316,  0.2204],
          [-0.3466, -0.5804, -1.1529, -0.0614,  0.4614]],

         [[ 0.0676,  0.3888, -0.9560, -1.9303, -1.9234],
          [-0.8344, -0.8868, -0.1357, -0.1018,  1.3273],
          [-1.6775,  0.0770, -0.4906, -0.5011, -0.1115],
          [-0.3862,  0.5039, -1.3555, -1.6612, -0.2368],
          [-0.1960,  1.2312, -0.9096, -1.5027, -0.3757]],

         [[ 1.1961, -2.2081, -0.7627, -0.4595,  0.2581],
          [ 1.5618,  0.3051, -0.0406, -1.3396,  0.1423],
          [-0.3616, -1.6760,  0.7677,  1.4149, -0.9568],
          [ 0.4904, -1.0562, -0.5624, -0.8620,  0.0377],
          [-0.2996, -0.4733, -1.4669,  0.2975, -0.1048]]],


        [[[ 0.0209,  0.1161,  0.0762,  0.0722, -1.4618],
          [-1.0744, -0.7705,  0.2472, -0.0775,  0.2777],
          [-1.4294, -0.

So sometimes the RGB channels are in dimension 0, and sometimes they are in dimension 1. But we can generalize by counting from the end: they are always in dimension –3, the third from the end. The lazy, unweighted [mean](https://pytorch.org/docs/master/generated/torch.mean.html) (R+G+B)/3 can thus be written as follows:

In [77]:
img_gray_naive = img_t.mean(-3)
batch_gray_naive = batch_t.mean(-3)
img_gray_naive, img_gray_naive.shape

(tensor([[-0.3277, -1.1539, -1.1902,  0.2286, -0.5471],
         [-0.2389,  0.1263, -0.0489, -0.5442,  1.6516],
         [-0.3136, -0.0216, -0.5037,  0.0159, -0.2877],
         [-0.2175,  0.2598,  0.1102, -1.2971,  0.8837],
         [-0.0209,  0.3123,  0.5876,  0.1195,  1.3702]]), torch.Size([5, 5]))

In [78]:
 batch_gray_naive, batch_gray_naive.shape

(tensor([[[-0.3209, -1.2883, -0.7799, -0.6141, -1.1447],
          [-0.4899, -0.2592, -0.2831,  0.0829,  0.6091],
          [-0.4414, -0.7657,  0.4806,  0.3599, -0.7550],
          [-0.4005, -0.3678, -0.9416, -0.7305,  0.0071],
          [-0.2807,  0.0592, -1.1765, -0.4222, -0.0064]],
 
         [[-0.7560,  0.0222,  0.3546, -0.0079,  0.0919],
          [-0.3747,  0.0023, -0.5811,  0.4492,  0.3752],
          [-0.8400, -0.2472,  0.0501,  0.1214,  0.5874],
          [ 0.6574, -0.4401, -0.0478, -0.3889,  0.1018],
          [ 0.0660,  0.3989, -0.5814,  0.1007, -0.2319]]]),
 torch.Size([2, 5, 5]))

But now we have the weight, too. PyTorch will allow us to multiply things that are the same shape, as well as shapes where one operand is of size 1 in a given dimension. It also appends leading dimensions of size 1 automatically. This is a feature called [broadcasting](https://stackoverflow.com/questions/51371070/how-does-pytorch-broadcasting-work). `batch_t` of shape `(2, 3, 5, 5)` is multiplied by *unsqueezed_weights* of shape `(3, 1, 1)`, resulting in a tensor of shape `(2, 3, 5, 5)`, from which we can then sum the third dimension from the end (the three channels):

In [79]:
weights

tensor([0.2126, 0.7152, 0.0722])

In [80]:
unsqueezed_weights = weights.unsqueeze(-1).unsqueeze_(-1)
unsqueezed_weights

tensor([[[0.2126]],

        [[0.7152]],

        [[0.0722]]])

In [81]:
img_weights = (img_t * unsqueezed_weights)
img_weights

tensor([[[-1.0435e-01, -1.9635e-01, -3.9916e-01, -2.3924e-02,  8.1233e-02],
         [ 3.7418e-01,  4.1847e-01,  1.3974e-03, -2.5457e-01,  5.1820e-01],
         [-4.0843e-01, -8.8961e-02, -3.4267e-01, -7.8228e-02, -2.3596e-01],
         [-4.4076e-01,  5.6331e-02, -3.3562e-02, -2.6202e-01, -2.4212e-02],
         [-1.7163e-01,  2.4293e-01, -1.1361e-01,  4.7432e-02,  1.9222e-01]],

        [[-1.5363e-01, -2.5548e-01, -4.9585e-01,  4.3736e-01, -2.4994e-01],
         [-9.5968e-01, -1.2940e+00, -9.6168e-02, -1.1982e-01,  6.9342e-01],
         [ 2.2722e-01,  2.6323e-01,  1.1997e+00,  1.4396e+00, -4.6411e-01],
         [ 1.4900e+00,  1.4773e-01,  1.9842e-01, -3.3189e-01,  1.8986e+00],
         [ 3.5423e-02, -6.8604e-01,  8.5939e-01,  7.7458e-01,  9.2238e-01]],

        [[-2.0034e-02, -1.5747e-01, -7.2195e-02,  1.3483e-02, -1.2085e-01],
         [-8.1943e-02,  1.5865e-02, -1.3617e-03, -1.9317e-02,  1.1175e-01],
         [ 4.7832e-02, -1.0461e-03, -1.1384e-01, -1.1532e-01,  6.4671e-02],
        

`batch_t` of shape `(2, 3, 5, 5)` is multiplied by unsqueezed_weights of shape `(3, 1, 1)`, resulting in a tensor of shape `(2, 3, 5, 5)`

In [82]:
batch_weights = (batch_t * unsqueezed_weights)
batch_weights, batch_weights.shape

(tensor([[[[-0.4733, -0.4349, -0.1320,  0.1164, -0.3761],
           [-0.4671, -0.0416, -0.1431,  0.3593,  0.0760],
           [ 0.1520, -0.1484,  0.2476,  0.0353, -0.2544],
           [-0.2776, -0.1171, -0.1928,  0.0705,  0.0469],
           [-0.0737, -0.1234, -0.2451, -0.0131,  0.0981]],
 
          [[ 0.0483,  0.2781, -0.6837, -1.3806, -1.3756],
           [-0.5968, -0.6343, -0.0970, -0.0728,  0.9493],
           [-1.1997,  0.0551, -0.3509, -0.3584, -0.0797],
           [-0.2762,  0.3604, -0.9694, -1.1881, -0.1694],
           [-0.1402,  0.8806, -0.6506, -1.0747, -0.2687]],
 
          [[ 0.0864, -0.1594, -0.0551, -0.0332,  0.0186],
           [ 0.1128,  0.0220, -0.0029, -0.0967,  0.0103],
           [-0.0261, -0.1210,  0.0554,  0.1022, -0.0691],
           [ 0.0354, -0.0763, -0.0406, -0.0622,  0.0027],
           [-0.0216, -0.0342, -0.1059,  0.0215, -0.0076]]],
 
 
         [[[ 0.0045,  0.0247,  0.0162,  0.0154, -0.3108],
           [-0.2284, -0.1638,  0.0525, -0.0165,  0.0590],
  

from which we can then sum the third dimension from the end (the three channels):

In [83]:
img_gray_weighted = img_weights.sum(-3)
img_gray_weighted, img_gray_weighted.shape

(tensor([[-0.2780, -0.6093, -0.9672,  0.4269, -0.2896],
         [-0.6674, -0.8597, -0.0961, -0.3937,  1.3234],
         [-0.1334,  0.1732,  0.7432,  1.2461, -0.6354],
         [ 1.0014,  0.2263,  0.1801, -0.7524,  1.8823],
         [-0.0860, -0.3887,  0.8249,  0.7536,  1.2530]]), torch.Size([5, 5]))

In [84]:
batch_gray_weighted = batch_weights.sum(-3)
batch_gray_weighted, batch_gray_weighted.shape

(tensor([[[-0.3386, -0.3162, -0.8708, -1.2973, -1.7330],
          [-0.9511, -0.6539, -0.2430,  0.1898,  1.0356],
          [-1.0739, -0.2143, -0.0478, -0.2210, -0.4033],
          [-0.5184,  0.1670, -1.2029, -1.1798, -0.1198],
          [-0.2355,  0.7230, -1.0016, -1.0663, -0.1782]],
 
         [[-0.9814,  0.5713, -0.0379,  0.4126, -0.0786],
          [ 0.1226,  0.2100, -1.5838,  0.8498,  0.4202],
          [-1.4340, -0.1109,  0.9596, -0.1831,  0.8977],
          [ 0.3783, -0.5823, -0.7482, -0.9408,  0.2297],
          [ 0.9486,  0.0388, -1.4439, -1.3740,  0.0971]]]),
 torch.Size([2, 5, 5]))

In [85]:
batch_weights.shape, batch_t.shape, unsqueezed_weights.shape

(torch.Size([2, 3, 5, 5]), torch.Size([2, 3, 5, 5]), torch.Size([3, 1, 1]))

Because this gets messy quickly—and for the sake of efficiency—the PyTorch function `einsum` (adapted from NumPy) specifies an indexing mini-language giving index names to dimensions for sums of such products. As often in Python, broadcasting—a form of summarizing unnamed things—is done using three dots '...'; but don’t worry too much about `einsum`, because we will not use it in the following:

In [86]:
img_gray_weighted_fancy = torch.einsum('...chw,c->...hw', img_t, weights)
batch_gray_weighted_fancy = torch.einsum('...chw,c->...hw', batch_t, weights)
batch_gray_weighted_fancy.shape

torch.Size([2, 5, 5])

As we can see, there is quite a lot of bookkeeping involved. This is error-prone, especially when the locations where tensors are created and used are far apart in our code. This has caught the eye of practitioners, and so it has been suggested that the dimension be given a name instead.
PyTorch 1.3 added named tensors as an experimental feature (see https://pytorch.org/tutorials/intermediate/named_tensor_tutorial.html and https://pytorch.org/docs/stable/named_tensor.html). Tensor factory functions such as `tensor` and `rand` take a `names` argument. The names should be a sequence of strings:

In [87]:
weights_named = torch.tensor([0.2126, 0.7152, 0.0722], names=['channels'])
weights_named

tensor([0.2126, 0.7152, 0.0722], names=('channels',))

Note: PyTorch recommends that Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable.

When we already have a tensor and want to add names (but not change existing ones), we can call the method `refine_names` on it. Similar to indexing, the ellipsis (...) allows you to leave out any number of dimensions. With the rename sibling method, you can also overwrite or drop (by passing in None) existing names:


In [88]:
img_named = img_t.refine_names(..., 'channels', 'rows', 'columns')
batch_named = batch_t.refine_names(..., 'channels', 'rows', 'columns')
print("img named:", img_named.shape, img_named.names)
print("batch named:", batch_named.shape, batch_named.names)

img named: torch.Size([3, 5, 5]) ('channels', 'rows', 'columns')
batch named: torch.Size([2, 3, 5, 5]) (None, 'channels', 'rows', 'columns')


For operations with two inputs, in addition to the usual dimension checks—whether sizes are the same, or if one is 1 and can be broadcast to the other—PyTorch will now check the names for us. So far, it does not automatically align dimensions, so we need to do this explicitly. The method `align_as` returns a tensor with missing dimensions added and existing ones permuted to the right order:

In [89]:
weights_aligned = weights_named.align_as(img_named)
weights_aligned.shape, weights_aligned.names

(torch.Size([3, 1, 1]), ('channels', 'rows', 'columns'))

Functions accepting dimension arguments, like `sum`, also take named dimensions:

In [90]:
gray_named = (img_named * weights_aligned).sum('channels')
gray_named.shape, gray_named.names

(torch.Size([5, 5]), ('rows', 'columns'))

If we try to combine dimensions with different names, we get an error:

In [91]:
#gray_named = (img_named[..., :3] * weights_named).sum('channels')

If we want to use tensors outside functions that operate on named tensors, we need to drop the names by renaming them to None. The following gets us back into the world of unnamed dimensions:

In [92]:
gray_plain = gray_named.rename(None)
gray_plain.shape, gray_plain.names

(torch.Size([5, 5]), (None, None))


Given the experimental nature of this feature at the time of writing, and to avoid mucking around with indexing and alignment, we will stick to unnamed in the remainder of the book. Named tensors have the potential to eliminate many sources of alignment errors, which—if the PyTorch forum is any indication—can be a source of headaches. It will be interesting to see how widely they will be adopted.

## Tensor Element Types

So far, we have covered the basics of how tensors work, but we have not yet touched on what kinds of numeric types we can store in a Tensor. As we hinted at in section 3.2, using the standard Python numeric types can be suboptimal for several reasons:
- Numbers in Python are objects. Whereas a floating-point number might require only, for instance, 32 bits to be represented on a computer, Python will convert it into a full-fledged Python object with reference counting, and so on. This operation, called boxing, is not a problem if we need to store a small number of numbers, but allocating millions gets very inefficient.
- Lists in Python are meant for sequential collections of objects. There are no operations defined for, say, efficiently taking the dot product of two vectors, or summing vec- tors together. Also, Python lists have no way of optimizing the layout of their con- tents in memory, as they are indexable collections of pointers to Python objects (of any kind, not just numbers). Finally, Python lists are one-dimensional, and although we can create lists of lists, this is again very inefficient.
- The Python interpreter is slow compared to optimized, compiled code. Performing math- ematical operations on large collections of numerical data can be much faster using optimized code written in a compiled, low-level language like C.

For these reasons, data science libraries rely on NumPy or introduce dedicated data structures like PyTorch tensors, which provide efficient low-level implementations of numerical data structures and related operations on them, wrapped in a convenient high-level API. To enable this, the objects within a tensor must all be numbers of the same type, and PyTorch must keep track of this numeric type.

### Specifying the numeric type with dtype


The `dtype` argument to tensor constructors (that is, functions like tensor, zeros, and ones) specifies the numerical data (d) type that will be contained in the tensor. The data type specifies the possible values the tensor can hold (integers versus floating-point numbers) and the number of bytes per value. The `dtype` argument is deliberately similar to the standard NumPy argument of the same name. Here's a list of all the possible values for the `dtype` argument:
- `torch.float32` or `torch.float`: 32-bit floating-point
- `torch.float64` or `torch.double`: 64-bit, double-precision floating-point
- `torch.float16` or `torch.half`: 16-bit, half-precision floating-point
- `torch.int8`: signed 8-bit integers
- `torch.uint8`: unsigned 8-bit integers
- `torch.int16` or `torch.short`: signed 16-bit integers
- `torch.int32` or `torch.int`: signed 32-bit integers
- `torch.int64` or `torch.long`: signed 64-bit integers
- `torch.bool`: Boolean

The default data type for tensors is 32-bit floating-point.

### A `dtype` for every occasion

As we will see in future chapters, computations happening in neural networks are typically executed with 32-bit floating-point precision. Higher precision, like 64-bit, will not buy improvements in the accuracy of a model and will require more memory and computing time. The 16-bit floating-point, half-precision data type is not present natively in standard CPUs, but it is offered on modern GPUs. It is possible to switch to half-precision to decrease the footprint of a neural network model if needed, with a minor impact on accuracy.


Tensors can be used as indexes in other tensors. In this case, PyTorch expects indexing tensors to have a 64-bit integer data type. Creating a tensor with integers as arguments, such as using `torch.tensor([2, 2])`, will create a 64-bit integer tensor by default. As such, we’ll spend most of our time dealing with `float32` and `int64`.

Finally, predicates on tensors, such as `points > 1.0`, produce `bool` tensors indicating whether each individual element satisfies the condition. These are the numeric types in a nutshell.

### Managing a tensor’s `dtype` attribute

In order to allocate a tensor of the right numeric type, we can specify the proper `dtype` as an argument to the constructor. For example:

In [93]:
double_points = torch.ones(10, 2, dtype=torch.double)
short_points = torch.tensor([[1, 2], [3, 4]], dtype=torch.short)

We can find out about the dtype for a tensor by accessing the corresponding attribute:

In [94]:
short_points.dtype

torch.int16

We can also cast the output of a tensor creation function to the right type using the corresponding casting method, such as

In [95]:
double_points = torch.zeros(10, 2).double()
short_points = torch.ones(10, 2).short()
double_points.dtype, short_points.dtype

(torch.float64, torch.int16)

or the more convenient to method:

In [96]:
double_points = torch.zeros(10, 2).to(torch.double)
short_points = torch.ones(10, 2).to(dtype=torch.short)
double_points.dtype, short_points.dtype

(torch.float64, torch.int16)

## The tensor API

It is worth taking a look at the tensor operations that PyTorch offers. It would be of little use to list them all here. Instead, we’re going to get a general feel for the API and establish a few directions on where to find things in the online documentation at http://pytorch.org/docs.

First, the vast majority of operations on and between tensors are available in the torch module and can also be called as methods of a tensor object.

For instance, the transpose function we encountered earlier can be used from the torch module:

In [97]:
a = torch.ones(3, 2)
a_t = torch.transpose(a, 0, 1)
a.shape, a_t.shape

(torch.Size([3, 2]), torch.Size([2, 3]))

or as a method of the a tensor:

In [98]:
a = torch.ones(3, 2)
a_t = a.transpose(0, 1)
a.shape, a_t.shape

(torch.Size([3, 2]), torch.Size([2, 3]))

There is no difference between the two forms; they can be used interchangeably.

The PyTorch docs are exhaustive and well organized, with the tensor operations divided into groups:

- *Creation ops* — Functions for constructing a tensor, like ones and from_numpy
- *Indexing, slicing, joining, mutating ops*—Functions for changing the shape, stride, or content of a tensor, like transpose
- *Math ops*—Functions for manipulating the content of the tensor through computations

    * *Pointwise ops*-Functions for obtaining a new tensor by applying a function to each element independently, like abs and cos.

    * *Reduction ops*-Functions for computing aggregate values by iterating through tensors, like mean, std, and norm
    * Comparison ops—Functions for evaluating numerical predicates over tensors, like equal and max

    * *Spectral ops*—Functions for transforming in and operating in the frequency domain, like stft and hamming_window

    * *Other operations* —Special functions operating on vectors, like cross, or matrices, like trace

    * *BLAS and LAPACK operations*—Functions following the Basic Linear Algebra Subprograms (BLAS) specification for scalar, vector-vector, matrix-vector, and matrix-matrix operations

- *Random sampling*—Functions for generating values by drawing randomly from
probability distributions, like randn and normal
- *Serialization* —Functions for saving and loading tensors, like load and save
- Parallelism—Functions for controlling the number of threads for parallel CPU
execution, like set_num_threads

### Indexing into storage

Let’s see how indexing into the storage works in practice with our 2D points. The storage for a given tensor is accessible using the `.storage` property:

In [99]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points.storage()

 4.0
 1.0
 5.0
 3.0
 2.0
 1.0
[torch.FloatStorage of size 6]

Even though the tensor reports itself as having three rows and two columns, the storage under the hood is a contiguous array of size 6. In this sense, the tensor just knows how to translate a pair of indices into a location in the storage.

We can also index into a storage manually. For instance:

In [100]:
points_storage = points.storage()
points_storage[0]

4.0

In [101]:
points.storage()[1]

1.0

We can’t index a storage of a 2D tensor using two indices. The layout of a storage is always one-dimensional, regardless of the dimensionality of any and all tensors that might refer to it.

At this point, it shouldn’t come as a surprise that changing the value of a storage leads to changing the content of its referring tensor:

In [102]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
print(points)
points_storage = points.storage()
points_storage[0] = 2.0
print(points)

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])
tensor([[2., 1.],
        [5., 3.],
        [2., 1.]])


### Modifying stored values: In-place operations

In addition to the operations on tensors introduced in the previous section, a small number of operations exist only as methods of the `Tensor` object. They are recognizable from a trailing underscore in their name, like `zero_` , which indicates that the method operates in place by modifying the input instead of creating a new output tensor and returning it. For instance, the `zero_` method zeros out all the elements of the input. Any method **without** the trailing underscore leaves the source tensor unchanged and instead returns a new tensor:

In [103]:
a = torch.ones(3, 2)
a

tensor([[1., 1.],
        [1., 1.],
        [1., 1.]])

In [104]:
a.zero_()
a

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

### Tensor metadata: Size, offset, and stride

In order to index into a storage, tensors rely on a few pieces of information that, together with their storage, unequivocally define them: ***size, offset, and stride***.

The `size` (or `shape`, in NumPy parlance) is a tuple indicating how many elements across each dimension the tensor represents.

The `storage` offset is the index in the storage corresponding to the first element in the tensor.

The `stride` is the number of elements in the storage that need to be skipped over to obtain the next element along each dimension.


### Views of another tensor’s storage

We can get the second point in the tensor by providing the corresponding index:

In [105]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
second_point = points[1]
second_point.storage_offset()

2

In [106]:
second_point.size()

torch.Size([2])

The resulting tensor has `offset` 2 in the storage (since we need to skip the first point, which has two items), and the size is an instance of the `Size` class containing one element, since the tensor is one-dimensional.

It’s important to note that this is the same information contained in the `shape` property of tensor objects:

In [107]:
second_point.shape

torch.Size([2])

The `stride` is a tuple indicating the number of elements in the storage that have to be skipped when the index is increased by 1 in each dimension. For instance, our `points` tensor has a stride of `(2, 1)`:


In [108]:
points.stride()

(2, 1)

Accessing an element `i`, `j` in a 2D tensor results in accessing the `storage_offset + stride[0] * i + stride[1] * j` element in the storage. The offset will usually be zero; if this tensor is a view of a storage created to hold a larger tensor, the offset might be a positive value.

This indirection between `Tensor` and `Storage` makes some operations inexpensive, like transposing a tensor or extracting a subtensor, because they do not lead to memory reallocations. Instead, they consist of allocating a new Tensor object with a different value for `size`, `storage` offset, or `stride`.

We already extracted a subtensor when we indexed a specific point and saw the `storage` offset increasing. Let’s see what happens to the `size` and `stride` as well:

In [109]:
print(points)
second_point = points[1]
second_point.size()

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])


torch.Size([2])

In [110]:
second_point.storage_offset()

2

In [111]:
second_point.stride()

(1,)

The bottom line is that the subtensor has one less dimension, as we would expect, while still indexing the same storage as the original `points` tensor. This also means changing the subtensor will have a side effect on the original tensor:

In [112]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
print(points)
second_point = points[1]
second_point[0] = 10.0
print(points)

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])
tensor([[ 4.,  1.],
        [10.,  3.],
        [ 2.,  1.]])



This might not always be desirable, so we can eventually clone the subtensor into a new tensor:

In [113]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
print(points)
second_point = points[1].clone()
second_point[0] = 10.0
print(points)

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])
tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])


### Transposing without copying

Let’s try transposing now. Let’s take our `points` tensor, which has individual points in the rows and `X` and `Y` coordinates in the columns, and turn it around so that individual points are in the columns. We take this opportunity to introduce the `t` function, a shorthand alternative to `transpose` for two-dimensional tensors:

In [114]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])

In [115]:
points_t = points.t() #transpose
points_t

tensor([[4., 5., 2.],
        [1., 3., 1.]])


We can easily verify that the two tensors share the same storage


In [116]:
id(points.storage()) == id(points_t.storage())

True

and that they differ only in shape and stride:

In [117]:
points.stride()

(2, 1)

In [118]:
points_t.stride()

(1, 2)

This tells us that increasing the first index by one in points—for example, going from points `[0,0]` to points `[1,0]` —will skip along the storage by two elements, while increasing the second index—from points `[0,0]` to points `[0,1]`—will skip along the storage by one. In other words, the storage holds the elements in the tensor sequentially row by row.

We can transpose `points` into `points_t`. We change the order of the elements in the `stride`. After that, increasing the row (the first index of the tensor) will skip along the storage by one, just like when we were moving along columns in points. This is the very definition of transposing. No new memory is allocated: transposing is obtained only by creating a new `Tensor` instance with different `stride` ordering than the original.

### Transposing in Higher Dimensions

Transposing in PyTorch is not limited to matrices. We can transpose a multidimensional array by specifying the two dimensions along which transposing (flipping shape and stride) should occur:

In [119]:
some_t = torch.ones(3, 4, 5) # 3 matrices with 4 rows and 5 columns -> 3, 4, 5
some_t, some_t.shape, some_t.stride()

(tensor([[[1., 1., 1., 1., 1.],
          [1., 1., 1., 1., 1.],
          [1., 1., 1., 1., 1.],
          [1., 1., 1., 1., 1.]],
 
         [[1., 1., 1., 1., 1.],
          [1., 1., 1., 1., 1.],
          [1., 1., 1., 1., 1.],
          [1., 1., 1., 1., 1.]],
 
         [[1., 1., 1., 1., 1.],
          [1., 1., 1., 1., 1.],
          [1., 1., 1., 1., 1.],
          [1., 1., 1., 1., 1.]]]), torch.Size([3, 4, 5]), (20, 5, 1))

In [120]:
transpose_t = some_t.transpose(0, 1) #flip 3 and 4 to get 4 matrices with 3 rows and 5 columns -> 4, 3, 5
transpose_t, transpose_t.shape, transpose_t.stride()

(tensor([[[1., 1., 1., 1., 1.],
          [1., 1., 1., 1., 1.],
          [1., 1., 1., 1., 1.]],
 
         [[1., 1., 1., 1., 1.],
          [1., 1., 1., 1., 1.],
          [1., 1., 1., 1., 1.]],
 
         [[1., 1., 1., 1., 1.],
          [1., 1., 1., 1., 1.],
          [1., 1., 1., 1., 1.]],
 
         [[1., 1., 1., 1., 1.],
          [1., 1., 1., 1., 1.],
          [1., 1., 1., 1., 1.]]]), torch.Size([4, 3, 5]), (5, 20, 1))

In [121]:
transpose_t = some_t.transpose(0, 2) #flip 3 and 5 to get 5 matrices with 4 rows and 3 columns -> 5, 4, 3
transpose_t, transpose_t.shape, transpose_t.stride()

(tensor([[[1., 1., 1.],
          [1., 1., 1.],
          [1., 1., 1.],
          [1., 1., 1.]],
 
         [[1., 1., 1.],
          [1., 1., 1.],
          [1., 1., 1.],
          [1., 1., 1.]],
 
         [[1., 1., 1.],
          [1., 1., 1.],
          [1., 1., 1.],
          [1., 1., 1.]],
 
         [[1., 1., 1.],
          [1., 1., 1.],
          [1., 1., 1.],
          [1., 1., 1.]],
 
         [[1., 1., 1.],
          [1., 1., 1.],
          [1., 1., 1.],
          [1., 1., 1.]]]), torch.Size([5, 4, 3]), (1, 5, 20))

In [122]:
transpose_t = some_t.transpose(1, 2) #flip 4 and 5 to get 3 matrices with 5 rows and 4 columns -> 3, 5, 4
transpose_t, transpose_t.shape, transpose_t.stride()

(tensor([[[1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.]],
 
         [[1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.]],
 
         [[1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.]]]), torch.Size([3, 5, 4]), (20, 1, 5))

### Contiguous tensors

A tensor whose values are laid out in the storage starting from the rightmost dimension onward (that is, moving along rows for a 2D tensor) is defined as `contiguous`. Contiguous tensors are convenient because we can visit them efficiently in order without jumping around in the storage (improving data locality improves performance because of the way memory access works on modern CPUs). This advantage of course depends on the way algorithms visit it.

Some tensor operations in PyTorch only work on contiguous tensors, such as `view`, which we’ll encounter in the next chapter. In that case, PyTorch will throw an informative exception and require us to call contiguous explicitly. It’s worth noting that calling contiguous will do nothing (and will not hurt performance) if the tensor is already contiguous.

In our case, points is contiguous, while its transpose is not:


In [123]:
points.is_contiguous()

True

In [124]:
points_t.is_contiguous()

False

We can obtain a new contiguous tensor from a non-contiguous one using the contiguous method. The content of the tensor will be the same, but the stride will change, as will the storage:

In [125]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
print(points)
points_t = points.t()
print(points_t)

tensor([[4., 1.],
        [5., 3.],
        [2., 1.]])
tensor([[4., 5., 2.],
        [1., 3., 1.]])


In [126]:
points_t.storage()

 4.0
 1.0
 5.0
 3.0
 2.0
 1.0
[torch.FloatStorage of size 6]

In [127]:
points_t.stride()

(1, 2)

In [128]:
points_t_cont = points_t.contiguous() #new contiguous tensor
points_t_cont

tensor([[4., 5., 2.],
        [1., 3., 1.]])

In [129]:
points_t_cont.storage()

 4.0
 5.0
 2.0
 1.0
 3.0
 1.0
[torch.FloatStorage of size 6]

In [130]:
points_t_cont.stride()

(3, 1)

Notice that the `storage` has been reshuffled in order for elements to be laid out row- by-row in the new storage. The `stride` has been changed to reflect the new layout.


## Moving tensors to the GPU

So far in this chapter, when we’ve talked about storage, we’ve meant memory on the CPU. PyTorch tensors also can be stored on a different kind of processor: a graphics processing unit (GPU). Every PyTorch tensor can be transferred to (one of) the GPU(s) in order to perform massively parallel, fast computations. All operations that will be performed on the tensor will be carried out using GPU-specific routines that come with PyTorch.

### Managing a tensor’s device attribute

In addition to `dtype`, a PyTorch `Tensor` also has the notion of `device`, which is where on the computer the tensor data is placed. Here is how we can create a tensor on the GPU by specifying the corresponding argument to the constructor:

In [138]:
points_gpu = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]], device='cuda')

If the above line doesn't work on Google Colab because no CUDA-capable device is detected, refer [this link](https://stackoverflow.com/a/55723109/3252158).

We could instead copy a tensor created on the CPU onto the GPU using the to method:

In [133]:
points_gpu = points.to(device='cuda')

Doing so returns a new tensor that has the same numerical data, but stored in the RAM of the GPU, rather than in regular system RAM. Now that the data is stored locally on the GPU, we’ll start to see the speedups mentioned earlier when performing mathematical operations on the tensor. In almost all cases, CPU- and GPU-based tensors expose the same user-facing API, making it much easier to write code that is agnostic to where, exactly, the heavy number crunching is running.

If our machine has more than one GPU, we can also decide on which GPU we allocate the tensor by passing a zero-based integer identifying the GPU on the machine, such as

In [134]:
points_gpu = points.to(device='cuda:0')

At this point, any operation performed on the tensor, such as multiplying all elements by a constant, is carried out on the GPU:

In [139]:
print(points)
points = 2 * points #Multiplication performed on the CPU
points_gpu = 2 * points.to(device='cuda') #Multiplication performed on the GPU
print(points)
print(points_gpu)

tensor([[32.,  8.],
        [40., 24.],
        [16.,  8.]])
tensor([[64., 16.],
        [80., 48.],
        [32., 16.]])
tensor([[128.,  32.],
        [160.,  96.],
        [ 64.,  32.]], device='cuda:0')


Note that the `points_gpu` tensor is not brought back to the CPU once the result has been computed. Here’s what happened in this line:

1.  The `points` tensor is copied to the GPU.
2.  A new tensor is allocated on the GPU and used to store the result of the multiplication.
3.  A handle to that GPU tensor is returned.

Therefore, if we also add a constant to the result

In [140]:
points_gpu = points_gpu + 4

the addition is still performed on the GPU, and no information flows to the CPU (unless we print or access the resulting tensor). In order to move the tensor back to the CPU, we need to provide a `cpu` argument to the `to` method, such as

In [143]:
points_cpu = points_gpu.to(device='cpu')
print(points_cpu)

tensor([[132.,  36.],
        [164., 100.],
        [ 68.,  36.]])


We can also use the shorthand methods `cpu` and `cuda` instead of the `to` method to achieve the same goal:

In [144]:
points_gpu = points.cuda() #Defaults to GPU index 0
points_gpu = points.cuda(0) #GPU
points_cpu = points_gpu.cpu() #CPU

It’s also worth mentioning that by using the `to` method, we can change the placement and the data type simultaneously by providing both `device` and `dtype` as arguments.

## NumPy interoperability

PyTorch tensors can be converted to NumPy arrays and vice versa very efficiently. By doing so, we can take advantage of the huge swath of functionality in the wider Python ecosystem that has built up around the NumPy array type. This zero-copy interoperability with NumPy arrays is due to the storage system working with the Python buffer protocol (https://docs.python.org/3/c-api/buffer.html).

To get a NumPy array out of our `points` tensor, we just call

In [145]:
points = torch.ones(3, 4)
points_np = points.numpy()
points_np

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]], dtype=float32)

which will return a NumPy multidimensional array of the right `size`, `shape`, and numerical `type`. Interestingly, the returned array shares the same underlying buffer with the tensor `storage`. This means the `numpy` method can be effectively executed at basically no cost, as long as the data sits in CPU RAM. It also means modifying the NumPy array will lead to a change in the originating tensor. If the tensor is allocated on the GPU, PyTorch will make a copy of the content of the tensor into a NumPy array allocated on the CPU.

Conversely, we can obtain a PyTorch tensor from a NumPy array this way

In [146]:
points = torch.from_numpy(points_np)
points

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

which will use the same buffer-sharing strategy we just described.

Note: while the default numeric type in PyTorch is `32-bit floating-point`, for NumPy it is `64-bit`. As discussed in "*A `dtype` for every occasion*" in *Tensor Element Types*, we usually want to use 32-bit floating-points, so we need to make sure we have tensors of `dtype torch .float` after converting.

## Serializing tensors

Creating a tensor on the fly is all well and good, but if the data inside is valuable, we will want to save it to a file and load it back at some point. After all, we don’t want to have to retrain a model from scratch every time we start running our program! PyTorch uses `pickle` under the hood to serialize the `tensor` object, plus dedicated serialization code for the storage. Here’s how we can save our `points` tensor to an `ourpoints.t` file:


In [148]:
torch.save(points, '/content/ourpoints.t')

As an alternative, we can pass a file descriptor in lieu of the filename:

In [149]:
with open('/content/ourpoints.t','wb') as f:
  torch.save(points, f)

Loading our points back is similarly a one-liner

In [150]:
points = torch.load('/content/ourpoints.t')
print(points)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])


or, equivalently,

In [152]:
with open('/content/ourpoints.t','rb') as f:
  points = torch.load(f)

print(points)

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])


While we can quickly save tensors this way if we only want to load them with PyTorch, the file format itself is not interoperable: we can’t read the tensor with software other than PyTorch. Depending on the use case, this may or may not be a limitation, but we should learn how to save tensors interoperably for those times when it is. We’ll look next at how to do so.

### Serializing to HDF5 with h5py

Every use case is unique, but we suspect needing to save tensors interoperably will be more common when introducing PyTorch into existing systems that already rely on different libraries. New projects probably won’t need to do this as often.

For those cases when you need to, however, you can use the HDF5 format and library (www.hdfgroup.org/solutions/hdf5). HDF5 is a portable, widely supported format for representing serialized multidimensional arrays, organized in a nested key- value dictionary. Python supports HDF5 through the h5py library (www.h5py.org), which accepts and returns data in the form of NumPy arrays.

We can install h5py using

In [155]:
!pip install h5py



At this point, we can save our `points` tensor by converting it to a NumPy array (at no cost, as we noted earlier) and passing it to the `create_dataset` function:

In [157]:
import h5py
f = h5py.File('/content/ourpoints.hdf5', 'w')
dset = f.create_dataset('coords', data=points.numpy())
f.close()

Here '`coords`' is a key into the HDF5 file. We can have other keys—even nested ones. One of the interesting things in HDF5 is that we can index the dataset while on disk and access only the elements we’re interested in.

Let’s suppose we want to load just the last two points in our dataset:

In [158]:
f = h5py.File('/content/ourpoints.hdf5', 'r')
dset = f['coords']
last_points = dset[-2:]

The data is not loaded when the file is opened or the dataset is required. Rather, the data stays on disk until we request the second and last rows in the dataset. At that point, `h5py` accesses those two columns and returns a NumPy array-like object encapsulating that region in that dataset that behaves like a NumPy array and has the same API.

Owing to this fact, we can pass the returned object to the `torch.from_numpy` function to obtain a tensor directly. Note that in this case, the data is copied over to the tensor’s storage:

In [159]:
last_points = torch.from_numpy(dset[-2:])
f.close()

Once we’re finished loading data, we close the file. Closing the HDFS file invalidates the datasets, and trying to access dset afterward will give an exception. As long as we stick to the order shown here, we are fine and can now work with the `last_points` tensor.

## Conclusion

Now we have covered everything we need to get started with representing everything in floats. We’ll cover other aspects of tensors—such as creating views of tensors; indexing tensors with other tensors; and broadcasting, which simplifies performing element-wise operations between tensors of different sizes or shapes—as needed along the way.

In chapter 4, we will learn how to represent real-world data in PyTorch. We will start with simple tabular data and move on to something more elaborate. In the process, we will get to know more about tensors.


## Exercises


1. Create a tensor `a` from `list(range(9))`. Predict and then check the `size`, `offset`, and `stride`.

   

In [161]:
a = torch.Tensor(list(range(9)))
a

tensor([0., 1., 2., 3., 4., 5., 6., 7., 8.])

The size should be 9. Offset should be 0. Stride should be 1 because the tensor is one-dimensional.

In [166]:
a.size(), a.storage_offset(), a.stride()

(torch.Size([9]), 0, (1,))

  a. Create a new tensor using `b = a.view(3, 3)`. What does view do? Check that `a` and `b` share the same storage.

In [168]:
b = a.view(3, 3)
b

tensor([[0., 1., 2.],
        [3., 4., 5.],
        [6., 7., 8.]])

The `view` shows us the same tensor with a different `shape`. To check that `a` and `b` share the same storage:

In [169]:
id(a.storage()) == id(b.storage())

True


b. Create a tensor `c = b[1:,1:]`. Predict and then check the `size`, `offset`, and `stride`.

In [170]:
c = b[1:,1:]
c

tensor([[4., 5.],
        [7., 8.]])

The size should be 2x2, offset should be 4, and stride should be (3, 1)

In [171]:
c.size(), c.storage_offset(), c.stride()

(torch.Size([2, 2]), 4, (3, 1))

2. Pick a mathematical operation like `cosine` or `square root`. Can you find a corresponding function in the torch library?


[Tan function](https://pytorch.org/docs/master/generated/torch.tan.html#torch-tan)

a. Apply the function element-wise to `a`. Why does it return an error?

In [172]:
a

tensor([0., 1., 2., 3., 4., 5., 6., 7., 8.])

In [177]:
import math

#print(math.tan(a))

ValueError: ignored

It returns an error because only one element tensors can be converted to Python scalars.

b. What operation is required to make the function work?

In [178]:
torch.tan(a)

tensor([ 0.0000,  1.5574, -2.1850, -0.1425,  1.1578, -3.3805, -0.2910,  0.8714,
        -6.7997])

c. Is there a version of your function that operates in place?

In [180]:
a_elementwise = []
for element in a:
  a_elementwise.append(math.tan(element))

print(a_elementwise)

[0.0, 1.5574077246549023, -2.185039863261519, -0.1425465430742778, 1.1578212823495775, -3.380515006246586, -0.29100619138474915, 0.8714479827243188, -6.799711455220379]


## Summary

- Neural networks transform floating-point representations into other floating- point representations. The starting and ending representations are typically human interpretable, but the intermediate representations are less so.
- These floating-point representations are stored in tensors.
- Tensors are multidimensional arrays; they are the basic data structure in
PyTorch.
- PyTorch has a comprehensive standard library for tensor creation, manipulation, and mathematical operations.
- Tensors can be serialized to disk and loaded back.
- All tensor operations in PyTorch can execute on the CPU as well as on the GPU,
with no change in the code.
- PyTorch uses a trailing underscore to indicate that a function operates in place on a tensor (for example, Tensor.sqrt_).