import torch
import torch.nn as nn

# 3.2 Multidimensional Arrays

In [6]:
a = [1.0,2.0,1.0]
#lists are mutable, tensors are able to be changed the same way
a[2] = 3.0
a

[1.0, 2.0, 3.0]

In [7]:
#now tensors
a = torch.ones(3)
a

tensor([1., 1., 1.])

In [8]:
a[1]

tensor(1.)

In [9]:
float(a[1])

1.0

In [10]:
a[2] = 2.0
a

tensor([1., 1., 2.])

 We can access an element using its zero-based
index or assign a new value to it. Although on the surface this example doesn’t differ
much from a list of number objects, under the hood things are completely different. 

Python lists or tuples of numbers are collections of Python objects that are individually
allocated in memory, as shown on the left in figure 3.3. PyTorch tensors or NumPy
arrays, on the other hand, are views over (typically) contiguous memory blocks containing unboxed C numeric types rather than Python objects. Each element is a 32-bit (4-byte)
float in this case, as we can see on the right side of figure 3.3. This means storing a 1D
tensor of 1,000,000 float numbers will require exactly 4,000,000 contiguous bytes, plus
a small overhead for the metadata (such as dimensions and numeric type).

 Say we have a list of coordinates we’d like to use to represent a geometrical object:
perhaps a 2D triangle with vertices at coordinates (4, 1), (5, 3), and (2, 1). The
example is not particularly pertinent to deep learning, but it’s easy to follow. Instead
of having coordinates as numbers in a Python list, as we did earlier, we can use a one-dimensional tensor by storing Xs in the even indices and Ys in the odd indices,
like this:

In [12]:
points = torch.zeros(6)
points[0] = 4.0
points[1] = 1.0
points[2] = 5.0
points[3] = 3.0
points[4] = 2.0
points[5] = 1.0

# or
points = torch.tensor([4.0,1.0,5.0,3.0,2.0,1.0])

In [13]:
#To get the coordinates of the first point, we do the following:
float(points[0]), float(points[1])

(4.0, 1.0)

This is OK, although it would be practical to have the first index refer to individual 2D
points rather than point coordinates. For this, we can use a 2D tensor:

In [17]:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points,points.shape

(tensor([[4., 1.],
         [5., 3.],
         [2., 1.]]),
 torch.Size([3, 2]))

This informs us about the size of the tensor along each dimension. We could also use
zeros or ones to initialize the tensor, providing the size as a tuple:

In [20]:
points = torch.zeros(3,2)
points

tensor([[0., 0.],
        [0., 0.],
        [0., 0.]])

In [23]:
#Now we can access an individual element in the tensor using one or two indices:
points = torch.tensor([[4.0, 1.0], [5.0, 3.0], [2.0, 1.0]])
points, points[0,1], points[0]

(tensor([[4., 1.],
         [5., 3.],
         [2., 1.]]),
 tensor(1.),
 tensor([4., 1.]))

The output is another tensor that presents a different view of the same underlying data.
The new tensor is a 1D tensor of size 2, referencing the values of the first row in the
points tensor. Does this mean a new chunk of memory was allocated, values were copied
into it, and the new memory was returned wrapped in a new tensor object? No, because
that would be very inefficient, especially if we had millions of points. We’ll revisit how
tensors are stored later in this chapter when we cover views of tensors in section 3.7. 

# 3.3 Indexing tensors
 can index a tensor and grab either in any dimension points


In [25]:
#grabs all rows after the first
points[1:,:]

tensor([[5., 3.],
        [2., 1.]])

In [26]:
# grab first column from all rows but first
points[1:, 0]

tensor([5., 2.])

In [27]:
#grabs first row and all columns
points[0,:]

tensor([4., 1.])

In [28]:
#adds dimension of size 1, just like unsqueeze
points[None]

tensor([[[4., 1.],
         [5., 3.],
         [2., 1.]]])

In addition to using ranges, PyTorch features a powerful form of indexing, called
advanced indexing, which we will look at in the next chapter. 


# 3.4 Named tensors

<b>Note: Named tensors are still experimental and should not be used in anything put into productions</b>

The dimensions (or axes) of our tensors usually index something like pixel locations
or color channels. This means when we want to index into a tensor, we need to
remember the ordering of the dimensions and write our indexing accordingly. As
data is transformed through multiple tensors, keeping track of which dimension contains what data can be error-prone.

 To make things concrete, imagine that we have a 3D tensor like img_t from section
2.1.4 (we will use dummy data for simplicity here), and we want to convert it to grayscale. We looked up typical weights for the colors to derive a single brightness value:1

In [31]:
img_t = torch.randn(3,5,5) # shape [channel,rows,columns]
weights = torch.tensor([0.2126,0.7152,0.0722])

We also often want our code to generalize—for example, from grayscale images represented as 2D tensors with height and width dimensions to color images adding a third
channel dimension (as in RGB), or from a single image to a batch of images. In section 2.1.4, we introduced an additional batch dimension in batch_t; here we pretend
to have a batch of 2:

In [33]:
batch_t = torch.randn(2,3,5,5) # shape [batch, channel,rows, columns]

So sometimes the RGB channels are in dimension 0, and sometimes they are in dimension 1. But we can generalize by counting from the end: they are always in dimension
–3, the third from the end. The lazy, unweighted mean can thus be written as follows:

In [34]:
img_gray_naive = img_t.mean(-3)
batch_gray_naive = batch_t.mean(-3)
img_gray_naive.shape, batch_gray_naive.shape

(torch.Size([5, 5]), torch.Size([2, 5, 5]))

But now we have the weight, too. PyTorch will allow us to multiply things that are the
same shape, as well as shapes where one operand is of size 1 in a given dimension. It
also appends leading dimensions of size 1 automatically. This is a feature called broadcasting. batch_t of shape (2, 3, 5, 5) is multiplied by unsqueezed_weights of shape (3,
1, 1), resulting in a tensor of shape (2, 3, 5, 5), from which we can then sum the third
dimension from the end (the three channels):

Note: This is whats usually done behind the scenes

In [36]:
unsqueezed_weights = weights.unsqueeze(-1).unsqueeze_(-1)

img_weights = (img_t * unsqueezed_weights)

batch_weights = (batch_t * unsqueezed_weights)

img_gray_weighted = img_weights.sum(-3)

batch_gray_weighted = batch_weights.sum(-3)

batch_weights.shape, batch_weights.shape, unsqueezed_weights.shape

(torch.Size([2, 3, 5, 5]), torch.Size([2, 3, 5, 5]), torch.Size([3, 1, 1]))

Because this gets messy quickly—and for the sake of efficiency—the PyTorch function
einsum (adapted from NumPy) specifies an indexing mini-language2
 giving index
names to dimensions for sums of such products. As often in Python, broadcasting—a
form of summarizing unnamed things—is done using three dots '…'; but don’t worry
too much about einsum, because we will not use it in the following:

In [37]:
img_gray_weighted_fancy = torch.einsum('...chw,c->...hw', img_t, weights)
batch_gray_weighted_fancy = torch.einsum('...chw,c->...hw', batch_t, weights)
batch_gray_weighted_fancy.shape

torch.Size([2, 5, 5])

As we can see, there is quite a lot of bookkeeping involved. This is error-prone, especially when the locations where tensors are created and used are far apart in our code.
This has caught the eye of practitioners, and so it has been suggested3
 that the dimension be given a name instead.
 
 
 PyTorch 1.3 added named tensors as an experimental feature (see https://pytorch.org/tutorials/intermediate/named_tensor_tutorial.html and https://pytorch.org/docs/stable/named_tensor.html). Tensor factory functions such as tensor and rand
take a names argument. The names should be a sequence of strings:

In [39]:
weights_named = torch.tensor([0.2126, 0.71652, 0.0722], names=['channels'])
weights_named



tensor([0.2126, 0.7165, 0.0722], names=('channels',))

When we already have a tensor and want to add names (but not change existing
ones), we can call the method refine_names on it. Similar to indexing, the ellipsis (…)
allows you to leave out any number of dimensions. With the rename sibling method,
you can also overwrite or drop (by passing in None) existing names:

In [41]:
img_named = img_t.refine_names(..., 'channels', 'rows', 'columns')
batch_named = batch_t.refine_names(..., 'channels', 'rows', 'columns')
print('img named:', img_named.shape, img_named.names)
print('batch named:', batch_named.shape, batch_named.names)

img named: torch.Size([3, 5, 5]) ('channels', 'rows', 'columns')
batch named: torch.Size([2, 3, 5, 5]) (None, 'channels', 'rows', 'columns')


For operations with two inputs, in addition to the usual dimension checks—whether
sizes are the same, or if one is 1 and can be broadcast to the other—PyTorch will now
check the names for us. So far, it does not automatically align dimensions, so we need
to do this explicitly. The method align_as returns a tensor with missing dimensions
added and existing ones permuted to the right order:

In [43]:
weights_aligned = weights_named.align_as(img_named)
weights_aligned.shape, weights_aligned.names

(torch.Size([3, 1, 1]), ('channels', 'rows', 'columns'))

In [44]:
#Functions accepting dimension arguments, like sum, also take named dimensions:
gray_named = (img_named * weights_aligned).sum('channels')
gray_named.shape, gray_named.names

(torch.Size([5, 5]), ('rows', 'columns'))

In [45]:
#If we try to combine dimensions with different names, we get an error:
gray_named = (img_named[..., :3] * weights_named).sum('channels')

RuntimeError: Error when attempting to broadcast dims ['channels', 'rows', 'columns'] and dims ['channels']: dim 'columns' and dim 'channels' are at the same position from the right but do not match.

If we want to use tensors outside functions that operate on named tensors, we need to
drop the names by renaming them to None. The following gets us back into the world
of unnamed dimensions:

In [47]:
gray_plain = gray_named.rename(None)
gray_plain.shape, gray_plain.names

(torch.Size([5, 5]), (None, None))

Given the experimental nature of this feature at the time of writing, and to avoid
mucking around with indexing and alignment, we will stick to unnamed in the
remainder of the book. Named tensors have the potential to eliminate many sources
of alignment errors, which—if the PyTorch forum is any indication—can be a source
of headaches. It will be interesting to see how widely they will be adopted. 


# 3.5 Tensor Element Types

Tensor element types
So far, we have covered the basics of how tensors work, but we have not yet touched on
what kinds of numeric types we can store in a Tensor. As we hinted at in section 3.2,
using the standard Python numeric types can be suboptimal for several reasons:
 - Numbers in Python are objects. Whereas a floating-point number might require
only, for instance, 32 bits to be represented on a computer, Python will convert
it into a full-fledged Python object with reference counting, and so on. This
operation, called boxing, is not a problem if we need to store a small number of
numbers, but allocating millions gets very inefficient.

-  Lists in Python are meant for sequential collections of objects. There are no operations
defined for, say, efficiently taking the dot product of two vectors, or summing vectors together. Also, Python lists have no way of optimizing the layout of their contents in memory, as they are indexable collections of pointers to Python objects (of any kind, not just numbers). Finally, Python lists are one-dimensional, and
although we can create lists of lists, this is again very inefficient.


-  The Python interpreter is slow compared to optimized, compiled code. Performing mathematical operations on large collections of numerical data can be much faster
using optimized code written in a compiled, low-level language like C.
For these reasons, data science libraries rely on NumPy or introduce dedicated data
structures like PyTorch tensors, which provide efficient low-level implementations of
numerical data structures and related operations on them, wrapped in a convenient
high-level API. To enable this, the objects within a tensor must all be numbers of the
same type, and PyTorch must keep track of this numeric type.


## 3.5.1Specifying the numeric type with dtype

The dtype argument to tensor constructors (that is, functions like tensor, zeros, and
ones) specifies the numerical data (d) type that will be contained in the tensor. The
data type specifies the possible values the tensor can hold (integers versus floatingpoint numbers) and the number of bytes per value.4
 The dtype argument is deliberately similar to the standard NumPy argument of the same name
 
 
 The default data type for tensors is 32-bit floating-point. 
 
 ## 3.5.2 A dtype for every occasion
As we will see in future chapters, computations happening in neural networks are typically executed with 32-bit floating-point precision. Higher precision, like 64-bit, will
not buy improvements in the accuracy of a model and will require more memory and
computing time. The 16-bit floating-point, half-precision data type is not present
natively in standard CPUs, but it is offered on modern GPUs. It is possible to switch to
half-precision to decrease the footprint of a neural network model if needed, with a
minor impact on accuracy.

 Tensors can be used as indexes in other tensors. In this case, PyTorch expects
indexing tensors to have a 64-bit integer data type. Creating a tensor with integers as
arguments, such as using torch.tensor([2, 2]), will create a 64-bit integer tensor by
default. As such, we’ll spend most of our time dealing with float32 and int64.
 
 Finally, predicates on tensors, such as points > 1.0, produce bool tensors indicating whether each individual element satisfies the condition. These are the numeric
types in a nutshell. 

## 3.5.3 Managing a tensor’s dtype attribute

In [50]:
#We can specify the dtype as an argument when creating a tensor
double_points = torch.ones(10,2, dtype=torch.double)
short_points = torch.ones(10,2, dtype=torch.short)
double_points.dtype, short_points.dtype

(torch.float64, torch.int16)

In [52]:
#can also cast dtype using the following syntax:
double_points = torch.ones(10,2).double()
short_points = torch.ones(10,2).short()

#or more convenient 'to' method
double_points = torch.ones(10,2).to(torch.double)
short_points = torch.ones(10,2).to(torch.double)