<a href="https://colab.research.google.com/github/jthemphill/pytorch_tutorial_2022/blob/automatic-reshaping-clarification/notebooks/00_Tensors.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
%matplotlib inline
%reload_ext autoreload
%autoreload 2

This notebook introduces the `Tensor` class of PyTorch, which is the fundamental class for storing numbers, whether they be data, parameters, gradients, etc. If you have any familiarity with Numpy arrays, just* treat Tensors the same.

`Tensor` objects can be instantiated directly, or returned by various `torch` methods:

In [None]:
import torch
from torch import Tensor

## Creating a tensor

In [None]:
my_tensor = Tensor([1,2,3])
my_tensor

tensor([1., 2., 3.])

Here we've stored a list of numbers in a `Tensor`.

Alternatively, we could have done:

In [None]:
my_tensor = torch.tensor([1,2,3])
my_tensor

tensor([1, 2, 3])

**Important points**
- Although these are similar, `torch.tensor` has a few more options, such as `requires_grad` (more on that later)
- If you miss out the brackets [ ], when intialising a `Tensor` from a float, they behave differently

In [None]:
Tensor(2), Tensor([2]), torch.tensor(2), torch.tensor([2])

(tensor([-2.9522e-20,  4.5794e-41]), tensor([2.]), tensor(2), tensor([2]))

Above, `Tensor(2)` made a 2-element tensor with "random" floats in it, but `torch.tensor(2)` instead made a 0-dimensional tensor (basically a float) with a value of 2

## Tensor shapes

Like Numpy arrays, Tensors can store multidimensional arrays of numbers. Each dimension can have a different number of elements. The number of dimensions that a tensor has is referred to as it's *rank*. 
E.g. a rank-1 tensor is just a vector of numbers, a rank-2 tensor is a matrix, etc.
The *shape* of a tensor is the number of elements per dimension. This can be accessed from either the `Tensor.shape` attribute, or by calling `Tensor.size()`. The number of elements in a specific dimension can be returned by either indexing the `torch.shape`, or by passing the dimension index in the call to `size` e.g. `Tensor.size(1)`. The `len()` method is overidden to provide the size of the zeroth dimension.

In [None]:
# rank-1 tensor:
r1 = Tensor([1,2,3])
r1.shape, r1.size(), len(r1)

(torch.Size([3]), torch.Size([3]), 3)

In [None]:
# rank-2 tensor:
r2 = Tensor([[1],[2],[3]])
r2.shape, r2.size(), r2.shape[0], r2.size(0), len(r2)

(torch.Size([3, 1]), torch.Size([3, 1]), 3, 3, 3)

In [None]:
# rank-3 tensor:
r3 = Tensor([[[1,4]],[[2,5]],[[3,6]]])
r3.shape, r3.size(), r3.shape[0], r3.size(0)

(torch.Size([3, 1, 2]), torch.Size([3, 1, 2]), 3, 3)

In [None]:
# rank-0 tensor
r0 = torch.tensor(2)
r0.shape, r0.size()

(torch.Size([]), torch.Size([]))

## Tensors from torch methods

So far we've manually created tensors from lists, but `torch` has a few methods to create tensors according to specific rules by specifying the desire shape

In [None]:
torch.zeros(2,3)

tensor([[0., 0., 0.],
        [0., 0., 0.]])

In [None]:
torch.ones(4,1)

tensor([[1.],
        [1.],
        [1.],
        [1.]])

In [None]:
torch.empty(2)

tensor([4.4650e+30, 1.4347e-19])

In [None]:
torch.rand(3,3)  # Uniform [0,1]

tensor([[0.5284, 0.0927, 0.7045],
        [0.9083, 0.1330, 0.3229],
        [0.6506, 0.9660, 0.5810]])

In [None]:
torch.randn(3,3)  # Normal (0,1)

tensor([[-0.9297, -1.0869,  1.2495],
        [ 0.7435,  0.4109,  0.4276],
        [ 0.2507, -1.3278, -0.5484]])

In [None]:
a = torch.rand(3,3)
torch.ones_like(a)  # Same shape

tensor([[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]])

In [None]:
torch.arange(start=2, end=10, step=2)

tensor([2, 4, 6, 8])

In [None]:
torch.linspace(11,20,30)

tensor([11.0000, 11.3103, 11.6207, 11.9310, 12.2414, 12.5517, 12.8621, 13.1724,
        13.4828, 13.7931, 14.1034, 14.4138, 14.7241, 15.0345, 15.3448, 15.6552,
        15.9655, 16.2759, 16.5862, 16.8966, 17.2069, 17.5172, 17.8276, 18.1379,
        18.4483, 18.7586, 19.0690, 19.3793, 19.6897, 20.0000])

## Indexing
Indexing tensors is the same as indexing Numpy arrays; indeces can be provided in brackets [ ], and these will return pointers to the elements at those indices in the tensor. These can be used to read and write values.

When indexing a tensor, indeces can be provided per dimension, starting from the zeroth dimension. Any dimensions which are not explicitly indexed will have all their dimensions returned.
To avoid indexing a dimension, set its index to `:`.
Dimension indeces can be passed as:
 - A single index, e.g. `2`
 - A list of indeces, e.g. `[0,1,4,5,8]`. Indices can also be repeated and provided in any order, `[0,0,3,2]`
 - A range, e.g. `2:6` will return indices 2,3,4,5, but not 6, and `2:6:2` wil return indices 2 and 4
 - a `slice` object, e.g. `slice(6, 10, 2)` will return indices 6 and 8
 
Indexing begins from zero, i.e. indexing a tensor at one will return the second logical element, and indexing at zero will return the first.

Negative indices begin counting back from the last element, which is located at index -1.

In [None]:
a = torch.arange(18).reshape(3,2,3)
a

tensor([[[ 0,  1,  2],
         [ 3,  4,  5]],

        [[ 6,  7,  8],
         [ 9, 10, 11]],

        [[12, 13, 14],
         [15, 16, 17]]])

In [None]:
a[2]  # single-element indexing of the zeroth dimension, this is the same as a[2,:,:]

tensor([[12, 13, 14],
        [15, 16, 17]])

In [None]:
a[:,1]  # single-element indexing of the first dimension, the : returns all elements from the zeroth dimension

tensor([[ 3,  4,  5],
        [ 9, 10, 11],
        [15, 16, 17]])

In [None]:
a[0,1]  # single-element indexing of the zeroth and first dimension

tensor([3, 4, 5])

In [None]:
a[[0,1,0]]  # indexing with a list

tensor([[[ 0,  1,  2],
         [ 3,  4,  5]],

        [[ 6,  7,  8],
         [ 9, 10, 11]],

        [[ 0,  1,  2],
         [ 3,  4,  5]]])

In [None]:
a[:,:, 0:2]  # indexing with a range

tensor([[[ 0,  1],
         [ 3,  4]],

        [[ 6,  7],
         [ 9, 10]],

        [[12, 13],
         [15, 16]]])

In [None]:
start = 0
end = 2
a[:,:,slice(start,end)]  # indexing with a slice

tensor([[[ 0,  1],
         [ 3,  4]],

        [[ 6,  7],
         [ 9, 10]],

        [[12, 13],
         [15, 16]]])

In [None]:
a[-2]  # indexing with negative indices

tensor([[ 6,  7,  8],
        [ 9, 10, 11]])

In [None]:
a[[0,1,2], [0,1,1]]  # Advanced indexing with two lists. Can you tell what is happening here?

tensor([[ 0,  1,  2],
        [ 9, 10, 11],
        [15, 16, 17]])

In [None]:
a[0:1] = 10  # Setting element values
a

tensor([[[10, 10, 10],
         [10, 10, 10]],

        [[ 6,  7,  8],
         [ 9, 10, 11]],

        [[12, 13, 14],
         [15, 16, 17]]])

In [None]:
a[0:1] *= -1
a

tensor([[[-10, -10, -10],
         [-10, -10, -10]],

        [[  6,   7,   8],
         [  9,  10,  11]],

        [[ 12,  13,  14],
         [ 15,  16,  17]]])

When indexing with a single value, note that the number of dimensions is reduced:

In [None]:
a[0].shape

torch.Size([2, 3])

This can be avoided by indexing with a range of length 1:

In [None]:
a[0:1].shape

torch.Size([1, 2, 3])

## Boolean Masking

Tensors can be masked to return elements for which the mask value is `True`. The mask must be of the same length as the dimension it is applied to. Masks can also be multi-dimensional.

In [None]:
a[[True,False,True]]

tensor([[[-10, -10, -10],
         [-10, -10, -10]],

        [[ 12,  13,  14],
         [ 15,  16,  17]]])

In [None]:
a[:,[True,False]]

tensor([[[-10, -10, -10]],

        [[  6,   7,   8]],

        [[ 12,  13,  14]]])

In [None]:
mask = a > 5
print(mask)
a[mask]  # Multi-dim mask. Note how it collapses to a rank-1 tensor

tensor([[[False, False, False],
         [False, False, False]],

        [[ True,  True,  True],
         [ True,  True,  True]],

        [[ True,  True,  True],
         [ True,  True,  True]]])


tensor([ 6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17])

Masks can also be used to write to elements that are selected by the mask

In [None]:
print(a)
a[mask] *= -1
a

tensor([[[-10, -10, -10],
         [-10, -10, -10]],

        [[  6,   7,   8],
         [  9,  10,  11]],

        [[ 12,  13,  14],
         [ 15,  16,  17]]])


tensor([[[-10, -10, -10],
         [-10, -10, -10]],

        [[ -6,  -7,  -8],
         [ -9, -10, -11]],

        [[-12, -13, -14],
         [-15, -16, -17]]])

### where function
In some cases, it can be simpler to avoid explicitly masking tensors and instead using a `where` function:

In [None]:
torch.where(a < 0, a*-1, a)

tensor([[[10, 10, 10],
         [10, 10, 10]],

        [[ 6,  7,  8],
         [ 9, 10, 11]],

        [[12, 13, 14],
         [15, 16, 17]]])

Despite the convoluted code, Multiple where statements can be chained together without having to worry about merging masks

In [None]:
torch.where(a < 0, a*-1, torch.where(a>2, 10*a, a))

tensor([[[10, 10, 10],
         [10, 10, 10]],

        [[ 6,  7,  8],
         [ 9, 10, 11]],

        [[12, 13, 14],
         [15, 16, 17]]])

### Mask functions
`isnan` and `isinf` can be used to return Boolean masks of elements which are NaN/Inf.

In [None]:
m = torch.randn(10).log().isnan()
m

tensor([False,  True,  True,  True,  True, False, False,  True,  True,  True])

Boolean masks can be inverted with the `~` operator

In [None]:
~m

tensor([ True, False, False, False, False,  True,  True, False, False, False])

## Data types

Tensors can have various types, e.g. float, bool, int, long. They can either be intialised with the specified type, or converted between types.

In [None]:
a = 10*torch.randn(5)
print(a, a.long())

tensor([ 9.5504, 15.1664,  5.1242, -4.5564, 16.1716]) tensor([ 9, 15,  5, -4, 16])


In [None]:
b = a > 0
print(b, b.float())

tensor([ True,  True,  True, False,  True]) tensor([1., 1., 1., 0., 1.])


In [None]:
torch.rand(5, dtype=torch.float64), torch.rand(5, dtype=torch.float32)

(tensor([0.0852, 0.0681, 0.9256, 0.6227, 0.8386], dtype=torch.float64),
 tensor([0.6645, 0.4892, 0.7237, 0.3992, 0.9269]))

## Basic mathematical operations

### Torch and tensor methods

Like Numpy, PyTorch has many methods that can be applied `Tensor`s. These can either be called by `torch.method_name(tensor)` or `tensor.method_name()`. Generally, if it is a common mathematical operation, there is a method for it. Generally, these operations will be *broadcast* to all elements, meaning that the same operation is applied to each element, without regard for where it is located, or what the values of the other elements are. Multiple operations can be chained together. Some examples:

In [None]:
a = torch.randn(3,3)

In [None]:
torch.cos(a), a.sin()

(tensor([[ 0.7299, -0.0457,  0.1589],
         [ 0.9909,  0.7171,  0.9412],
         [ 0.2808,  0.8627,  0.0505]]),
 tensor([[ 0.6835,  0.9990, -0.9873],
         [-0.1348, -0.6970, -0.3379],
         [ 0.9598,  0.5058, -0.9987]]))

In [None]:
a.square(), torch.sqrt(a), torch.sqrt(a).nan_to_num()

(tensor([[0.5664, 2.6133, 1.9914],
         [0.0183, 0.5947, 0.1188],
         [1.6542, 0.2812, 2.3112]]),
 tensor([[0.8675, 1.2714,    nan],
         [   nan,    nan,    nan],
         [1.1341, 0.7282,    nan]]),
 tensor([[0.8675, 1.2714, 0.0000],
         [0.0000, 0.0000, 0.0000],
         [1.1341, 0.7282, 0.0000]]))

In [None]:
a.pow(3)

tensor([[ 4.2624e-01,  4.2245e+00, -2.8103e+00],
        [-2.4735e-03, -4.5861e-01, -4.0938e-02],
        [ 2.1276e+00,  1.4911e-01, -3.5137e+00]])

In [None]:
torch.abs(a)

tensor([[0.7526, 1.6166, 1.4112],
        [0.1352, 0.7712, 0.3446],
        [1.2862, 0.5303, 1.5203]])

In [None]:
a.exp(), torch.log10(a)

(tensor([[2.1225, 5.0357, 0.2439],
         [0.8735, 0.4625, 0.7085],
         [3.6189, 1.6994, 0.2187]]),
 tensor([[-0.1234,  0.2086,     nan],
         [    nan,     nan,     nan],
         [ 0.1093, -0.2755,     nan]]))

### Operations between tensors and floats
Common mathematical operators `(+,-,*,**,/,//)` are overloaded to broadcast the same operation to each element, an operations are commutative.

In [None]:
a = torch.randn(3,3)
a

tensor([[-0.5958, -0.4090, -0.5656],
        [-0.9153, -0.3750, -2.3914],
        [-1.3762,  0.0594,  2.2618]])

In [None]:
a + 10, a-3, 0.1*a, a**-1, a/20, a//2

(tensor([[ 9.4042,  9.5910,  9.4344],
         [ 9.0847,  9.6250,  7.6086],
         [ 8.6238, 10.0594, 12.2618]]),
 tensor([[-3.5958, -3.4090, -3.5656],
         [-3.9153, -3.3750, -5.3914],
         [-4.3762, -2.9406, -0.7382]]),
 tensor([[-0.0596, -0.0409, -0.0566],
         [-0.0915, -0.0375, -0.2391],
         [-0.1376,  0.0059,  0.2262]]),
 tensor([[-1.6785, -2.4448, -1.7680],
         [-1.0926, -2.6669, -0.4182],
         [-0.7267, 16.8324,  0.4421]]),
 tensor([[-0.0298, -0.0205, -0.0283],
         [-0.0458, -0.0187, -0.1196],
         [-0.0688,  0.0030,  0.1131]]),
 tensor([[-1., -1., -1.],
         [-1., -1., -2.],
         [-1.,  0.,  1.]]))

These operations can also be called as methods:

In [None]:
torch.add(a, 10), a.sub(3), a.mul(0.1), torch.pow(a,-1), torch.div(a,20), torch.div(a,2,rounding_mode="floor")

(tensor([[ 9.4042,  9.5910,  9.4344],
         [ 9.0847,  9.6250,  7.6086],
         [ 8.6238, 10.0594, 12.2618]]),
 tensor([[-3.5958, -3.4090, -3.5656],
         [-3.9153, -3.3750, -5.3914],
         [-4.3762, -2.9406, -0.7382]]),
 tensor([[-0.0596, -0.0409, -0.0566],
         [-0.0915, -0.0375, -0.2391],
         [-0.1376,  0.0059,  0.2262]]),
 tensor([[-1.6785, -2.4448, -1.7680],
         [-1.0926, -2.6669, -0.4182],
         [-0.7267, 16.8324,  0.4421]]),
 tensor([[-0.0298, -0.0205, -0.0283],
         [-0.0458, -0.0187, -0.1196],
         [-0.0688,  0.0030,  0.1131]]),
 tensor([[-1., -1., -1.],
         [-1., -1., -2.],
         [-1.,  0.,  1.]]))

#### In-place operations

Current, these operations are performed such that they return the new value of the tensor, but the values of the original tensor are left unchanged. Many of these operations can also be performed "in-place", in which the original values of the tensor are updated. In torch methods, there is often an `method_` version, which performs `method` in-place on the tensor.
Personally, I would **not** recommend, this: it is more compact, but can cause issues when building a differentiable system, since gradients can't be propagated through in-place operations.

In [None]:
a = torch.randn(3,3)
print(a)
a += 10
a

tensor([[ 0.6071,  0.6758,  0.2167],
        [-0.7897, -0.6264,  0.9150],
        [-0.5337, -2.4725, -0.2235]])


tensor([[10.6071, 10.6758, 10.2167],
        [ 9.2103,  9.3736, 10.9150],
        [ 9.4663,  7.5275,  9.7765]])

In [None]:
a = torch.randn(3,3)
print(a)
a.pow_(-1)
a

tensor([[ 0.0747,  1.6477,  1.5224],
        [ 0.7767, -0.0370, -0.2346],
        [ 0.3004,  1.1650,  0.1740]])


tensor([[ 13.3866,   0.6069,   0.6569],
        [  1.2876, -27.0051,  -4.2634],
        [  3.3286,   0.8584,   5.7475]])

In [None]:
a = torch.randn(3,3).log()
print(a)
torch.nan_to_num_(a)
a

tensor([[    nan,     nan,     nan],
        [-0.4975,     nan,  0.5074],
        [ 0.6073,     nan,     nan]])


tensor([[ 0.0000,  0.0000,  0.0000],
        [-0.4975,  0.0000,  0.5074],
        [ 0.6073,  0.0000,  0.0000]])

In-place operations can also happen unexpectedly, due to e.g. indexing, and can quite difficult to find and fix. Overwriting tensors with out-of-place operations is (generally) fine:

In [None]:
a = a+3

### Operations between pairs of tensors
Similar to floats, common mathematical operators `(+,-,*,**,/,//)` are overloaded to apply the same the same operation to each pair of elements of tensors of the same size.

**Important** the `*` is an element-wise multiplication (Hadamard product). For a traditional matrix multiplication, use `torch.mm` or the `@` operator. This is subject to the normal requirements on shapes, and is not (necessarily) commutative.

In [None]:
a,b = torch.randn(3,3),torch.randn(3,3)
a,b

(tensor([[-1.1703, -0.8555,  1.0839],
         [ 0.1600,  1.6309,  1.0055],
         [-0.4818,  0.7642,  0.4079]]),
 tensor([[-0.7183,  1.1314, -0.1432],
         [ 0.1695, -0.6910,  0.9068],
         [-0.4084, -0.8161,  0.6513]]))

In [None]:
a + b, a-b, a**b, a/b, a//b.long()

(tensor([[-1.8886,  0.2759,  0.9407],
         [ 0.3295,  0.9400,  1.9124],
         [-0.8901, -0.0520,  1.0591]]),
 tensor([[-0.4520, -1.9869,  1.2271],
         [-0.0094,  2.3219,  0.0987],
         [-0.0734,  1.5803, -0.2434]]),
 tensor([[   nan,    nan, 0.9885],
         [0.7331, 0.7132, 1.0050],
         [   nan, 1.2455, 0.5576]]),
 tensor([[ 1.6292, -0.7562, -7.5708],
         [ 0.9443, -2.3603,  1.1089],
         [ 1.1798, -0.9363,  0.6263]]),
 tensor([[-inf, -1., inf],
         [inf, inf, inf],
         [-inf, inf, inf]]))

In [None]:
a*b, a@b

(tensor([[ 0.8406, -0.9679, -0.1552],
         [ 0.0271, -1.1269,  0.9118],
         [ 0.1967, -0.6237,  0.2656]]),
 tensor([[ 0.2530, -1.6175,  0.0976],
         [-0.2492, -1.7666,  2.1109],
         [ 0.3090, -1.4060,  1.0276]]))

In [None]:
torch.randn(2,3)@torch.randn(3,2)

tensor([[0.0417, 0.6071],
        [0.6482, 0.2336]])

In [None]:
torch.mm(torch.randn(2,3),torch.randn(3,2))

tensor([[-0.5271, -1.5831],
        [-1.7283, -1.5105]])

### Implicit reshaping
Operations between tensors of different shapes can either fail, or result in one of the tensors being automatically reshaped in order to perform the requested operation.

In [None]:
torch.randn(2)*torch.ones(2,3) # This fails since the first tensor is of a different rank to the second

RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 1

In [None]:
torch.randn(2,2)*torch.ones(2,3) # This fails since the first tensor has a different number of elements in the second dimension

RuntimeError: The size of tensor a (2) must match the size of tensor b (3) at non-singleton dimension 1

In [71]:
torch.randn(2,1)*torch.ones(2,3) # Here the first tensor is automatically copied along the second dimension to in an effective shape of (2,3) to match that of the second tensor

tensor([[-1.1172, -1.1172, -1.1172],
        [ 0.1716,  0.1716,  0.1716]])

In [72]:
torch.randn(1)*torch.ones(2,3) # Here the first tensor is automatically reshaped to (1,1) and then copied along the first & second dimension to in an effective shape of (2,3) to match that of the second tensor

tensor([[0.8493, 0.8493, 0.8493],
        [0.8493, 0.8493, 0.8493]])

**Important**, this automatic reshaping can happen unexpectedly. Imagine a function that computes values based on two tensors, say the x-position and momentum of a collection of muons:

In [None]:
n_muons = 5
xyz = torch.rand(n_muons,3)
mom = torch.ones(n_muons)
xyz.shape, mom.shape

(torch.Size([5, 3]), torch.Size([5]))

In [None]:
def my_func(x, mom):
    return x*mom

We expect to compute one value per muon, but if we accidentally pass tensors of different shapes, the result can be reshaped unexpectedly:

In [None]:
x = xyz[:,0:1]
x.shape

torch.Size([5, 1])

In [None]:
val = my_func(x, mom)
val.shape, val

(torch.Size([5, 5]),
 tensor([[0.4786, 0.4786, 0.4786, 0.4786, 0.4786],
         [0.6937, 0.6937, 0.6937, 0.6937, 0.6937],
         [0.2799, 0.2799, 0.2799, 0.2799, 0.2799],
         [0.1249, 0.1249, 0.1249, 0.1249, 0.1249],
         [0.0537, 0.0537, 0.0537, 0.0537, 0.0537]]))

In [None]:
val[0]

tensor([0.4786, 0.4786, 0.4786, 0.4786, 0.4786])

Here both the `x` and `mom` tensors were shape (5,5), since the size of the `x` tensor's second dimension was 1.
Instead, if the tensors are of the same rank, then this doesn't happen:

In [None]:
val = my_func(xyz[:,0], mom)
val.shape, val

(torch.Size([5]), tensor([0.4786, 0.6937, 0.2799, 0.1249, 0.0537]))

In [None]:
val = my_func(xyz[:,0:1], mom[:,None])  # The [:,None] index adds an extra dimension to the indexed tensor
val.shape, val

(torch.Size([5, 1]),
 tensor([[0.4786],
         [0.6937],
         [0.2799],
         [0.1249],
         [0.0537]]))

## Advanced mathematical operations

### Reduction methods
Certain methods can be applied to compute values based on elements of a tensor along specified dimensions, and can reduce the number of dimensions in the tensor. Often these will take a `dim` argument for specifying the (zero-ordered) dimension along which to apply the operation. This can sometimes be a list/tuple of dimensions, to apply the operation to all dimensions listed. By default this is `None`, in which case the operation is applied to all dimensions.
Normally, all dimensions listed, will be removed from the resulting tensor. The operations will also have a `keepdim` argument, which can be set to `True`, in order to retain the same rank; the dimension size will be one.

In [None]:
a = torch.rand(3,2,4)
a

In [None]:
a.sum(), torch.sum(a, dim=0), torch.sum(a, dim=0, keepdim=True), torch.sum(a, dim=1), torch.sum(a, dim=-1), a.sum([0,-1])

In [None]:
a.mean(), a.prod(-2), a.norm(dim=0)

Min, max, and median operations are a bit strange in PyTorch. In that their return types can vary, depending on whether a dimension is specified. When a dimension is specified, then the return type has both value and index attributes:

In [None]:
a.min()

In [None]:
m = a.min(0)
m, m.values, m.indices

In [None]:
torch.max(a), torch.max(a, 0)

In [None]:
torch.median(a), torch.median(a, 0)

## Manipulations

### Reshaping tensors
Although tensors are created with a specific shape, as we've seen they needn't retain this shape forever. Their shapes can be manipulated by the following methods, provided the number of elements remains the same:

In [None]:
a = torch.arange(9)
print(a.shape, a)
b = a.reshape(3,3)  # reshape to the specified shape. If possible a new tensor is NOT returned, but sometimes this isn't possible, in which case the tensor is copied
print(b.shape)
print(b)

A similar function is `view`, but sometimes a view isn't possible due to the way that the tensor is stored in memory

In [None]:
a = torch.arange(9)
print(a.shape, a)
b = a.reshape(3,-1)  # The -1 indicates that the size of this dimension should be whatever is required to retain the same number of elements
print(b.shape)

In [None]:
a = torch.arange(18).reshape(3,2,3)
print(a.shape)
print(a)
b = a.transpose(0,1)  # swaps the specifed dimensions
print(b.shape)
print(b)

In [None]:
a = torch.arange(24).reshape(3,2,4)
print(a.shape)
print(a)
b = a.permute(2,0,1)  # Changes the dimensions by listing the id of the dimension in the position where it should appear in the reshaped tensor
print(b.shape)

In [None]:
a = torch.arange(24).reshape(3,2,4)
print(a.shape)
b = a.unsqueeze(2)  # Adds an extra dimension of size 1 at position 2
print(b.shape)

In [None]:
a = torch.arange(24).reshape(3,2,4)
print(a.shape)
b = a[:,:,None]  # A shortcut to .unsqueeze(), place None where new dimensions should be added
print(b.shape)

In [None]:
a = torch.arange(24).reshape(3,2,1,4,1)
print(a.shape)
b = a.squeeze()  # Removes all size=1 dimensions
print(b.shape)

In [None]:
a = torch.arange(24).reshape(3,2,1,4,1)
print(a.shape)
b = a.squeeze(-1)  # Removes the size=1 dimension at the specified position
print(b.shape)

In [None]:
a = torch.arange(24).reshape(3,2,-1)
print(a.shape)
b = a.flatten(1)  # Removes all dimensions starting from the specified one
print(b.shape)
b = a.flatten()
print(b.shape)  # Removes all dimensions

torch.Size([3, 2, 4])
torch.Size([3, 8])
torch.Size([24])


#### Expanding/repeating tensors

In [None]:
a = torch.arange(6).reshape(3,2,1)
print(a.shape)
b = a.expand(-1,-1,3)  # "Copies" the tensor along singleton (size=1) dimensions the specified number of times per dimension. -1 indicates "leave this dimension as is"
print(b.shape)         # This doesn't actually copy the values, instead it returns a view of the specified shape. An in place modification of one of the "copied"
b                      # elements will update all elements that were "copied", since they share the same memory address

In [None]:
b[0,0,0] += 1  # All elements in [0,0,:] are updated
b

Instead, to really copy elements, use the `repeat` and `repeat_interleave` functions.

In [None]:
b = a.repeat(1,1,3)  # Specify per dimension the number of times to copy data
print(b.shape)
print(b)
b[0,0,0] += 1
print(b)  # Inplace addition only affects one element; datat really was copied

In [None]:
b = a.repeat_interleave(3, dim=2)  # Specify the number of times to repeat along a specific dimension
print(b.shape)
print(b)
b[0,0,0] += 1
print(b)  # Inplace addition only affects one element; datat really was copied

### Combining tensors
There are two methods for combining multiple tensors into one:
- cat (concatenate) combines tensors of the same along an existing dimension, in which they can potentially have different sizes
- stack combines tensors of the same shape along a new dimension

In [None]:
a = torch.arange(9).reshape(3,3)
b = torch.arange(9,18).reshape(3,3)
c = torch.arange(18,24).reshape(3,2)

In [None]:
torch.cat([a,b,c], dim=1).shape  # combine a,b,c along the last dimension (possible even though c only has 2 elements)

In [None]:
torch.cat([a,b], dim=0).shape  # combine a,b along the first dimension (cannot combine c since it has a different size for the remaining dimensions)

In [None]:
torch.stack([a,b],dim=0).shape, torch.stack([a,b],dim=1).shape  # combine a,b along the new dimensions (cannot combine c since it has a different shape)

## Devices
By default, tensors are created on the cpu, however it is often beneficial to perform operations on gpus. Tensors must either manually transfered to the gpu, or explicitly created on it.
- Calling `.cpu()` or `.cuda()` on a `Tensor` will move it to the specified device. The `to(device)` method will place Tensors on the device.
- Devices can be created by e.g. `torch.device('cpu')`, `torch.device('cuda')`, `torch.device('cuda:0')`.
- Most torch function that create tensors have a `device` argument in which the user can specify that the tensor be created directly on a device, rather than shift it there from cpu

## Retrieving data from tensors
Whilst a lot of other Python packages can extract data from tensors, this isn't always possible. The `item()`, and `numpy()` methods can be used to export data into more portable formats:

In [None]:
a = torch.rand(10)

In [None]:
a.numpy()

In [None]:
a[0].item()  # item only exports single elements

However, sometimes this isn't possible: the tensor must be on the cpu, and cannot have a gradient (more on gradient in later notebooks).
- `.detach()` creates a view of the tensor with no gradient
- the `.data` attribute is the values of the tensor, with no gradient

The most robust way to extract data is to is to call `.detach().cpu()`  or `.data().cpu()` on the tensor

In [None]:
b = torch.ones(1, requires_grad=True)
c = a*b

In [None]:
c.numpy()

In [None]:
%%timeit
a.data.cpu().numpy()

In [None]:
%%timeit
a.detach().cpu().numpy()

### clone
Sometimes it is necessary to copy a tensor. E.g. modifying the return of `a.detach()` would still affect the values of `a` and cause problems.
Instead `a.detach().clone()` will create a new tensor with it's own memory address.
Cloning a non-detached tensor will is differentiable. 

In [None]:
a = torch.rand(10)
print(a)
b = torch.log_(a.detach())
a  # a is updated by the inplace operation

In [None]:
a = torch.rand(10)
print(a)
b = torch.log_(a.detach().clone())
a  # a is not updated by the inplace operation on the clone