Source: https://deeplizard.com/learn/video/Csa5R12jYRg

### What are Tensors?

__Tensors__ in deep learning are $n$-dimensional arrays. (These are not the tensors of differential geometry.)

* The __rank__ of a tensor is the dimension of a tensor. The rank is the number of indices required to access any element of the tensor. 

* The $i$th __axis__ of a tensor is the rank-1 tensor obtained by fixing the other indices of the tensor that is distinct from $i$. 

* The __shape__ of a rank-$d$ tensor is a list $\left(m_0, m_1, \ldots, m_{d-1} \right)$ where $m_i$ is the length of the $i$th axis. The shape of a tensor encodes all relevant information about the axes and the rank of a tensor. 

In [3]:
import torch

t = torch.Tensor([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

In [4]:
type(t)

torch.Tensor

In [5]:
t.shape

torch.Size([3, 3])

In [6]:
t.reshape(1, 9)

tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]])

In [7]:
t.reshape(9, 1)

tensor([[1.],
        [2.],
        [3.],
        [4.],
        [5.],
        [6.],
        [7.],
        [8.],
        [9.]])

In [8]:
t.reshape(9, 1).shape

torch.Size([9, 1])

### Tensor attributes.

In [9]:
t = torch.Tensor() # class constructor

In [10]:
print(t.dtype)
print(t.device)
print(t.layout)

torch.float32
cpu
torch.strided


In [11]:
device = torch.device('cuda:0')
device

device(type='cuda', index=0)

Tensor operations must occur between tensors of the same `dtype`, and exist on the same device. Layout tells us how the tensor is laid out in memory.

### Tensors from NumPy data

In [12]:
import numpy as np

data = np.array([1, 2, 3])
type(data)

numpy.ndarray

In [13]:
torch.Tensor(data) # class constructor

tensor([1., 2., 3.])

In [14]:
torch.tensor(data) # factory function. Note int dtype

tensor([1, 2, 3])

In [15]:
torch.Tensor(data) + torch.tensor(data) # error!

tensor([2., 4., 6.])

In [16]:
torch.as_tensor(data) # same as factory (?)

tensor([1, 2, 3])

In [17]:
torch.from_numpy(data) # same as factory (?)

tensor([1, 2, 3])

The oddball is `torch.Tensor`. The other three _seems_ to work the same.

### Special Tensors (without data)

In [18]:
torch.eye(2)

tensor([[1., 0.],
        [0., 1.]])

In [19]:
torch.zeros(2,2)

tensor([[0., 0.],
        [0., 0.]])

In [20]:
torch.ones(2, 2)

tensor([[1., 1.],
        [1., 1.]])

In [21]:
torch.rand(2, 2) # U[0,1]

tensor([[0.7841, 0.4444],
        [0.7524, 0.1920]])

In [22]:
torch.randn(2, 2) # N(0,1)

tensor([[2.1807, 0.0730],
        [0.2537, 0.6845]])

### Transforming tensors: `.to()`, `.tolist()`, and `.numpy()`

In [23]:
t = torch.tensor([1, 2, 3])

In [24]:
t.tolist()

[1, 2, 3]

In [25]:
t.numpy()

array([1, 2, 3])

In [26]:
t.to(torch.float32)

tensor([1., 2., 3.])

In [27]:
t.to(float)

tensor([1., 2., 3.], dtype=torch.float64)

In [28]:
t.to(int).dtype

torch.int64

### Looking deeper at constructors & factories

Factory functions allow for a more dynamic instantiation of objects. For example, __factory functions__ 
* `tensor`, 
* `as_tensor`, 
* `from_numpy` 

infer the data type of its output from its input whereas the __constructor__ 
* `Tensor` 

uses the global default type which can be accessed by `torch.get_default_dtype()`.

In [29]:
torch.get_default_dtype()

torch.float32

In [30]:
print(torch.tensor([1, 2, 3]))
print(torch.tensor([1., 2., 3.]))
print(torch.tensor([1, 2, 3], dtype=torch.int64))


tensor([1, 2, 3])
tensor([1., 2., 3.])
tensor([1, 2, 3])


In [31]:
data = np.array([1, 2, 3])
t1 = torch.Tensor(data)
t2 = torch.tensor(data)
t3 = torch.as_tensor(data)
t4 = torch.from_numpy(data)

In [32]:
data[0] = 0
data[1] = 0
data[2] = 0

In [33]:
print(t1)
print(t2)

tensor([1., 2., 3.])
tensor([1, 2, 3])


In [34]:
print(t3)
print(t4)

tensor([0, 0, 0])
tensor([0, 0, 0])


The tensors `t3` and `t4` are also modified! It turns out that `torch.Tensor` and `torch.tensor` __copy__ new data (i.e. creates new object in memory). On the other hand, `as_tensor` and `from_numpy` __share__ memory from data. 

`tensor.Tensor`    
 * copy
 * uses global data type
 * constructor

__`tensor.tensor`__†
* copy
* dynamic; infers data type


__`tensor.as_tensor`__†
* shares memory
* Accepts _any_ array-like object as input.

`tensor.from_numpy` 
* shares memory
* Accepts only NumPy arrays.

† We emphasize the factory functions that are better to use generally.

### Flatten, Reshape, Squeeze

Source: https://deeplizard.com/learn/video/fCVuiW9AFzY

We can categorize high-level tensor operations into four categories:
1. Reshaping operations
2. Element-wise operations
3. Reduction operations
4. Access operations

#### Number of elements

In [35]:
t = torch.tensor([
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3],
])

In [36]:
torch.tensor(t.shape).prod() # hacky way to get num. of elements in a tensor

tensor(12)

In [37]:
t.numel() # instead use

12

#### Reshape

In [38]:
t.reshape(-1) # -1 says that reshape method will figure out the value based on attributes of t

tensor([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])

In [39]:
t.reshape(4,-1)

tensor([[1, 1, 1],
        [1, 2, 2],
        [2, 2, 3],
        [3, 3, 3]])

In [40]:
t.reshape(2,2,3)

tensor([[[1, 1, 1],
         [1, 2, 2]],

        [[2, 2, 3],
         [3, 3, 3]]])

0 1 2 3 4 5 6 7 8 9 10 11

0 0 0 0 0 0 1 1 1 1  1  1

0 0 0 1 1 1 0 0 0 1  1  1

0 1 2 0 1 2 0 1 2 0  1  2

#### Squeeze

In [41]:
t.reshape(1, 12) # notice the double brackets

tensor([[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]])

In [42]:
t.reshape(1, 12).squeeze()

tensor([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])

In [43]:
t.reshape(1, 12).squeeze().unsqueeze(dim=0) # (1, 12) -> 12 -> (1, 12)

tensor([[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]])

In [44]:
t.reshape(1, 12).squeeze().unsqueeze(dim=1) # (1, 12) -> 12 -> (12, 1)

tensor([[1],
        [1],
        [1],
        [1],
        [2],
        [2],
        [2],
        [2],
        [3],
        [3],
        [3],
        [3]])

In [45]:
t.unsqueeze(dim=2)

tensor([[[1],
         [1],
         [1],
         [1]],

        [[2],
         [2],
         [2],
         [2]],

        [[3],
         [3],
         [3],
         [3]]])

In [46]:
t.shape, t.unsqueeze(dim=2).shape

(torch.Size([3, 4]), torch.Size([3, 4, 1]))

In [47]:
t.unsqueeze(dim=2).shape

torch.Size([3, 4, 1])

In [48]:
t.unsqueeze(dim=1).shape

torch.Size([3, 1, 4])

In [49]:
t.unsqueeze(dim=1)

tensor([[[1, 1, 1, 1]],

        [[2, 2, 2, 2]],

        [[3, 3, 3, 3]]])

In [50]:
t

tensor([[1, 1, 1, 1],
        [2, 2, 2, 2],
        [3, 3, 3, 3]])

#### Flatten

In [51]:
def flatten(t):
    t = t.reshape(1, -1)
    t = t.squeeze()
    return t

In [52]:
flatten(t)

tensor([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])

In [53]:
t.reshape(-1) # one-liner

tensor([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])

In [54]:
t.flatten() # PyTorch lol

tensor([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])

In [55]:
t.reshape(1,-1)[0]

tensor([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])

In [56]:
t.view(t.numel())

tensor([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])

#### Concat

In [57]:
t1 = torch.tensor([1,2])
t2 = torch.tensor([3,4])

torch.cat((t1, t2), dim=0)

tensor([1, 2, 3, 4])

In [58]:
t1 = t1.unsqueeze(dim=0)
t2 = t2.unsqueeze(dim=0)

In [59]:
torch.cat((t1, t2), dim=0) # dim tells which index are concatenated, here the rows are concatenated

tensor([[1, 2],
        [3, 4]])

In [60]:
torch.cat((t1, t2), dim=1) # dim tells which index are concatenated, here the columns are concatenated

tensor([[1, 2, 3, 4]])

#### Example: Batch image input for CNN

Note that the given the first three indices, the final index iterates over scalars.

Consider the tensor with indices $[B, C, H, W]$ where $B$ is the batch number, $C = 1, 2, 3$ is the index for the color (RGB), $H$, and $W$ are the width and height coordinates of pixels. Thus this tensor is represents a single batch of image inputs to a CNN. For instance a tensor with shape $[3, 1, 28, 28]$ is a batch of three 28$\times$28 grayscale images.

In [61]:
# three grayscale 4x4 images
t1 = torch.ones(4, 4)
t2 = torch.ones(4, 4)*2
t3 = torch.ones(4, 4)*3

In [62]:
t1

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [63]:
t2

tensor([[2., 2., 2., 2.],
        [2., 2., 2., 2.],
        [2., 2., 2., 2.],
        [2., 2., 2., 2.]])

In [64]:
t3

tensor([[3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.]])

In [65]:
batch = torch.stack((t1, t2, t3), dim=0)

In [66]:
batch.shape # creates a new axis which stores all the inputs (different from cat which joins the given axes)

torch.Size([3, 4, 4])

In [67]:
batch

tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[2., 2., 2., 2.],
         [2., 2., 2., 2.],
         [2., 2., 2., 2.],
         [2., 2., 2., 2.]],

        [[3., 3., 3., 3.],
         [3., 3., 3., 3.],
         [3., 3., 3., 3.],
         [3., 3., 3., 3.]]])

In [68]:
batch = batch.reshape(3, 1, 4, 4) 

In [69]:
batch # three grayscale images

tensor([[[[1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.]]],


        [[[2., 2., 2., 2.],
          [2., 2., 2., 2.],
          [2., 2., 2., 2.],
          [2., 2., 2., 2.]]],


        [[[3., 3., 3., 3.],
          [3., 3., 3., 3.],
          [3., 3., 3., 3.],
          [3., 3., 3., 3.]]]])

We flatten the (4, 4) tensor which contains pixel values.

In [70]:
batch = batch.flatten(start_dim=2) # start at the third index
batch

tensor([[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]],

        [[2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.]],

        [[3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.]]])

In [71]:
batch.shape

torch.Size([3, 1, 16])

### Broadcasting in PyTorch. 

This is simply broadcasting in NumPy. 

In [72]:
np.broadcast_to(np.array([1,2]), (3,2)) # returns arr[1,1] broadcasted to become an array of shape (3,2)

array([[1, 2],
       [1, 2],
       [1, 2]])

(2,) -> (1, 2) -> (3, 2) (See rules for broadcasting in the Handbook.)

In [73]:
np.broadcast_to(np.array([[1],[2],[3]]), (3,3))

array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]])

In [74]:
np.array([[1],[2],[3]]) + np.array([[0, 0, 0]])

array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]])

In [75]:
t = torch.tensor([[1,1],
                  [0,1]], dtype=torch.int64)

In [76]:
t+1

tensor([[2, 2],
        [1, 2]])

In [77]:
t*2

tensor([[2, 2],
        [0, 2]])

In [78]:
t.gt(0)

tensor([[ True,  True],
        [False,  True]])

What actually happens is as follows:

In [79]:
t > torch.tensor(np.broadcast_to(0, t.shape), # zero turned to an array of shape t.shape
                 dtype=torch.int64)

tensor([[ True,  True],
        [False,  True]])

(1,) -> (1, 1) -> (2, 2)

### Reduction operations

Reduction operations allow operations within the elements of a single tensor.

In [81]:
import torch
import numpy as np


In [82]:
t = torch.tensor([
    [0,1,0],
    [2,0,2],
    [0,3,0]
], dtype=torch.float32)

In [83]:
t.sum()

tensor(8.)

In [84]:
t.numel()

9

In [85]:
t.sum().numel()

1

In [87]:
t.sum().numel() < t.numel()

True

In [88]:
t.sum(), t.prod(), t.mean(), t.std()

(tensor(8.), tensor(0.), tensor(0.8889), tensor(1.1667))

#### Reducion with dim

In [89]:
t = torch.tensor([
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3]
], dtype=torch.float32)

In [92]:
t.shape

torch.Size([3, 4])

In [90]:
t.sum(dim=0)

tensor([6., 6., 6., 6.])

In [91]:
t.sum(dim=1)

tensor([ 4.,  8., 12.])

In [93]:
t.sum(dim=0)

tensor([6., 6., 6., 6.])

In [94]:
t[0], t[1], t[2]

(tensor([1., 1., 1., 1.]), tensor([2., 2., 2., 2.]), tensor([3., 3., 3., 3.]))

In [95]:
t[0] + t[1] + t[2]

tensor([6., 6., 6., 6.])

In [96]:
t[0].sum(), t[1].sum(), t[2].sum()

(tensor(4.), tensor(8.), tensor(12.))

In [97]:
t.sum(dim=1)

tensor([ 4.,  8., 12.])

Argmax

In [103]:
t = torch.tensor([
    [1,0,0,2],
    [0,3,3,0],
    [4,0,0,5]
], dtype=torch.float32)

In [104]:
t.max()

tensor(5.)

In [105]:
t.argmax()

tensor(11)

In [106]:
t.flatten()

tensor([1., 0., 0., 2., 0., 3., 3., 0., 4., 0., 0., 5.])

In [107]:
t.max(dim=0)

torch.return_types.max(
values=tensor([4., 3., 3., 5.]),
indices=tensor([2, 1, 1, 2]))

In [108]:
t.argmax(dim=0)

tensor([2, 1, 1, 2])

In [109]:
t.max(dim=1)

torch.return_types.max(
values=tensor([2., 3., 5.]),
indices=tensor([3, 1, 3]))

In [110]:
t.argmax(dim=1)

tensor([3, 1, 3])

mean

In [111]:
t = torch.tensor([
    [1,2,3],
    [4,5,6],
    [7,8,9]
], dtype=torch.float32)

In [112]:
t.mean()

tensor(5.)

In [113]:
t.mean().item()

5.0

In [114]:
t.mean(dim=0).tolist()

[4.0, 5.0, 6.0]

In [115]:
t.mean(dim=0).numpy()

array([4., 5., 6.], dtype=float32)

In [80]:
t = torch.tensor([1, 2, 3])

In [80]:
t.sum()

tensor(6)

In [81]:
t.prod()

tensor(6)

To get value use `.item()`:

In [116]:
t.prod().item()

362880.0

Note that these operations return a tensor.

In [117]:
t.to(torch.float32) # or t.float()

tensor([[1., 2., 3.],
        [4., 5., 6.],
        [7., 8., 9.]])

In [118]:
t = t.to(torch.float32)

In [119]:
t.mean()

tensor(5.)

In [120]:
t.std()

tensor(2.7386)

In [121]:
t = torch.tensor([[1,2,3], [4,5,6]], dtype=torch.float32)
t

tensor([[1., 2., 3.],
        [4., 5., 6.]])

In [122]:
print(t.std(dim=0))
print(t.std(dim=1))

tensor([2.1213, 2.1213, 2.1213])
tensor([1., 1.])


In [123]:
t

tensor([[1., 2., 3.],
        [4., 5., 6.]])

In [124]:
print(t.sum(dim=0))
print(t.sum(dim=1))

tensor([5., 7., 9.])
tensor([ 6., 15.])


The specification of `dim=k` can be thought of as all elements which differ only on the `k` axis are aggregated. 

### Argmax

In [125]:
t = torch.tensor([1, 2, 3])
t.argmax()

tensor(2)

In [126]:
t = torch.rand(3,3)
t

tensor([[0.0618, 0.9840, 0.8646],
        [0.8293, 0.9310, 0.6668],
        [0.9801, 0.0957, 0.6880]])

In [127]:
t.argmax() # index of the flattened tensor!

tensor(1)

In [128]:
t.argmax(dim=0)

tensor([2, 0, 0])

In [129]:
t.max(dim=0) # ohhh contains also argmax in it

torch.return_types.max(
values=tensor([0.9801, 0.9840, 0.8646]),
indices=tensor([2, 0, 0]))

In [130]:
(t.max(dim=0)[1] == t.argmax(dim=0)).all()

tensor(True)