Source: https://deeplizard.com/learn/video/Csa5R12jYRg

### What are Tensors?

__Tensors__ in deep learning are $n$-dimensional arrays. (These are not the tensors of differential geometry.)

* The __rank__ of a tensor is the dimension of a tensor. The rank is the number of indices required to access any element of the tensor. 

* The $i$th __axis__ of a tensor is the rank-1 tensor obtained by fixing the other indices of the tensor that is distinct from $i$. 

* The __shape__ of a rank-$d$ tensor is a list $\left(m_0, m_1, \ldots, m_{d-1} \right)$ where $m_i$ is the length of the $i$th axis. The shape of a tensor encodes all relevant information about the axes and the rank of a tensor. 

In [10]:
import torch

t = torch.Tensor([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])

In [11]:
type(t)

torch.Tensor

In [12]:
t.shape

torch.Size([3, 3])

In [13]:
t.reshape(1, 9)

tensor([[1., 2., 3., 4., 5., 6., 7., 8., 9.]])

In [6]:
t.reshape(9, 1)

tensor([[1],
        [2],
        [3],
        [4],
        [5],
        [6],
        [7],
        [8],
        [9]])

In [7]:
t.reshape(9, 1).shape

torch.Size([9, 1])

### Tensor attributes.

In [199]:
t = torch.Tensor() # class constructor

In [15]:
print(t.dtype)
print(t.device)
print(t.layout)

torch.float32
cpu
torch.strided


In [18]:
device = torch.device('cuda:0')
device

device(type='cuda', index=0)

Tensor operations must occur between tensors of the same `dtype`, and exist on the same device. Layout tells us how the tensor is laid out in memory.

### Tensors from NumPy data

In [28]:
import numpy as np

data = np.array([1, 2, 3])
type(data)

numpy.ndarray

In [29]:
torch.Tensor(data) # class constructor

tensor([1., 2., 3.])

In [34]:
torch.tensor(data) # factory function. Note int dtype

tensor([1, 2, 3])

In [35]:
torch.Tensor(data) + torch.tensor(data) # error!

RuntimeError: expected device cpu and dtype Float but got device cpu and dtype Long

In [36]:
torch.as_tensor(data) # same as factory (?)

tensor([1, 2, 3])

In [38]:
torch.from_numpy(data) # same as factory (?)

tensor([1, 2, 3])

The oddball is `torch.Tensor`. The other three _seems_ to work the same.

### Special Tensors (without data)

In [39]:
torch.eye(2)

tensor([[1., 0.],
        [0., 1.]])

In [41]:
torch.zeros(2,2)

tensor([[0., 0.],
        [0., 0.]])

In [44]:
torch.ones(2, 2)

tensor([[1., 1.],
        [1., 1.]])

In [43]:
torch.rand(2, 2) # U[0,1]

tensor([[0.5380, 0.5181],
        [0.9409, 0.4689]])

In [46]:
torch.randn(2, 2) # N(0,1)

tensor([[-0.2431, -0.6834],
        [ 0.2748,  0.6107]])

### Transforming tensors: `.to()`, `.tolist()`, and `.numpy()`

In [250]:
t = torch.tensor([1, 2, 3])

In [251]:
t.tolist()

[1, 2, 3]

In [252]:
t.numpy()

array([1, 2, 3])

In [253]:
t.to(torch.float32)

tensor([1., 2., 3.])

In [254]:
t.to(float)

tensor([1., 2., 3.], dtype=torch.float64)

In [257]:
t.to(int).dtype

torch.int64

### Looking deeper at constructors & factories

Factory functions allow for a more dynamic instantiation of objects. For example, __factory functions__ 
* `tensor`, 
* `as_tensor`, 
* `from_numpy` 

infer the data type of its output from its input whereas the __constructor__ 
* `Tensor` 

uses the global default type which can be accessed by `torch.get_default_dtype()`.

In [47]:
torch.get_default_dtype()

torch.float32

In [48]:
print(torch.tensor([1, 2, 3]))
print(torch.tensor([1., 2., 3.]))
print(torch.tensor([1, 2, 3], dtype=torch.int64))


tensor([1, 2, 3])
tensor([1., 2., 3.])
tensor([1, 2, 3])


In [49]:
data = np.array([1, 2, 3])
t1 = torch.Tensor(data)
t2 = torch.tensor(data)
t3 = torch.as_tensor(data)
t4 = torch.from_numpy(data)

In [50]:
data[0] = 0
data[1] = 0
data[2] = 0

In [51]:
print(t1)
print(t2)

tensor([1., 2., 3.])
tensor([1, 2, 3])


In [52]:
print(t3)
print(t4)

tensor([0, 0, 0])
tensor([0, 0, 0])


The tensors `t3` and `t4` are also modified! It turns out that `torch.Tensor` and `torch.tensor` __copy__ new data (i.e. creates new object in memory). On the other hand, `as_tensor` and `from_numpy` __share__ memory from data. 

`tensor.Tensor`    
 * copy
 * uses global data type

__`tensor.tensor`__†
* copy
* dynamic; infers data type


__`tensor.as_tensor`__†
* shares memory
* Accepts _any_ array-like object as input.

`tensor.from_numpy` 
* shares memory
* Accepts only NumPy arrays.

† We emphasize the factory functions that are better to use generally.

### Flatten, Reshape, Squeeze

Source: https://deeplizard.com/learn/video/fCVuiW9AFzY

We can categorize high-level tensor operations into four categories:
1. Reshaping operations
2. Element-wise operations
3. Reduction operations
4. Access operations

#### Number of elements

In [73]:
t = torch.tensor([
    [1,1,1,1],
    [2,2,2,2],
    [3,3,3,3],
])

In [74]:
torch.tensor(t.shape).prod() # hacky way to get num. of elements in a tensor

tensor(12)

In [75]:
t.numel() # instead use

12

#### Reshape

In [76]:
t.reshape(-1) # -1 says that reshape method will figure out the value based on attributes of t

tensor([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])

In [79]:
t.reshape(4,-1)

tensor([[1, 1, 1],
        [1, 2, 2],
        [2, 2, 3],
        [3, 3, 3]])

In [80]:
t.reshape(2,2,3)

tensor([[[1, 1, 1],
         [1, 2, 2]],

        [[2, 2, 3],
         [3, 3, 3]]])

0 1 2 3 4 5 6 7 8 9 10 11

0 0 0 0 0 0 1 1 1 1  1  1

0 0 0 1 1 1 0 0 0 1  1  1

0 1 2 0 1 2 0 1 2 0  1  2

#### Squeeze

In [81]:
t.reshape(1, 12) # notice the double brackets

tensor([[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]])

In [82]:
t.reshape(1, 12).squeeze()

tensor([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])

In [83]:
t.reshape(1, 12).squeeze().unsqueeze(dim=0) # (1, 12) -> 12 -> (1, 12)

tensor([[1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]])

In [87]:
t.reshape(1, 12).squeeze().unsqueeze(dim=1) # (1, 12) -> 12 -> (12, 1)

tensor([[1],
        [1],
        [1],
        [1],
        [2],
        [2],
        [2],
        [2],
        [3],
        [3],
        [3],
        [3]])

In [89]:
t.unsqueeze(dim=2)

tensor([[[1],
         [1],
         [1],
         [1]],

        [[2],
         [2],
         [2],
         [2]],

        [[3],
         [3],
         [3],
         [3]]])

In [90]:
t.shape

torch.Size([3, 4])

In [92]:
t.unsqueeze(dim=2).shape

torch.Size([3, 4, 1])

In [93]:
t.unsqueeze(dim=1).shape

torch.Size([3, 1, 4])

In [94]:
t.unsqueeze(dim=1)

tensor([[[1, 1, 1, 1]],

        [[2, 2, 2, 2]],

        [[3, 3, 3, 3]]])

In [95]:
t

tensor([[1, 1, 1, 1],
        [2, 2, 2, 2],
        [3, 3, 3, 3]])

#### Flatten

In [96]:
def flatten(t):
    t = t.reshape(1, -1)
    t = t.squeeze()
    return t

In [97]:
flatten(t)

tensor([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])

In [99]:
t.reshape(-1) # one-liner

tensor([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])

In [132]:
t.flatten() # PyTorch lol

tensor([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3])

#### Concat

In [120]:
t1 = torch.tensor([1,2])
t2 = torch.tensor([3,4])

torch.cat((t1, t2), dim=0)

tensor([1, 2, 3, 4])

In [121]:
t1 = t1.unsqueeze(dim=0)
t2 = t2.unsqueeze(dim=0)

In [122]:
torch.cat((t1, t2), dim=0) # dim tells which index are concatenated, here the rows are concatenated

tensor([[1, 2],
        [3, 4]])

In [123]:
torch.cat((t1, t2), dim=1) # dim tells which index are concatenated, here the columns are concatenated

tensor([[1, 2, 3, 4]])

#### Example: Batch image input for CNN

Note that the given the first three indices, the final index iterates over scalars.

Consider the tensor with indices $[B, C, H, W]$ where $B$ is the batch number, $C = 1, 2, 3$ is the index for the color (RGB), $H$, and $W$ are the width and height coordinates of pixels. Thus this tensor is represents a single batch of image inputs to a CNN. For instance a tensor with shape $[3, 1, 28, 28]$ is a batch of three 28$\times$28 grayscale images.

In [147]:
# three grayscale 4x4 images
t1 = torch.ones(4, 4)
t2 = torch.ones(4, 4)*2
t3 = torch.ones(4, 4)*3

In [153]:
t1

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [149]:
t2

tensor([[2., 2., 2., 2.],
        [2., 2., 2., 2.],
        [2., 2., 2., 2.],
        [2., 2., 2., 2.]])

In [150]:
t3

tensor([[3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.]])

In [157]:
batch = torch.stack((t1, t2, t3), dim=0)

In [158]:
batch.shape # creates a new axis which stores all the inputs (different from cat which joins the given axes)

torch.Size([3, 4, 4])

In [159]:
batch

tensor([[[1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.],
         [1., 1., 1., 1.]],

        [[2., 2., 2., 2.],
         [2., 2., 2., 2.],
         [2., 2., 2., 2.],
         [2., 2., 2., 2.]],

        [[3., 3., 3., 3.],
         [3., 3., 3., 3.],
         [3., 3., 3., 3.],
         [3., 3., 3., 3.]]])

In [160]:
batch = batch.reshape(3, 1, 4, 4) 

In [161]:
batch # three grayscale images

tensor([[[[1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.],
          [1., 1., 1., 1.]]],


        [[[2., 2., 2., 2.],
          [2., 2., 2., 2.],
          [2., 2., 2., 2.],
          [2., 2., 2., 2.]]],


        [[[3., 3., 3., 3.],
          [3., 3., 3., 3.],
          [3., 3., 3., 3.],
          [3., 3., 3., 3.]]]])

We flatten the (4, 4) tensor which contains pixel values.

In [165]:
batch = batch.flatten(start_dim=2) # start at the third index
batch

tensor([[[1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]],

        [[2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2., 2.]],

        [[3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3., 3.]]])

### Broadcasting in PyTorch. 

This is simply broadcasting in NumPy. 

In [189]:
np.broadcast_to(np.array([1,2]), (3,2)) # returns arr[1,1] broadcasted to become an array of shape (3,2)

array([[1, 2],
       [1, 2],
       [1, 2]])

(2,) -> (1, 2) -> (3, 2) (See rules for broadcasting in the Handbook.)

In [197]:
np.broadcast_to(np.array([[1],[2],[3]]), (3,3))

array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]])

In [198]:
np.array([[1],[2],[3]]) + np.array([[0, 0, 0]])

array([[1, 1, 1],
       [2, 2, 2],
       [3, 3, 3]])

In [181]:
t = torch.tensor([[1,1],
                  [0,1]], dtype=torch.int64)

In [188]:
t+1

tensor([[2, 2],
        [1, 2]])

In [187]:
t*2

tensor([[2, 2],
        [0, 2]])

In [182]:
t.gt(0)

tensor([[ True,  True],
        [False,  True]])

What actually happens is as follows:

In [185]:
t > torch.tensor(np.broadcast_to(0, t.shape), # zero turned to an array of shape t.shape
                 dtype=torch.int64)

tensor([[ True,  True],
        [False,  True]])

(1,) -> (1, 1) -> (2, 2)

### Reduction operations

Reduction operations allow operations within the elements of a single tensor.

In [200]:
t = torch.tensor([1, 2, 3])

In [201]:
t.sum()

tensor(6)

In [202]:
t.prod()

tensor(6)

To get value use `.item()`:

In [258]:
t.prod().item()

6

Note that these operations return a tensor.

In [209]:
t.to(torch.float32) # or t.float()

tensor([1., 2., 3.])

In [211]:
t = t.to(torch.float32)

In [212]:
t.mean()

tensor(2.)

In [213]:
t.std()

tensor(1.)

In [234]:
t = torch.tensor([[1,2,3], [4,5,6]], dtype=torch.float32)
t

tensor([[1., 2., 3.],
        [4., 5., 6.]])

In [226]:
print(t.std(dim=0))
print(t.std(dim=1))

tensor([2.1213, 2.1213, 2.1213])
tensor([1., 1.])


In [235]:
t

tensor([[1., 2., 3.],
        [4., 5., 6.]])

In [225]:
print(t.sum(dim=0))
print(t.sum(dim=1))

tensor([5., 7., 9.])
tensor([ 6., 15.])


The specification of `dim=k` can be thought of as all elements which differ only on the `k` axis are aggregated. 

### Argmax

In [237]:
t = torch.tensor([1, 2, 3])
t.argmax()

tensor(2)

In [238]:
t = torch.rand(3,3)
t

tensor([[0.0655, 0.0405, 0.5059],
        [0.4549, 0.7206, 0.9108],
        [0.4145, 0.2088, 0.3007]])

In [239]:
t.argmax() # index of the flattened tensor!

tensor(5)

In [240]:
t.argmax(dim=0)

tensor([1, 1, 1])

In [241]:
t.max(dim=0) # ohhh contains also argmax in it

torch.return_types.max(
values=tensor([0.4549, 0.7206, 0.9108]),
indices=tensor([1, 1, 1]))

In [247]:
(t.max(dim=0)[1] == t.argmax(dim=0)).all()

tensor(True)