# Tensors

## Two definitions

Tensors have both a (strict) mathematical definition and a (loose) machine learning definition.

I have heard it said that because ML is a child of many disciplines it takes words from all of them. Those disciplines often have different words for the same concepts so ML inherits that messiness.

### Physics definition

> A rank-n tensor in m-dimensions is a mathematical object that has n indices
> and $m^n$ components and obeys certain transformation rules.

Mathematically vectors & matrices and tensors are not the same thing but I think from ML point of view they basically are other names for rank 0, 1, 2 tensors.

- The thing that makes a tensor a tensor is how it transforms.
- a tensor is a number or collection of numbers which **maintains its meaning under transformations**
- Physics seems to mostly deal with 3D (xyz) and 4D (xyz + time) tensors

Velocity is a rank-1 tensor describing the motion of an object. The **meaning** of this tensor does not change even if you move the coordinate system around. The numbers in the tensor do change but the meaning does not.

Aside: angular momentum is a pseudovector not a real vector because it cares about where you put your coordinate system - it works if your coordinate system 0 is at the centre of the rotation but doesn't make sense if you put it elsewhere

If a real vector is zero in one set of coordinates it must be 0 in all of them - no exceptions.

#### rank

- Rank is the amount of info you need to find a single component within the tensor.
- A 4x4 matrix is rank=2 because you only need an x and y to index a given component but shape = `[4,4]`
- matrix notation is just a crutch to make you more comfy when learning about tensors
- matrices are a convenient way to display any rank-2 tensor
  - rank-0 => you need 0 index information to find a component
  - rank-1 => you need 1 index information to find a component
- Traditionally latin letters used for rank 2 and 3 dimensions (e.g. $T_{ij}$) and greek letters used for rank for 4 dimensions (e.g. $T_{\alpha\beta}$ or $T_{\alpha\beta\mu}$). This allows you to guess at the dimensions without seeing them explicitly stated.

### Machine learning

- A ML generalisation of vectors and matrices to any number of dimensions
- think of tensors as nested groups rather than trying to visualise as coordinates - it helps make sense of higher dimensions
- primary data structure used in neural networks
- Rank matters but when defining tensors you'll use dimensions
- tensor = math generalisation of these concepts:
  - comp-sci words
    - number (0 indexes required)
    - array (1 indexes required)
    - 2d-array (2 indexes required)
    - nd-array (n indexes required)
  - math words
    - scalar (0 indexes required)
    - vector (1 indexes required)
    - matrix (2 indexes required)
    - nd-tensor (n indexes required)

The n tells us how many indexes we need to index a specific element

```bash
# 500x500 image with 3 channels (RGB)
(500,500,3) # rank 3 tensor (3 numbers required to find an element)

# a simple video with 10 frames, each frame being one of the above images
(10,500,500,3) # rank 4 tensor

# a collection of 34 such videos
(34,10,500,500,3) # rank 5 tensor
```

- Rank is number of axes (aka dimensions) present in the tensor
- Each axis has a length with is the number of elements the tensor has along that axis
- Axes make visual sense up to rank 3 but you have to give up visualising beyond that.
- An axis is a dimension (two words for same thing)
- The length of the axis tells us how many elements are along each axis
- The shape of a tensor is the size of tensor and is expressed as an array e.g. `[3, 3]` is the size of a 3x3
- Tensor **shape** is also called tensor **size** (pytorch calls it size and has a `torch.Size()` type to hold it.

The rank is the length of the shape

ML uses a convention to use $X$ to refer to the "design matrix" and assume the structure

$$X \in \mathbb{R}^{n\times m}$$

where

* n = number of training examples
* m = number of features

In the notation of each element, the subscript is the feature index and the superscript is the training example index

In example of N traing images, each with 3 channels

We typically use the convention: NCHW

N = num training examples
C = num color channels
H = image height
W = image width

CHW = a feature

?? Why is color first not last?

## Tensors in pytorch

- Tensors are immutable. You have to re-assign the tensor to get a new copy
- Presumably under the hood the library tries to be as memory efficient as possible

?? can you mix and match tensors stored in CPU memory and GPU memory in a single model?

In [58]:
import torch

example_array = [[1,2,3], [4,5,6], [7,8,9]]
t = torch.tensor(example_array)
t

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

Tensors have a shape which is an array giving the length of each axis of the tensor

In [59]:
# get the shape of the tensor
t.shape

torch.Size([3, 3])

In [60]:
derived_rank = len(t.shape)
print(f"This tensor has rank {derived_rank} which we derived from the length of its shape")

# you can also use ndim to get this
len(t.shape) == t.ndim

This tensor has rank 2 which we derived from the length of its shape


True

In [61]:

t.reshape(1,9)

tensor([[1, 2, 3, 4, 5, 6, 7, 8, 9]])

In [62]:

t.reshape(1,9).shape

torch.Size([1, 9])

In [63]:
reshaped_t = t.reshape(1,9)
t.reshape(3,3)

tensor([[1, 2, 3],
        [4, 5, 6],
        [7, 8, 9]])

In [64]:
# 1. creates a float vs an int in python

t1 = torch.tensor([1,3,5])
t2 = torch.tensor([1.,3.,5.])

[t1.type(), t2.type()]

['torch.LongTensor', 'torch.FloatTensor']

In [65]:
t3 = torch.rand(3,4,5, dtype=torch.float32)
t3

tensor([[[2.0385e-02, 6.3443e-01, 1.9392e-01, 8.0916e-01, 1.5694e-01],
         [6.8071e-01, 6.7303e-01, 1.0751e-01, 3.5412e-02, 6.3112e-01],
         [3.9185e-01, 7.2436e-01, 6.6475e-01, 5.6696e-01, 9.3499e-01],
         [1.3914e-01, 7.4445e-01, 6.2362e-01, 5.0541e-03, 8.9574e-01]],

        [[6.9122e-02, 6.6526e-01, 4.5589e-02, 2.8011e-01, 3.7344e-01],
         [5.3301e-01, 5.4943e-01, 4.2847e-01, 7.7138e-01, 4.9886e-01],
         [9.6639e-01, 8.8726e-01, 8.9064e-01, 6.4565e-01, 5.6689e-01],
         [9.3325e-01, 5.6758e-01, 6.8728e-01, 7.7214e-01, 3.8831e-01]],

        [[5.2126e-01, 8.4500e-01, 1.1187e-01, 8.9798e-01, 5.4011e-01],
         [3.6881e-01, 2.7149e-01, 3.1362e-01, 3.0655e-01, 6.8758e-01],
         [2.1923e-01, 8.7920e-01, 3.0118e-04, 5.5278e-01, 9.7220e-02],
         [7.9041e-02, 2.4013e-01, 6.0609e-01, 1.9326e-01, 8.9706e-02]]])

In [66]:
# get num elements in a tensor
t3.numel()

60

### Reshape
We can reshape the tensor to any other size where the number of elements is the same.

You can get this by multiplying all the dimensions e.g.  3 * 4 * 5 = 60

`reshape` can take a `-1` argument which tells it to use the dimension of the input tensor)

In [67]:

t3.reshape(6,10)
t3.reshape(60, 1)
t3.reshape(1, 60)
t3.reshape(5, 2, 6)
t3.reshape(3,4,5)
t3.reshape(4,3,-1) # note the magical -1 arg meaning "use input tensor's dimension"

tensor([[[2.0385e-02, 6.3443e-01, 1.9392e-01, 8.0916e-01, 1.5694e-01],
         [6.8071e-01, 6.7303e-01, 1.0751e-01, 3.5412e-02, 6.3112e-01],
         [3.9185e-01, 7.2436e-01, 6.6475e-01, 5.6696e-01, 9.3499e-01]],

        [[1.3914e-01, 7.4445e-01, 6.2362e-01, 5.0541e-03, 8.9574e-01],
         [6.9122e-02, 6.6526e-01, 4.5589e-02, 2.8011e-01, 3.7344e-01],
         [5.3301e-01, 5.4943e-01, 4.2847e-01, 7.7138e-01, 4.9886e-01]],

        [[9.6639e-01, 8.8726e-01, 8.9064e-01, 6.4565e-01, 5.6689e-01],
         [9.3325e-01, 5.6758e-01, 6.8728e-01, 7.7214e-01, 3.8831e-01],
         [5.2126e-01, 8.4500e-01, 1.1187e-01, 8.9798e-01, 5.4011e-01]],

        [[3.6881e-01, 2.7149e-01, 3.1362e-01, 3.0655e-01, 6.8758e-01],
         [2.1923e-01, 8.7920e-01, 3.0118e-04, 5.5278e-01, 9.7220e-02],
         [7.9041e-02, 2.4013e-01, 6.0609e-01, 1.9326e-01, 8.9706e-02]]])

### Squeeze

Squeezing a tensor removes all dimensions that have a length of 1

In [68]:
t3.reshape(1,60).squeeze()

tensor([2.0385e-02, 6.3443e-01, 1.9392e-01, 8.0916e-01, 1.5694e-01, 6.8071e-01,
        6.7303e-01, 1.0751e-01, 3.5412e-02, 6.3112e-01, 3.9185e-01, 7.2436e-01,
        6.6475e-01, 5.6696e-01, 9.3499e-01, 1.3914e-01, 7.4445e-01, 6.2362e-01,
        5.0541e-03, 8.9574e-01, 6.9122e-02, 6.6526e-01, 4.5589e-02, 2.8011e-01,
        3.7344e-01, 5.3301e-01, 5.4943e-01, 4.2847e-01, 7.7138e-01, 4.9886e-01,
        9.6639e-01, 8.8726e-01, 8.9064e-01, 6.4565e-01, 5.6689e-01, 9.3325e-01,
        5.6758e-01, 6.8728e-01, 7.7214e-01, 3.8831e-01, 5.2126e-01, 8.4500e-01,
        1.1187e-01, 8.9798e-01, 5.4011e-01, 3.6881e-01, 2.7149e-01, 3.1362e-01,
        3.0655e-01, 6.8758e-01, 2.1923e-01, 8.7920e-01, 3.0118e-04, 5.5278e-01,
        9.7220e-02, 7.9041e-02, 2.4013e-01, 6.0609e-01, 1.9326e-01, 8.9706e-02])

### Unsqueeze

Unsqueezing a tensor adds a dimension with a length of 1. It takes a `dim` argument which ???

In [69]:
t3.reshape(1,60).unsqueeze(dim=0)

tensor([[[2.0385e-02, 6.3443e-01, 1.9392e-01, 8.0916e-01, 1.5694e-01,
          6.8071e-01, 6.7303e-01, 1.0751e-01, 3.5412e-02, 6.3112e-01,
          3.9185e-01, 7.2436e-01, 6.6475e-01, 5.6696e-01, 9.3499e-01,
          1.3914e-01, 7.4445e-01, 6.2362e-01, 5.0541e-03, 8.9574e-01,
          6.9122e-02, 6.6526e-01, 4.5589e-02, 2.8011e-01, 3.7344e-01,
          5.3301e-01, 5.4943e-01, 4.2847e-01, 7.7138e-01, 4.9886e-01,
          9.6639e-01, 8.8726e-01, 8.9064e-01, 6.4565e-01, 5.6689e-01,
          9.3325e-01, 5.6758e-01, 6.8728e-01, 7.7214e-01, 3.8831e-01,
          5.2126e-01, 8.4500e-01, 1.1187e-01, 8.9798e-01, 5.4011e-01,
          3.6881e-01, 2.7149e-01, 3.1362e-01, 3.0655e-01, 6.8758e-01,
          2.1923e-01, 8.7920e-01, 3.0118e-04, 5.5278e-01, 9.7220e-02,
          7.9041e-02, 2.4013e-01, 6.0609e-01, 1.9326e-01, 8.9706e-02]]])

### Flatten

Flatten the tensor to a 1d array that contains all the scalar components of .

Common in a CNN when you connect a convolutional layer to a fully connected layer

In [70]:
def flatten(t):
    t = t.reshape(1, -1)
    t = t.squeeze()
    return t

### Create a tensor

In [79]:
import numpy as np

data = np.array([1,2,3])

# IMPORTANT:
# uses default dtype of torch.float32 no matter the type of the data
# this one will make copy of the argument data
made_by_class_constructor = torch.Tensor(data)  

# all these factory functions have better docs and more configurable than constructor
# they also infer the dtype from the data
# this one will make copy of the argument data
made_by_tensor = torch.tensor(data) 

# these factory functions will share data with `data` so you can move between
# numpy array and pytorch tensors without array copying but be careful!

# as_tensor will accept data structures other than a numpy array unlike from_numpy which is sticter
made_by_as_tensor = torch.as_tensor(data) 
made_by_from_numpy = torch.from_numpy(data) 

print(torch.get_default_dtype())
print(made_by_class_constructor.dtype)
print("")
print(made_by_tensor.dtype)
print(made_by_as_tensor.dtype)
print(made_by_from_numpy.dtype)

print("")

# tensor factory func infers dtype from the incoming data
assert torch.tensor(np.array([1.,2.,3.])).dtype == torch.float64, "Unexpected dtype"
print(torch.tensor(np.array([1.,2.,3.]), dtype=torch.int32).dtype)


# recommendation is to use tensor.tensor() as a default
# use tensor.as_tensor() if you explicitly want data sharing (it's more flexibly than tensor.from_numpy())


torch.float32
torch.float32

torch.int64
torch.int64
torch.int64

torch.int32
