# 1 Intro into torch

## 1.1 Tensors

In [5]:
import torch
import numpy as np

Let's start with the basics. We create some dummy dataset: two observations $n$, every observation has three features $m$. We organize that as a dataset with dimensions $(n,m)$, so that is $(2,3)$

In [6]:
data = [
    [1, 2, 3],
    [10, 20, 30]
]

Our datatype here is `List[int]`, and PyTorch uses a `torch.Tensor` datatype.

In [7]:
X = torch.tensor(data)
type(X)

torch.Tensor

We can retrieve the shape

In [8]:
X.shape

torch.Size([2, 3])

And the type of the data inside the tensor:

In [9]:
X.dtype

torch.int64

Or the amount of observations:

In [10]:
len(X)

2

We can also start with a `numpy.array`

In [11]:
npdata = np.array(
    data,
    dtype = np.float32
)

Note we changed the dataformat to `np.float32`

In [12]:
X2 = torch.from_numpy(npdata)
X2

tensor([[ 1.,  2.,  3.],
        [10., 20., 30.]])

In [13]:
X2.dtype

torch.float32

## 1.2 Usefull functions for creating tensors

We can easily create a stand in tensor, with the same shape as our data:

In [14]:
ones = torch.ones_like(X2)
ones

tensor([[1., 1., 1.],
        [1., 1., 1.]])

Or random weights. These are uniform distributed positive numbers between 0 and 1

In [15]:
X3 = torch.rand(2,3)
X3

tensor([[0.7940, 0.6244, 0.0636],
        [0.4848, 0.1870, 0.7221]])

If we want normally distributed numbers, we need to specify mean and standard deviation:

In [16]:
X4 = torch.normal(mean=0.0, std=0.1, size=(2,3))
X4

tensor([[ 0.2193,  0.1203,  0.1384],
        [-0.0659, -0.0041,  0.0322]])

If your laptop or server has a GPU, PyTorch can run the calculations on the GPU. You can check if the GPU can be found by PyTorch with:

In [17]:
torch.cuda.is_available()

False

And you can set the tensor to the GPU device with `.to()`. Default is `"cpu"`

In [18]:
if torch.cuda.is_available():
    tensor = X3.to("cuda")
else:
    print("cuda not found")
X3.device

cuda not found


device(type='cpu')

For people with a macbook with an `mps` backend, there is mps acceleration available.

In [19]:
if torch.backends.mps.is_available() and torch.backends.mps.is_built():
    device = torch.device("mps")
else:
    device = "cpu"
print(f"Using device {device}")
tensor = X3.to(device)
tensor

Using device cpu


tensor([[0.7940, 0.6244, 0.0636],
        [0.4848, 0.1870, 0.7221]])

Please note that using accelaration with cuda or mps is not always faster!
Reasons why this can be slower are:
- Memory transer: data needs to be transfered from cpu to gpu. This can be a bottleneck.
- Parallel processing limits: some architectures (especially the RNNs we will learn about in lesson 3) cant be parallelized. 
- Synchronisation overhead: running things in parallel also takes some overhead to synchronise the calculations, like waiting things to finish, merging them back together, etc.

This will especially be true for the simplere models and datasets we are using in the contexts of our lessons.

Other usefull tricks are to create an array of ones. Can you figure out how to create an array of zeros for yourself?

In [20]:
ones = torch.ones(1, 10)
ones

tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

Tensors can be concatenated. We need to specify the dimension along which the concatenation is done:

In [21]:
torch.cat([ones, ones, ones], dim=0)

tensor([[1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]])

## 1.3 Manipulation of tensors

The basis of most machine learning functions is the linear function. We can easily scale this by using matrix multiplication. Let's say we start with some random data, 32 observations with 10 features.

In [22]:
X = torch.rand(32, 10)

Now, if we want a linear map that transforms these 10 features into 2 dimensions, we can do that with a set of weights with dimensions $(10,2)$

In [23]:
W = torch.rand(10, 2)

In [24]:
yhat = X @ W
yhat.shape

torch.Size([32, 2])

Equivalent is this syntax:

In [25]:
yhat = torch.matmul(X, W)
yhat.shape

torch.Size([32, 2])

Torch will scale this up if you have more dimensions:

In [26]:
X = torch.rand(32, 10, 16)
W = torch.rand(16, 2)
yhat = X @ W
yhat.shape

torch.Size([32, 10, 2])

And finally, we can aggregate the tensor along the two features by taking the mean over the last dimension.

In [27]:
aggregate = yhat.mean(dim=-1)
aggregate.shape

torch.Size([32, 10])

Try for yourself to calculate the sum

## 1.4 GPU or CPU

Tensors live in the CPU or GPU:

In [28]:
X.device

device(type='cpu')

You can check if you have a GPU available:

In [29]:
torch.cuda.is_available()

False

Or a mac with M1

In [30]:
torch.backends.mps.is_available()

False

And move a tensor to the GPU for faster computing, if available

In [31]:
if torch.cuda.is_available():
    X_ = X.to("cuda")
elif torch.backends.mps.is_available():
    X_ = X.to("mps")
else:
    X_ = X.to("cpu")
X_.device

device(type='cpu')

## 1.5 Reshape or View

Often, you will need to reshape a tensor:

In [32]:
X = torch.rand(32, 28, 28, 1)
X_view = X.view(32, 28*28)
X_reshape = X.reshape(32, 28*28)
X.shape, X_view.shape, X_reshape.shape

(torch.Size([32, 28, 28, 1]), torch.Size([32, 784]), torch.Size([32, 784]))

The difference between `view` and `reshape` is: `view` operates as a view on the original tensor. If the underlying data is changed, the view will change too.

No data movement occurs when creating a view, view tensor just changes the way it interprets the same data.

In [33]:
X = torch.Tensor([0, 0])
X_view = X.view(1,2)
X.storage().data_ptr() == X_view.storage().data_ptr()

  X.storage().data_ptr() == X_view.storage().data_ptr()


True

In [34]:
X[0] = 1
X_view

tensor([[1., 0.]])

`view` can throw an error if the required view is not contiguous (does not share the same memory block)

> A tensor whose values are laid out in the storage starting from the rightmost dimension onward (that is, moving along rows for a 2D tensor) is defined as contiguous. Contiguous tensors are convenient because we can visit them efficiently in order without jumping around in the storage (improving data locality improves performance because of the way memory access works on modern CPUs). This advantage of course depends on the way algorithms visit.

You could call `.contiugous()` on a `view`, but `.reshape()` does that behind the scenes.

## 1.6 Permute

Sometimes you might want to reshuffle the order of a tensor.

For example, let's say we load an batch of 32 images, where every image has a size of 28x28 pixels, and has 3 channels (RGB color)

In [35]:
X = torch.rand(32, 28, 28, 3)

It is the case that there are different conventions for manipulating tensors in image recognition models. Some models have a channel-last convention, like I used above, but some (like [pytorch](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)) use a channel first convention, which would be (batch, channel, height, width).

You would want to swap the 4th dimension to the 2nd, or if you start from zero:

In [36]:
channel_first = X.permute(0, 3, 1, 2)
channel_first.shape

torch.Size([32, 3, 28, 28])

## 1.7 Broadcasting

Broadcasting is something you might know from `numpy`, but it is also used by `tensorflow`, `jax` and `torch`. 

Broadcasting allows to extend a dimension, without the need to do so explicitly. The rules for broadcasting are simple:

- two dimesions are equal
- one of the dimensions is 1

but lets show an example

In [37]:
a = torch.ones(2, 2)
b = torch.ones(2, 2)
a, b, a+b

(tensor([[1., 1.],
         [1., 1.]]),
 tensor([[1., 1.],
         [1., 1.]]),
 tensor([[2., 2.],
         [2., 2.]]))

This is straigh forward. But what would happen in this case:

In [38]:
a = torch.ones(1, 2)
b = torch.ones(2, 2)

`b` is a 2x2 grid, and has four numbers. If we want to add `a`, we have only two numbers! Now, you could start stacking the `a` tensor to get matching dimensions. But you dont have to!

In [39]:
a, b, a + b

(tensor([[1., 1.]]),
 tensor([[1., 1.],
         [1., 1.]]),
 tensor([[2., 2.],
         [2., 2.]]))

See what happened here? 

`a` is magically broadcasted over the first dimension. And what would you guess would happen in this case:

In [40]:
a = torch.ones(1, 5, 1, 4)
b = torch.ones(3, 1, 3, 1)

First, predict the output shape, then check it for yourself.

And, what would you think happens here; do you think this gives an error, or do you think it broadcasts?

In [41]:
(a + b).shape

torch.Size([3, 5, 3, 4])

<font color='teal'>
We can see that the output shape of 'a + b' has the broadcasted dimensions. Because every mismatched dimension is broadcasted to match the other dimension.
</font>

In [42]:
a = torch.ones(5, 1, 4)
b = torch.ones(3, 1, 3, 1)

In [43]:
(a + b).shape

torch.Size([3, 5, 3, 4])

<font color='teal'>
The same goes with this example where every mismatched dimension is broadcasted to match the other dimension.
</font>