#### Login
```
ssh -i ~/.ssh/id_ARCHER2_rsa liyiyan@login.cirrus.ac.uk
cd /work/m23ss/m23ss/liyiyan
```


#### Running Jobs
- `sinfo`
- `sbatch jobscript.slurm`  // submit a job script containing `srun` commands
- `squeue`
- `scancel 12345`
```
srun --nodes=1 --time=00:05:00 --partition=standard --qos=short \<path to file\>
```

#### Interactive Jobs
```
# partitions = standard or gpu --gres=gpu:X
# qos = shrot < 20 mins or standard
# --exclusive
srun --nodes=1 --time=00:20:00 --partition=gpu --gres=gpu:1 --qos=short --account=m23ss --pty /usr/bin/bash --login
```


#### conda env
```
# Create env
conda config --prepend envs_dirs ${CONDA_ROOT}/envs
conda config --prepend pkgs_dirs ${CONDA_ROOT}/pkgs
conda avtivate myenv

# Activate env
CONDA_ROOT=/work/m23ss/m23ss/liyiyan/condaenvs2
    # CONDA_ROOT=/work/m23ss/m23ss/liyiyan/plankton/conda_envs
export CONDARC=${CONDA_ROOT}/.condarc
eval "$(conda shell.bash hook)"

conda activate myvenv
```

#### CPRNet
```
# cv2
conda install -c conda-forge opencv
# tqdm
conda install -c conda-forge tqdm
```


#### Run JupyterLab on Cirrus
https://cirrus.readthedocs.io/en/main/user-guide/python.html#using-jupyterlab-on-cirrus
```
module load anaconda/python3

export HOME=/work/m23ss/m23ss/liyiyan
export JUPYTER_RUNTIME_DIR=$(pwd)

# Start Jupyter server.
jupyter notebook --ip=0.0.0.0 --no-browser
# --port=5000-65535

# Start a new terminal.
ssh <username>@login.cirrus.ac.uk -L<port_number>:<node_id>:<port_number>
ssh -i ~/.ssh/id_ARCHER2_rsa liyiyan@login.cirrus.ac.uk -L
```


# PyTorch
```
import torch
```

- `torch.Tensor` - A multi-dimensional array with support for autograd operations like backward(). Also holds the gradient w.r.t. the tensor.

- `nn.Module` - Neural network module. Convenient way of encapsulating parameters, with helpers for moving them to GPU, exporting, loading, etc.

- `nn.Parameter` - A kind of Tensor, that is automatically registered as a parameter when assigned as an attribute to a Module.

- `autograd.Function` - Implements forward and backward definitions of an autograd operation. Every Tensor operation creates at least a single Function node that connects to functions that created a Tensor and encodes its history.


## Tensors
- Generalisation of vectors and matrices (multidimensional array).
- A tensor ${\mathcal A} \isin {\mathbb F}^{I_{0} \times ... I_{C}}$, where $I_{i}$ are positive intergers and $C$ is the number of dimensions or mode of the tensor.

#### Initialisation
```
# 1. Directly from data
x_data = torch.tensor(data)

# 2. From a NumPy array
x_np = torch.from_numpy(np_array)

# 3. From another tensor
x_ones = torch.ones_like(x_data) # retains the properties of x_data (shape, datatype)

x_rand = torch.rand_like(x_data, dtype=torch.float) # overrides the datatype of x_data

```
#### Attributes
`shape`, `dtype`, `device` (device on which they are stored)

#### Operations
- Element-wise product: `tensor.mul(tensor)` or `tensor * tensor`
- Matrix multiplication: `tensor.matmul(tensor.T)` or `tensor @ tensor.T`

- In-place operations: operations that have a _ suffix
</br>`x.copy_(y)`
</br>`x.t_()`
</br>`tensor.add_(5)`

- Joining: `torch.cat`, `torch.stack`
- Tensor to NumPy array: `n = t.numpy()` (Note that a change in the tensor relects in the NumPy array and vice versa)


## `torch.autograd`
PyTorch's automatic differentiation engine.

#### Traning a NN:
1. Forward Propagation

   In forward prop, the NN makes its best guess about the correct output.
   
   It runs the input data through each of its functions to make this guess.

2. Backward Propagation
   
   In back prop, the NN adjusts its parameters proportionate to the error in its guess.
   
   It does this by traversing backwards from the output, collecting the gradients of the error wrt the parameters of the functions, and optimising the parameters using gradient descent.

#### E.g. Implemetation

```
import torch
from torchvision.models import resnet18, ResNet18_Weights
model = resnet18(weights=ResNet18_Weights.DEFAULT)
data = torch.rand(1, 3, 64, 64)
labels = torch.rand(1, 1000)
```

1. **Foward pass:**</br>
   Run the input data through the model through each of its layers to make a prediction.
```
prediction = model(data)  # forward pass
```

2. **Backward pass:**</br>
Calculate the error (loss).</br>
Backpropagate the error through the network.</br>
Call `.backward()` on the error tensor.
Autograd then calculates and stores the gradients for each model parameter in the parameter's `.grad` attribute.
```
loss = (prediction - labels).sum()
loss.backward()  # backward pass
```

3. Load an **optimiser**. Register all the parameters of the model in the optimizer.
```
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)
```

4. **Gradient descent:**</br>
   Call `.step()` to initiate gradient descent.</br>
   The optimiser adjusts each parameter by its gradient stored in `.grad`.
```
optim.step()  # gradient descent
```


## Neural Networks

## `torch.nn`

A typical training procedure for a neural network:

- Define the neural network that has some learnable parameters (or weights)

- Iterate over a dataset of inputs

- Process input through the network

- Compute the loss (how far is the output from being correct)

- Propagate gradients back into the network’s parameters

- Update the weights of the network, typically using a simple update rule: `weight = weight - learning_rate * gradient`

</br>

#### E.g. Implementation

1. Define the network:</br>
   (The `backward` function is already defined in `autograd`.)
```
import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 5x5 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 5 * 5, 120)  # 5*5 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square, you can specify with a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = torch.flatten(x, 1) # flatten all dimensions except the batch dimension
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
```

2. Process inputs and call `backward`:
```
net = Net()

input = torch.randn(1, 1, 32, 32)  # random 32x32 input
out = net(input)

net.zero_grad()  # zero the gradient buffers of all parameters
out.backward(torch.randn(1, 10))  # backprop with random gradients

params = list(net.parameters())  # learnable parameters
```
- `torch.nn` only supports mini-batches. The entire torch.nn package only supports inputs that are a mini-batch of samples, and not a single sample.
- If you have a single sample, just use `input.unsqueeze(0)` to add a fake batch dimension.

3. Compute the loss:</br>
    A *loss function* takes (output, target) and computes a value that estimates how far away the output is from the target.</br>
    There are several different loss functions under the nn package. A simple loss is `nn.MSELoss` which computes the mean-squared error between the output and the target.
```
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make the target the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
```

4. Backpropagate the error:</br>
   clear the existing gradients and then call `loss.backward`.
```
net.zero_grad()  # zero the gradient buffers of all parameters

loss.backward()
```

5. Update the weights:</br>
   The simplest update rule used in practice is the Stochastic Gradient Descent (SGD): `weight = weight - learning_rate * gradient`.</br>
   `torch.optim` implements various update rules such as SGD, Nesterov-SGD, Adam, RMSProp, etc.
```
# SGD
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)


# Alternatively, use torch.optim.
import torch.optim as optim

# Create a optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# In the training loop.
optimizer.zero_grad()  # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()  # update
```
