# PyTorch

- PyTorch uses tensors as a basic datatype. Tensors are generalizations of matrices to an arbitrary number of dimensions (note that in the context of tensors, a dimension is often called an axis).
- Imagine a colored pixel. It has three dimensions: X, Y, and color.

# What is CUDA?
CUDA stands for Compute Unified Device Architecture. It is a parallel computing platform and application programming interface (API) model created by Nvidia. It allows software developers and software engineers to use a CUDA-enabled graphics processing unit (GPU) for general purpose processing – an approach termed GPGPU (General-Purpose computing on Graphics Processing Units). The CUDA platform is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements, for the execution of compute kernels.
CUDA acts as a compute environment for PyTorch.

In [2]:
import torch

In [4]:
torch.rand(5, 3)

tensor([[0.7983, 0.0416, 0.6330],
        [0.9983, 0.3587, 0.4066],
        [0.1612, 0.3614, 0.8665],
        [0.2395, 0.2811, 0.1859],
        [0.6195, 0.9561, 0.3588]])

In [12]:
# Create a tensor of size (5 x 3)
tensor = torch.Tensor([[[3,2],[1,5]], [[7,9], [2,1]], [[3,4], [5,6]]])
tensor 

tensor([[[3., 2.],
         [1., 5.]],

        [[7., 9.],
         [2., 1.]],

        [[3., 4.],
         [5., 6.]]])

In [13]:
# View the device of the tensor
tensor.device

device(type='cpu')

In [14]:
# Change the device of the tensor from CPU to GPU
tensor.to('cuda')

tensor([[[3., 2.],
         [1., 5.]],

        [[7., 9.],
         [2., 1.]],

        [[3., 4.],
         [5., 6.]]], device='cuda:0')

In [15]:
# Check the shape of the tensor
tensor.shape

torch.Size([3, 2, 2])

_It tells us that we have 3 elements in the first dimension, 2 elements in the second dimension, and 2 elements in the third dimension. The total number of elements is 3 x 2 x 2 = 8._

# Element Retrieval

In [18]:
print(f"The first matrix \n {tensor[0]}") # Gives the first matrix
print(f"The first matrix inside the first matrix \n {tensor[0][0]}") 

The first matrix 
 tensor([[3., 2.],
        [1., 5.]])
The first matrix inside the first matrix 
 tensor([3., 2.])


# Using Neural Networks

In [20]:
import torch.nn as nn
linear = nn.Linear(10,2) # Creates a linear layer with 10 inputs and 2 outputs. It takes in a tensor of size (N, 10) and outputs a tensor of size (N, 2).
input = torch.randn(3,10) # Creates a random tensor of size (3, 10)
output = linear(input) # Passes the input tensor through the linear layer 

In [21]:
print(output)

tensor([[-0.2551,  0.1872],
        [ 1.0104, -0.3893],
        [ 0.4198, -0.5051]], grad_fn=<AddmmBackward0>)


# What is an activation?
Activation takes any number and outputs a number between 0 and 1. It is a function that is applied to the output of each processing element (neuron) in a neural network.

## ReLU
ReLu stands for Rectified Linear Unit. It is a type of activation function. Mathematically, it is defined as y = max(0, x). It is a simple function which returns the value passed to it directly, or the value 0, whichever is greater. It has become very popular in the last few years and is now the default activation function for many types of neural networks.

In [22]:
relu = nn.ReLU() # Creates a ReLU activation function

In [26]:
relu_output = relu(output) # Passes the output tensor through the ReLU activation function
print(relu_output)

tensor([[0.0000, 0.1872],
        [1.0104, 0.0000],
        [0.4198, 0.0000]], grad_fn=<ReluBackward0>)


# What are optimisers?
Optimisers are the algorithms that are used to calculate the loss and adjust the weights. They are used to solve minimisation problems (minimising the loss). They help us finetune our neural network to get the best possible results. We need `torch.optim` for this.

In [25]:
import torch.optim as optim

We can sequentially peroform a series of steps using nn.sequential.

In [27]:
mlp = nn.Sequential(
  nn.Linear(5,2), # Creates a linear layer with 5 inputs and 2 outputs. This is linear transformation.
  nn.BatchNorm1d(2), # Creates a batch normalization layer with 2 features. This is normalization.
  nn.ReLU(), # Creates a ReLU activation function
)

In [29]:
input = torch.randn(3,5) # Creates a random tensor of size (3, 5) 
mlp_output = mlp(input) # Passes the input tensor through the MLP
print(mlp_output)

tensor([[1.2417, 0.6178],
        [0.0000, 0.7928],
        [0.0000, 0.0000]], grad_fn=<ReluBackward0>)


Now we will use a optimiser. We will use the Adam optimiser. It is a very popular optimiser. It is a combination of RMSProp and Stochastic Gradient Descent with momentum. It is an adaptive learning rate optimiser.


In [30]:
adam_optimiser = optim.Adam(mlp.parameters(), lr=0.01) # Creates an Adam optimizer with a learning rate of 0.01

In [32]:
train_data = torch.randn(100,5) # Creates a random tensor of size (100, 5)
adam_optimiser.zero_grad() # Clears the gradients of the optimizer
current_loss = torch.abs(mlp(train_data)).sum() # Calculates the loss of the MLP on the training data 
current_loss.backward() # Backpropagates the loss
adam_optimiser.step() # Updates the parameters of the MLP
print(current_loss)


tensor(78.4811, grad_fn=<SumBackward0>)


This is one single step of training. We will use a loop to train for multiple epochs.