# Why Use PyTorch?

PyTorch is an open-source machine learning library for Python, developed primarily by Facebook's artificial intelligence research group. The main reason why PyTorch has gained popularity is its complete ease of use and flexibility.

![PyTorch Logo](https://upload.wikimedia.org/wikipedia/commons/thumb/c/c6/PyTorch_logo_black.svg/2560px-PyTorch_logo_black.svg.png)



## Dynamic Computation Graphs

A key feature that differentiates PyTorch from other deep learning libraries is its dynamic computation graph.

- A computation graph is an abstract way to represent your mathematical computations which are happening in the code.
- The graph is essential as it allows you to not only visualize the computation but also compute the gradients automatically.
- [Computation Graph Blog](https://towardsdatascience.com/getting-started-with-pytorch-part-1-understanding-how-automatic-differentiation-works-5008282073ec)

In static computation graph (used in libraries like TensorFlow), the graph is defined in advance and then executed. However, in Pytorch:
- Operations can be executed immediately, making debugging easier.
- The overall experience is closer to the usual Python programming.



## Role of Differentiation and Chain Rule

Differentiation is a crucial part of any deep learning library.

- PyTorch uses a technique called automatic differentiation, which numerically evaluates the derivative of a function.
- It is beneficial when dealing with complex equations, such as the ones we encounter in deep learning models.

The Chain Rule:
- It is used for differentiating compositions of functions.
- This rule is central to backpropagation which is used for training neural networks.
- The ability to compute gradients or derivatives of our error or loss with respect to our model's parameters, using the chain rule, allows us to optimize our model.

# A Beginner's Guide to Tensors

## What is a Tensor?

Just as kids can arrange their toys in many ways, data in computing can also be arranged in various forms. Imagine you're in a toy store.

- An action figure sitting alone represents a single number, also known as a **scalar** in mathematics. A scalar only has a magnitude (its price in our case).
- Action figures arranged in a row represent a one-dimensional array, popularly known as a **vector** in computing and mathematics.
- Multiple rows and columns of action figures showcased in a glass box constitute a two-dimensional array or a **matrix**.
- And lastly, a pile of such glass boxes stacked over each other forms a three-dimensional array, known as a **tensor**.

A tensor, in essence, can house data in N dimensions but visualizing dimensions beyond the third one can be quite challenging.


## Tensors in PyTorch

In the context of PyTorch (a popular library for deep learning), a tensor is pretty similar to NumPy's ndarray, and serves as the core data structure of the library.

The three main components associated with tensors in PyTorch are:

- **Grad**: It holds the value of gradient. PyTorch provides automatic differentiation and gradient computation which is executed when `.backward()` function is called.

- **Grad_fn**: It is the backward function through which the gradient has been computed.

- **Data**: This gives access to the raw data of the tensor.


#pyTorch hands on


In [1]:
import torch

In [2]:
l = [1, 2, 3]
l, type(l)

([1, 2, 3], list)

In [3]:
t1 = torch.tensor(l) # list
print(t1)

tensor([1, 2, 3])


In [4]:
t2 = torch.tensor([1, 3, 56]) # list
t2  # 1d tensor, aka vector

tensor([ 1,  3, 56])

In [5]:
t3 = torch.tensor([[71, 63, 56], [41, 33, 36], [12, 13, 56]]) #matrix or 2d array
t3

tensor([[71, 63, 56],
        [41, 33, 36],
        [12, 13, 56]])

In [6]:
t4 = torch.rand((3, 2, 3,  4)) # 4d tensor
t4

tensor([[[[0.2711, 0.5325, 0.0667, 0.7627],
          [0.6939, 0.4436, 0.0646, 0.9406],
          [0.4772, 0.9942, 0.9673, 0.9294]],

         [[0.1159, 0.0375, 0.5463, 0.7139],
          [0.9652, 0.8298, 0.2032, 0.2123],
          [0.7324, 0.1217, 0.8245, 0.0641]]],


        [[[0.9040, 0.5370, 0.7815, 0.4725],
          [0.4140, 0.5743, 0.1007, 0.2776],
          [0.5211, 0.6973, 0.3306, 0.0035]],

         [[0.5335, 0.6374, 0.6438, 0.3324],
          [0.8386, 0.5532, 0.1005, 0.1302],
          [0.6472, 0.0835, 0.6075, 0.9370]]],


        [[[0.8247, 0.3982, 0.2856, 0.4714],
          [0.1708, 0.3508, 0.5003, 0.2148],
          [0.3656, 0.9803, 0.6903, 0.9538]],

         [[0.8711, 0.9611, 0.3604, 0.3327],
          [0.8302, 0.8938, 0.3753, 0.0151],
          [0.3122, 0.2499, 0.4188, 0.8080]]]])

In [7]:
t5 = torch.rand((2, 2, 2, 3,  4)) # 5d tensor
t5

tensor([[[[[0.4159, 0.0592, 0.0559, 0.5674],
           [0.5666, 0.0352, 0.9652, 0.8828],
           [0.2336, 0.8954, 0.9035, 0.7558]],

          [[0.1671, 0.3765, 0.4618, 0.8796],
           [0.5110, 0.3395, 0.2673, 0.8305],
           [0.0149, 0.9670, 0.0200, 0.3209]]],


         [[[0.0341, 0.4285, 0.6979, 0.5017],
           [0.4821, 0.4929, 0.5933, 0.5376],
           [0.4807, 0.7165, 0.0828, 0.3125]],

          [[0.7447, 0.8070, 0.2704, 0.7317],
           [0.0670, 0.5016, 0.1057, 0.4898],
           [0.9406, 0.5799, 0.4563, 0.1760]]]],



        [[[[0.6988, 0.9275, 0.4731, 0.5632],
           [0.1305, 0.4814, 0.4525, 0.1024],
           [0.4771, 0.0871, 0.6763, 0.2731]],

          [[0.9425, 0.6928, 0.1970, 0.2595],
           [0.0968, 0.9732, 0.3993, 0.1889],
           [0.7184, 0.4649, 0.0021, 0.1680]]],


         [[[0.8225, 0.3465, 0.9506, 0.3313],
           [0.9457, 0.8236, 0.7229, 0.6946],
           [0.4006, 0.2454, 0.6333, 0.2207]],

          [[0.2927, 0.6895, 0.805

In [8]:
t1.shape, t2.shape, t3.shape, t4.shape, t5.shape

(torch.Size([3]),
 torch.Size([3]),
 torch.Size([3, 3]),
 torch.Size([3, 2, 3, 4]),
 torch.Size([2, 2, 2, 3, 4]))

## Data types in Tensor

[All datatypes of Tensor in pyTorch](https://pytorch.org/docs/stable/tensors.html)

In [9]:
t_float = torch.tensor([1, 3, 56], dtype=torch.long) # list
t_float

tensor([ 1,  3, 56])

##Math Operations on Tensors


In [10]:

# common functions
a = torch.rand(2, 4) * 2 - 1
a

tensor([[-0.8709, -0.0640,  0.7953,  0.8383],
        [ 0.2413, -0.1402, -0.8101,  0.8598]])

In [11]:

print('Common functions:')
print("Absolute", torch.abs(a))
print("Ceiling", torch.ceil(a))
print("Floor", torch.floor(a))
print("Clamp", torch.clamp(a, -0.5, 0.5))

Common functions:
Absolute tensor([[0.8709, 0.0640, 0.7953, 0.8383],
        [0.2413, 0.1402, 0.8101, 0.8598]])
Ceiling tensor([[-0., -0., 1., 1.],
        [1., -0., -0., 1.]])
Floor tensor([[-1., -1.,  0.,  0.],
        [ 0., -1., -1.,  0.]])
Clamp tensor([[-0.5000, -0.0640,  0.5000,  0.5000],
        [ 0.2413, -0.1402, -0.5000,  0.5000]])


In [12]:
import math
# trigonometric functions and their inverses
angles = torch.tensor([0, math.pi, math.pi / 4, math.pi / 2, 3 * math.pi / 4])
sines = torch.sin(angles)
inverses = torch.asin(sines)
print('\nSine and arcsine:')
print(angles)
print(sines)
print(inverses)



Sine and arcsine:
tensor([0.0000, 3.1416, 0.7854, 1.5708, 2.3562])
tensor([ 0.0000e+00, -8.7423e-08,  7.0711e-01,  1.0000e+00,  7.0711e-01])
tensor([ 0.0000e+00, -8.7423e-08,  7.8540e-01,  1.5708e+00,  7.8540e-01])


In [13]:
# comparisons:
print('\nBroadcasted, element-wise equality comparison:')
d = torch.tensor([[1., 2.], [3., 4.]])
e = torch.ones(1, 2)  # many comparison ops support broadcasting!
print(torch.eq(d, e)) # returns a tensor of type bool


Broadcasted, element-wise equality comparison:
tensor([[ True, False],
        [False, False]])


In [14]:
d

tensor([[1., 2.],
        [3., 4.]])

In [15]:
e

tensor([[1., 1.]])

##Tensor Reduction/Aggregation




In [16]:
d = torch.tensor([[1., 2.], [3., 4.]])
d

tensor([[1., 2.],
        [3., 4.]])

In [17]:
# reductions:

print('\nReduction ops:')
print(torch.max(d))        # returns a single-element tensor
print(torch.max(d).item()) # extracts the value from the returned tensor
print(torch.mean(d))       # average
print(torch.std(d))        # standard deviation
print(torch.prod(d))       # product of all numbers
print(torch.unique(torch.tensor([1, 2, 1, 2, 1, 2]))) # filter unique elements


Reduction ops:
tensor(4.)
4.0
tensor(2.5000)
tensor(1.2910)
tensor(24.)
tensor([1, 2])


##Tensor Reduction/Aggregation along rows and columns



In [18]:
# reductions:
d = torch.tensor([[1., 2.], [3., 4.]])
print('\nReduction ops:')
print(torch.mean(d))       # average
print(torch.mean(d, dim=1))       # average each row
print(torch.mean(d, dim=0))       # average each col


Reduction ops:
tensor(2.5000)
tensor([1.5000, 3.5000])
tensor([2., 3.])


In [19]:
d

tensor([[1., 2.],
        [3., 4.]])

#Manipulating Tensor Shapes
- This is the one of the most important if not the most important topic.

In [20]:
a = torch.rand(2, 3, 4)
a.shape


torch.Size([2, 3, 4])

In [21]:
a

tensor([[[0.0784, 0.7746, 0.2177, 0.9592],
         [0.3998, 0.0689, 0.0351, 0.4325],
         [0.7645, 0.4277, 0.2615, 0.3540]],

        [[0.1413, 0.5714, 0.0770, 0.1342],
         [0.9489, 0.6592, 0.5160, 0.6475],
         [0.3972, 0.7820, 0.1910, 0.9347]]])

In [22]:
a.view(-1, 1).shape, a.view(-1, 2).shape

(torch.Size([24, 1]), torch.Size([12, 2]))

In [23]:
a.view(-1, 2)

tensor([[0.0784, 0.7746],
        [0.2177, 0.9592],
        [0.3998, 0.0689],
        [0.0351, 0.4325],
        [0.7645, 0.4277],
        [0.2615, 0.3540],
        [0.1413, 0.5714],
        [0.0770, 0.1342],
        [0.9489, 0.6592],
        [0.5160, 0.6475],
        [0.3972, 0.7820],
        [0.1910, 0.9347]])

In [24]:
# you dont need -1 all the time, if you know the exact dim, then you can use that
# 2x3x4 = total 24

a.view(4, 6), a.view(4, 6).shape

(tensor([[0.0784, 0.7746, 0.2177, 0.9592, 0.3998, 0.0689],
         [0.0351, 0.4325, 0.7645, 0.4277, 0.2615, 0.3540],
         [0.1413, 0.5714, 0.0770, 0.1342, 0.9489, 0.6592],
         [0.5160, 0.6475, 0.3972, 0.7820, 0.1910, 0.9347]]),
 torch.Size([4, 6]))

In [25]:
# you dont need -1 all the time, if you know the exact dim, then you can use that
# 2x3x4 = total 24

a.view(4, 10)

RuntimeError: shape '[4, 10]' is invalid for input of size 24

# Rand vs Randn
- Rand gives random number from 0 to 1
- Randn also gives random number from a normal distribution where mean is 0 and variance of 1

In [26]:
torch.manual_seed(42)
torch.rand(2, 3)

tensor([[0.8823, 0.9150, 0.3829],
        [0.9593, 0.3904, 0.6009]])

In [27]:
torch.randn(2, 3)

tensor([[ 1.1561,  0.3965, -2.4661],
        [ 0.3623,  0.3765, -0.1808]])

In [28]:
# random integers between 3 and 10, 10 excluded of the provided shape
x = torch.randint(3, 10, (3, 4))
x

tensor([[5, 6, 4, 3],
        [3, 6, 6, 8],
        [4, 4, 5, 5]])

In [29]:
torch.numel(x) # number of elements in the tensor, in this case, 3x4 = 12

12

#Zeros, Ones, and Likes

# Practical UseCases for PyTorch's `torch.ones`, `zeros`, `like`

PyTorch provides several functions to create tensors. Three of them are `torch.ones`, `torch.zeros`, and `torch.ones_like`, `torch.zeros_like`. Let's take a look at practical scenarios where these can be used.

## torch.ones

`torch.ones` is used to create a tensor filled with the scalar value 1, with the shape defined by the variable argument size.

### Use case:
- Initialization of bias or weights in a neural network: `torch.ones` can be used fill the weights or bias tensor with initial value of 1.

    ```python
    bias = torch.ones((10,))
    weights = torch.ones((5, 10))
    ```

- Generation of a mask tensor: For certain operations where we needs apply a mask to all elements.

    ```python
    mask = torch.ones_like(weights)
    ```

## torch.zeros

`torch.zero` creates a tensor filled with the scalar value 0, with the shape defined by the variable argument size.

### Use case:
- Creation of a black image or background in computer vision tasks.
    ```python
    black_img = torch.zeros([256, 256, 3])
    ```
- Padding values in data analytics.
    ```python
    padded_sequence = torch.zeros(max_sequence_length, dtype=int)
    padded_sequence[:len(sequence)] = torch.tensor(sequence)
    ```
- In deep learning, to initialize the weights or bias tensor with zeros before training.
    ```python
    weights = torch.zeros((5, 10))
    ```
- Generation of a masking tensor.

## torch.ones_like and torch.zeros_like

These functions come handy when ones need to create a new tensor that has the same size as a given tensor. The new tensor is then filled with ones or zeros respectively.

### Use case:

```python
x = torch.randn((5, 5))

# creating tensor with same size as x
ones = torch.ones_like(x)
zeros = torch.zeros_like(x)
```

Operations like initializing tensors, creating masks, generating images, and padding sequences are just some of the areas where these functions are vital. Their application is versatile and extend to many other fields of machine learning, deep learning and data analytics.

In [30]:
ones = torch.ones((2, 4))
ones

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.]])

In [31]:
ones = torch.ones((2, 4), dtype=torch.float64)
ones

tensor([[1., 1., 1., 1.],
        [1., 1., 1., 1.]], dtype=torch.float64)

In [32]:
zeros = torch.zeros((2, 4))
zeros

tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.]])

In [33]:
new_tensor = torch.rand_like(zeros, dtype=torch.float64)
new_tensor

tensor([[0.6124, 0.0881, 0.7012, 0.6234],
        [0.4373, 0.0747, 0.6834, 0.3122]], dtype=torch.float64)

In [34]:
t1 = torch.tensor([1, 2, 3])
t2 = torch.tensor([1, 2, 3])

t3 = t1.add(t2)
t3

tensor([2, 4, 6])

In [35]:
t1, t2

(tensor([1, 2, 3]), tensor([1, 2, 3]))

In [36]:
# inplace addition. Similarly you can do any operation inplace
t1.add_(t2)
t1

tensor([2, 4, 6])

# Understanding AutoGrad

PyTorch's `autograd` package provides classes and functions implementing automatic differentiation of arbitrary scalar-valued functions. It allows for on-the-fly operation tracking on tensors, where every operation is recorded on a graph. The graph is then used in the backward pass to compute gradients.

## Definition of Autograd

`autograd` is PyTorch’s automatic differentiation library. It keeps a record of all operations performed on a tensor and creates an acyclic computation graph. This allows the use of back-propagation to calculate the gradients of the weights with respect to the loss function, which is essential when training neural networks.

## Importance of Autograd

In machine learning, especially in neural networks, optimization of weights is done using gradient information. Gradients point the optimizer in the direction towards which the parameters should be altered for minimizing the loss function. Autograd automates the computation of the backward pass in a computational graph (which are the gradients). Hence, calculation of gradients is performed automatically and accurately.

## How Autograd works

- When a tensor's `.requires_grad` property is set as `True`, PyTorch starts to track all operations on it.
- After the computation is finished, `.backward()` is called and all gradients are computed automatically.
- The gradient for a tensor will be accumulated into `.grad` attribute.
- To prevent tracking history and the use of memory, `.detach()` is called to stop tracking computations.
- Tracking can also be stopped by wrapping the code block inside `with torch.no_grad():`

Here is an example demonstrating autograd:

```python
x = torch.randn(3, 3, requires_grad=True)
y = x + 2
z = y * y * 3
out = z.mean()
out.backward()  # computes gradients
print(x.grad)  # the gradient of 'out' with respect to 'x'
```

## Conclusion

The usage of `autograd` simplifies the computation of gradients in neural networks. By automatically calculating gradients, it reduces the likelihood of making errors in complex neural networks, and hence improves efficiency of the model training phase.

In [37]:
x = torch.tensor([1., 2., 3.], requires_grad=True)
y = x + 2
z = y * y * 3
out = z.mean()
out



tensor(50., grad_fn=<MeanBackward0>)

In [38]:
z

tensor([27., 48., 75.], grad_fn=<MulBackward0>)

In [39]:
y

tensor([3., 4., 5.], grad_fn=<AddBackward0>)

In [40]:
# this wont work because backword is not yet called
print(x.grad)  # the gradient of 'out' with respect to 'x'

None


In [41]:
# before calling the backward, lets have a look at the grad functions of each

y.grad_fn, z.grad_fn, out.grad_fn

(<AddBackward0 at 0x799f84f01b70>,
 <MulBackward0 at 0x799f84f026e0>,
 <MeanBackward0 at 0x799f84f00820>)

##The Backward Function

The `backward()` function in PyTorch is a critical part of the Autograd system, which is responsible for automatic computation of gradients for various operations on Tensors.




### Functionality of the `backward()` Function

The `backward()` function is primarily used for computing the gradients of the loss with respect to some set of variables (usually the model parameters). It performs backpropagation starting from a variable.

Here's how it works:

- The `backward()` function computes the gradient/differentiation of the current tensor w.r.t. graph leaves.
- The graph is defined by following all the operations from the leaf/leaves to the current tensor.
- If the current tensor (`y`) is a scalar (i.e., it holds a one-element data), then you don’t need to specify any arguments to `backward()`. However, if it has more elements, you need to specify a gradient argument that is a tensor of matching shape (to `y`).



### Example of how `backward()` works:

Here is a simple example:

```python
import torch

# Create a tensor and set requires_grad=True to track computation with it
x = torch.ones(2, 2, requires_grad=True)

# Perform a tensor operation
y = x + 2

# Perform more operations on y
z = y * y * 3
out = z.mean()

# Trigger the backpropagation with backward()
out.backward()

# Print gradients d(out)/dx
print(x.grad)
```

In the above code:
- We first define a tensor `x` with `requires_grad=True` to ensure that the operation should be recorded for gradient computation.
- We then perform several operations on `x` to get the output tensor `out`.
- Calling the `backward` function triggers backpropagation, computation of the gradient.
- Finally, the gradients are available in `x.grad`.

### Conclusion

The `backward()` function is essential for neural network training. With this function, PyTorch allows automatic and efficient gradient computations. This simplifies the implementation of many machine learning algorithms, making development faster and less error-prone.

In [42]:
out.backward( )  # computes gradients
print(x.grad)  # the gradient of 'out' with respect to 'x'

tensor([ 6.,  8., 10.])


#Loss function in pyTorch
- [Every loss available in pyTorch](https://pytorch.org/docs/stable/nn.html#loss-functions)

In [44]:
import torch.nn as nn

In [45]:
#without any reduction
loss_mse = nn.MSELoss(reduction="none")
input = torch.randn(3, 4, requires_grad=True)
target = torch.randn(3, 4)
loss = loss_mse(input, target)
loss

tensor([[2.2424, 1.8387, 1.9646, 0.1659],
        [6.6479, 0.1036, 0.0574, 0.0315],
        [0.5554, 4.1243, 1.2666, 4.2864]], grad_fn=<MseLossBackward0>)

In [46]:
# in general we want to have one single value of the loss
loss_mse = nn.MSELoss(reduction="sum")
loss = loss_mse(input, target)
loss

tensor(23.2846, grad_fn=<MseLossBackward0>)

In [47]:
# in general we want to have one single value of the loss
loss_mse = nn.MSELoss(reduction="mean")
loss = loss_mse(input, target)
loss

tensor(1.9404, grad_fn=<MseLossBackward0>)