# PyTorch

## What is PyTorch?

Its 3 things
- Tensor library
- Automatic differentiation engine
- Deep learning library

Its `free` and `open source`

### History of PyTorch
- `PyTorch` is based on `torch` which is another popular library written in `lua` programming language.
- Because of most people love python and don't want to learn lua, PyTorch originated from torch.
- making it available in python based on torch7
- happend in 2016
- Most likely used deep learning library for researchers

### Tensors
Mathematically: It is generalization of vectors, matrices, etc.

Computationally: as a data container for storing multi dimentional arrays

### 1. Scalar (Rank - 0 Tensor)
- In `python` its number
- Can think of it as float
```py
a = 10.
print(a)  //10.
```

In [None]:
a = 10.
print(a) # think of it as a float

10.0



- Equivalent in `PyTorch`


In [1]:
import torch

a = torch.tensor(10.)
a # scalar tensor or rank-0 tensor

tensor(10.)

In [None]:
# can use a.shape to check dimentionality or rank of a Tensor
a.shape

"""
It returns nothing because its Rank-0 tensor.
"""

'\nIt returns nothing because its Rank-0 tensor.\n'

### 2. Vectors (Rank-1 Tensor)
- In Python, we can think simple list as a vector/ Rank-1 tensor


In [None]:
a = [1., 2., 3.]
a # Vector: simple list

[1.0, 2.0, 3.0]

- In PyTorch, its same as before but wrapping the list to the `torch.tensor`


In [None]:
a = torch.tensor([1., 2., 3.])
a # Vactor/Rank-1 Tensor

tensor([1., 2., 3.])

In [None]:
a.shape # will return 3
# because its 3 element tensor

torch.Size([3])

### 3. Matrices (Rank-2 Tensor)
- Here we use list of lists.
- This list has two sub list, and each of the sub list represents the row, So this will result in a matrix consisting of 2 rows and 3 columns.
- We can think of the rows as a `training example`, and columns represnts the `features` of the dataset

In [None]:
a = torch.tensor([[1., 2., 3.],
                  [4., 5., 6.]])
a

tensor([[1., 2., 3.],
        [4., 5., 6.]])

In [None]:
a.shape

torch.Size([2, 3])

### Considering realworld dataset
- We can think image as a matrix

<img src="https://images.pexels.com/photos/539694/pexels-photo-539694.jpeg" width="400" >

- Where the rows and the columns represents the pixels of the image.
- Here the image refers to 1 training example

**Look at RGB image**

Red, green, blue, 3 different color channels

- So, in above image we have 3 color channels rgb.
- we can think of this as a `stack of matrices`.
- each layer/color channel represents the matrix.
- as we already know scalar(rank-0 tensor), vectors(rank-1 tensor), and matrices(rank-2 tensors).

### 4. 3D-tensors
- So we this 3 dimentional data we call it **3D-tensor**
- We can think stack of matrices as a 3D tensors




In [None]:
a = torch.tensor([[[1., 2., 3.],[2., 3., 4.]],
                  [[4., 5., 6.], [7., 8., 9.]]])
a

tensor([[[1., 2., 3.],
         [2., 3., 4.]],

        [[4., 5., 6.],
         [7., 8., 9.]]])

In [None]:
"""
So, when we use a.shape it returns 3 numbers
- each number represents one dimention or one rank,
- and then numbers represents the values in this dimention
"""
a.shape

torch.Size([2, 2, 3])

### 5. 4D(rank-4)tensor

Going 1 step further we can also have a stack of multiple color images.

this would add another dimention.

And in this case we have 4 dimentional tensor or rank-4 tensor

In [None]:
b = torch.stack((a, a))
b

tensor([[[[1., 2., 3.],
          [2., 3., 4.]],

         [[4., 5., 6.],
          [7., 8., 9.]]],


        [[[1., 2., 3.],
          [2., 3., 4.]],

         [[4., 5., 6.],
          [7., 8., 9.]]]])

In [None]:
b.shape

torch.Size([2, 2, 2, 3])

So it looks similar like numpy's array, now let's see how tensor is different than a numpy array.


## Comparing Tensor library with Array library

- they are actually the same thing.
- tensor library = array library
- torch.tensor ~= numpy.array: torch.tensor is almost identical to numpy.array

Difference

|torch.tensor|
|:-|
|+ supports GPU computation|
|+ Automatic differentiation support, very useful when training neural nets|

### How tensors and arrays differe from regular python lists?

**Python Lists**
+ **Pros:** Can store heterogeous types (mix str, float, etc). you can store float, strings, and other objects mixed in a list.
+ **Pros:** In python list we can easily remove or add items using `.append` or `.pop`
+ **Cons:** while lists are easy to use and flexible, lists are very slow when it comes to numerical computation(that is the main motivation behind tensors)

**Tensors**
- Limitations of using tensors though is that all elements in tensors have to be the same type(eg. float, integer)
- In contrast to lists, tensors also have a fixed size, so we can't easily add or remove If we want to have a larger tensor, we have to create new empty tensor with a larger size and copy over the all the elements and add the new elements to it

this sounds like tensors are bad, However tensors have certain advantages over the lists which are extreamly useful for deep learning, which is heavily based on numerical computations.

- tensors support wide variety of different computations.
- numerical computations are fast

# Using Tensors In PyTorch

1. `torch.tensor()`: Creating Tensors
    - its most fundamental function. bcs that's how we create a tensors in PyTorch

In [None]:
a = torch.tensor([1., 2., 3.])
a

tensor([1., 2., 3.])

2. `.shape` Checking Shape of Tensors
    - tensor.shape to check the shape of the tensor.
    - using `.shape` attribute we can check the no of elements in the tensor.
    - in 2D tensor 1st no referes no of rows in tensor, and 2nd no referes no of columns in tensor.
    - to check rank of the tensor count the no of no that are returned by `.shape`

In [None]:
a = torch.tensor([[1., 2., 3.], [3., 4., 5.]])
a

tensor([[1., 2., 3.],
        [3., 4., 5.]])

In [None]:
a.shape # will get [2,3] tow numbers meaning its 2D tensor

torch.Size([2, 3])

3. `.ndim`: Checking the Rank/ Number of Dimentions
    - use .ndim to check rank or dimention of the tensor

In [None]:
a = torch.tensor([[[1., 2., 3.], [4., 5., 6.]], [[3., 4., 5.], [6., 7., 8.]]])
a

tensor([[[1., 2., 3.],
         [4., 5., 6.]],

        [[3., 4., 5.],
         [6., 7., 8.]]])

In [None]:
a.ndim # will get dimention of the tensor, in our case its 3D tensor

3

4. `.dtype`: Checking the Data type of Tensor
    - as tensor can only store same type of data.
    - we can see the datatype of the tensor.
    - below it returns torch.float32, meaning 32 bit precision
    - that's prefered precision in deep learning bcs of efficiency reasons

In [None]:
a = torch.tensor([[1., 2., 3.], [5., 6., 7.]])
a

tensor([[1., 2., 3.],
        [5., 6., 7.]])

In [None]:
a.dtype # will get data type of tensor that is float32

torch.float32

In [None]:
a = torch.tensor([[1,2,3],[4,5,6]])
a

tensor([[1, 2, 3],
        [4, 5, 6]])

In [None]:
a.dtype # int datatype

torch.int64

5. `torch.from_numpy(np_array)` Creating Tensor from Numpy Array
    - torch has the `from_numpy()` function which lets us convert Numpy array directly into Tensor.
    - can also call `.tensor` on numpy array, but that would create a copy in memory.
    - using from_numpy() function it will use the same memory as the numpy array.
    - since python uses 64bit precision bydefault, the converted tensor from numpy will be the same 64bit.


In [None]:
import numpy as np

In [None]:
np_array = np.array([1., 2., 3.]) # creating numpy array
print(f"Numpy array: {np_array}")
m2 = torch.from_numpy(np_array) # creating tensor from numpy array, its dtype will be same as numpy's that is 64bit,
# its 64 because its default type in numpy
m2

Numpy array: [1. 2. 3.]


tensor([1., 2., 3.], dtype=torch.float64)

6. `tensor_obj.to(new_dtype)` Change the dtype

In [None]:
m2 # currently its float64 dtype

tensor([1., 2., 3.], dtype=torch.float64)

In [None]:
m2 = m2.to(torch.float32) # changing 64bit datatype to 32
m2.dtype

torch.float32

7. `.device` Checking the device Type
    - tensors also have a `.device` attribute, that show us where on our computer the tensor is located.
    - So usually it will return CPU which means tensor is on CPU's memory.
    - Later will see how to transfer tensors to the GPU which can be very useful for deep learnig and accelerating training.

In [None]:
m2.device # currently its on cpu

device(type='cpu')

8. Changing Shape of a Tensor
    - Using `.view()` function we can change the shape of the tensor
    - `.view(-1,2)` here `-1` means it will automatically decide the no of rows/ dimention. here we're saying that we want no of columns 2, and because there's only one way it can have 2 columns it will put 3 rows automatically for us.
    - `.view(2, -1)` and if we use `-1` in column placeholder, that means it will automatically determine that dimention

In [None]:
print(a.shape)
a

torch.Size([2, 3])


tensor([[1, 2, 3],
        [4, 5, 6]])

In [None]:
a.view(3,2)

tensor([[1, 2],
        [3, 4],
        [5, 6]])

In [None]:
a.view(-1,3)

tensor([[1, 2, 3],
        [4, 5, 6]])

9. Transposing a Matrix
    - There's a concept of transposing a tensor like matrix transpose.
    - Transposing a matrix meaning flipping it along its diagonal. like row values becomes column and column becomes rows.

In [None]:
m = torch.tensor([[1., 2., 3.], [4., 5., 6.]])
m # before transposing a matrix its shape is 2,3. 2 rows and 3 columns

tensor([[1., 2., 3.],
        [4., 5., 6.]])

In [None]:
m.T # applying transpose operation

tensor([[1., 4.],
        [2., 5.],
        [3., 6.]])

10. Multiplying Matrices
    - This is something that we do a lot in deeplearning.
    - Can use concepts from linear algebra to make code more efficient and faster

# Improving Code with Linear Algebra

- From for loop to dot product
- As we already know basics of tensors, let's see how to use concepts from linear algebra to encode certain things like the weighted sum more efficiently.

**Weighted Sum**

- its computed by multiplying inputs with the weights

`TODO: Image of neuron`

$$ b + x_1w_1 + x_2w_2 $$

- we do this computation for each individual training example.

1 training example with 2 features values

$$ z = b + x_1 * w_1 + x_2 * w_2 $$

General form

$$ z = b + ∑_{j=1}^m x_jw_j $$

- we can express weighted sum is dot product between `x` and `w` two vectors. if `X`and `W` are vectors.

$$  = b + XW $$

- can use concept from linear algebra called dot product to compute weighted sum


In [None]:
# dot product or multiplication of vectors in plain Python
b = 0
x = [1.2, 2.2]
w = [3.3, 4.3]

output = b
for xi, wi in zip(x,w):
  output += xi * wi

output

13.42

## Dot product between 2 vectors

**Another way to look at it is for single training example**

$$ z = b + X^TW $$

Let's see step by step

`X` is a vector from 1 to m. and `W` is the vector from 1 to m.

$$ X = \begin{bmatrix}x_1 \\ x_2 \\ .\\.\\.\\x_m
\end{bmatrix} W = \begin{bmatrix}w_1 \\ w_2 \\ .\\.\\.\\w_m
\end{bmatrix}$$

- now adding a transpose operation here. It will become handy during matrix multiplication.
- Transpose operation is transposing the column vector `X` into row vector `X^T` x transpose.

### `.dot()` to compute the dot product of two vectors

- It's much simpler and compact than venila python using for loop.
- Biggest advantage of using `.dot()` dot product is its **much much faster**.
- we can use `.dot()` dot product to improve code efficiency.
- question is how much faster? let's do benchmark

In [None]:
# Dot product of two vectors using .dot using PyTorch
b = torch.tensor(0.)
x = torch.tensor([1.2, 2.2])
w = torch.tensor([3.3, 4.3])

x.dot(w) + b  # much simpler than the plain python. It's more compact than venila python using for loop


tensor(13.4200)

## Benchmarking Venila Python List and PyTorch dot product


In [None]:
def plain_python(x, w, b):
  out = b
  for xi, wi in zip(x,w):
    out += xi * wi
  return out

import random

random.seed(123)

b = 0
x = [random.random() for _ in range(1000)]
w = [random.random() for _ in range(1000)]


%timeit plain_python(x, w, b)

84.7 µs ± 1.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [None]:
# In PyTorch
def pytorch_dot(x, w, b):
  return x.dot(w) + b # dot product using pytorch

# First convert python list to tensor
b = torch.tensor(b)
x = torch.tensor(x)
w = torch.tensor(w)

%timeit pytorch_dot(x, w, b)

11.7 µs ± 2.08 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)


Comparing runtime of plain python with pytorch tensor there is 25X speedup! in pytorch, which is huge.

so that's a good reason to replace python for loop with pytorch dot product.

So far it was about computation/weighted sum of single vector/training example. However in practice in deep learning we usually work with large dataset. and for that we use matrix multiplication.

# Dealing with multiple training examples via matrix multiplication

## Matrix multiplication of matrix and a vector.
It can be seen as multiple input examples for single neuron/node.

- Let's see applying dot product to multiple training example using matrix multiplication
- previously it was only 1 training example for computing weighted sum.
- extending that with multiple training examples


$$ z^{[i]} = b + ∑_{j=1}^m x_jw_j^{[i]} $$
`i` referes to the training example index.

- one way to look at it is as computing each weighted sum individually for each training example
- computing weighted sum `n` times if dataset contains `n` training examples

$$ z^{[1]} = b + ∑_{j=1}^m x_jw_j^{[1]} $$
$$ z^{[2]} = b + ∑_{j=1}^m x_jw_j^{[2]} $$
$$...$$
$$ z^{[n]} = b + ∑_{j=1}^m x_jw_j^{[n]} $$

**Look at more compact form**
$$ z^{[1]} = b + X^{[1]}W $$
$$ z^{[2]} = b + X^{[2]}W $$
$$...$$
$$ z^{[n]} = b + X^{[n]}W $$

- zoom into the `X`, each `X` represents the vector
- this is the feature vector corresponding to each training example.
- we can take these `vectors` and represent as a `matix`.

$$ X = \begin{bmatrix}x^{[1]} \\ x^{[2]} \\ .\\.\\.\\x^{[n]}
\end{bmatrix} $$

- each row here is a single training example.
- expanding it each column represents one feature.

$$ X = \begin{bmatrix}x^{[1]} \\ x^{[2]} \\ .\\.\\.\\x^{[n]}
\end{bmatrix} = \begin{bmatrix}x_1^{[1]} x_2^{[1]}...x_m^{[1]} \\ x_1^{[2]} x_2^{[2]}...x_m^{[2]} \\ .\\.\\.\\ x_1^{[n]} x_2^{[n]}...x_m^{[n]}
\end{bmatrix} $$

- here **each feature column still corresponds to single weight**.

$$ X = \begin{bmatrix}x_1^{[1]} x_2^{[1]}...x_m^{[1]} \\ x_1^{[2]} x_2^{[2]}...x_m^{[2]} \\ .\\.\\.\\ x_1^{[n]} x_2^{[n]}...x_m^{[n]}
\end{bmatrix}  W = \begin{bmatrix}w^{[1]} \\ w^{[2]} \\ .\\.\\.\\w^{[m]}
\end{bmatrix}  $$

### My understanding
- each feature meaning no of inputs to the network.
- and its same size for weight parameter vector. because no of weights will be same as no of inputs for the single neuron/perceptron.
- as per above explaination `X` matrix the no of rows are training examples and it has total `m` inputs from single example. and corresponding weights of size `m` which is same as no of inputs.
- 📍NOTE: here its `W` weight vector that means its just single neuron or node.


---
- Here we still have a weight vector and not a weight matrix, bacause we use the same weight for each training example.
- in this case `w1` will be used for feature column `x1` the whole 1st column.
- `w2` would be used for feature column 2 `x2`, and so for.
- so for dataset with `m` features we still have weight vector consisting of `m` `weights`.
- 💡Note: ensure that weight matrix X hase the same no of columns as the weight vector rows. we can then compute matrix multiplication via dot product.





## Benchmarking



In [None]:
# Python simple for loop
#----------

b = 0.
X = [[1.2, 2.2],
     [4.4, 5.5]] # 2 inputs 2d vector
w = [3.3, 4.3] # weight vector

outputs = []
for x in X: # computing matmul for each row/inputs with weight vector
  output = b
  for xi, wi in zip(x, w):
    output += xi*wi
  outputs.append(output)

outputs # final output

[13.42, 38.17]

In [None]:
# PyTorch matmul
#-----------------

# Converting python lists and values to tensor
b = torch.tensor(b)
X = torch.tensor(X)
w = torch.tensor(w)

# Applying matrix multiplication
X.matmul(w) + b # will return tensor result of matrix multiplication

tensor([13.4200, 38.1700])

As results are exacly same

**Benchmarking with simple Python**

In [None]:
random.seed(123) # initializing the specific random value

# initializing input matrix X and weight vector w
b = 0.
X = [[random.random() for _ in range(1000)] for _ in range(500)] # 500 rows and 1000 columns
w = [random.random() for _ in range(1000)]

# definging function that does weighted sum of input matrix and weight vector.
def plain_python(X, w, b):
  outputs = []
  for x in X:
    output = b
    for xi, wi in zip(x, w):
      output += xi * wi
    outputs.append(output)

  return outputs

%timeit plain_python(X, w, b) # will give us time took to execute the matmul in plain python

43.5 ms ± 1.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


**Benchmarking with PyTorch matmul**


In [None]:
# Converting
b = torch.tensor(b)
X = torch.tensor(X)
w = torch.tensor(w)

def pytorch_matmul(X, w, b):
  return X.matmul(w) + b

%timeit pytorch_matmul(X, w, b)

93.8 µs ± 10.2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


🟩 As we can see PyTorch is approx. 1000X faster in speed which is huge🔥

🟩 Its now clear why we prefer to use PyTorch over plain Python when implementing neural nets.

> ✍️ So that was matrix multiplication between matrix and a vector.

# Multiply two matrix
---
- It will be more relevent in multilayer perceptron.
- Can think it as weight matrix, and its rows, each row contains weights of the each neuron in the layer.


$$ X = \begin{bmatrix}x_1^{[1]} x_2^{[1]}...x_m^{[1]} \\ x_1^{[2]} x_2^{[2]}...x_m^{[2]} \\ .\\.\\.\\ x_1^{[n]} x_2^{[n]}...x_m^{[n]}
\end{bmatrix}  W = \begin{bmatrix}w_1^{(1)} w_2^{(1)},...,w_m^{(1)} \\ w_1^{(2)} w_2^{(2)},...,w_m^{(2)}  \\ .\\.\\.\\ w_1^{(h)} w_2^{(h)},...,w_m^{(h)}
\end{bmatrix}  $$

`n` training examples.

`h` weight vectors. can think each row as each node's weight coresponding to input row/vector.

- In training deep neural nets we often use matrix multiplication because we have a weight matrix now.

- So input is the same `X` as the training dataset.
- However, now we have a weight matrix `W`. And in weight matrix, the rows represents the different features.
- in addition we have columns now, and each columns produces different output.
- can think of it as perceptron's weighted sum but with multiple outputs. each column refers to different output.
- in essence we can think of these weight matrix as a weight matrix consisting of `h` weight vectors, each weight vector corresponds to 1 output.
- In order to multiply input matrix `X` with weight matrix `W`, we have to **make sure that the dimentions match** and **we are computing the right thing**
- 💡 so we want to compute dot product of each training example with each weight vector. So we do a transpose of the weight matrix.

$$ X = \begin{bmatrix}x_1^{[1]} x_2^{[1]}...x_m^{[1]} \\ x_1^{[2]} x_2^{[2]}...x_m^{[2]} \\ .\\.\\.\\ x_1^{[n]} x_2^{[n]}...x_m^{[n]}
\end{bmatrix}  W^{T} = \begin{bmatrix}w_1^{(1)} w_1^{(2)},...,w_1^{(h)} \\ w_2^{(1)} w_2^{(2)},...,w_2^{(h)}  \\ .\\.\\.\\ w_m^{(1)} w_m^{(2)},...,w_m^{(h)}
\end{bmatrix}  $$

- So, now each column of right hand side of weight matrix that is `W transpose` corresponds one weight vector. Then just like before we can compute the dot product between each row in dataset `X` and weight matrix `W`.
- to obtain first output value do dot product between `1st row` in input matrix and first column of weight matrix. and computing for each row in input we will get 1st column in result.
- doing this till all the column of weight matrix we will get full result.


In [None]:
X = torch.rand(100, 10)
W = torch.rand(50, 10)

R = torch.matmul(X, W.T) # two matrix multiplication using pytorch


In [None]:
X.shape

torch.Size([100, 10])

In [None]:
W.shape

torch.Size([50, 10])

In [None]:
R.shape

torch.Size([100, 50])

## Benchmarking


In [None]:
def pytorch_matmul(X, W):
  return torch.matmul(X, W.T)

%timeit pytorch_matmul(X,W)

10.4 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


# Broadcasting

let's see how we can work with different shapes of tensor.
- previously we saw how we can use concepts of linear algebra to make our code more efficient.
- Now, `Broadcasting` how to do computations with the unequal tensor shapes to save us some typing work

eg.

v1 = [1.1, 2.1, 3.1, 4.1] n = 5.6

want to do
`[1.1, 2.1, 3.1, 4.1] + 5.6`

- here we have vector `v1` and we want to add `n`'s value to the vector.
- we just can't add a single number to vector.
- However, tensor and array library have a concept called `broadcasting` where they create dimentions implicitly.

PyTorch will infer that we actually want to add vector consisting of 5.6 to the vector `v1` that we use to add. And this concept is called `Broadcasting`

like this `[1.1, 2.1, 3.1, 4.1] + [5.6, 5.6, 5.6, 5.6]` its called Broadcasting.

So when we add number to vector, output will be a vector where the same number is added to each element in the input vector.

In [None]:
a = torch.tensor([1.1, 2.1, 3.1, 4.1])
b = torch.tensor([5.6])

a + b

tensor([6.7000, 7.7000, 8.7000, 9.7000])

**Similarly we can add `vector` to a `matrix`.**

In [None]:
A = torch.tensor([[1.1, 2.1, 3.1, 4.1],
                 [1.2, 2.2, 3.2, 4.2]])

b = torch.tensor([5.4, 5.5, 5.6, 5.7])

A + b # addition of matrix A and vector b

tensor([[6.5000, 7.6000, 8.7000, 9.8000],
        [6.6000, 7.7000, 8.8000, 9.9000]])

### Mean operation


In [2]:
a = torch.tensor([[1., 2., 3.], [3., 2., 5.]])
a

tensor([[1., 2., 3.],
        [3., 2., 5.]])

In [3]:
a.mean(dim=0) # or torch.mean(a, dim=0).  dim = 0 meaning it will work on column values to mean

tensor([2., 2., 4.])

In [4]:
a.mean(dim=1) # or torch.mean(a, dim=1) dim = 1 meaning it will work on row values, mean of row values.

tensor([2.0000, 3.3333])