In [3]:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
import time

pytorch is a python package which enables users to train state-of-the-art machine learning/deep learning models. In order to efficiently use pytorch, one needs to have a firm understanding of the basic building blocks of pytorch: the *torch.tensor* object. In many ways, it's similar to a *numpy array*.

# Numpy vs. Torch

Numpy arrays and PyTorch tensors can be created in the same way.

In [11]:
np.arange(48)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47])

In [12]:
torch.arange(48)

tensor([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
        18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35,
        36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47])

In [4]:
n = np.linspace(0, 1, 5)
t = torch.linspace(0, 1, 5)

In [5]:
n

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [6]:
t

tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])

They can be resize in similar way

In [8]:
n = np.arange(48).reshape(3,4,4) # 3*4*4=48
t = torch.arange(48).reshape(3,4,4)

In [9]:
n

array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11],
        [12, 13, 14, 15]],

       [[16, 17, 18, 19],
        [20, 21, 22, 23],
        [24, 25, 26, 27],
        [28, 29, 30, 31]],

       [[32, 33, 34, 35],
        [36, 37, 38, 39],
        [40, 41, 42, 43],
        [44, 45, 46, 47]]])

In [10]:
t

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11],
         [12, 13, 14, 15]],

        [[16, 17, 18, 19],
         [20, 21, 22, 23],
         [24, 25, 26, 27],
         [28, 29, 30, 31]],

        [[32, 33, 34, 35],
         [36, 37, 38, 39],
         [40, 41, 42, 43],
         [44, 45, 46, 47]]])

Most importantly, they have the same broadcasting rules. In order to use pytorch (and even numpy for that matter) most efficiently, one needs to have a strong grasp on the broadcasting rules.

# General Broadcasting Rules

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when: <br>
1. they are equal, or
2. one of them (one of the corresponding shape) is 1 <br>

**Example** : The following are compatible <br>
Shape 1: (1,6,4,1,7,2) # 6D array<br>
Shape 2: (5,6,1,3,1,2) # 6D array <br>

because 2=2, among 7 and 1, one is 1, among 1 and 3, one of them is 1 and so on....


In [13]:
a = np.array([1,2])
b = np.array([3,4])

In [14]:
a * b

array([3, 8])

In [17]:
a = np.ones((6, 5))
b = np.arange(5).reshape((1,5))

In [18]:
a

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [19]:
b

array([[0, 1, 2, 3, 4]])

In [21]:
a.shape

(6, 5)

In [22]:
b.shape

(1, 5)

here, 5=5 and among 6 and 1, one of them is 1. so a and b are compatible for multiplication or addition

In [23]:
a + b

array([[1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.]])

b is duplicated as the number of rows in a and added or multiplied

In [24]:
a * b

array([[0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.],
       [0., 1., 2., 3., 4.]])

In [25]:
a = torch.ones((6, 5))
b = torch.arange(5).reshape((1,5))

In [26]:
a.shape

torch.Size([6, 5])

In [27]:
b.shape

torch.Size([1, 5])

In [28]:
a + b

tensor([[1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.],
        [1., 2., 3., 4., 5.]])

In [29]:
a * b

tensor([[0., 1., 2., 3., 4.],
        [0., 1., 2., 3., 4.],
        [0., 1., 2., 3., 4.],
        [0., 1., 2., 3., 4.],
        [0., 1., 2., 3., 4.],
        [0., 1., 2., 3., 4.]])

 The arrays/ tensors don't need to have the same number of dimensions. If one of the arrays/ tensors has less dimension than the other

**Example**: Scaling each other the color channels of an image by different amount: <br>
Image (3d array): 256 x 256 x 3 <br>
Scale (1d array): &nbsp;    &emsp;  &emsp;  &emsp;  &emsp; 3 <br>
Result (3d array): 256 x 256 x 3

In [34]:
Image = torch.randn((256,256,3))
Scale = torch.tensor([0.5, 1.5, 1])

In [35]:
Result = Image * Scale

In [36]:
Result

tensor([[[ 5.1454e-02,  2.6802e-01,  4.1596e-01],
         [ 8.4332e-01, -1.8419e-01, -1.4681e-01],
         [ 1.1405e-01, -1.6455e+00,  1.4219e-01],
         ...,
         [-2.6965e-03, -1.4741e+00, -1.2482e+00],
         [ 1.1183e+00,  3.6781e-02, -2.3700e-02],
         [-1.0798e-01,  2.3609e+00,  1.7246e+00]],

        [[-2.3946e-01, -1.4655e+00,  1.4798e+00],
         [ 1.0470e+00,  3.8732e-01,  9.2530e-01],
         [ 1.4290e-01,  1.6302e-01, -3.4421e-01],
         ...,
         [-2.2645e-01,  1.9465e+00, -1.5431e+00],
         [ 2.0174e-01,  6.5452e-01, -7.3650e-01],
         [ 8.1948e-02, -3.9358e-01,  1.2933e+00]],

        [[ 2.1527e-01,  5.7599e-01, -1.1575e-01],
         [ 9.4873e-02, -1.5287e+00,  7.6560e-02],
         [-5.4928e-01, -5.1547e-01, -8.7000e-01],
         ...,
         [ 3.3562e-01,  8.3634e-01, -1.8612e+00],
         [-2.4708e-01,  4.2533e-01,  4.5570e-01],
         [ 2.8943e-01,  6.7803e-01, -5.6016e-01]],

        ...,

        [[-6.2291e-01,  3.0958e-01, -9

**Example**: One has an array of 2 images and wants to scale the color channels of each image by a slightly different amount: <br>
Images (4d array): 2 x 256 x 256 x 3 <br>
Scales (4d array): 2 x 1 x 1 x 3 <br>
Results (4d array): 2 x 256 x 256 x 3

In [37]:
Images = torch.randn((2, 256, 256, 3))
Scales =  torch.tensor([0.5, 1.5, 1, 1.5, 1, 0.5]).reshape((2,1,1,3))

In [38]:
Scales

tensor([[[[0.5000, 1.5000, 1.0000]]],


        [[[1.5000, 1.0000, 0.5000]]]])

In [39]:
Results = Images * Scales

In [40]:
Results

tensor([[[[ 6.9224e-01,  2.1387e+00,  5.6479e-01],
          [ 1.0987e+00,  2.5955e+00, -1.7519e+00],
          [-4.2801e-02, -2.4423e+00,  6.6390e-01],
          ...,
          [ 1.1118e+00, -1.6474e+00,  5.6583e-01],
          [ 8.6046e-01, -9.1463e-01, -7.8563e-02],
          [-5.9368e-02,  1.0753e+00,  2.1903e-02]],

         [[-4.9166e-01,  3.0684e+00, -9.7192e-02],
          [-4.2931e-01,  1.1056e+00, -8.6740e-01],
          [-4.0219e-01, -1.1884e+00,  6.3587e-01],
          ...,
          [ 3.4750e-01, -4.9126e-01,  2.7277e+00],
          [ 2.8576e-01,  1.1128e+00,  2.2119e-01],
          [-1.7568e-01,  2.4408e-01,  6.9873e-01]],

         [[-2.8574e-01,  1.3898e+00, -1.5342e+00],
          [-1.2999e-01, -1.1167e+00, -6.7655e-01],
          [-3.4302e-01,  5.1024e-01,  6.4373e-01],
          ...,
          [-5.2884e-01,  2.3116e-01, -4.7809e-01],
          [ 6.1831e-01,  3.3039e-01, -5.3643e-01],
          [ 3.2004e-01, -2.3021e+00,  4.7926e-01]],

         ...,

         [[ 3.41

# Operations Across Dimensions

This is so fundamental for pytorch. Obviously simple operations can be done on 1 dimensional tensors:

In [42]:
t = torch.tensor([0.5, 1, 3, 4])
torch.mean(t), torch.std(t), torch.max(t), torch.min(t)

(tensor(2.1250), tensor(1.6520), tensor(4.), tensor(0.5000))

But suppose we have a 2d tensor, for example, and we want to compute the mean value of each columns: <br>
* Note: taking the mean of each column means taking the mean **across** the rows (which are the first dimension

In [43]:
t = torch.arange(20, dtype=float).reshape(5,4)

In [44]:
t

tensor([[ 0.,  1.,  2.,  3.],
        [ 4.,  5.,  6.,  7.],
        [ 8.,  9., 10., 11.],
        [12., 13., 14., 15.],
        [16., 17., 18., 19.]], dtype=torch.float64)

In [50]:
torch.mean(t, axis = 0)

tensor([ 8.,  9., 10., 11.], dtype=torch.float64)

This can be done for higher dimensional arrays as well

In [51]:
t = torch.randn(5, 256, 256, 3)

Take the mean across the batch (size 5)

In [53]:
torch.mean(t, axis=0)

tensor([[[ 0.5862,  0.2114,  0.8305],
         [ 0.1073,  0.1592,  0.1144],
         [ 0.2857, -0.0277,  0.5187],
         ...,
         [-0.1007, -0.4852, -0.3457],
         [-0.2784,  0.5910, -0.2158],
         [-0.4625, -0.5209, -0.3702]],

        [[-0.6171,  0.4625, -0.1960],
         [ 1.2811, -0.3595,  0.0391],
         [ 0.0184, -0.5875, -0.2994],
         ...,
         [-0.6224,  0.1476,  0.5807],
         [ 0.6610, -0.3446,  0.4007],
         [ 0.4598,  0.0181, -0.2860]],

        [[-0.9435, -0.2387, -0.1080],
         [-0.4153, -0.3182, -0.7225],
         [-0.1607,  0.1500, -0.5762],
         ...,
         [ 0.2325,  0.5990, -0.3596],
         [-0.3315, -0.9168, -1.4226],
         [-0.4230,  0.4585,  0.3777]],

        ...,

        [[ 0.3885, -0.0937,  0.5384],
         [-0.0184, -0.3899,  0.4461],
         [-0.2193,  0.1354, -0.0564],
         ...,
         [ 0.4098, -0.4612,  0.0108],
         [ 0.3455, -0.0405, -0.0165],
         [ 0.2688,  0.3338, -0.5377]],

        [[

In [54]:
torch.mean(t, axis=0).shape

torch.Size([256, 256, 3])

Take the mean across the color channels

In [57]:
torch.mean(t, axis=-1)

tensor([[[ 3.8287e-01,  2.5946e-01, -6.8375e-01,  ..., -1.0806e-01,
           4.4483e-01, -2.0371e-02],
         [-7.5231e-01, -3.9452e-01, -9.9473e-02,  ..., -7.2902e-01,
           1.5516e+00,  3.7451e-01],
         [-2.2500e-01,  1.4093e-01, -2.6268e-01,  ...,  2.6568e-01,
          -9.8937e-01,  8.5870e-01],
         ...,
         [ 4.6504e-01,  9.4296e-01, -3.6420e-01,  ..., -1.0970e-01,
          -1.4246e-02, -4.1372e-01],
         [ 4.1170e-01,  1.0829e+00,  3.8034e-01,  ..., -4.1560e-02,
           1.4081e+00, -3.2450e-01],
         [-2.0361e+00,  1.4021e+00,  5.6690e-01,  ..., -1.2001e+00,
           6.1863e-01,  1.1562e-01]],

        [[ 1.1500e+00,  9.0068e-02,  1.4117e+00,  ..., -1.2572e-02,
           9.7607e-01, -1.0760e+00],
         [-4.5693e-01, -1.2365e-01,  1.9171e-01,  ...,  5.7745e-01,
          -1.2848e+00, -8.4582e-02],
         [ 3.5998e-01, -8.5583e-01, -6.8098e-01,  ..., -1.8599e-01,
          -6.1946e-01,  7.0077e-01],
         ...,
         [ 5.0455e-01,  1

In [58]:
torch.mean(t, axis=-1).shape

torch.Size([5, 256, 256])

Take only the maximum color channel values, (and get the corresponding indices): <br>
* This is done by all the time in image segmentation models (i.e. take an image, decide which pixels correspond to, say, a car)

In [62]:
values, indices = torch.max(t, axis=-1)

In [63]:
values, indices

(tensor([[[ 1.0340,  1.1700, -0.0560,  ...,  0.3090,  1.5074,  0.7787],
          [ 0.3300,  0.6722,  0.7266,  ...,  0.2052,  2.8091,  0.9162],
          [ 1.2477,  1.5044,  0.6295,  ...,  0.7073, -0.6769,  1.4490],
          ...,
          [ 1.2723,  1.4845,  0.2509,  ...,  0.4104,  0.8282,  0.2327],
          [ 1.9201,  2.3256,  2.3640,  ...,  0.2043,  2.0907,  0.3436],
          [-0.8064,  1.7442,  1.3520,  ..., -0.2068,  1.3393,  0.2322]],
 
         [[ 1.8308,  0.3280,  2.2460,  ...,  0.2707,  1.8919, -0.5982],
          [-0.1824,  0.0938,  0.3293,  ...,  0.8796, -0.0886,  0.7682],
          [ 2.0298, -0.2823, -0.2881,  ...,  0.7440, -0.2202,  1.5713],
          ...,
          [ 0.8017,  1.0445, -0.1980,  ...,  1.0038, -0.2186,  0.2398],
          [ 0.2863,  1.0484,  1.4741,  ...,  1.2144,  0.1738,  1.1143],
          [ 0.8300,  0.7556,  0.4555,  ...,  0.0966, -0.0131,  0.4326]],
 
         [[ 1.1217,  0.9255, -0.1270,  ...,  0.0340,  1.1446,  1.0615],
          [ 0.6358,  2.7055,

In [64]:
values.shape

torch.Size([5, 256, 256])

# So where do pytorch and numpy differ?

**PyTorch** starts to really differ from **Numpy** in terms of automatically computing gradients of operations. 
$$ y = \sum_{i} x_i^3 $$
has a gradient
$$ \frac{\delta y}{\delta x_i} = 3x_i^2$$

In [67]:
x = torch.tensor([[5.,8.],[4.,6.]], requires_grad=True)
y = x.pow(3).sum()
y

tensor(917., grad_fn=<SumBackward0>)

Compute the gradient

In [69]:
y.backward() # compute the gradient


In [71]:
x.grad #print the gradient (everything that has happened to x)

tensor([[ 75., 192.],
        [ 48., 108.]])

Double check using gradient formula:

In [72]:
3*x**2

tensor([[ 75., 192.],
        [ 48., 108.]], grad_fn=<MulBackward0>)

The automatic computation of gradients is the backbone of training deep learning models. Unlike in the example above, most gradient computations don't have an analytical formula, so the automatic computation of gradients is essential. In general, if one has
$$ y = (\vec{x})$$

Then pytorch can compute $\frac{\delta y}{\delta x_i}$
. For each of element of the vector 
. In the context of machine learning, 
 contains all the weights (also known as parameters) of the neural network and 
 is the Loss Function of the neural network.

# Additional Benefits

**In addition, any sort of large matrix multiplication problem is faster with torch tensors than it is with numpy arrays, especially if you're running on a GPU.** <br>

Using torch: (with a CPU. With GPU, this is much much faster)

In [93]:
A = torch.randn((1000,1000))
B = torch.randn((1000,1000))

In [94]:
t1 = time.perf_counter()
torch.matmul(A,B)
t2 = time.perf_counter()
print(t2-t1)

0.010379899998952169


Using numpy:

In [95]:
A = np.random.randn(int(1e6)).reshape((1000,1000))
B = np.random.randn(int(1e6)).reshape((1000,1000))

In [96]:
t1 = time.perf_counter()
A@B
t2 = time.perf_counter()
print(t2-t1)

0.007250399999975343
