In [1]:
import torch
import torch.nn as nn
import numpy as np
import matplotlib.pyplot as plt
import time

`pytorch` is a python package which enables users to train state-of-the-art machine learning/deep learning models. In order to efficiently use `pytorch`, one needs to have a firm understanding of the basic building blocks of pytorch: the `torch.tensor` object. In many ways, it's similar to a numpy array

# Numpy vs. Torch

Numpy **`array`s** and pytorch **`tensor`s** can be created in the same way:

In [3]:
n = np.linspace(0,1,5)
t = torch.linspace(0,1,5)

In [4]:
n

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [5]:
t

tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])

They can be resized in similar ways

In [7]:
n = np.arange(48).reshape(3,4,4)
t = torch.arange(48).reshape(3,4,4)

In [8]:
t

tensor([[[ 0,  1,  2,  3],
         [ 4,  5,  6,  7],
         [ 8,  9, 10, 11],
         [12, 13, 14, 15]],

        [[16, 17, 18, 19],
         [20, 21, 22, 23],
         [24, 25, 26, 27],
         [28, 29, 30, 31]],

        [[32, 33, 34, 35],
         [36, 37, 38, 39],
         [40, 41, 42, 43],
         [44, 45, 46, 47]]])

Most importantly, they have the same broadcasting rules. In order to use `pytorch` (and even `numpy` for that matter) most efficiently, one needs to have a very strong grasp on the **broadcasting rules**.

# General Broadcasting Rules

When operating on two arrays, NumPy compares their shapes element-wise. It starts with the trailing (i.e. rightmost) dimensions and works its way left. Two dimensions are compatible when

1. they are equal, or
2. one of them is 1

**Example**: The following are compatible

Shape 1: (1,6,4,1,7,2)

Shape 2: (5,6,1,3,1,2)

In [14]:
a = np.ones((6,5))
b = np.arange(5).reshape((1,5))

In [22]:
a+b

array([[1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.],
       [1., 2., 3., 4., 5.]])

In [23]:
a = torch.ones((6,5))
b = torch.arange(5).reshape((1,5))

The arrays/tensors don't need to have the same number of dimenions. If one of the arrays/tensors has less dimensions than the other

**Example**: Scaling each other the color channels of an image by a different amount:

<pre><span></span><span class="n">Image</span>  <span class="p">(</span><span class="mi">3</span><span class="n">d</span> <span class="n">array</span><span class="p">):</span> <span class="mi">256</span> <span class="n">x</span> <span class="mi">256</span> <span class="n">x</span> <span class="mi">3</span>
<span class="n">Scale</span>  <span class="p">(</span><span class="mi">1</span><span class="n">d</span> <span class="n">array</span><span class="p">):</span>             <span class="mi">3</span>
<span class="n">Result</span> <span class="p">(</span><span class="mi">3</span><span class="n">d</span> <span class="n">array</span><span class="p">):</span> <span class="mi">256</span> <span class="n">x</span> <span class="mi">256</span> <span class="n">x</span> <span class="mi">3</span>
</pre>


In [30]:
Image = torch.randn((256,256,3))
Scale = torch.tensor([0.5,1.5,1])

In [33]:
Result = Image*Scale

**Example**: One has an array of 2 images and wants to scale the color channels of each image by a slightly different amount:

<pre><span></span><span class="n">Images</span>  <span class="p">(</span><span class="mi">4</span><span class="n">d</span> <span class="n">array</span><span class="p">):</span> <span class="mi">2</span> <span class="n">x</span> <span class="mi">256</span> <span class="n">x</span> <span class="mi">256</span> <span class="n">x</span> <span class="mi">3</span>
<span class="n">Scales</span>  <span class="p">(</span><span class="mi">4</span><span class="n">d</span> <span class="n">array</span><span class="p">):</span> <span class="mi">2</span> <span class="n">x</span> <span class="mi">1</span> <span class="n">x</span> <span class="mi">1</span> <span class="n">x</span> <span class="mi">3</span>
<span class="n">Results</span>  <span class="p">(</span><span class="mi">4</span><span class="n">d</span> <span class="n">array</span><span class="p">):</span> <span class="mi">2</span> <span class="n">x</span> <span class="mi">256</span> <span class="n">x</span> <span class="mi">256</span> <span class="n">x</span> <span class="mi">3</span>
</pre>

In [35]:
Images = torch.randn((2,256,256,3))
Scales = torch.tensor([0.5,1.5,1,1.5,1,0.5]).reshape((2,1,1,3))

# Operations Across Dimensions

This is so fundamental for pytorch. Obviously simple operations can be done one 1 dimensional tensors:

In [38]:
t = torch.tensor([0.5,1,3,4])
torch.mean(t), torch.std(t), torch.max(t), torch.min(t)

(tensor(2.1250), tensor(1.6520), tensor(4.), tensor(0.5000))

But suppose we have a 2d tensor, for example, and want to compute the mean value of each columns:
* Note: taking the mean **of** each column means taking the mean **across** the rows (which are the first dimension)

In [39]:
t = torch.arange(20, dtype=float).reshape(5,4)

In [43]:
t = torch.arange(20, dtype=float).reshape(5,4)
torch.mean(t, axis=0)

tensor([ 8.,  9., 10., 11.], dtype=torch.float64)

This can be done for higher dimensionality arrays as well

In [44]:
t = torch.randn(5,256,256,3)

Take the mean across the batch (size 5)

In [47]:
torch.mean(t,axis=0).shape

torch.Size([256, 256, 3])

Take the mean across the color channels:

In [49]:
torch.mean(t,axis=-1).shape

torch.Size([5, 256, 256])

Take only the maximum color channel values (and get the corresponding indices):
* This is done all the time in image segmentation models (i.e. take an image, decide which pixels correspond to, say, a car)

In [50]:
values, indices = torch.max(t,axis=-1)

# So Where Do Pytorch and Numpy Differ?

**Pytorch** starts to really differ from **numpy** in terms of automatically computing gradients of operations

$$y = \sum_i x_i^3$$

has a gradient

$$\frac{\partial y}{\partial x_i} = 3x_i^2$$

In [54]:
x = torch.tensor([[5.,8.],[4.,6.]], requires_grad=True)

In [56]:
x = torch.tensor([[5.,8.],[4.,6.]], requires_grad=True)
y = x.pow(3).sum()
y

tensor(917., grad_fn=<SumBackward0>)

Compute the gradient:

In [57]:
y.backward() #compute the gradient
x.grad #print the gradient (everything that has happened to x)

tensor([[ 75., 192.],
        [ 48., 108.]])

Double check using the analytical formula:

In [59]:
3*x**2

tensor([[ 75., 192.],
        [ 48., 108.]], grad_fn=<MulBackward0>)

The automatic computation of gradients is the backbone of training deep learning models. Unlike in the example above, most gradient computations don't have an analytical formula, so the automatic computation of gradients is essential. In general, if one has 

$$y = f(\vec{x})$$

Then pytorch can compute $\partial y / \partial x_i$. For each of element of the vector $\vec{x}$. In the context of machine learning, $\vec{x}$ contains all the weights (also known as parameters) of the neural network and $y$ is the **Loss Function** of the neural network.

# Additional Benefits

**In addition, any sort of large matrix multiplication problem is faster with torch tensors than it is with numpy arrays, especially if you're running on a GPU**

Using torch: (with a CPU. With GPU, this is much much faster)

In [60]:
A = torch.randn((1000,1000))
B = torch.randn((1000,1000))

In [61]:
t1 = time.perf_counter()
torch.matmul(A,B)
t2 = time.perf_counter()
print(t2-t1)

0.00980130000243662


Using numpy: 

In [62]:
A = np.random.randn(int(1e6)).reshape((1000,1000))
B = np.random.randn(int(1e6)).reshape((1000,1000))

In [63]:
t1 = time.perf_counter()
A@B
t2 = time.perf_counter()
print(t2-t1)

0.028782699999283068
