# Pytorch

##### Keywords: gradient descent,  pytorch, sgd, minibatch sgd

## Contents
{:.no_toc}
* 
{: toc}

## Installing PyTorch

Install pytorch by going to its web page


#### Testing Installation

If the code cell shows an error, then your PyTorch installation is not working and you should contact one of the teaching staff.

In [1]:
### Code Cell to Test PyTorch

import torch
print(torch.__version__)
import torchvision
import torchvision.transforms as transforms
print(torchvision.__version__)

x = torch.rand(5, 3)
print(x)

transforms.RandomRotation(0.7)
transforms.RandomRotation([0.9, 0.2])

t = transforms.RandomRotation(10)
angle = t.get_params(t.degrees)

print(angle)


1.0.0
0.2.1
tensor([[0.7855, 0.5797, 0.4545],
        [0.2648, 0.5096, 0.0966],
        [0.0087, 0.1676, 0.7308],
        [0.2209, 0.0587, 0.7217],
        [0.5059, 0.1692, 0.7231]])
-8.00169423514493


## Why PyTorch?

*All the quotes will come from the PyTorch About Page http://pytorch.org/about/ from which we'll plagiarize shamelessly.  After all, who better to tout the virtues of PyTorch than the creators?*


### What is PyTorch?

According to the PyTorch about page, "PyTorch is a python package that provides two high-level features:

- Tensor computation (like numpy) with strong GPU acceleration
- Deep Neural Networks built on a tape-based autograd system"

### Why is it getting so popular?

#### It's quite fast

"PyTorch has minimal framework overhead. We integrate acceleration libraries such as Intel MKL and NVIDIA (CuDNN, NCCL) to maximize speed. At the core, it’s CPU and GPU Tensor and Neural Network backends (TH, THC, THNN, THCUNN) are written as independent libraries with a C99 API.
They are mature and have been tested for years.

Hence, PyTorch is quite fast – whether you run small or large neural networks."

#### Imperative programming experience

"PyTorch is designed to be intuitive, linear in thought and easy to use. When you execute a line of code, it gets executed. There isn’t an asynchronous view of the world. When you drop into a debugger, or receive error messages and stack traces, understanding them is straight-forward. The stack-trace points to exactly where your code was defined. We hope you never spend hours debugging your code because of bad stack traces or asynchronous and opaque execution engines."

"PyTorch is not a Python binding into a monolothic C++ framework. It is built to be deeply integrated into Python. You can use it naturally like you would use numpy / scipy / scikit-learn etc. You can write your new neural network layers in Python itself, using your favorite libraries and use packages such as Cython and Numba. Our goal is to not reinvent the wheel where appropriate."

#### Takes advantage of GPUs easily

"PyTorch provides Tensors that can live either on the CPU or the GPU, and accelerate compute by a huge amount.

We provide a wide variety of tensor routines to accelerate and fit your scientific computation needs such as slicing, indexing, math operations, linear algebra, reductions. And they are fast!"


#### Dynamic Graphs!!!

"Most frameworks such as TensorFlow, Theano, Caffe and CNTK have a static view of the world. One has to build a neural network, and reuse the same structure again and again. Changing the way the network behaves means that one has to start from scratch.

With PyTorch, we use a technique called Reverse-mode auto-differentiation, which allows you to change the way your network behaves arbitrarily with zero lag or overhead. Our inspiration comes from several research papers on this topic, as well as current and past work such as autograd, autograd, Chainer, etc.

While this technique is not unique to PyTorch, it’s one of the fastest implementations of it to date. You get the best of speed and flexibility for your crazy research."



## Working with PyTorch Basics

Enough of the sales pitch!  Let's start to understand the PyTorch basics.

The basic unit of PyTorch is a tensor (basically a multi-dimensional array like a np.ndarray).

![](https://cdn-images-1.medium.com/max/2000/1*_D5ZvufDS38WkhK9rK32hQ.jpeg)

(image borrowed from https://hackernoon.com/learning-ai-if-you-suck-at-math-p4-tensors-illustrated-with-cats-27f0002c9b32 )

We can create PyTorch tensors directly.

In [2]:
# from https://www.stefanfiott.com/machine-learning/tensors-and-gradients-in-pytorch/
def tensor_properties(t, show_value=True):
    print('Tensor properties:')
    props = [('rank', t.dim()),
             ('shape', t.size()),
             ('data type', t.dtype),
             ('tensor type', t.type())]
    for s,v in props:
        print('\t{0:12}: {1}'.format(s,v))
    if show_value:
        #print('{0:12}: {1}'.format('value',t))
        print("Value:")
        print(t)

In [3]:
# torch.tensor always copies data. See below for 0-copy
scalar = torch.tensor(5)
tensor_properties(scalar)

Tensor properties:
	rank        : 0
	shape       : torch.Size([])
	data type   : torch.int64
	tensor type : torch.LongTensor
Value:
tensor(5)


In [4]:
## You can create torch.Tensor objects by giving them data directly

#  1D vector
vector_input = [1., 2., 3., 4., 5., 6.]
vector = torch.tensor(vector_input)
tensor_properties(vector)

Tensor properties:
	rank        : 1
	shape       : torch.Size([6])
	data type   : torch.float32
	tensor type : torch.FloatTensor
Value:
tensor([1., 2., 3., 4., 5., 6.])


In [5]:
# Matrix
matrix_input = [[1., 2., 3.], [4., 5., 6]]
matrix = torch.tensor(matrix_input)
tensor_properties(matrix)

Tensor properties:
	rank        : 2
	shape       : torch.Size([2, 3])
	data type   : torch.float32
	tensor type : torch.FloatTensor
Value:
tensor([[1., 2., 3.],
        [4., 5., 6.]])


In [6]:
# Create a 3D tensor of size 2x2x2.
tensor_input = [[[1., 2.], [3., 4.]],
          [[5., 6.], [7., 8.]]]
tensor3d = torch.tensor(tensor_input)

tensor_properties(tensor3d)

Tensor properties:
	rank        : 3
	shape       : torch.Size([2, 2, 2])
	data type   : torch.float32
	tensor type : torch.FloatTensor
Value:
tensor([[[1., 2.],
         [3., 4.]],

        [[5., 6.],
         [7., 8.]]])


They can be created without any initialization or initialized with random data from uniform (rand()) or normal (randn()) distributions

In [7]:
# Tensors with no initialization
x_1 = torch.Tensor(2, 5)
y_1 = torch.Tensor(3, 5)
tensor_properties(x_1)
tensor_properties(y_1)

Tensor properties:
	rank        : 2
	shape       : torch.Size([2, 5])
	data type   : torch.float32
	tensor type : torch.FloatTensor
Value:
tensor([[ 0.0000e+00, -3.6893e+19, -3.7212e-21, -1.5849e+29,  2.8131e+20],
        [ 1.7566e+25,  1.7748e+28,  0.0000e+00, -3.7237e-21,  8.5920e+09]])
Tensor properties:
	rank        : 2
	shape       : torch.Size([3, 5])
	data type   : torch.float32
	tensor type : torch.FloatTensor
Value:
tensor([[ 0.0000e+00, -3.6893e+19,  0.0000e+00, -3.6893e+19,  4.2981e+21],
        [ 6.3828e+28,  3.8016e-39,  0.0000e+00,  0.0000e+00, -3.6893e+19],
        [ 0.0000e+00, -3.6893e+19,  2.8131e+20,  1.7566e+25,  1.7748e+28]])


In [8]:
# Tensors initialized from uniform
x_2 = torch.rand(5, 3)
y_2 = torch.rand(5, 5)

tensor_properties(x_2)
tensor_properties(y_2)

Tensor properties:
	rank        : 2
	shape       : torch.Size([5, 3])
	data type   : torch.float32
	tensor type : torch.FloatTensor
Value:
tensor([[0.2612, 0.7302, 0.7795],
        [0.9621, 0.2362, 0.3938],
        [0.0813, 0.2182, 0.1140],
        [0.7256, 0.3401, 0.7494],
        [0.4888, 0.6210, 0.4900]])
Tensor properties:
	rank        : 2
	shape       : torch.Size([5, 5])
	data type   : torch.float32
	tensor type : torch.FloatTensor
Value:
tensor([[0.8440, 0.6660, 0.4346, 0.3580, 0.3835],
        [0.9997, 0.9853, 0.2406, 0.6101, 0.9860],
        [0.9228, 0.9695, 0.3000, 0.4735, 0.2903],
        [0.9267, 0.5977, 0.2435, 0.0379, 0.6333],
        [0.4250, 0.6443, 0.6927, 0.4173, 0.5849]])


In [10]:
# Tensors initialized from normal
x_3 = torch.randn(5, 3)
y_3 = torch.randn(5, 5)

tensor_properties(x_3)
tensor_properties(y_3)

Tensor properties:
	rank        : 2
	shape       : torch.Size([5, 3])
	data type   : torch.float32
	tensor type : torch.FloatTensor
Value:
tensor([[-1.0954, -0.0614, -1.0212],
        [ 1.0143, -0.0055, -1.2042],
        [ 1.7418, -0.2050, -1.3398],
        [ 1.2264,  0.9062,  0.5064],
        [ 0.1187, -0.9342,  0.8012]])
Tensor properties:
	rank        : 2
	shape       : torch.Size([5, 5])
	data type   : torch.float32
	tensor type : torch.FloatTensor
Value:
tensor([[ 0.1948,  0.4439, -0.2306, -0.4700,  0.7891],
        [-0.8612, -0.2490, -1.3734, -1.9507,  0.5581],
        [ 0.0487, -0.7507, -1.0863,  0.5295, -0.1510],
        [-0.2630, -1.1078,  1.4324,  0.2213, -0.0691],
        [-0.0661,  2.2021, -0.1563, -0.6238,  0.3503]])


The expected operations (arithmetic operations, addressing, etc) are all in place.

In [11]:
# Expect (2,5)
print(x_1.size())

print(x_1)


# Addition
print(x_2)
print(x_3)

print(x_2 + x_3)

# Addressing
print(x_3[:, 2])

torch.Size([2, 5])
tensor([[ 0.0000e+00, -3.6893e+19, -3.7212e-21, -1.5849e+29,  2.8131e+20],
        [ 1.7566e+25,  1.7748e+28,  0.0000e+00, -3.7237e-21,  8.5920e+09]])
tensor([[0.2612, 0.7302, 0.7795],
        [0.9621, 0.2362, 0.3938],
        [0.0813, 0.2182, 0.1140],
        [0.7256, 0.3401, 0.7494],
        [0.4888, 0.6210, 0.4900]])
tensor([[-1.0954, -0.0614, -1.0212],
        [ 1.0143, -0.0055, -1.2042],
        [ 1.7418, -0.2050, -1.3398],
        [ 1.2264,  0.9062,  0.5064],
        [ 0.1187, -0.9342,  0.8012]])
tensor([[-0.8342,  0.6687, -0.2417],
        [ 1.9764,  0.2307, -0.8105],
        [ 1.8231,  0.0133, -1.2258],
        [ 1.9520,  1.2464,  1.2558],
        [ 0.6075, -0.3132,  1.2913]])
tensor([-1.0212, -1.2042, -1.3398,  0.5064,  0.8012])


It's easy to move between PyTorch and Numpy worlds with numpy() and torch.from_numpy()

In [12]:
# PyTorch --> Numpy
print(x_1)
print(x_1.numpy())

print(type(x_1))
print(type(x_1.numpy()))

numpy_x_1 = x_1.numpy()

# does not makes a copy: just wraps a tensor object around the numpy array
pytorch_x_1 = torch.from_numpy(numpy_x_1)

print(type(numpy_x_1))
print(type(pytorch_x_1))

tensor([[ 0.0000e+00, -3.6893e+19, -3.7212e-21, -1.5849e+29,  2.8131e+20],
        [ 1.7566e+25,  1.7748e+28,  0.0000e+00, -3.7237e-21,  8.5920e+09]])
[[ 0.0000000e+00 -3.6893488e+19 -3.7212154e-21 -1.5849494e+29
   2.8131290e+20]
 [ 1.7566070e+25  1.7748358e+28  0.0000000e+00 -3.7237446e-21
   8.5920276e+09]]
<class 'torch.Tensor'>
<class 'numpy.ndarray'>
<class 'numpy.ndarray'>
<class 'torch.Tensor'>


Finally PyTorch provides some convenience mechanisms for concatenating Tensors via torch.cat() and reshaping them with  .view() 

In [13]:
## Concatenating

# By default, it concatenates along the zeroth(first) axis (concatenates rows)
x_1 = torch.randn(2, 5)
y_1 = torch.randn(3, 5)
z_1 = torch.cat([x_1, y_1])
print(z_1.shape)

# Concatenate columns:
x_2 = torch.randn(2, 3)
y_2 = torch.randn(2, 5)
# second arg specifies which axis to concat along
z_2 = torch.cat([x_2, y_2], 1)
print(z_2.shape)

## Reshaping
x = torch.randn(2, 3, 4)
print(x)
print(x.view(2, 12))  # Reshape to 2 rows, 12 columns
# Same as above.  If one of the dimensions is -1, its size can be inferred
print(x.view(2, -1))

torch.Size([5, 5])
torch.Size([2, 8])
tensor([[[-1.0053,  1.0082,  0.0516, -2.6949],
         [ 2.1851,  2.2241,  1.0111,  0.6711],
         [-1.5217, -0.4370, -0.0411,  2.3265]],

        [[ 0.7362, -0.4631,  0.0565,  0.1602],
         [-0.2661,  0.5695, -0.1632, -0.5906],
         [ 1.2274, -0.3200, -0.3043, -0.9487]]])
tensor([[-1.0053,  1.0082,  0.0516, -2.6949,  2.1851,  2.2241,  1.0111,  0.6711,
         -1.5217, -0.4370, -0.0411,  2.3265],
        [ 0.7362, -0.4631,  0.0565,  0.1602, -0.2661,  0.5695, -0.1632, -0.5906,
          1.2274, -0.3200, -0.3043, -0.9487]])
tensor([[-1.0053,  1.0082,  0.0516, -2.6949,  2.1851,  2.2241,  1.0111,  0.6711,
         -1.5217, -0.4370, -0.0411,  2.3265],
        [ 0.7362, -0.4631,  0.0565,  0.1602, -0.2661,  0.5695, -0.1632, -0.5906,
          1.2274, -0.3200, -0.3043, -0.9487]])


## PyTorch Variables and the Computational Graph

Ok -- back to PyTorch.

The other fundamental PyTorch construct besides Tensors are Variables.  Variables are very similar to tensors, but they also keep track of the graph (including their gradients for autodifferentiation).  They are defined in the autograd module of torch.

This has changed in recent versions of pytorch, but i want to keep this section in as you will likely see code which uses `Variables`. A `Variable` bow is just a tensor with `requires_grad=True`.

In [14]:
from torch.autograd import Variable
import torch.nn as nn

# Let's create a variable by initializing it with a tensor
first_tensor = torch.Tensor([23.3])

In [15]:
tensor_properties(first_tensor)

Tensor properties:
	rank        : 1
	shape       : torch.Size([1])
	data type   : torch.float32
	tensor type : torch.FloatTensor
Value:
tensor([23.3000])


In [16]:
print("first_tensor.grad", first_tensor.grad)

first_tensor.grad None


In [17]:
first_variable = Variable(first_tensor, requires_grad=True)

print("first variables gradient: ", first_variable.grad)
print("first variables data: ", first_variable.data)

first variables gradient:  None
first variables data:  tensor([23.3000])


In [18]:
tensor_properties(first_tensor)

Tensor properties:
	rank        : 1
	shape       : torch.Size([1])
	data type   : torch.float32
	tensor type : torch.FloatTensor
Value:
tensor([23.3000])


In [19]:
first_tensor_new = torch.tensor([23.3], requires_grad=True)

In [20]:
tensor_properties(first_tensor_new)

Tensor properties:
	rank        : 1
	shape       : torch.Size([1])
	data type   : torch.float32
	tensor type : torch.FloatTensor
Value:
tensor([23.3000], requires_grad=True)


In [21]:
print("first variables gradient: ", first_tensor_new.grad)
print("first variables data: ", first_variable.data)

first variables gradient:  None
first variables data:  tensor([23.3000])


In [22]:
print("first_tensor.grad", first_tensor.grad)

first_tensor.grad None


Now let's create some new variables. We can do so implicitly just by creating other variables with functional relationships to our variable.

In [23]:
x = first_variable
print("x.data", x.data)
y = (x * x) * (x - 2) # y is a variable
z = torch.tanh(y) # z has a functional relationship to y, it's a variable
print("z.grad: ", z.grad)

z.backward()

print("y.data: ", y.data)
print("y.grad: ", y.grad)

print("z.data: ", z.data)
print("z.grad: ", z.grad)

print("x.grad:", x.grad)


x.data tensor([23.3000])
z.grad:  None
y.data:  tensor([11563.5557])
y.grad:  None
z.data:  tensor([1.])
z.grad:  None
x.grad: tensor([0.])


In [24]:
x = first_tensor_new
print("x.data", x.data)
y = (x ** x) * (x - 2) # y is a variable
z = torch.tanh(y) # z has a functional relationship to y
print("z.grad: ", z.grad)

z.backward()

print("y.data: ", y.data)
print("y.grad: ", y.grad)

print("z.data: ", z.data)
print("z.grad: ", z.grad)

print("x.grad:", x.grad)



x.data tensor([23.3000])
z.grad:  None
y.data:  tensor([1.5409e+33])
y.grad:  None
z.data:  tensor([1.])
z.grad:  None
x.grad: tensor([0.])


Variables (and now tensors requiring gradients) come with a .backward() that allows them to do autodifferentiation via backwards propagation.  