*Accompanying code examples of the book "Introduction to Artificial Neural Networks and Deep Learning: A Practical Guide with Applications in Python" by Sebastian Raschka. All code examples are released under the [MIT license](https://github.com/rasbt/deep-learning-book/blob/master/LICENSE). If you find this content useful, please consider supporting the work by buying a [copy of the book](https://leanpub.com/ann-and-deeplearning).*


Other code examples and content are available on [GitHub](https://github.com/rasbt/deep-learning-book). The PDF and ebook versions of the book are available through [Leanpub](https://leanpub.com/ann-and-deeplearning).

# Appendix G - PyTorch Basics

* [PyTorch in a Nutshell](#PyTorch-in-a-Nutshell)
* [Tensors](#Tensors)
* [Installation](#Installation)
* [CPU and GPU](#CPU-and-GPU)

This appendix offers a brief overview of [*PyTorch*](http://pytorch.org), an open-source library for numerical computation and deep learning. This section is intended for readers who want to gain a basic overview of this library before progressing through the hands-on sections that are concluding the main chapters.
The majority of hands-on sections in this book focus will use PyTorch, assuming that you have PyTorch >=0.1 installed if you are planning to execute the code sections shown in this book.
In addition to glancing over this appendix, I recommend the following resources from the official documentation for a more in-depth coverage on using this library:

- [API documentation](http://pytorch.org/docs/)
- [Example projects](https://github.com/pytorch/examples)
- [Tutorials](https://github.com/pytorch/tutorials)

## PyTorch in a Nutshell

At its core, PyTorch is a library for efficient tensor operations with primitives for deep learning, which make the latter more efficient and convenient in contrast to standard linear algebra libraries that only execute operations on Central Processing Units (CPUs). If you have used [NumPy](http://www.numpy.org) before -- Python's main library for scientific computing -- PyTorch may look very familiar to you. However, the main strength of PyTorch is that it allows us to utilize Graphics Processing Units (GPUs). You may have already heard of GPUs; we can think of them as small computer clusters, real power horses,  inside our computers. And compared to state-of-the-art CPUs, GPUs are relatively cheap.

In a nutshell, PyTorch is a library for tensor computations, just like NumPy, but with massive GPU acceleration. And one advantage of PyTorch over other existing frameworks for deep learning is that it puts Python first. Thus, the API feels very familiar to Python and NumPy users, and everything happens dynamically without having to set up static computation graphs. (Note that if you don't have a GPU that supports CUDA, PyTorch runs on CPUs as well; the use of a GPU, or multiple GPUs, is entirely optional.)

Besides being efficient at performing highly parallelized numerical computations, PyTorch leverages reverse-mode auto-differentiation to compute the derivatives of mathematical functions very efficiently, functions that take multiple inputs and produce a single output. As the developers state it in the documentation, PyTorch is

> [...] one of the fastest implementations of [reverse-mode auto-differentiation] to date. You get the best of speed and flexibility for your crazy research.

And besides being fast at training small or large neural networks, its elegant API makes neural networks implemented in PyTorch very easy to read and comprehend, which is a huge plus when it comes to academic research and the development of real-world applications.

## Tensors

As mentioned in the introduction, PyTorch mainly operates on tensors, but what is a *tensor*? In simple terms, we can think of tensors as multi-dimensional arrays of numbers, a generalization of scalars, vectors, and matrices.

1. Scalar: $\mathbb{R}$
2. Vector: $\mathbb{R}^n$
3. Matrix: $\mathbb{R}^n \times \mathbb{R}^m$
4. 3-Tensor: $\mathbb{R}^n \times \mathbb{R}^m \times \mathbb{R}^p$
5. ...

When we describe a tensor, we refer to its "dimensions" as the *rank* (or *order*) of a tensor, which is not to be confused with the dimensions of a matrix. For instance, an $m \times n$ matrix, where $m$ is the number of rows and $n$ is the number of columns, would be a special case of a rank-2 tensor. A visual explanation of tensors and their ranks is given is the figure below.

![Tensors](images/appendix_pytorch/tensors.png)

Code conventions in this book follow the Python 3.x syntax, and while the code examples should be backwards compatible to Python 2.7, I highly recommend the use of Python >=3.5.

Once you have your Python environment set up ([Appendix - Python Setup](#python-setup)), the two probably most convenient ways for installing PyTorch are via `pip` or `conda` -- the latter only applies if you have the [Anaconda/Miniconda Python distribution](https://www.continuum.io/downloads) installed, which I prefer and recommend.

Since PyTorch is under active development, I recommend you to visit the [PyTorch Website](http://pytorch.org) for more information on the installation procedure that fits your hardware and operating system, for example, as shown in the screenshot below:

![Tensors](images/appendix_pytorch/pytorch-install.png)

In [1]:
import numpy as np

y = np.array([[1., 2., 3.],
              [4., 5., 6.]])

print('Dimensions:', y.shape)
print('Tensor contents:')
y

Dimensions: (2, 3)
Tensor contents:


array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]])

In [2]:
import torch

x = torch.Tensor([[1., 2., 3.],
                  [4., 5., 6.]])

print('Dimensions:', x.size())
print('Tensor contents:')
x

Dimensions: torch.Size([2, 3])
Tensor contents:



 1  2  3
 4  5  6
[torch.FloatTensor of size 2x3]

In [3]:
print(y + y)
print(x + x)

[[  2.   4.   6.]
 [  8.  10.  12.]]

  2   4   6
  8  10  12
[torch.FloatTensor of size 2x3]



In [4]:
x.numpy()

array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]], dtype=float32)

In [5]:
import torch
import numpy

y = np.array([[1., 2., 3.],
              [4., 5., 6.]])

torch.from_numpy(y)


 1  2  3
 4  5  6
[torch.DoubleTensor of size 2x3]

In [6]:
x.t()


 1  4
 2  5
 3  6
[torch.FloatTensor of size 3x2]

In [7]:
x


 1  2  3
 4  5  6
[torch.FloatTensor of size 2x3]

In [8]:
y.transpose()

array([[ 1.,  4.],
       [ 2.,  5.],
       [ 3.,  6.]])

## Autograd

In [9]:
from torch.autograd import Variable

In [10]:
x = Variable(torch.ones(2, 2), requires_grad=True)
print('x:\n', x)
print('x.creator:', x.creator)

x:
 Variable containing:
 1  1
 1  1
[torch.FloatTensor of size 2x2]

x.creator: None


In [11]:
z = (2*x)**3
print('1) z:\n', z)
print('2) z.creator:', z.creator)
print('3) z.requires_grad:', z.requires_grad)
print('4) z.grad:\n', z.grad)

1) z:
 Variable containing:
 8  8
 8  8
[torch.FloatTensor of size 2x2]

2) z.creator: <torch.autograd._functions.basic_ops.PowConstant object at 0x10a6a6508>
3) z.requires_grad: True
4) z.grad:
 Variable containing:
 0  0
 0  0
[torch.FloatTensor of size 2x2]



In [12]:
out = torch.sum(z)
print('1) out:', out)
print('2) out.creator', out.creator)

1) out: Variable containing:
 32
[torch.FloatTensor of size 1]

2) out.creator <torch.autograd._functions.reduce.Sum object at 0x10a6a65a0>


In [13]:
out.backward()

In [14]:
x.grad

Variable containing:
 24  24
 24  24
[torch.FloatTensor of size 2x2]

## Linear Regression

In [43]:
from torch.autograd import Variable
import torch


x = Variable(torch.Tensor([[1.0], 
                           [2.1], 
                           [3.6], 
                           [4.2], 
                           [6.0], 
                           [7.0]]))

y = Variable(torch.Tensor([1.0, 2.1, 3.6, 4.2, 6.0, 7.0]))

weights = Variable(torch.zeros(1, 1), requires_grad=True)
bias = Variable(torch.zeros(1), requires_grad=True)

for i in range(5000):

    net_input = x.mm(weights)
    net_input += bias.unsqueeze(0).expand_as(net_input)
    # net_input = F.linear(input, weights, bias)
    
    loss = torch.mean((net_input - y)**2)
    loss.backward()
    weights.data.add_(-0.00001 * weights.grad.data)
    bias.data.add_(-0.00001 * loss.data)
    
    if loss.data[0] < 1e-3:
        break

print('n_iter', i)
print(loss.data[0])

n_iter 78
0.0005492811324074864


In [52]:
from torch.autograd import Variable
import torch


x = Variable(torch.Tensor([[1.0], 
                           [2.1], 
                           [3.6], 
                           [4.2], 
                           [6.0], 
                           [7.0]]))

y = Variable(torch.Tensor([1.0, 2.1, 3.6, 4.2, 6.0, 7.0]))

weights = Variable(torch.zeros(1, 1), requires_grad=True)
bias = Variable(torch.zeros(1))

for i in range(5000):

    net_input = F.linear(x, weights, bias)
    
    loss = torch.mean((net_input - y)**2)
    loss.backward()
    weights.data.add_(-0.00001 * weights.grad.data)
    bias.data.add_(-0.00001 * loss.data)
    
    if loss.data[0] < 1e-3:
        break

print('n_iter', i)
print(loss.data[0])

n_iter 78
0.0005492811324074864


In [61]:
from torch.autograd import Variable
import torch


x = Variable(torch.Tensor([[1.0], 
                           [2.1], 
                           [3.6], 
                           [4.2], 
                           [6.0], 
                           [7.0]]))

y = Variable(torch.Tensor([1.0, 2.1, 3.6, 4.2, 6.0, 7.0]))

weights = Variable(torch.zeros(1, 1), requires_grad=True)
bias = Variable(torch.zeros(1), requires_grad=False)

for i in range(5000):

    net_input = F.linear(x, weights)
    
    loss = torch.mean((net_input - y)**2)
    loss.backward()
    weights.data.add_(-0.0001 * weights.grad.data)
    #bias.data.add_(-0.00001 * loss.data)
    
    if loss.data[0] < 1e-3:
        break

print('n_iter', i)
print(loss.data[0])

n_iter 222
0.0002557723200879991


In [68]:
from torch.autograd import Variable
import torch.nn.functional as F

class Model(torch.nn.Module):
    
    def __init__(self):
        super(Model, self).__init__()
        self.fc = torch.nn.Linear(1, 1, bias=True)
    
    def forward(self, x):
        return self.fc(x)
        
model = Model()
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001)

for i in range(5000):
    optimizer.zero_grad()
    outputs = model(x)
    
    loss = criterion(outputs, y)
    loss.backward()        

    optimizer.step()
    
print(loss.data[0])

0.004064568784087896


In [72]:
list(model.parameters())

[Parameter containing:
  0.9725
 [torch.FloatTensor of size 1x1], Parameter containing:
  0.1380
 [torch.FloatTensor of size 1]]