# Deep Neural Network in PyTorch

In this notebook, we adapt our [TensorFlow Deep Net](https://github.com/jonkrohn/DLTFpT/blob/master/notebooks/deep_net_in_tensorflow.ipynb) to PyTorch.

#### Load dependencies

In [1]:
import torch
import torch.nn as nn

from torchvision.datasets import MNIST
from torchvision import transforms

from torchsummary import summary

#### Load data

In [2]:
train = MNIST('data', train=True, transform=transforms.ToTensor(), download=True)
test = MNIST('data', train=False, transform=transforms.ToTensor())

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 34442526.28it/s]


Extracting data/MNIST/raw/train-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 70020632.27it/s]


Extracting data/MNIST/raw/train-labels-idx1-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 20657154.54it/s]


Extracting data/MNIST/raw/t10k-images-idx3-ubyte.gz to data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 3007186.86it/s]


Extracting data/MNIST/raw/t10k-labels-idx1-ubyte.gz to data/MNIST/raw



#### Batch data

In [3]:
train_loader = torch.utils.data.DataLoader(train, batch_size=128) 
test_loader = torch.utils.data.DataLoader(test, batch_size=128) 

#### Design neural network architecture

In [4]:
n_input = 784
n_dense_1 = 64
n_dense_2 = 64
n_dense_3 = 64
n_out = 10

In [5]:
model = nn.Sequential(
    
    # first hidden layer: 
    nn.Linear(n_input, n_dense_1), 
    nn.ReLU(), 
    nn.BatchNorm1d(n_dense_1),
    
    # second hidden layer: 
    nn.Linear(n_dense_1, n_dense_2), 
    nn.ReLU(), 
    nn.BatchNorm1d(n_dense_2),
    
    # third hidden layer: 
    nn.Linear(n_dense_2, n_dense_3), 
    nn.ReLU(), 
    nn.BatchNorm1d(n_dense_3),
    nn.Dropout(),  
    
    # output layer: 
    nn.Linear(n_dense_3, n_out) 
)

In [7]:
summary(model, input_size=(n_input,))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 64]          50,240
              ReLU-2                   [-1, 64]               0
       BatchNorm1d-3                   [-1, 64]             128
            Linear-4                   [-1, 64]           4,160
              ReLU-5                   [-1, 64]               0
       BatchNorm1d-6                   [-1, 64]             128
            Linear-7                   [-1, 64]           4,160
              ReLU-8                   [-1, 64]               0
       BatchNorm1d-9                   [-1, 64]             128
          Dropout-10                   [-1, 64]               0
           Linear-11                   [-1, 10]             650
Total params: 59,594
Trainable params: 59,594
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/ba

#### Configure training hyperparameters

In [8]:
cost_fxn = nn.CrossEntropyLoss() # includes softmax activation

In [9]:
optimizer = torch.optim.Adam(model.parameters())

#### Train

In [10]:
def accuracy_pct(pred_y, true_y):
  _, prediction = torch.max(pred_y, 1) # returns maximum values, indices; fed tensor, dim to reduce
  correct = (prediction == true_y).sum().item()
  return (correct / true_y.shape[0]) * 100.0

In [11]:
n_batches = len(train_loader)
n_batches

469

In [12]:
n_epochs = 10 

print('Training for {} epochs. \n'.format(n_epochs))

for epoch in range(n_epochs):
  
  avg_cost = 0.0
  avg_accuracy = 0.0
  
  for i, (X, y) in enumerate(train_loader): # enumerate() provides count of iterations  
    
    # forward propagation:
    X_flat = X.view(X.shape[0], -1)
    y_hat = model(X_flat)
    cost = cost_fxn(y_hat, y)
    avg_cost += cost / n_batches
    
    # backprop and optimization via gradient descent: 
    optimizer.zero_grad() # set gradients to zero; .backward() accumulates them in buffers
    cost.backward()
    optimizer.step()
    
    # calculate accuracy metric:
    accuracy = accuracy_pct(y_hat, y)
    avg_accuracy += accuracy / n_batches
    
    if (i + 1) % 100 == 0:
      print('Step {}'.format(i + 1))
    
  print('Epoch {}/{} complete. Cost: {:.3f}, Accuracy: {:.1f}% \n'
        .format(epoch + 1, n_epochs, avg_cost, avg_accuracy)) 

print('Training complete.')

Training for 10 epochs. 

Step 100
Step 200
Step 300
Step 400
Epoch 1/10 complete. Cost: 0.407, Accuracy: 88.9% 

Step 100
Step 200
Step 300
Step 400
Epoch 2/10 complete. Cost: 0.159, Accuracy: 95.5% 

Step 100
Step 200
Step 300
Step 400
Epoch 3/10 complete. Cost: 0.116, Accuracy: 96.7% 

Step 100
Step 200
Step 300
Step 400
Epoch 4/10 complete. Cost: 0.092, Accuracy: 97.4% 

Step 100
Step 200
Step 300
Step 400
Epoch 5/10 complete. Cost: 0.077, Accuracy: 97.7% 

Step 100
Step 200
Step 300
Step 400
Epoch 6/10 complete. Cost: 0.066, Accuracy: 98.1% 

Step 100
Step 200
Step 300
Step 400
Epoch 7/10 complete. Cost: 0.057, Accuracy: 98.3% 

Step 100
Step 200
Step 300
Step 400
Epoch 8/10 complete. Cost: 0.052, Accuracy: 98.4% 

Step 100
Step 200
Step 300
Step 400
Epoch 9/10 complete. Cost: 0.046, Accuracy: 98.6% 

Step 100
Step 200
Step 300
Step 400
Epoch 10/10 complete. Cost: 0.039, Accuracy: 98.8% 

Training complete.


#### Test model

In [13]:
n_test_batches = len(test_loader)
n_test_batches

79

In [14]:
model.eval() # disables dropout and batch norm

with torch.no_grad(): # disables autograd, reducing memory consumption
  
  avg_test_cost = 0.0
  avg_test_acc = 0.0
  
  for X, y in test_loader:
    
    # make predictions: 
    X_flat = X.view(X.shape[0], -1)
    y_hat = model(X_flat)
    
    # calculate cost: 
    cost = cost_fxn(y_hat, y)
    avg_test_cost += cost / n_test_batches
    
    # calculate accuracy:
    test_accuracy = accuracy_pct(y_hat, y)
    avg_test_acc += test_accuracy / n_test_batches

print('Test cost: {:.3f}, Test accuracy: {:.1f}%'.format(avg_test_cost, avg_test_acc))

# model.train() # 'undoes' model.eval()

Test cost: 0.099, Test accuracy: 97.5%
