---
layout: code-post
title: First look at PyTorch
tags: [neural net]
---

So I've never played with PyTorch before, so I'm going to play around with 
some simple multilayer perceptrons to get started. Much of this code is adapted
from PyTorchs' [60 minute bliz](https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.html) tutorial.
This is a very simple walkthrough of creating a `DataSet` and then training a neural net on it. There's also
an example of a class allowing for some variable architecture.

In [225]:
import torch
import torch.nn as nn
import torch.nn.functional as F

## Creating a simple MLP

In previous notebooks, we were playing around with neural nets that have the following
architecture. The linear layers all have biases by default, but we an also set
`bias=False` to remove those.

In [226]:
class Net(nn.Module):

    def __init__(self, random_state=47):
        super(Net, self).__init__()
        torch.manual_seed(random_state)
        self.fc1 = nn.Linear(2, 9)
        self.fc2 = nn.Linear(9, 9)
        self.fc3 = nn.Linear(9, 9)
        self.fc4 = nn.Linear(9, 9)
        self.fc5 = nn.Linear(9, 1)

    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.relu(self.fc4(x))
        x = torch.sigmoid(self.fc5(x))
        return x


In [227]:
net = Net()

Let's see this `Net`'s parameters.

In [228]:
list(net.parameters())

[Parameter containing:
 tensor([[-0.6321, -0.6365],
         [-0.0457,  0.5313],
         [ 0.0793,  0.4220],
         [ 0.6728, -0.3561],
         [-0.4993, -0.0926],
         [ 0.2811,  0.5492],
         [-0.3340, -0.3312],
         [-0.5127, -0.0552],
         [ 0.3450, -0.6575]], requires_grad=True),
 Parameter containing:
 tensor([-0.5059, -0.1334,  0.0482,  0.1219, -0.4993, -0.2885, -0.3198,  0.3339,
          0.5823], requires_grad=True),
 Parameter containing:
 tensor([[-0.1811, -0.2940, -0.1910, -0.0098,  0.1716, -0.0090, -0.2765,  0.2010,
           0.1643],
         [-0.0962, -0.0630, -0.3145, -0.0620, -0.1136, -0.1233,  0.0276, -0.1000,
          -0.1335],
         [-0.0280, -0.1061, -0.0684, -0.1077, -0.0805, -0.2235, -0.3329,  0.0497,
          -0.1480],
         [-0.0531,  0.1734, -0.1789, -0.1540, -0.0209,  0.2476, -0.0416,  0.2050,
          -0.2495],
         [-0.1266,  0.1427,  0.2202, -0.1647,  0.1948, -0.3059, -0.0061, -0.1883,
           0.2310],
         [ 0.1373

And let's make a prediction.

In [229]:
net(torch.tensor([[1., 2.], [2., 3.]]))

tensor([[0.4689],
        [0.4699]], grad_fn=<SigmoidBackward>)

## Loading Data

The cleanest way to use data with PyTorch -- I'm guessing here -- is to
use the classes native to `torch.utils.data`, principally the `Dataset`
(instead of say a numpy `Array` or a pandas `DataFrame`) and to load
data from the dataset using a `DataLoader`. We find our data initially
loaded into a pandas `DataFrame`, so we'll try to make a `Dataset` from
it.

In [230]:
import numpy as np
import pandas as pd

def normalize_data(df, col_names=['x_1', 'x_2']):
    """ return normalized x_1 and x_2 columns """
    return (df[col_names] - df[col_names].mean()) / df[col_names].std()

def train_data(random_seed=3):

    np.random.seed(random_seed)
    
    def rad_sq(array):
        return array[:, 0]**2 + array[:, 1]**2

    data_pos_ = np.random.normal(0, .75, size=(100, 2))
    data_pos = data_pos_[rad_sq(data_pos_) < 4]

    data_neg_ = np.random.uniform(-5, 5, size=(1000, 2))
    data_neg = data_neg_[(rad_sq(data_neg_) > 6.25) & (rad_sq(data_neg_) < 16)]

    data = np.concatenate((data_pos, data_neg), axis=0)
    y = np.concatenate((np.ones(data_pos.shape[0]), np.zeros(data_neg.shape[0])), axis=0)

    df = pd.DataFrame({
        'x_1': data[:, 0]
        ,'x_2': data[:, 1]
        ,'const': 1.0
        ,'y': y
    })
    
    df[['x_1_norm', 'x_2_norm']] = normalize_data(df)
    
    return df

In [231]:
train_df = train_data()
train_df.head()

Unnamed: 0,x_1,x_2,const,y,x_1_norm,x_2_norm
0,1.341471,0.327382,1.0,1.0,0.595074,0.205659
1,0.072373,-1.39762,1.0,1.0,-0.001979,-0.650942
2,-0.208041,-0.266069,1.0,1.0,-0.133901,-0.089037
3,-0.062056,-0.470251,1.0,1.0,-0.065222,-0.190429
4,-0.032864,-0.357914,1.0,1.0,-0.051488,-0.134645


The `TrainDataset` class will subclass `Dataset`. We are required to fill
in the `__init__`, `__len__`, and `__getitem__` functions. We can then 
call `DataLoader` with this dataset, and it will know how to batch
and shuffle through the dataset. We are relying on the fact that `DataFrame`
can retrieve data by relying on numerical indexes. If we do not have indexes
but instead only an iterable list, we could have used `IterableDataset` instead.

In [236]:
from torch.utils.data import Dataset, DataLoader

class TrainDataset(Dataset):
    
    def __init__(self, df):
        """ df contains the data that we want to use
        it should have x_1_norm, x_2_norm, and y columns """
        self.df = df[['x_1_norm', 'x_2_norm', 'y']].copy()
        
    def __len__(self):
        return self.df.shape[0]
    
    def __getitem__(self, idx):
        if torch.is_tensor(idx):
            idx = idx.tolist()
            
        samples = self.df[['x_1_norm', 'x_2_norm']].iloc[idx].values
        labels = self.df[['y']].iloc[idx].values
        
        return samples, labels

In [237]:
trainset = TrainDataset(train_df)

Let's print out a random sample from this.

In [240]:
trainloader_4 = DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)
trainiter = iter(trainloader)

print('length of dataset:', len(trainset))
trainiter.next()

length of dataset: 393


[tensor([[ 0.6871,  1.7872],
         [ 1.0827, -0.3425],
         [ 1.1661, -0.5120],
         [ 0.6863, -1.4712]], dtype=torch.float64),
 tensor([[0.],
         [0.],
         [0.],
         [0.]], dtype=torch.float64)]

## Train the Neural Net

With a neural net and training dataset ready to go, let's train the neural net
using the Adam optimizer, which we've imported from `torch.optim`. We will
use the Mean Squared Error loss. We'll replace the trainloader from above
with one which ha batch size as the max size possible.

In [262]:
from torch.optim import Adam

net = Net()
trainloader = DataLoader(trainset, batch_size=50, shuffle=True, num_workers=0)
optimizer = Adam(net.parameters())
criterion = nn.MSELoss()

for epoch in range(1000): 

    running_loss = 0.0
    num_wrong = 0.0
    for i, data in enumerate(trainloader, 0):
        samples, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = net(samples.float())
        loss = criterion(outputs.float(), labels.float())
        loss.backward()
        optimizer.step()

        predictions = 1 * (outputs >= 0.5) + 0 * (outputs < 0.5)
        
        num_wrong += torch.sum(torch.abs(predictions - labels))
        running_loss += loss.item()
        
    class_error = num_wrong / len(trainset)
    print('epoch: {} - loss: {:.3f} - classification error: {:.3f}'.format(epoch+1, loss, class_error))
    
    if class_error == 0.0:
        break

print('Finished Training')


epoch: 1 - loss: 0.240 - classification error: 0.249
epoch: 2 - loss: 0.231 - classification error: 0.249
epoch: 3 - loss: 0.226 - classification error: 0.249
epoch: 4 - loss: 0.219 - classification error: 0.249
epoch: 5 - loss: 0.246 - classification error: 0.249
epoch: 6 - loss: 0.214 - classification error: 0.249
epoch: 7 - loss: 0.215 - classification error: 0.249
epoch: 8 - loss: 0.205 - classification error: 0.249
epoch: 9 - loss: 0.215 - classification error: 0.249
epoch: 10 - loss: 0.238 - classification error: 0.249
epoch: 11 - loss: 0.168 - classification error: 0.249
epoch: 12 - loss: 0.184 - classification error: 0.249
epoch: 13 - loss: 0.191 - classification error: 0.249
epoch: 14 - loss: 0.169 - classification error: 0.249
epoch: 15 - loss: 0.166 - classification error: 0.249
epoch: 16 - loss: 0.187 - classification error: 0.249
epoch: 17 - loss: 0.174 - classification error: 0.249
epoch: 18 - loss: 0.139 - classification error: 0.249
epoch: 19 - loss: 0.173 - classificat

And there we have it, we finished training in 65 epochs when we achieved 0 classification error.

## Variable architecture Net

Can we create a class that allows us to specify the number and width of
layers upon creation? Apparently we have to use the `add_module` function 
to do this, as the layers are not connected explicitly to `self`.

In [97]:
class ReLuNet(nn.Module):
    
    
    def __init__(self, layer_widths, random_state=47):
        """ layer_widths should include the input and out layer widths."""
        torch.manual_seed(random_state)
        super(ReLuNet, self).__init__()
        self.layers = [
            nn.Linear(layer_widths[i], layer_widths[i+1])
            for i in range(len(layer_widths)-1)
        ]
        for i in range(len(self.layers)):
            self.add_module("hidden layer " + str(i), self.layers[i])
        
        
    def forward(self, x):
        for i in range(len(self.layers)-1):
            x = F.relu(self.layers[i](x))
        x = torch.sigmoid(self.layers[-1](x))
        return x
        

In [98]:
relu_net = ReLuNet([2, 9, 9, 9, 9, 1])

In [99]:
list(relu_net.parameters())

[Parameter containing:
 tensor([[-0.6321, -0.6365],
         [-0.0457,  0.5313],
         [ 0.0793,  0.4220],
         [ 0.6728, -0.3561],
         [-0.4993, -0.0926],
         [ 0.2811,  0.5492],
         [-0.3340, -0.3312],
         [-0.5127, -0.0552],
         [ 0.3450, -0.6575]], requires_grad=True),
 Parameter containing:
 tensor([-0.5059, -0.1334,  0.0482,  0.1219, -0.4993, -0.2885, -0.3198,  0.3339,
          0.5823], requires_grad=True),
 Parameter containing:
 tensor([[-0.1811, -0.2940, -0.1910, -0.0098,  0.1716, -0.0090, -0.2765,  0.2010,
           0.1643],
         [-0.0962, -0.0630, -0.3145, -0.0620, -0.1136, -0.1233,  0.0276, -0.1000,
          -0.1335],
         [-0.0280, -0.1061, -0.0684, -0.1077, -0.0805, -0.2235, -0.3329,  0.0497,
          -0.1480],
         [-0.0531,  0.1734, -0.1789, -0.1540, -0.0209,  0.2476, -0.0416,  0.2050,
          -0.2495],
         [-0.1266,  0.1427,  0.2202, -0.1647,  0.1948, -0.3059, -0.0061, -0.1883,
           0.2310],
         [ 0.1373

And yeah, this works and we can create predictions. (The prediction is the 
same as the initial state of the `net` created above since we used
the same random state.)

In [101]:
relu_net(torch.tensor([1., 2.]))

tensor([0.4689], grad_fn=<SigmoidBackward>)

And the random state of course can be different and produces
different results.

In [105]:
relu_net_2 = ReLuNet([2, 9, 9, 9, 9, 1], random_state=1)
relu_net_2(torch.tensor([1., 2.]))

tensor([0.4216], grad_fn=<SigmoidBackward>)