# 8. Feedforward Neural Network with PyTorch

## 1. About Feedforward Neural Network

### 1.1 Logistic Regression Transition to Neural Networks

#### Logistic Regression Review

<img src="./images/07-01.png">

In [1]:
import torch
import torch.nn as nn

In [2]:
class LogisticRegressionModel(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(LogisticRegressionModel, self).__init__()
        self.linear = nn.Linear(input_dim, output_dim)
    def forward(self, x):
        out = self.linear(x)
        return out

In [3]:
input_dim = 28 * 28
output_dim = 10

model = LogisticRegressionModel(input_dim, output_dim)

In [4]:
print (model)

LogisticRegressionModel(
  (linear): Linear(in_features=784, out_features=10, bias=True)
)


** Logistic Regression Problems **

- Can represent **lineaer** functions well
    - $y = 2x + 3$
    - $y = x_1 + x_2$
    - $y = x_1 + 3x_2 + 4x_3$
- Cannot represent **non-linear** functions
    - $y = 4x_1 + 2x_2^2 + 3x_3^3$

### 1.2 Introducing a Non-linear Function

<img src="./images/08-01.png">

### 1.3 Non-linear Function In-Depth

- Function: takes a number & perform mathematical operation
- Common Types of Non-linearity
    - ReLUs (Rectified Linear Units)
    - Sigmoid
    - Tanh

#### Sigmoid (Logistic)

- $\sigma(x) = \frac{1}{1+e^{-x}}$ where $x$ are logits
- input number $\rightarrow$ [0, 1]
    - Large negative number $\rightarrow$ 0
    - Large positive number $\rightarrow$ 1
- Cons:
    1. Activation saturates at 0 or 1 with **gradients** $\approx$ 0
        - No signal to update weight $\rightarrow$  **cannot learn**
        - Solution: Have to carefully initialize weights to prevent this
    2. Outputs not centered around 0
        - If output always positive $\rightarrow$ gradients always positive or negative $\rightarrow$ **bad for gradient updates**

#### Tanh
- tanh(x) = $2\sigma(2x) - 1$ where $\sigma$ means Sigmoid
    - A scaled sigmoid function
- Input number $\rightarrow$ [-1, 1]
- Cons:
    1. Activation saturates at 0 or 1 with **gradients** $\approx$ 0
        - No signal to update weights $\rightarrow$ **cannot learn**
        - **Solution**: Have to carefully initialize weights to prevent this

#### ReLUs

- $f(x)$ = max$(0, x)$
- Pros:
    1. Accelerates convergence $\rightarrow$ **train faster**
    2. **Less computationally expensive operation** compated to Sigmoid/Tanh exponentials
- Cons:
    1. Many ReLU units "die" $\rightarrow$ **gradients = 0** forever, meaning no update for parameters
        - **Solution**: careful learning rate choice

## 2. Building a Feedforward Neural Network with PyTorch

### Model A: 1 Hidden Layer Feedfoward Neural Network (Sigmoid Activation)

<img src="./images/08-02.png">

### Steps
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- Step 3: Create Model Class
- Step 4: Instantiate Model Class
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

### Step 1: Loading MNIST Train Dataset
** Images from 1 to 9 **

In [5]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

In [6]:
train_dataset = dsets.MNIST(root = './data',
                            train = True,
                            transform = transforms.ToTensor(),
                            download = True)

test_dataset = dsets.MNIST(root = './data',
                           train = False,
                           transform = transforms.ToTensor())

### Step 2: Make Dataset Iterable

In [7]:
60000 / 100

600.0

In [8]:
3000 / 600

5.0

In [9]:
batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

### Step 3: Create Model Class

In [10]:
class FeedforwardNeuralNetModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(FeedforwardNeuralNetModel, self).__init__()
        # Linear function
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        # Non-linearity
        self.sigmoid = nn.Sigmoid()
        # Linear function (readout)
        self.fc2 = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        # Linear function # LINEAR
        out = self.fc1(x)
        # Non-linearity # NON-LINEAR
        out = self.sigmoid(out)
        # Linear function (readout) # LINEAR
        out = self.fc2(out)
        return out

### Step 4: Instantiate Model Class

- **Input** dimension: **784**
    - Size of image
    - 28 $\times$ 28 = 784
- **Output** dimension: **10**
    - 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
- **Hidden** dimension: **100**
    - Can be any number
    - Similar term
        - Number of neurons
        - Number of non-linear activation functions

In [11]:
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)

### Step 5: Instantiate Loss Class

- Feedforward Neural Network: **Cross Entropy Loss**
    - *Logistic Regression*: **Cross Entropy Loss**
    - *Linear Regression*: **MSE**

In [12]:
criterion = nn.CrossEntropyLoss()

- Simplified equation
    - $\theta = \theta - \eta \cdot \nabla_{\theta}$
        - $\theta$: parameters (our variables)
        - $\eta$: learning rate (how fast we want to learn)
        - $\nabla_{\theta}$: parameters' gradients
        
    - 더욱 간단한 수식
        - parameters = parameters - learning_rate * parameters_gradients
        - **At every iteration, we update our model's parameters**

In [13]:
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

### Parameters In-Depth

In [14]:
print (model.parameters())

<generator object Module.parameters at 0x113cd9e08>


In [15]:
print (len(list(model.parameters())))

4


In [16]:
# Hidden Layer Parameters
print (list(model.parameters())[0].size())

torch.Size([100, 784])


In [17]:
# FC 1 Bias Parameters
print (list(model.parameters())[1].size())

torch.Size([100])


In [18]:
# FC 2 Parameters
print (list(model.parameters())[2].size())

torch.Size([10, 100])


In [19]:
# FC 2 Bias Parameters
print (list(model.parameters())[3].size())

torch.Size([10])


<img src="./images/08-03.png">

### Step 7: Train Model
- Process
    1. Convert inputs/labels to variables
    2. Clear gradient buffers
    3. Get output given inputs
    4. Get loss
    5. Get gradients w.r.t. parameters
    6. Update parameters using gradients
        - parameters = parameters - learning_rate * parameters_gradients
    7. REPEAT

In [20]:
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images as Variable
        images = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t parameters
        loss.backward()
        
        # Updationg parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images.view(-1, 28*28))
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
                
            accuracy = 100 * int(correct) / int(total)
            
            # Print Loss
            print (f'Iteration: {iter}, Loss: {loss.item()}, Accuracy: {accuracy}')              

Iteration: 500, Loss: 0.7069175243377686, Accuracy: 86.23
Iteration: 1000, Loss: 0.47253838181495667, Accuracy: 89.28
Iteration: 1500, Loss: 0.37439411878585815, Accuracy: 90.57
Iteration: 2000, Loss: 0.40091001987457275, Accuracy: 91.16
Iteration: 2500, Loss: 0.29052475094795227, Accuracy: 91.64
Iteration: 3000, Loss: 0.38746798038482666, Accuracy: 92.01


### Model B: 1 Hidden Layer Feedforward Neural Network (Tanh Activation)

<img src="./images/08-02.png">

### Steps

- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- **Step 3: Create Model Class **
- Step 4: Instantiate Model Class
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimzer Class
- Step 7 : Train Model

In [24]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data',
                            train=True,
                            transform= transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root = './data',
                           train=False,
                           transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

'''
STEP 3: CREATE MODEL CLASS
'''

class FeedforwardNeuralNetModel(nn.Module):
    
    def __init__(self, input_dim, hidden_dim, output_dim):
        
        super(FeedforwardNeuralNetModel, self).__init__()
        
        # Linear funciton 1: 784 --> 100
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        # Non-Linearity 1
        self.relu1 = nn.ReLU()
        
        # Linear function 2: 100 --> 100
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        # Non-Linearity 2
        self.relu2 = nn.ReLU()
        
        # Linear function 3 (readout): 100 --> 10
        self.fc3 = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        # Linear function 1
        out = self.fc1(x)
        # Non-linearity 1
        out = self.relu1(out)
        
        # Linear function2
        out = self.fc2(out)
        # Non-Linearity 2
        out = self.relu2(out)
        
        # Linear function 3 (readout)
        out = self.fc3(out)
        return out
    
'''
STEP 4: INSTANTIATE MODEL CLASS
'''
input_dim = 28 * 28
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)

'''
STEP 5: INSTANTIATE LOSS CLASS
'''
criterion = nn.CrossEntropyLoss()

'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
STEP 7: TRAIN THE MODEL
'''

iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images as Variable
        images = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        # Clear gradients w.r.t parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images.view(-1, 28*28))
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get prediction from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
                
            accuracy = 100 * int(correct) / int(total)
            
            # Print Loss
            print (f'Iteration: {iter}, Loss: {loss.item()}, Accuracy: {accuracy}')

Iteration: 500, Loss: 0.30187758803367615, Accuracy: 90.4
Iteration: 1000, Loss: 0.39137744903564453, Accuracy: 93.48
Iteration: 1500, Loss: 0.22136268019676208, Accuracy: 94.68
Iteration: 2000, Loss: 0.1438414305448532, Accuracy: 95.6
Iteration: 2500, Loss: 0.1414695680141449, Accuracy: 96.16
Iteration: 3000, Loss: 0.02799205295741558, Accuracy: 96.49


### Model E: 3 Hidden Layer Feedforward Neural Network (ReLU Activation)

<img src="./images/08-04.png">

### Steps

- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- **Step 3: Create Model Class**
- Step 4: Instantiate Model Class
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

In [28]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data',
                            train=True,
                            transform= transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root = './data',
                           train=False,
                           transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 3000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)


'''
STEP 3: CREATE MODEL CLASS
'''
class FeedforwardNeuralNetModel(nn.Module):
    
    def __init__(self, input_dim, hidden_dim, output_dim):
        
        super(FeedforwardNeuralNetModel, self).__init__()
        # Linear function 1: 784 --> 100
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        # Non-Linearity 1
        self.relu1 = nn.ReLU()
        
        # Linear function 2: 100 --> 100
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        # Non-Linearity 2
        self.relu2 = nn.ReLU()
        
        # Linear function 3: 100 --> 100
        self.fc3 = nn.Linear(hidden_dim, hidden_dim)
        # Non-Linearity 3
        self.relu3 = nn.ReLU()
        
        # Linear function 4 (readout) : 100 -- > 100
        self.fc4 = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        # Linear function 1
        out = self.fc1(x)
        # Non-linearity 1
        out = self.relu1(out)
        
        # Linear function 2
        out = self.fc2(out)
        # Non-linearity 2
        out = self.relu2(out)
        
        # Linear function 3
        out = self.fc3(out)
        # Non-linearity 3
        out = self.relu3(out)
        
        # Linear function 4 (readout)
        out = self.fc4(out)
        return out
    
'''
STEP 4: INSTANTIATE MODEL CLASS
'''
input_dim = 28 * 28
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)

'''
STEP 5: INSTANTIATE LOSS CLASS
'''
criterion = nn.CrossEntropyLoss()

'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
STEP 7: TRAIN THE MODEL
'''

iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Load images as Variable
        images = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        # Clear gradients w.r.t parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy
            correct = 0
            total = 0
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images.view(-1, 28*28))
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get prediction from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                # Total correct predictions
                correct += (predicted == labels).sum()
                
            accuracy = 100 * int(correct) / int(total)
            
            # Print Loss
            print (f'Iteration: {iter}, Loss: {loss.item()}, Accuracy: {accuracy}')

Iteration: 500, Loss: 0.25381138920783997, Accuracy: 89.61
Iteration: 1000, Loss: 0.36012136936187744, Accuracy: 93.02
Iteration: 1500, Loss: 0.0649011954665184, Accuracy: 94.99
Iteration: 2000, Loss: 0.08892955631017685, Accuracy: 95.99
Iteration: 2500, Loss: 0.11272620409727097, Accuracy: 96.56
Iteration: 3000, Loss: 0.14828279614448547, Accuracy: 96.71


### Deep Learning
- 2 ways to expand a neural network
    - More non-linear activation units(neurons)
    - More hidden layers
- Cons
    - Need a larger dataset
        - Curse of dimensionality
    - Does not necessarily mean higher accuracy