# 3. Feedforward Neural Networks
## 1. About Feedfoward neural network
- the difference from the logistic regression is that you add a hidden layer
- the hidden layer is a combination of a linear function with a non linear function
- the logistic regression (logit -> softmax ) becomes the readout layer

## 2. the nonlinear function is also called the activation function
- function: takes a number and perfomrs a mathematical operanton
- common types of non-linear activation functions:
  - ReLUs(Rectified Linear Units)
  - Sigmoid
  - Tanh
  - Threshold
  - leaky ReLUs
  
### Sigmoid(logistic)
- $ \sigma(x) = 1/(1+e^y)$
- input number -> [0,1]
    - large negative numbers = 0  and large positive numbers  = 1
- con
1. Activation saturates at 0 or 1 with gradients close to 0
  - no signal to update wights - cannot learn
  - solution: have to carefully initialize wieghts to prevent this
2. Outputs not centered around 0
    - if output always positive - gradients always positive or negative - bad for gradient updates

### Tanh
- $tanh(x) = 2\sigma(2x)-1$
     - a scaled sigmoid function
- input number - [-1,1]
- Cons:
    1. activation saturates at 0  or 1 with gradients close to 0
    - no signal to update weights
    - solution: carefully initialize weights to prevent this

### ReLUs
- $f(x) = max(0,x)$
- Pros
    1. Accelerates convergence - train faster
    2. Les computationally expensive operation compared to sigmoid / tanh exponentials
- Cons
    1. Many ReLU units "die! - gradients = 0
        - solution: careful learn rate choice

# The first feed forward network will be one with a hidden layer and a sigmoidal activation function

### Steps
1. Load Dataset
2. Make Dataset Iterable
3. Create the model
4. Instantiate the model class
5. Instantiate the loss class
6. Instantiate the optimizer
7. Train Model
8. Measure Accuracy
9. save the model

In [11]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 13000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

'''
STEP 3: CREATE MODEL CLASS
'''
class FeedforwardNeuralNetModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(FeedforwardNeuralNetModel, self).__init__()
        # Linear function 1: 784 --> 100
        self.fc1 = nn.Linear(input_dim, hidden_dim) 
        # Non-linearity 1
        self.sigmoid = nn.Sigmoid()
        
        # Linear function 2: 100 --> 100
        self.fc2 = nn.Linear(hidden_dim,output_dim)
    
    def forward(self, x):
        # Linear function 1
        out = self.fc1(x)
        
        # Non-linearity 
        out = self.sigmoid(out)
        
        # Linear function (readout)
        out = self.fc2(out)
        return out
'''
STEP 4: INSTANTIATE MODEL CLASS
'''
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)

#######################
#  USE GPU FOR MODEL  #
#######################

if torch.cuda.is_available():
    model.cuda()

'''
STEP 5: INSTANTIATE LOSS CLASS
'''
criterion = nn.CrossEntropyLoss()


'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

print(model.parameters())
print(len(list(model.parameters())))

print(list(model.parameters())[1])

<generator object Module.parameters at 0x000001D6264F23B8>
4
Parameter containing:
tensor([-0.0272, -0.0260, -0.0171,  0.0030, -0.0104,  0.0176,  0.0075,  0.0195,
        -0.0239,  0.0223,  0.0211, -0.0225,  0.0120,  0.0124,  0.0096,  0.0064,
         0.0085, -0.0241,  0.0262,  0.0325, -0.0018,  0.0346, -0.0060, -0.0203,
         0.0011, -0.0080,  0.0189,  0.0021,  0.0333,  0.0272, -0.0092,  0.0031,
         0.0173,  0.0226, -0.0004, -0.0349,  0.0242,  0.0003,  0.0161,  0.0117,
         0.0316,  0.0220,  0.0123, -0.0017, -0.0021,  0.0011,  0.0121,  0.0035,
        -0.0219,  0.0307,  0.0159,  0.0261, -0.0211,  0.0229, -0.0066, -0.0020,
         0.0250,  0.0106, -0.0117,  0.0224,  0.0251, -0.0091,  0.0061,  0.0337,
        -0.0038,  0.0253, -0.0167, -0.0059, -0.0148, -0.0237, -0.0341, -0.0270,
         0.0351,  0.0071,  0.0097, -0.0214,  0.0345, -0.0224,  0.0161, -0.0095,
        -0.0149, -0.0053, -0.0311,  0.0121, -0.0077,  0.0055, -0.0033, -0.0244,
         0.0101,  0.0176, -0.0075,  0

In [12]:
'''
STEP 7: TRAIN THE MODEL
'''
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        
        #######################
        #  USE GPU FOR MODEL  #
        #######################
        if torch.cuda.is_available():
            images = Variable(images.view(-1, 28*28).cuda())
            labels = Variable(labels.cuda())
        else:
            images = Variable(images.view(-1, 28*28))
            labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                images = Variable(images.view(-1, 28*28).cuda())
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                # Total correct predictions
                correct += (predicted.cpu() == labels.cpu()).sum()
            
            accuracy = 100 * correct / total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data, accuracy))

Iteration: 500. Loss: 0.5882385969161987. Accuracy: 86
Iteration: 1000. Loss: 0.5010350942611694. Accuracy: 89
Iteration: 1500. Loss: 0.24042809009552002. Accuracy: 90
Iteration: 2000. Loss: 0.22205448150634766. Accuracy: 91
Iteration: 2500. Loss: 0.4286780059337616. Accuracy: 91
Iteration: 3000. Loss: 0.4364495575428009. Accuracy: 92
Iteration: 3500. Loss: 0.22739818692207336. Accuracy: 92
Iteration: 4000. Loss: 0.34756019711494446. Accuracy: 92
Iteration: 4500. Loss: 0.24960219860076904. Accuracy: 92
Iteration: 5000. Loss: 0.28244122862815857. Accuracy: 93
Iteration: 5500. Loss: 0.1531527191400528. Accuracy: 93
Iteration: 6000. Loss: 0.30804160237312317. Accuracy: 93
Iteration: 6500. Loss: 0.493997722864151. Accuracy: 93
Iteration: 7000. Loss: 0.24398483335971832. Accuracy: 93
Iteration: 7500. Loss: 0.37047064304351807. Accuracy: 93
Iteration: 8000. Loss: 0.3526969254016876. Accuracy: 94
Iteration: 8500. Loss: 0.17212581634521484. Accuracy: 94
Iteration: 9000. Loss: 0.160203397274017

# 2nd Feedfoward model with TanH



In [13]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 13000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

'''
STEP 3: CREATE MODEL CLASS
'''
class FeedforwardNeuralNetModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(FeedforwardNeuralNetModel, self).__init__()
        # Linear function 1: 784 --> 100
        self.fc1 = nn.Linear(input_dim, hidden_dim) 
        # Non-linearity 1
        self.tanh = nn.Tanh()
        
        # Linear function 2: 100 --> 100
        self.fc2 = nn.Linear(hidden_dim,output_dim)
    
    def forward(self, x):
        # Linear function 1
        out = self.fc1(x)
        
        # Non-linearity 
        out = self.tanh(out)
        
        # Linear function (readout)
        out = self.fc2(out)
        return out
'''
STEP 4: INSTANTIATE MODEL CLASS
'''
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)

#######################
#  USE GPU FOR MODEL  #
#######################

if torch.cuda.is_available():
    model.cuda()

'''
STEP 5: INSTANTIATE LOSS CLASS
'''
criterion = nn.CrossEntropyLoss()


'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

print(model.parameters())
print(len(list(model.parameters())))

print(list(model.parameters())[1])

<generator object Module.parameters at 0x000001D62B049468>
4
Parameter containing:
tensor([-0.0201, -0.0026,  0.0316, -0.0200, -0.0347,  0.0013,  0.0238,  0.0269,
        -0.0096, -0.0063, -0.0013,  0.0303, -0.0081, -0.0046,  0.0301,  0.0344,
        -0.0086, -0.0290,  0.0200, -0.0278,  0.0223,  0.0256, -0.0138, -0.0237,
         0.0085, -0.0213,  0.0353, -0.0163,  0.0173, -0.0064,  0.0210,  0.0135,
        -0.0329, -0.0134,  0.0117, -0.0293, -0.0229, -0.0259,  0.0228, -0.0139,
         0.0207,  0.0137,  0.0067,  0.0153,  0.0294,  0.0201,  0.0125,  0.0258,
        -0.0331,  0.0062, -0.0218,  0.0357,  0.0213, -0.0260,  0.0081, -0.0191,
         0.0005, -0.0095,  0.0109,  0.0039, -0.0132,  0.0325, -0.0275, -0.0076,
         0.0042, -0.0283,  0.0125, -0.0320, -0.0198, -0.0332,  0.0117, -0.0112,
        -0.0306,  0.0008, -0.0051,  0.0041,  0.0326, -0.0333, -0.0091, -0.0134,
         0.0184, -0.0062, -0.0135, -0.0127,  0.0149, -0.0063,  0.0077,  0.0215,
         0.0233, -0.0328,  0.0141,  0

In [14]:
'''
STEP 7: TRAIN THE MODEL
'''
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        
        #######################
        #  USE GPU FOR MODEL  #
        #######################
        if torch.cuda.is_available():
            images = Variable(images.view(-1, 28*28).cuda())
            labels = Variable(labels.cuda())
        else:
            images = Variable(images.view(-1, 28*28))
            labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                images = Variable(images.view(-1, 28*28).cuda())
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                # Total correct predictions
                correct += (predicted.cpu() == labels.cpu()).sum()
            
            accuracy = 100 * correct / total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data, accuracy))

Iteration: 500. Loss: 0.3465653657913208. Accuracy: 91
Iteration: 1000. Loss: 0.2720365524291992. Accuracy: 92
Iteration: 1500. Loss: 0.2544703781604767. Accuracy: 93
Iteration: 2000. Loss: 0.2720886170864105. Accuracy: 93
Iteration: 2500. Loss: 0.20705772936344147. Accuracy: 94
Iteration: 3000. Loss: 0.12390336394309998. Accuracy: 95
Iteration: 3500. Loss: 0.1868055909872055. Accuracy: 95
Iteration: 4000. Loss: 0.13565988838672638. Accuracy: 95
Iteration: 4500. Loss: 0.12658077478408813. Accuracy: 96
Iteration: 5000. Loss: 0.06818234175443649. Accuracy: 96
Iteration: 5500. Loss: 0.08663061261177063. Accuracy: 96
Iteration: 6000. Loss: 0.09467755258083344. Accuracy: 96
Iteration: 6500. Loss: 0.08164642006158829. Accuracy: 96
Iteration: 7000. Loss: 0.13865233957767487. Accuracy: 96
Iteration: 7500. Loss: 0.126531183719635. Accuracy: 96
Iteration: 8000. Loss: 0.08130700886249542. Accuracy: 96
Iteration: 8500. Loss: 0.06163925677537918. Accuracy: 97
Iteration: 9000. Loss: 0.04121266677975

# With ReLU activation function

In [15]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100
n_iters = 13000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

'''
STEP 3: CREATE MODEL CLASS
'''
class FeedforwardNeuralNetModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(FeedforwardNeuralNetModel, self).__init__()
        # Linear function 1: 784 --> 100
        self.fc1 = nn.Linear(input_dim, hidden_dim) 
        # Non-linearity 1
        self.relu = nn.ReLU()
        
        # Linear function 2: 100 --> 100
        self.fc2 = nn.Linear(hidden_dim,output_dim)
    
    def forward(self, x):
        # Linear function 1
        out = self.fc1(x)
        
        # Non-linearity 
        out = self.relu(out)
        
        # Linear function (readout)
        out = self.fc2(out)
        return out
'''
STEP 4: INSTANTIATE MODEL CLASS
'''
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)

#######################
#  USE GPU FOR MODEL  #
#######################

if torch.cuda.is_available():
    model.cuda()

'''
STEP 5: INSTANTIATE LOSS CLASS
'''
criterion = nn.CrossEntropyLoss()


'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

print(model.parameters())
print(len(list(model.parameters())))

print(list(model.parameters())[1])

<generator object Module.parameters at 0x000001D626A22D58>
4
Parameter containing:
tensor([-9.5067e-03,  3.2343e-02,  3.2037e-02, -1.9265e-02, -2.3669e-02,
        -1.1958e-02, -9.9688e-03, -8.9702e-03, -2.8975e-02,  4.4591e-03,
         4.8244e-03,  1.4223e-03,  7.5565e-03,  3.2924e-05,  1.0543e-02,
         1.5221e-02, -9.3783e-03,  2.6155e-03, -3.4561e-03,  2.3895e-02,
         1.6179e-02,  2.2199e-02,  9.7457e-03, -2.3617e-02, -1.4926e-02,
         1.7569e-02, -2.3757e-02, -3.5613e-02,  3.5169e-02, -1.7436e-02,
        -2.0572e-02,  1.7825e-02, -2.4569e-02,  1.7198e-02, -7.1652e-03,
         9.4106e-03,  2.0372e-02, -1.1275e-02,  3.2430e-02,  2.3100e-02,
        -2.8415e-02,  2.0464e-02,  3.3981e-02,  2.2645e-02,  6.3669e-03,
         2.9786e-02,  1.5544e-02,  3.5167e-02, -2.2952e-02,  3.5595e-02,
         1.0711e-03, -7.4445e-03, -2.9431e-02,  1.7700e-03, -6.9776e-03,
         3.2191e-02,  2.8742e-02,  6.3955e-04, -2.7285e-02,  3.3126e-02,
        -2.0060e-02, -6.6094e-04, -8.4831

In [16]:
'''
STEP 7: TRAIN THE MODEL
'''
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        
        #######################
        #  USE GPU FOR MODEL  #
        #######################
        if torch.cuda.is_available():
            images = Variable(images.view(-1, 28*28).cuda())
            labels = Variable(labels.cuda())
        else:
            images = Variable(images.view(-1, 28*28))
            labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                images = Variable(images.view(-1, 28*28).cuda())
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                # Total correct predictions
                correct += (predicted.cpu() == labels.cpu()).sum()
            
            accuracy = 100 * correct / total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data, accuracy))

Iteration: 500. Loss: 0.49764227867126465. Accuracy: 91
Iteration: 1000. Loss: 0.34558406472206116. Accuracy: 92
Iteration: 1500. Loss: 0.18043696880340576. Accuracy: 93
Iteration: 2000. Loss: 0.14796826243400574. Accuracy: 94
Iteration: 2500. Loss: 0.18515442311763763. Accuracy: 95
Iteration: 3000. Loss: 0.24979309737682343. Accuracy: 95
Iteration: 3500. Loss: 0.14210090041160583. Accuracy: 96
Iteration: 4000. Loss: 0.06268902868032455. Accuracy: 96
Iteration: 4500. Loss: 0.06783333420753479. Accuracy: 96
Iteration: 5000. Loss: 0.0839662253856659. Accuracy: 96
Iteration: 5500. Loss: 0.08378181606531143. Accuracy: 97
Iteration: 6000. Loss: 0.15154466032981873. Accuracy: 97
Iteration: 6500. Loss: 0.07049443572759628. Accuracy: 97
Iteration: 7000. Loss: 0.05954616144299507. Accuracy: 97
Iteration: 7500. Loss: 0.025450147688388824. Accuracy: 97
Iteration: 8000. Loss: 0.03571431338787079. Accuracy: 97
Iteration: 8500. Loss: 0.1116565614938736. Accuracy: 97
Iteration: 9000. Loss: 0.05823264

In [17]:
# Two hidden layers

In [20]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data', 
                            train=True, 
                            transform=transforms.ToTensor(),
                            download=True)

test_dataset = dsets.MNIST(root='./data', 
                           train=False, 
                           transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 50
n_iters = 13000
num_epochs = n_iters / (len(train_dataset) / batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

'''
STEP 3: CREATE MODEL CLASS
'''
class FeedforwardNeuralNetModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(FeedforwardNeuralNetModel, self).__init__()
        # Linear function 1: 784 --> 100
        self.fc1 = nn.Linear(input_dim, hidden_dim) 
        # Non-linearity 1
        self.relu1 = nn.ReLU()
        
        # Linear function 2: 100 --> 100
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        # Non-linearity 2
        self.relu2 = nn.ReLU()
        
        # Linear function 3: 100 --> 100
        self.fc3 = nn.Linear(hidden_dim, hidden_dim)
        # Non-linearity 3
        self.relu3 = nn.ReLU()
        
        # Linear function 4 (readout): 100 --> 10
        self.fc4 = nn.Linear(hidden_dim, output_dim)  
    
    def forward(self, x):
        # Linear function 1
        out = self.fc1(x)
        # Non-linearity 1
        out = self.relu1(out)
        
        # Linear function 2
        out = self.fc2(out)
        # Non-linearity 2
        out = self.relu2(out)
        
        # Linear function 2
        out = self.fc3(out)
        # Non-linearity 2
        out = self.relu3(out)
        
        # Linear function 4 (readout)
        out = self.fc4(out)
        return out
'''
STEP 4: INSTANTIATE MODEL CLASS
'''
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)

#######################
#  USE GPU FOR MODEL  #
#######################

if torch.cuda.is_available():
    model.cuda()

'''
STEP 5: INSTANTIATE LOSS CLASS
'''
criterion = nn.CrossEntropyLoss()


'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
STEP 7: TRAIN THE MODEL
'''
iter = 0
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        
        #######################
        #  USE GPU FOR MODEL  #
        #######################
        if torch.cuda.is_available():
            images = Variable(images.view(-1, 28*28).cuda())
            labels = Variable(labels.cuda())
        else:
            images = Variable(images.view(-1, 28*28))
            labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            # Calculate Accuracy         
            correct = 0
            total = 0
            # Iterate through test dataset
            for images, labels in test_loader:
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                images = Variable(images.view(-1, 28*28).cuda())
                
                # Forward pass only to get logits/output
                outputs = model(images)
                
                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data, 1)
                
                # Total number of labels
                total += labels.size(0)
                
                #######################
                #  USE GPU FOR MODEL  #
                #######################
                # Total correct predictions
                correct += (predicted.cpu() == labels.cpu()).sum()
            
            accuracy = 100 * correct / total
            
            # Print Loss
            print('Iteration: {}. Loss: {}. Accuracy: {}'.format(iter, loss.data, accuracy))

Iteration: 500. Loss: 0.15162785351276398. Accuracy: 89
Iteration: 1000. Loss: 0.2101718932390213. Accuracy: 93
Iteration: 1500. Loss: 0.1529603749513626. Accuracy: 95
Iteration: 2000. Loss: 0.20929856598377228. Accuracy: 95
Iteration: 2500. Loss: 0.12695719301700592. Accuracy: 96
Iteration: 3000. Loss: 0.10877981036901474. Accuracy: 96
Iteration: 3500. Loss: 0.0709371268749237. Accuracy: 96
Iteration: 4000. Loss: 0.09012982249259949. Accuracy: 96
Iteration: 4500. Loss: 0.07318182289600372. Accuracy: 96
Iteration: 5000. Loss: 0.028534861281514168. Accuracy: 97
Iteration: 5500. Loss: 0.12468843162059784. Accuracy: 97
Iteration: 6000. Loss: 0.16371086239814758. Accuracy: 97
Iteration: 6500. Loss: 0.05104072391986847. Accuracy: 97
Iteration: 7000. Loss: 0.16311556100845337. Accuracy: 97
Iteration: 7500. Loss: 0.01565774902701378. Accuracy: 97
Iteration: 8000. Loss: 0.019158992916345596. Accuracy: 97
Iteration: 8500. Loss: 0.012381171807646751. Accuracy: 97
Iteration: 9000. Loss: 0.1026558

# Summary
- Logistic regression problems for non-linear functions representation
 - Cannot represent non-linear functions
   - $y = 4x_i + 3 x_2^2 + 3x_3^3$
   - $y = x_i * x_2$
- Introduced Non-linearity to logistic regression to form a neural network
- Types of non-linearity
 - Sigmoid
 - Tanh
 - ReLu
- Feedforward Neural Netowrk Models
 - Model A:  1 hidden layer (sigmoid activation)
 - Model B: 1 hidden layer (tanh activation)
 - Model C: 1 hidden layer (ReLU activation)
 - Model D: 2 hidden layers (ReLU activation)
 - Model E: 3 hidden layers (ReLU activation)
- Models Variation in code
 - Modifying only step 3
- Ways to expand models capacity
 - More non-linear activation units (neurons)
 - More hidden layers
- Cons of Expanding capacity
 - Need more data
 - Does not necessarily mean higher accuracy
- GPU Code
 - 2 things on GPU
    - model
    - variables
 - Modifying only step 4 and step 7
 
 