#### Linear Regresion / Logistic Regression problems
- Can represent linear functions well
    - $ y = 2x + 3 $
    - $ y = x_{1} + x_{2} $
    - $ y = x_{1} + 3x_{2} + 4x_{3} $
- Cannot represent <b> non-linear </b> functions 
    - $ y = 4x_{1} + 2x_{2}^2 + 3x_{3}^2 $

### 1.2 Introducing a Non-linear Function

- In Logistic Regression we have:
    - <b> Input </b> - > (Linear Function) <b>Logits </b>-> (Softmax Function) <b>Softmax </b>-> (Cross Entropy Function) <b>Labels</b>

- In 1 Layer Neural Network we have:
    - Input Layer : Input
    - 1 Hidden Layer : Linear Function, Logits, Non Linear Function, Non Linear Output
    - Readout Layer : Logits , Softmax Function, Softmax
    - Cross Entropy Function, Labels

- Nonlinear Functions:
    - ReLUS
    - Sigmoid
    - Tanh
    
#### Sigmoid ( Logistic )
- $ \sigma (x) = \frac{1}{1+e^{-x}} $
- Input number -> output is between [0,1]
    - Large negative number -> 0
    - Large positive number -> 1
- Cons:
    - Activation saturates at 0 or 1 with gradients ~ 0
        - No signal to update weights -> cannot learn
        - Solution: Have to carefully initialize weights to prevent this
    - Outputs not centerend around 0
        - If output always positive -> gradients always positive or negative -> bad for gradients updates
        
#### TanH
- $ tanh(x) = 2\sigma(2x) - 1 $
    - A scaled sigmoid function
- Input number -> [-1,1]
- Cons:
    - Activation saturates at 0 or 1 with gradients ~ 0
        - No signal to update weights -> cannot learn
        - Solution: Have to carefully initialize weights to prevent this
        
#### ReLUs
- $ f(x) = max(0,x) $
- Pros:
    - Accelerates convergence -> train faster -> Less iterations then others
    - Less computationally expensive operation compared to Sigmoid/Tanh exponentials
- Cons:
    - Many ReLU units "die" -> gradients = 0 forever
    - Solution: careful learning rate choice

### Building a FeedForward Neural Network with Pytorch
#### Model A: 1 Hidden Layer Feedforward Neural Network (Sigmoid Activation)

Steps:
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- Step 3: Create Model Class
- Step 4: Instantiate Model Class
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

#### Step 1: Loading MNIST Train Dataset

<b> Images from 1 to 9</b>

In [1]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

In [2]:
train_dataset = dsets.MNIST(root='./data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)
test_dataset = dsets.MNIST(root='./data',
                               train=False,
                               transform=transforms.ToTensor())

#### Step 2: Make Dataset Iterable

In [3]:
batch_size = 100 # 100 images at time
n_iters = 3000 # 1 epochs = number of iterations
num_epochs = n_iters / (len(train_dataset)/ batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

#### Step 3: Create Model Class

In [4]:
class FeedforwardNeuralNetModel(nn.Module): # pyTorch stuff
    def __init__(self,input_size,hidden_size,num_classes): # hiden_size - nonliearity dimension
        super(FeedforwardNeuralNetModel,self).__init__()
        #Linear function
        self.fc1 = nn.Linear(input_dim, hidden_dim) # Input - > hidden
        #Non-linearity
        self.sigmoid = nn.Sigmoid() # activation function
        # Linear function(readout) -> hidden -> output
        self.fc2 = nn.Linear(hidden_dim, output_dim)

# running functions
    def forward(self,x):
        # LINEAR 
        out = self.fc1(x)
        # NON-LINEAR       
        out = self.sigmoid(out)
        # LINEAR - readout
        out = self.fc2(out)
        return out

#### Step 4: Instantiate Model Class
- <b> Input </b> dimension: <b> 784</b>
    - Size of image : $ 28x28 = 784 $
- <b> Output </b> dimension: <b> 10 </b>
    - 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
- <b> Hidden </b> dimension: <b> 100 </b>
    - Can be any number
    - Similar term
        - Number of neurons
        - Number of non-linear activation functions

In [5]:
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)

#### Step 5: Instantiate Loss Class

- Feedforward Neural Network: <b> Cross Entropy Loss </b>
    - Logistic Regression: <b> Cross Entropy Loss </b>
    - Linear Regression : <b> MSE </b> 

In [6]:
criterion = nn.CrossEntropyLoss()

##### Step 6: Instatiate Optimizer Class
- Simplified equation
    - $ \theta = \theta - \eta * \nabla_{\theta} $
    - $ \theta $ : parameters (our variables)
    - $ \eta $ : learning rate (how fast we want to learn)
    - $ \nabla_{\theta}$ : paramterers' gradients
- Even simplifier equation
    - $ parameters = parameters - learning\_rate * parameter\_gradients $
    - <b> At every iteration(batch_size), we update our model's parameters

In [7]:
learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

#### Parameters In-Depth

In [8]:
print(model.parameters())

print(len(list(model.parameters())))

# Hidden Layer Parameters
print(list(model.parameters())[0].size())

# FC 1 Bias Parameters
print(list(model.parameters())[1].size())

# FC 2 Parameters
print(list(model.parameters())[2].size())

# FC 2 Bias Parameters
print(list(model.parameters())[3].size())

<generator object Module.parameters at 0x7f1451bd2678>
4
torch.Size([100, 784])
torch.Size([100])
torch.Size([10, 100])
torch.Size([10])


#### Step 7: Train Model
- Process
    - 1. Conver inputs / labels to variables
    - 2. Clearn gradient buffers
    - 3. Get output given inputs
    - 4. Get loss
    - 5. Get gradient w.r.t. parameters
    - 6. Update parameters using gradients
        - $ parameters = parameters - learning\_rate * parameters\_gradients$
    - 7. REPEAT

In [9]:
iter = 0
# Now 5 epochs in this case
for epoch in range(num_epochs):
    
    # 60k images loaded 100 by 100 <- batch_size
    for i, (images, labels) in enumerate(train_loader):
        # 1 ITERATION = 1 CYCLE
        # Load images as Variable
        images = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        # Don't want to accumulate gradients
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter +=1
        if iter % 500 == 0:
            # Calculate accuracy
            correct = 0
            total = 0
            # Iterats through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images.view(-1,28*28))
            
                # Forward pass only to get logits/output
                outputs = model(images)

                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data,1)

                # Total number of labels
                total += labels.size(0)

                # Total correcnt predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct / total
        
            ## Print loss
            print('Iteration: {}, Loss: {}, Accuracy:{}'.format(iter,loss.data,accuracy))

Iteration: 500, Loss: 0.6876333355903625, Accuracy:86
Iteration: 1000, Loss: 0.500399649143219, Accuracy:89
Iteration: 1500, Loss: 0.5672891736030579, Accuracy:90
Iteration: 2000, Loss: 0.3102739453315735, Accuracy:91
Iteration: 2500, Loss: 0.20339469611644745, Accuracy:91
Iteration: 3000, Loss: 0.3302624523639679, Accuracy:91


#### Model B: 1 Hidden Layer Feedforward Neural Network (Tanh Activation)

Steps:
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- <b> Step 3: Create Model Class </b> - We will make changes in this step - changing the activation function
- Step 4: Instantiate Model Class
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

In [12]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

batch_size = 100 # 100 images at time
n_iters = 3000 # 1 epochs = number of iterations
num_epochs = n_iters / (len(train_dataset)/ batch_size)
num_epochs = int(num_epochs)

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)
test_dataset = dsets.MNIST(root='./data',
                               train=False,
                               transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100 # 100 images at time
n_iters = 3000 # 1 epochs = number of iterations
num_epochs = n_iters / (len(train_dataset)/ batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)
'''
STEP 3: CREATE MODEL CLASS
'''

class FeedforwardNeuralNetModel(nn.Module): # pyTorch stuff
    def __init__(self,input_size,hidden_size,num_classes): # hiden_size - nonliearity dimension
        super(FeedforwardNeuralNetModel,self).__init__()
        #Linear function
        self.fc1 = nn.Linear(input_dim, hidden_dim) # Input - > hidden
        #Non-linearity
        self.tanh = nn.Tanh() ################################ ! Here is the change to tanh activation function
        # Linear function(readout) -> hidden -> output
        self.fc2 = nn.Linear(hidden_dim, output_dim)

# running functions
    def forward(self,x):
        # LINEAR 
        out = self.fc1(x)
        # NON-LINEAR       
        out = self.tanh(out) ################################ ! Here is the change to tanh activation function
        # LINEAR - readout
        out = self.fc2(out)
        return out
'''
STEP 4: INSTANTIATE MODEL CLASS
'''
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)
'''
STEP 5: INSTANTIATE LOSS CLASS
'''

criterion = nn.CrossEntropyLoss()

'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''

learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
STEP 7: TRAIN MODEL
'''
iter = 0
# Now 5 epochs in this case
for epoch in range(num_epochs):
    
    # 60k images loaded 100 by 100 <- batch_size
    for i, (images, labels) in enumerate(train_loader):
        # 1 ITERATION = 1 CYCLE
        # Load images as Variable
        images = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        # Don't want to accumulate gradients
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter +=1
        if iter % 500 == 0:
            # Calculate accuracy
            correct = 0
            total = 0
            # Iterats through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images.view(-1,28*28))
            
                # Forward pass only to get logits/output
                outputs = model(images)

                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data,1)

                # Total number of labels
                total += labels.size(0)

                # Total correcnt predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct / total
        
            ## Print loss
            print('Iteration: {}, Loss: {}, Accuracy:{}'.format(iter,loss.data,accuracy))

Iteration: 500, Loss: 0.3404415249824524, Accuracy:90
Iteration: 1000, Loss: 0.34508755803108215, Accuracy:92
Iteration: 1500, Loss: 0.4050217568874359, Accuracy:93
Iteration: 2000, Loss: 0.1702473759651184, Accuracy:93
Iteration: 2500, Loss: 0.12349386513233185, Accuracy:94
Iteration: 3000, Loss: 0.2238638550043106, Accuracy:95


#### Model C: 1 Hidden Layer Feedforward Neural Network (ReLU Activation)

Steps:
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- <b> Step 3: Create Model Class </b> - We will make changes in this step - changing the activation function
- Step 4: Instantiate Model Class
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

In [15]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

batch_size = 100 # 100 images at time
n_iters = 3000 # 1 epochs = number of iterations
num_epochs = n_iters / (len(train_dataset)/ batch_size)
num_epochs = int(num_epochs)

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)
test_dataset = dsets.MNIST(root='./data',
                               train=False,
                               transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100 # 100 images at time
n_iters = 3000 # 1 epochs = number of iterations
num_epochs = n_iters / (len(train_dataset)/ batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)
'''
STEP 3: CREATE MODEL CLASS
'''

class FeedforwardNeuralNetModel(nn.Module): # pyTorch stuff
    def __init__(self,input_size,hidden_size,num_classes): # hiden_size - nonliearity dimension
        super(FeedforwardNeuralNetModel,self).__init__()
        #Linear function
        self.fc1 = nn.Linear(input_dim, hidden_dim) # Input - > hidden
        #Non-linearity
        self.relu = nn.ReLU() ################################ ! Here is the change to tanh activation function
        # Linear function(readout) -> hidden -> output
        self.fc2 = nn.Linear(hidden_dim, output_dim)

# running functions
    def forward(self,x):
        # LINEAR 
        out = self.fc1(x)
        # NON-LINEAR       
        out = self.relu(out) ################################ ! Here is the change to tanh activation function
        # LINEAR - readout
        out = self.fc2(out)
        return out
'''
STEP 4: INSTANTIATE MODEL CLASS
'''
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)
'''
STEP 5: INSTANTIATE LOSS CLASS
'''

criterion = nn.CrossEntropyLoss()

'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''

learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
STEP 7: TRAIN MODEL
'''
iter = 0
# Now 5 epochs in this case
for epoch in range(num_epochs):
    
    # 60k images loaded 100 by 100 <- batch_size
    for i, (images, labels) in enumerate(train_loader):
        # 1 ITERATION = 1 CYCLE
        # Load images as Variable
        images = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        # Don't want to accumulate gradients
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter +=1
        if iter % 500 == 0:
            # Calculate accuracy
            correct = 0
            total = 0
            # Iterats through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images.view(-1,28*28))
            
                # Forward pass only to get logits/output
                outputs = model(images)

                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data,1)

                # Total number of labels
                total += labels.size(0)

                # Total correcnt predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct / total
        
            ## Print loss
            print('Iteration: {}, Loss: {}, Accuracy:{}'.format(iter,loss.data,accuracy))

Iteration: 500, Loss: 0.31928738951683044, Accuracy:90
Iteration: 1000, Loss: 0.3033466041088104, Accuracy:93
Iteration: 1500, Loss: 0.36112144589424133, Accuracy:93
Iteration: 2000, Loss: 0.16185984015464783, Accuracy:94
Iteration: 2500, Loss: 0.11418130993843079, Accuracy:95
Iteration: 3000, Loss: 0.23605427145957947, Accuracy:95


#### Model D: 2 Hidden Layer Feedforward Neural Network (ReLU Activation)

Steps:
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- <b> Step 3: Create Model Class </b> - We will make changes in this step - adding 1 more layer
- Step 4: Instantiate Model Class
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

In [3]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)
test_dataset = dsets.MNIST(root='./data',
                               train=False,
                               transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100 # 100 images at time
n_iters = 3000 # 1 epochs = number of iterations
num_epochs = n_iters / (len(train_dataset)/ batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)
'''
STEP 3: CREATE MODEL CLASS
'''

class FeedforwardNeuralNetModel(nn.Module): # pyTorch stuff
    def __init__(self,input_size,hidden_size,num_classes): # hiden_size - nonliearity dimension
        super(FeedforwardNeuralNetModel,self).__init__()
       
        # Linear function 1: 784 --> 100
        self.fc1 = nn.Linear(input_dim, hidden_dim) # Input - > hidden        
        # Non-linearity 1
        self.relu1 = nn.ReLU() 
        
        #Linear fuction 2: 100-> 100
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)   ###### ADDING 1 more hidden layer !!!!!!
        #Non-linearity2
        self.relu2= nn.ReLU()             ###### ADDING 1 more hidden layer !!!!!!
        
        # Linear function(readout) 3-> hidden -> output --> 100 -> 10
        self.fc3 = nn.Linear(hidden_dim, output_dim)

# running functions
    def forward(self,x):
        # LINEAR f1
        out = self.fc1(x)
        # NON-LINEAR       
        out = self.relu1(out)
        
        # LINEAR f2
        out = self.fc2(out)
        # NON-LINEAR       
        out = self.relu2(out)        
        
        # LINEAR - readout
        out = self.fc3(out)
        return out
'''
STEP 4: INSTANTIATE MODEL CLASS
'''
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)
'''
STEP 5: INSTANTIATE LOSS CLASS
'''

criterion = nn.CrossEntropyLoss()

'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''

learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
STEP 7: TRAIN MODEL
'''
iter = 0
# Now 5 epochs in this case
for epoch in range(num_epochs):
    
    # 60k images loaded 100 by 100 <- batch_size
    for i, (images, labels) in enumerate(train_loader):
        # 1 ITERATION = 1 CYCLE
        # Load images as Variable
        images = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        # Don't want to accumulate gradients
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter +=1
        if iter % 500 == 0:
            # Calculate accuracy
            correct = 0
            total = 0
            # Iterats through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images.view(-1,28*28))
            
                # Forward pass only to get logits/output
                outputs = model(images)

                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data,1)

                # Total number of labels
                total += labels.size(0)

                # Total correcnt predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct / total
        
            ## Print loss
            print('Iteration: {}, Loss: {}, Accuracy:{}'.format(iter,loss.data,accuracy))

Iteration: 500, Loss: 0.24639907479286194, Accuracy:90
Iteration: 1000, Loss: 0.28348368406295776, Accuracy:93
Iteration: 1500, Loss: 0.30106884241104126, Accuracy:94
Iteration: 2000, Loss: 0.12275117635726929, Accuracy:95
Iteration: 2500, Loss: 0.06751161813735962, Accuracy:95
Iteration: 3000, Loss: 0.2389032393693924, Accuracy:96


#### Model D: 3 Hidden Layer Feedforward Neural Network (ReLU Activation)

Steps:
- Step 1: Load Dataset
- Step 2: Make Dataset Iterable
- <b> Step 3: Create Model Class </b> - We will make changes in this step - adding 1 more layer
- Step 4: Instantiate Model Class
- Step 5: Instantiate Loss Class
- Step 6: Instantiate Optimizer Class
- Step 7: Train Model

In [5]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)
test_dataset = dsets.MNIST(root='./data',
                               train=False,
                               transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100 # 100 images at time
n_iters = 3000 # 1 epochs = number of iterations
num_epochs = n_iters / (len(train_dataset)/ batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)
'''
STEP 3: CREATE MODEL CLASS
'''

class FeedforwardNeuralNetModel(nn.Module): # pyTorch stuff
    def __init__(self,input_size,hidden_size,num_classes): # hiden_size - nonliearity dimension
        super(FeedforwardNeuralNetModel,self).__init__()
       
        # Linear function 1: 784 --> 100
        self.fc1 = nn.Linear(input_dim, hidden_dim)      
        # Non-linearity 1
        self.relu1 = nn.ReLU() 
        
        #Linear fuction 2: 100-> 100
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)   
        #Non-linearity2
        self.relu2= nn.ReLU()  
        
        #Linear fuction 3: 100-> 100
        self.fc3 = nn.Linear(hidden_dim, hidden_dim)   ###### ADDING 1 more hidden layer !!!!!!
        #Non-linearity3
        self.relu3= nn.ReLU()             ###### ADDING 1 more hidden layer !!!!!!
        
        # Linear function(readout) 3-> hidden -> output --> 100 -> 10
        self.fc4 = nn.Linear(hidden_dim, output_dim)

# running functions
    def forward(self,x):
        # LINEAR f1
        out = self.fc1(x)
        # NON-LINEAR       
        out = self.relu1(out)
        
        # LINEAR f2
        out = self.fc2(out)
        # NON-LINEAR       
        out = self.relu2(out)  
        
        # LINEAR f3
        out = self.fc3(out)
        # NON-LINEAR       
        out = self.relu3(out)
        
        # LINEAR - readout
        out = self.fc4(out)
        return out
'''
STEP 4: INSTANTIATE MODEL CLASS
'''
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)
'''
STEP 5: INSTANTIATE LOSS CLASS
'''

criterion = nn.CrossEntropyLoss()

'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''

learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
STEP 7: TRAIN MODEL
'''
iter = 0
# Now 5 epochs in this case
for epoch in range(num_epochs):
    
    # 60k images loaded 100 by 100 <- batch_size
    for i, (images, labels) in enumerate(train_loader):
        # 1 ITERATION = 1 CYCLE
        # Load images as Variable
        images = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        # Don't want to accumulate gradients
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter +=1
        if iter % 500 == 0:
            # Calculate accuracy
            correct = 0
            total = 0
            # Iterats through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                images = Variable(images.view(-1,28*28))
            
                # Forward pass only to get logits/output
                outputs = model(images)

                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data,1)

                # Total number of labels
                total += labels.size(0)

                # Total correcnt predictions
                correct += (predicted == labels).sum()
            
            accuracy = 100 * correct / total
        
            ## Print loss
            print('Iteration: {}, Loss: {}, Accuracy:{}'.format(iter,loss.data,accuracy))

Iteration: 500, Loss: 0.2807856500148773, Accuracy:89
Iteration: 1000, Loss: 0.2270146906375885, Accuracy:93
Iteration: 1500, Loss: 0.2654840350151062, Accuracy:94
Iteration: 2000, Loss: 0.0961790457367897, Accuracy:96
Iteration: 2500, Loss: 0.04602858051657677, Accuracy:96
Iteration: 3000, Loss: 0.20235863327980042, Accuracy:96


#### Deep Learning
- 2 ways to expand a neural network
    - More non-linear activation units (neurons)
    - More hidden layers
- Cons
    - Need a larger dataset
        - Curse of dimensionality
    - Does not necessarily mean higher accuracy
    
### 3. Building a Feedforward Neural Network with PyTorch (GPU)
- GPU : 2 things must be on GPU
    - model
    - variables
    - you can do it with $.cuda()$ function
    - add $.cuda()$ in <b> Step 4</b> and <b> Step 7</b>

In [6]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable

'''
STEP 1: LOADING DATASET
'''

train_dataset = dsets.MNIST(root='./data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)
test_dataset = dsets.MNIST(root='./data',
                               train=False,
                               transform=transforms.ToTensor())

'''
STEP 2: MAKING DATASET ITERABLE
'''

batch_size = 100 # 100 images at time
n_iters = 3000 # 1 epochs = number of iterations
num_epochs = n_iters / (len(train_dataset)/ batch_size)
num_epochs = int(num_epochs)

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)
'''
STEP 3: CREATE MODEL CLASS
'''

class FeedforwardNeuralNetModel(nn.Module): # pyTorch stuff
    def __init__(self,input_size,hidden_size,num_classes): # hiden_size - nonliearity dimension
        super(FeedforwardNeuralNetModel,self).__init__()
       
        # Linear function 1: 784 --> 100
        self.fc1 = nn.Linear(input_dim, hidden_dim)      
        # Non-linearity 1
        self.relu1 = nn.ReLU() 
        
        #Linear fuction 2: 100-> 100
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)   
        #Non-linearity2
        self.relu2= nn.ReLU()  
        
        #Linear fuction 3: 100-> 100
        self.fc3 = nn.Linear(hidden_dim, hidden_dim)   ###### ADDING 1 more hidden layer !!!!!!
        #Non-linearity3
        self.relu3= nn.ReLU()             ###### ADDING 1 more hidden layer !!!!!!
        
        # Linear function(readout) 3-> hidden -> output --> 100 -> 10
        self.fc4 = nn.Linear(hidden_dim, output_dim)

# running functions
    def forward(self,x):
        # LINEAR f1
        out = self.fc1(x)
        # NON-LINEAR       
        out = self.relu1(out)
        
        # LINEAR f2
        out = self.fc2(out)
        # NON-LINEAR       
        out = self.relu2(out)  
        
        # LINEAR f3
        out = self.fc3(out)
        # NON-LINEAR       
        out = self.relu3(out)
        
        # LINEAR - readout
        out = self.fc4(out)
        return out
'''
STEP 4: INSTANTIATE MODEL CLASS
'''
input_dim = 28*28
hidden_dim = 100
output_dim = 10

model = FeedforwardNeuralNetModel(input_dim, hidden_dim, output_dim)

# move model to cuda
if torch.cuda.is_available():
    model.cuda()
'''
STEP 5: INSTANTIATE LOSS CLASS
'''

criterion = nn.CrossEntropyLoss()

'''
STEP 6: INSTANTIATE OPTIMIZER CLASS
'''

learning_rate = 0.1

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

'''
STEP 7: TRAIN MODEL
'''
iter = 0
# Now 5 epochs in this case
for epoch in range(num_epochs):  
    # 60k images loaded 100 by 100 <- batch_size
    for i, (images, labels) in enumerate(train_loader):
        # 1 ITERATION = 1 CYCLE
        # Load images as Variable
        
        #####################
        # USE GPU FOR MODEL #
        #####################
        if torch.cuda.is_available():
            images = Variable(images.view(-1, 28*28).cuda())
            labels = Variable(labels.cuda())
        else:
            images = Variable(images.view(-1, 28*28))
            labels = Variable(labels)
        
        # Clear gradients w.r.t. parameters
        # Don't want to accumulate gradients
        optimizer.zero_grad()
        
        # Forward pass to get output/logits
        outputs = model(images)
        
        # Calculate Loss: softmax --> cross entropy loss
        loss = criterion(outputs, labels)
        
        # Getting gradients w.r.t. parameters
        loss.backward()
        
        # Updating parameters
        optimizer.step()
        
        iter +=1
        if iter % 500 == 0:
            # Calculate accuracy
            correct = 0
            total = 0
            # Iterats through test dataset
            for images, labels in test_loader:
                # Load images to a Torch Variable
                
                #####################
                # USE GPU FOR MODEL #
                #####################
                if torch.cuda.is_available():
                        images = Variable(images.view(-1,28*28).cuda())
                else:
                        images = Variable(images.view(-1,28*28))
            
                # Forward pass only to get logits/output
                outputs = model(images)

                # Get predictions from the maximum value
                _, predicted = torch.max(outputs.data,1)

                # Total number of labels
                total += labels.size(0)
                
                #####################
                # USE GPU FOR MODEL #
                #####################
                # Total correcnt predictions
                if torch.cuda.is_available():
                    correct += (predicted.cpu() == labels.cpu()).sum()
                else:
                    correct += (predicted == labels).sum()
            
            accuracy = 100 * correct / total
        
            ## Print loss
            print('Iteration: {}, Loss: {}, Accuracy:{}'.format(iter,loss.data,accuracy))

Iteration: 500, Loss: 0.2634340524673462, Accuracy:89
Iteration: 1000, Loss: 0.2613889276981354, Accuracy:93
Iteration: 1500, Loss: 0.25126415491104126, Accuracy:94
Iteration: 2000, Loss: 0.11134683340787888, Accuracy:96
Iteration: 2500, Loss: 0.038412827998399734, Accuracy:96
Iteration: 3000, Loss: 0.22977958619594574, Accuracy:96
