Logistic regression can represent linear functions but not non-linear functions

![title](resource/logistic.png)
![title](resource/nn.png)

## Non-linear function
- takes a number & perform mathematical operation
    - ReLUs
    - Sigmoid
    - Tanh
    
#### Sigmoid (Logistic)
- $\sigma(x) = \frac{1}{1 + e^{-x}}$
- Input number $\rightarrow$ [0, 1]
    - Large negative number $\rightarrow$ 0
    - Large positive number $\rightarrow$ 1
- Cons: 
    1. Activation saturates at 0 or 1 with **gradients $\approx$ 0**
        - No signal to update weights $\rightarrow$ **cannot learn**
        - Solution: Have to carefully initialize weights to prevent this
    2. Outputs not centered around 0 
        - If output always positive $\rightarrow$ gradients always positive or negative $\rightarrow$ **bad for gradient updates** 

#### Tanh
- $\tanh(x) = 2 \sigma(2x) -1$
    - A scaled sigmoid function
- Input number $\rightarrow$ [-1, 1]
- Cons: 
    1. Activation saturates at 0 or 1 with **gradients $\approx$ 0**
        - No signal to update weights $\rightarrow$ **cannot learn**
        - **Solution**: Have to carefully initialize weights to prevent this

 
#### ReLUs
- $f(x) = \max(0, x)$
- Pros:
    1. Accelerates convergence $\rightarrow$ **train faster**
    2. **Less computationally expensive operation** compared to Sigmoid/Tanh exponentials
- Cons:
    1. Many ReLU units "die" $\rightarrow$ **gradients = 0** forever
        - **Solution**: careful learning rate choice
      

### 1 Hidden layer Feedforward NN with sigmoid

1. load dataset
2. make dataset iterable
3. create model & instantiate
4. instantiate loss
5. instantiate optimizer
6. train

In [1]:
import torch
import torch.nn as nn
import torchvision.transforms as transforms
import torchvision.datasets as dsets
from torch.autograd import Variable
import datetime
import sys

In [2]:
train_dataset = dsets.MNIST(root='./data',
                           train=True,
                           transform=transforms.ToTensor(),
                           download=True)
test_dataset = dsets.MNIST(root='./data',
                           train=False,
                           transform=transforms.ToTensor())

In [3]:
batch_size = 100
n_iters = 3000
num_epochs = int(n_iters / (len(train_dataset) / batch_size))

train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                          batch_size=batch_size,
                                          shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)

In [4]:
class FeedfowardNNModel(nn.Module):
    def __init__(self, input_size, hidden_dim, num_class):
        super(FeedfowardNNModel, self).__init__()
        #Linear function
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        #Non-linear
        self.sigmoid = nn.Sigmoid()
        # Linear
        self.fc2 = nn.Linear(hidden_dim, num_class)
        
    def forward(self, x):
        # Linear
        out = self.fc1(x)
        # Non- Linear
        out = self.sigmoid(out)
        # linear
        return self.fc2(out)

- input dimension = 28*28 = 784
- ouput dimension = 10
- hidden dimension : 100 (number of neurons, number of non-linear activation functions)

In [5]:
input_dim = 28*28
output_dim = 10
hidden_dim = 100

model = FeedfowardNNModel(input_dim ,hidden_dim, output_dim)

Loss class for FNN : Cross Entropy Loss
#### pytorch computes softmax and cross entropy loss simultaneously

In [6]:
criterion = nn.CrossEntropyLoss()

In [7]:
learning_rate = 0.1
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [8]:
print(model.parameters())

<generator object Module.parameters at 0x7f3c595aed58>


In [9]:
# parameters
print(len(list(model.parameters())))

# hidden layer parameters
print(list(model.parameters())[0].size())

# FC1 bias
print(list(model.parameters())[1].size())

# FC2 parameters
print(list(model.parameters())[2].size())

# FC 2 bias
print(list(model.parameters())[3].size())

4
torch.Size([100, 784])
torch.Size([100])
torch.Size([10, 100])
torch.Size([10])


![title](resource/dot.png)

In [10]:
# train

start_time = datetime.datetime.now()

iter = 0
for e in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        optimizer.zero_grad()
        
        outputs = model(images)
        
        loss = criterion(outputs, labels)
        
        loss.backward()
        
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            correct = 0
            total = 0
            
            for images, labels in test_loader:
                images = Variable(images.view(-1, 28*28))
                
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                
                total += labels.size(0)
                correct += (predicted==labels).sum()
                
            accuracy = 100 * correct / total
            print('Iteration : {},  Loss: {}, Accuracy {}'.format(iter, loss.data.item(), accuracy))

sys.stdout.write('Time '+ str(datetime.datetime.now() - start_time))

Iteration : 500,  Loss: 0.6108832359313965, Accuracy 85
Iteration : 1000,  Loss: 0.4910432696342468, Accuracy 89
Iteration : 1500,  Loss: 0.3296413719654083, Accuracy 90
Iteration : 2000,  Loss: 0.43148380517959595, Accuracy 91
Iteration : 2500,  Loss: 0.28478577733039856, Accuracy 91
Iteration : 3000,  Loss: 0.22945664823055267, Accuracy 92
Time 0:00:35.293384

## Tanh

In [11]:
class FeedfowardNNModelWT(nn.Module):
    def __init__(self, input_size, hidden_dim, num_class):
        super(FeedfowardNNModelWT, self).__init__()
        #Linear function
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        #Non-linear
        self.tanh = nn.Tanh()
        # Linear
        self.fc2 = nn.Linear(hidden_dim, num_class)
        
    def forward(self, x):
        # Linear
        out = self.fc1(x)
        # Non- Linear
        out = self.tanh(out)
        # linear
        return self.fc2(out)

In [12]:
# train

start_time = datetime.datetime.now()
model = FeedfowardNNModelWT(input_dim, hidden_dim, output_dim)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

iter = 0
for e in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        optimizer.zero_grad()
        
        outputs = model(images)
        
        loss = criterion(outputs, labels)
        
        loss.backward()
        
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            correct = 0
            total = 0
            
            for images, labels in test_loader:
                images = Variable(images.view(-1, 28*28))
                
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                
                total += labels.size(0)
                correct += (predicted==labels).sum()
                
            accuracy = 100 * correct / total
            print('Iteration : {},  Loss: {}, Accuracy {}'.format(iter, loss.data.item(), accuracy))

sys.stdout.write('Time '+ str(datetime.datetime.now() - start_time))

Iteration : 500,  Loss: 0.2955114245414734, Accuracy 91
Iteration : 1000,  Loss: 0.16988371312618256, Accuracy 92
Iteration : 1500,  Loss: 0.45927196741104126, Accuracy 93
Iteration : 2000,  Loss: 0.18294626474380493, Accuracy 94
Iteration : 2500,  Loss: 0.1142941266298294, Accuracy 94
Iteration : 3000,  Loss: 0.11751607060432434, Accuracy 95
Time 0:00:34.897965

## ReLU

In [13]:
class FeedfowardNNModelWRe(nn.Module):
    def __init__(self, input_size, hidden_dim, num_class):
        super(FeedfowardNNModelWRe, self).__init__()
        #Linear function
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        #Non-linear
        self.relu = nn.ReLU()
        # Linear
        self.fc2 = nn.Linear(hidden_dim, num_class)
        
    def forward(self, x):
        # Linear
        out = self.fc1(x)
        # Non- Linear
        out = self.relu(out)
        # linear
        return self.fc2(out)

In [14]:
# train

start_time = datetime.datetime.now()
model = FeedfowardNNModelWRe(input_dim, hidden_dim, output_dim)

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

iter = 0
for e in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        optimizer.zero_grad()
        
        outputs = model(images)
        
        loss = criterion(outputs, labels)
        
        loss.backward()
        
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            correct = 0
            total = 0
            
            for images, labels in test_loader:
                images = Variable(images.view(-1, 28*28))
                
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                
                total += labels.size(0)
                correct += (predicted==labels).sum()
                
            accuracy = 100 * correct / total
            print('Iteration : {},  Loss: {}, Accuracy {}'.format(iter, loss.data.item(), accuracy))

sys.stdout.write('Time '+ str(datetime.datetime.now() - start_time))

Iteration : 500,  Loss: 0.2781760096549988, Accuracy 91
Iteration : 1000,  Loss: 0.1956309825181961, Accuracy 92
Iteration : 1500,  Loss: 0.11842720955610275, Accuracy 93
Iteration : 2000,  Loss: 0.35106608271598816, Accuracy 94
Iteration : 2500,  Loss: 0.20275720953941345, Accuracy 95
Iteration : 3000,  Loss: 0.13850967586040497, Accuracy 95
Time 0:00:34.425624

## 2 Hidden layer FNN

In [15]:
class TFeedforwardNNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dime):
        super(TFeedforwardNNModel, self).__init__()
        # 784 -> 100
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.relu1 = nn.ReLU()
        
        # 100 -> 100
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        self.relu2 = nn.ReLU()
        
        # 100 -> 10
        self.fc3 = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu1(out)
        
        out = self.fc2(out)
        out = self.relu2(out)
        
        return self.fc3(out)

In [20]:
model = TFeedforwardNNModel(input_dim, hidden_dim, output_dim)

# train
start_time = datetime.datetime.now()

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

iter = 0
for e in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        optimizer.zero_grad()
        
        outputs = model(images)
        
        loss = criterion(outputs, labels)
        
        loss.backward()
        
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            correct = 0
            total = 0
            
            for images, labels in test_loader:
                images = Variable(images.view(-1, 28*28))
                
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                
                total += labels.size(0)
                correct += (predicted==labels).sum()
                
            accuracy = 100 * correct / total
            print('Iteration : {},  Loss: {}, Accuracy {}'.format(iter, loss.data.item(), accuracy))

sys.stdout.write('Time '+ str(datetime.datetime.now() - start_time))

Iteration : 500,  Loss: 0.24669824540615082, Accuracy 91
Iteration : 1000,  Loss: 0.35514146089553833, Accuracy 92
Iteration : 1500,  Loss: 0.17534667253494263, Accuracy 94
Iteration : 2000,  Loss: 0.12188761681318283, Accuracy 95
Iteration : 2500,  Loss: 0.16846801340579987, Accuracy 96
Iteration : 3000,  Loss: 0.16696247458457947, Accuracy 96
Time 0:00:37.383510

## 3 Hidden layer FNN

In [22]:
class ThFeedforwardNNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dime):
        super(ThFeedforwardNNModel, self).__init__()
        # 784 -> 100
        self.fc1 = nn.Linear(input_dim, hidden_dim)
        self.relu1 = nn.ReLU()
        
        # 100 -> 100
        self.fc2 = nn.Linear(hidden_dim, hidden_dim)
        self.relu2 = nn.ReLU()
        
        # 100 -> 100
        self.fc3 = nn.Linear(hidden_dim, hidden_dim)
        self.relu3 = nn.ReLU()
        
        # 100 -> 10
        self.fc4 = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu1(out)
        
        out = self.fc2(out)
        out = self.relu2(out)
        
        out = self.fc3(out)
        out = self.relu3(out)
        
        return self.fc4(out)

In [23]:
model = ThFeedforwardNNModel(input_dim, hidden_dim, output_dim)

# train
start_time = datetime.datetime.now()

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

iter = 0
for e in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = Variable(images.view(-1, 28*28))
        labels = Variable(labels)
        
        optimizer.zero_grad()
        
        outputs = model(images)
        
        loss = criterion(outputs, labels)
        
        loss.backward()
        
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            correct = 0
            total = 0
            
            for images, labels in test_loader:
                images = Variable(images.view(-1, 28*28))
                
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                
                total += labels.size(0)
                correct += (predicted==labels).sum()
                
            accuracy = 100 * correct / total
            print('Iteration : {},  Loss: {}, Accuracy {}'.format(iter, loss.data.item(), accuracy))

sys.stdout.write('Time '+ str(datetime.datetime.now() - start_time))

Iteration : 500,  Loss: 0.3227856457233429, Accuracy 91
Iteration : 1000,  Loss: 0.17028804123401642, Accuracy 93
Iteration : 1500,  Loss: 0.091800756752491, Accuracy 95
Iteration : 2000,  Loss: 0.09865974634885788, Accuracy 95
Iteration : 2500,  Loss: 0.2647742033004761, Accuracy 96
Iteration : 3000,  Loss: 0.08425090461969376, Accuracy 96
Time 0:00:39.460579

## Deep Learning
- 2 ways to expand NN
    - more neurons (non-linear activation units)
    - more hidden layers

- cons : need larger datasets (curse of dimensionality)

### GPU

In [27]:
model = ThFeedforwardNNModel(input_dim, hidden_dim, output_dim)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)

start_time = datetime.datetime.now()

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

iter = 0
for e in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.view(-1, 28*28).requires_grad_().to(device)
        labels = labels.to(device)
#         images = images.view(-1, 28*28).cuda()
#         labels = labels.cuda()
        
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        iter += 1
        
        if iter % 500 == 0:
            correct = 0
            total = 0
            
            for images, labels in test_loader:
                images = Variable(images.view(-1, 28*28))
                
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                
                total += labels.size(0)
                if torch.cuda.is_available():
                    correct += (predicted.cpu()==labels.cpu()).sum()
                else:
                    correct += (predicted==labels).sum()
                
            accuracy = 100 * correct / total
            print('Iteration : {},  Loss: {}, Accuracy {}'.format(iter, loss.data.item(), accuracy))

sys.stdout.write('Time '+ str(datetime.datetime.now() - start_time))

Iteration : 500,  Loss: 0.424159437417984, Accuracy 89
Iteration : 1000,  Loss: 0.18057289719581604, Accuracy 94
Iteration : 1500,  Loss: 0.2304164320230484, Accuracy 94
Iteration : 2000,  Loss: 0.1671176701784134, Accuracy 95
Iteration : 2500,  Loss: 0.11628420650959015, Accuracy 96
Iteration : 3000,  Loss: 0.1326976865530014, Accuracy 96
Time 0:00:41.997669