# CSE6250BDH Deep Learning Labs
## 1. Feed-forward Neural Network

In this chapter, we will learn how to implement a feed-forward neural network by using PyTorch.
Also, we will use the dataset generated from the previous lab [Spark-mllib](http://www.sunlab.org/teaching/cse6250/fall2017/lab/spark-mllib/#Scikit-learn). If you have not completed that part, please complete it first.

### Preparing the dataset

First, import packages we need and load data generated from the previous lab series.

In [1]:
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_svmlight_file
from sklearn.preprocessing import MaxAbsScaler
from sklearn.metrics import roc_curve, auc

X, y = load_svmlight_file("patients.svmlight")
X = X.toarray() # make it dense

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=41)

In this lab, We will use features scaled into values between 0 and 1.

In [2]:
scaler = MaxAbsScaler().fit(X_train)
X_train_transformed = scaler.transform(X_train)
X_test_transformed = scaler.transform(X_test)

### Feedforward Neural Network

Now, we will train a feed-forward neural network by using PyTorch. We will do the following steps in order:

1. Load the training and test datasets using DataLoader
2. Define a Feedforwad Neural Network
3. Define a loss function
4. Train the network on the training data
5. Test the network on the test data

#### 1. Loading datasets
We will use DataLoader and TensorDataset (from [torch.utils.data](http://pytorch.org/docs/master/data.html#)) for convinience in data handling. You can create your custom dataset class by inheriting Dataset with some required member functions.

In [3]:
import torch
from torch.utils.data import DataLoader, TensorDataset

# lets fix the random seeds for reproducibility.
torch.manual_seed(0)
if torch.cuda.is_available():
    torch.cuda.manual_seed(0)

trainset = TensorDataset(torch.from_numpy(X_train_transformed.astype('float32')), torch.from_numpy(y_train.astype('float32')).view(-1,1))
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4, shuffle=True, num_workers=2)

testset = TensorDataset(torch.from_numpy(X_test_transformed.astype('float32')), torch.from_numpy(y_test.astype('float32')).view(-1,1))
testloader = torch.utils.data.DataLoader(testset, batch_size=4, shuffle=False, num_workers=2)

Let's check some training samples

In [4]:
# get some random training samples
dataiter = iter(trainloader)
records, labels = dataiter.next()

print(records)
print(labels)


 0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
 0.0000  0.0000  0.0000  ...   0.0000  0.0000  0.0000
[torch.FloatTensor of size 4x9978]


 1
 1
 0
 1
[torch.FloatTensor of size 4x1]



#### 2. Define a Feed-forward Neural Network

Next, we will define a model, feed-forward neural network for this chapter..
For simplicity, we will use 3-layer, 2 hidden layers with 1 hidden-to-output layer, feed-forward net. Each layer is a fully-connected layer where the module `torch.nn.Linear` is a implementation of it. Also, we will apply RELU activation for each layer.

Basically, we are required to define a member method of `forward(self, x)` when we define a class for any customized network. It represents a forward pass of a computational graph and a backward pass (back-propagation) with automatic differentiation will be performed later based on this forward definition.

Usually, we define layers of entire network structure at the constructor of the class `__init__` with some arguments. Then, define `forward` function for forward computation based on the layers defined in the constructor.

In [5]:
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F


class FeedForwardNet(nn.Module):
    def __init__(self, n_input, n_hidden, n_output):
        super(FeedForwardNet, self).__init__()
        self.hidden1 = nn.Linear(n_input, n_hidden)
        self.hidden2 = nn.Linear(n_hidden, n_hidden)
        self.out = nn.Linear(n_hidden, n_output)

    def forward(self, x):
        x = F.relu(self.hidden1(x))
        x = F.relu(self.hidden2(x))
        x = self.out(x)
        return x

net = FeedForwardNet(n_input=9978, n_hidden=256, n_output=1)

#### 3. Define a Loss function and Optimizer
We will use Binary Cross Entropy loss and SGD with momentum as our optimizer.
PyTorch provide BCEWithLogitsLoss loss function which combines a Sigmoid layer and the BCEloss together and it is more numerically stable than using them separately. **Keep in mind that you should not apply sigmoid activation after the output layer to use this combined loss.** See the last computation in `forward` function above.

When we create an optimizer in PyTorch, we need to pass parameters that we want to optimize (train) as input arguments. We can retrieve all trainable parameters of the model by calling `MODEL.parameters()`.

In [6]:
import torch.optim as optim

criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

#### 4. Train the network

Now, we will actually train the model.
For each full coverage of train dataset, we just need to do a forward pass computation with a mini-batch of dataset and a backward pass to compute gradients followed by a step of optimization.
We need to do this for a reasonable number of iterations.

In [7]:
for epoch in range(10):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs
        inputs, labels = data

        # wrap them in Variable
        inputs, labels = Variable(inputs), Variable(labels)

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        # backward
        loss.backward()
        # optimize
        optimizer.step()

        # print statistics
        running_loss += loss.data[0]
        
        if i % 10 == 9:    # print every 10 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 10))
            running_loss = 0.0

print('Finished Training')

[1,    10] loss: 0.687
[1,    20] loss: 0.689
[1,    30] loss: 0.683
[1,    40] loss: 0.683
[1,    50] loss: 0.674
[1,    60] loss: 0.670
[2,    10] loss: 0.684
[2,    20] loss: 0.648
[2,    30] loss: 0.664
[2,    40] loss: 0.651
[2,    50] loss: 0.675
[2,    60] loss: 0.681
[3,    10] loss: 0.662
[3,    20] loss: 0.680
[3,    30] loss: 0.666
[3,    40] loss: 0.652
[3,    50] loss: 0.637
[3,    60] loss: 0.641
[4,    10] loss: 0.646
[4,    20] loss: 0.661
[4,    30] loss: 0.661
[4,    40] loss: 0.626
[4,    50] loss: 0.624
[4,    60] loss: 0.677
[5,    10] loss: 0.659
[5,    20] loss: 0.658
[5,    30] loss: 0.618
[5,    40] loss: 0.645
[5,    50] loss: 0.646
[5,    60] loss: 0.633
[6,    10] loss: 0.633
[6,    20] loss: 0.656
[6,    30] loss: 0.653
[6,    40] loss: 0.620
[6,    50] loss: 0.607
[6,    60] loss: 0.666
[7,    10] loss: 0.664
[7,    20] loss: 0.594
[7,    30] loss: 0.626
[7,    40] loss: 0.628
[7,    50] loss: 0.663
[7,    60] loss: 0.639
[8,    10] loss: 0.700
[8,    20] 

#### 5. Test the network on the test data

As we do always, we will calculate a test set performance.
To utilize scikit-learn pacakges, we need to convert PyTorch Tensor to Numpy ndarray by simply calling `TENSOR.numpy()`. **Note again, Tensor and corresponding ndarray share the memory.**

In [8]:
y_true = []
y_scores = []

In [9]:
for data in testloader:
    inputs, labels = data
    outputs = net(Variable(inputs))
    outputs = F.sigmoid(outputs)
    y_true.extend(labels.numpy().flatten().tolist())
    y_scores.extend(outputs.data.numpy().flatten().tolist())

In [10]:
fpr, tpr, _ = roc_curve(y_true, y_scores)
auc_ffnet = auc(fpr, tpr)
auc_ffnet

0.78042328042328046

We have finished a simple example to be familiar with PyTorch. Let's try other types of neural networks in the following chapters.

### Exercise 1: Try to use GPU if you have one

### Exercise 2: How is the result comparing to the previous lab, e.g. SVM? Is it better or worse? If it is worse, can you improve the performance of the network?

### Exercise 3: How could you check whether the network underfit, overfit or well-fit?