# Wide and Deep Neural Network

## more than one attribute

### Graduate School admission problem

GPA, experience

$ \text{x_data} = \left[ {\begin{array}{cc}
   2.1 & 0.1 \\
   4.2 & 0.8 \\
   3.1 & 0.9 \\
   3.3 & 0.2 \\
  \end{array} } \right]
  $   $ \text{y_data} = \left[ {\begin{array}{c}
   0.0 \\
   1.0 \\
   0.0 \\
   1.0\\
  \end{array} } \right]
  $
  
  $$ wX = \hat{y} $$
  $$ 
  \left[ {\begin{array}{cc}
   w_1 &
   w_2 \\
  \end{array} } \right]
  \left[ {\begin{array}{cccc}
   x_{11} & x_{21} & x_{31} & x_{41} \\
   x_{12} & x_{22} & x_{32} & x_{42} \\
  \end{array} } \right]
   = 
   \left[ {\begin{array}{cccc}
   y_1 &
   y_2 &
   y_3 &
   y_4
  \end{array} } \right]
  $$

In [1]:
# linear = torch.nn.Linear(2,1)
# y_pred = linear(x_data)

### Go Wide

  $$
  \left[ {\begin{array}{c}
   x_1 \\
   x_2 \\
   \cdots \\
   x_n \\
  \end{array} } \right] \implies \text{Linear} \implies \text{Sigmoid} \implies \hat{y}
  $$

### Go Deep

  $$
  \ X \implies \text{Linear} \implies \text{Sigmoid} \implies \text{Linear} \implies \text{Sigmoid} \implies \cdots \implies \text{Linear} \implies \text{Sigmoid} \implies \hat{y}
  $$

In [2]:
# sigmoid = torch.nn.Sigmoid()

# l1 = torch.nn.Linear(2,4)
# l2 = torch.nn.Linear(4,3)
# l3 = torch.nn.Linear(3,1)

# out1 = sigmoid(l1(x_data))
# out2 = sigmoid(l2(out1))
# y_pred = sigmoid(l3(out2))

### Sigmoid Activation Function causes Vanishing Gradient Problem

gradient of sigmoid activation function, $ a * ( 1-a)$, always < 1, therefore decrease as it backpropagates !! 

<img src="http://img1.daumcdn.net/thumb/R1920x0/?fname=http%3A%2F%2Fcfile1.uf.tistory.com%2Fimage%2F26698E4F592AE818420518" >

### Activation Functions
<img src="http://rasbt.github.io/mlxtend/user_guide/general_concepts/activation-functions_files/activation-functions.png">

[visualization of Activation Functions](https://dashee87.github.io/data%20science/deep%20learning/visualising-activation-functions-in-neural-networks/)

## Classifying Diabetes

In [25]:
import numpy as np
import torch
from torch.autograd import Variable

xy = np.loadtxt('./data/diabetes.csv.gz', delimiter=',', dtype=np.float32)
x_data = Variable(torch.from_numpy(xy[:,0:-1]))
y_data = Variable(torch.from_numpy(xy[:,[-1]]))
print(x_data.data.shape)
print(y_data.data.shape)
x_data[:5,:], y_data[:5,:]

torch.Size([759, 8])
torch.Size([759, 1])


(Variable containing:
 -0.2941  0.4874  0.1803 -0.2929  0.0000  0.0015 -0.5312 -0.0333
 -0.8824 -0.1457  0.0820 -0.4141  0.0000 -0.2072 -0.7669 -0.6667
 -0.0588  0.8392  0.0492  0.0000  0.0000 -0.3055 -0.4927 -0.6333
 -0.8824 -0.1055  0.0820 -0.5354 -0.7778 -0.1624 -0.9240  0.0000
  0.0000  0.3769 -0.3443 -0.2929 -0.6028  0.2846  0.8873 -0.6000
 [torch.FloatTensor of size 5x8], Variable containing:
  0
  1
  0
  1
  0
 [torch.FloatTensor of size 5x1])

In [10]:
class Model(torch.nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.l1 = torch.nn.Linear(8, 6)
        self.l2 = torch.nn.Linear(6, 4)
        self.l3 = torch.nn.Linear(4, 1)
        self.sigmoid = torch.nn.Sigmoid()
        
    def forward(self, x):
        out1 = self.sigmoid(self.l1(x))
        out2 = self.sigmoid(self.l2(out))
        y_pred = self.sigmoid(self.l3(out2))
        return y_pred
    

In [33]:
import torch
from torch.autograd import Variable
import numpy as np

xy = np.loadtxt('./data/diabetes.csv.gz', delimiter=',', dtype=np.float32)
x_data = Variable(torch.from_numpy(xy[:, 0:-1]))
y_data = Variable(torch.from_numpy(xy[:, [-1]]))

print(x_data.data.shape)
print(y_data.data.shape)

class Model(torch.nn.Module):
    def __init__(self):
        """
        In the constructor we instantiate two nn.Linear module
        """
        super(Model, self).__init__()
        self.l1 = torch.nn.Linear(8, 6)
        self.l2 = torch.nn.Linear(6, 4)
        self.l3 = torch.nn.Linear(4, 1)

        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Variables.
        """
        out1 = self.sigmoid(self.l1(x))
        out2 = self.sigmoid(self.l2(out1))
        y_pred = self.sigmoid(self.l3(out2))
        return y_pred

# our model
model = Model()

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.BCELoss(size_average=True)
optimizer = torch.optim.SGD(model.parameters(), lr=0.05)

# Training loop
for epoch in range(100):
        # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x_data)

    # Compute and print loss
    loss = criterion(y_pred, y_data)
    if epoch%20 == 0 : 
        print(epoch, np.round(loss.data[0],5))

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

torch.Size([759, 8])
torch.Size([759, 1])
0 0.75724
20 0.68613
40 0.66029
60 0.65089
80 0.64742


##### Exercise 7-1

- Classifying Diabetes with deep nets ( more than 10 layers )
- Find other classification problems/datasets & try with deep network
- Try different activation functions