# Lecture 7: Wide and Deep

How do we deal with models with multiple variables?

## Matrix Multiplication - Go Wide!

$$ \begin{bmatrix}
a_1 & b_1\\
a _2 & b_2\\
... & ... \\
a_n & b_n 
\end{bmatrix} 
\begin{bmatrix}
w_1 \\
w _2 
\end{bmatrix} 
=
\begin{bmatrix}
y_1 \\
y_2 \\
...\\
y_n 
\end{bmatrix} 
$$

## Multiple Layers- Go Deep!

X -> Linear -> Sigmoid -> Linear -> Sigmoid -> .... -> y

## Sigmoid: Vanishing Gradient Problem

Sigmoid squshes values to a small number.
Multiplying these small numbers make it very small, which vanishes gradients and makes it hard to use back propagation.

Solution: Use Other Activaition functions such as `RELU`.

## Example: Classifying Diabetes

In [9]:
import torch
import torch.nn.functional as F
from torch.autograd import Variable
import numpy as np

#import dataset
xy =np.loadtxt("data/diabetes.csv", delimiter=',', dtype=np.float32)
x_data = Variable(torch.from_numpy(xy[:, 0:-1]))
y_data = Variable(torch.from_numpy(xy[:, [-1]]))

# Design Model
class Model(torch.nn.Module):
    def __init__(self):
        """
        In the constructor we instantiate two nn.Linear module
        """
        super(Model, self).__init__()
        self.l1 = torch.nn.Linear(8,6) ## 8 in, 6 out : Wide Model
        self.l2 = torch.nn.Linear(6,4) ## 5 in, 4 out
        self.l3 = torch.nn.Linear(4,1) ## 4 in, 1 out
        # 3 layers : Deep Model

        self.sigmoid = torch.nn.Sigmoid()
    
    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of Output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Variables.
        """
        out_1 = self.sigmoid(self.l1(x))
        out_2 = self.sigmoid(self.l2(out_1))
        y_pred = self.sigmoid(self.l3(out_2))
        return y_pred

# our Model
model = Model()

# Construct our Loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn. Linear modules which are members of the model.
criterion = torch.nn.BCELoss(size_average= True)
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# Training loop
for epoch in range(100):
    # Froward pass: Compute predicted y by passing x to the model
    y_pred = model(x_data)

    # Compute and print loss
    loss = criterion(y_pred, y_data)
    print(epoch, loss.data)

    # Zero gradients, perform a backward pass, and update the weights
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()


0 tensor(0.7964)
1 tensor(0.7796)
2 tensor(0.7646)
3 tensor(0.7512)
4 tensor(0.7394)
5 tensor(0.7288)
6 tensor(0.7194)
7 tensor(0.7111)
8 tensor(0.7037)
9 tensor(0.6972)
10 tensor(0.6914)
11 tensor(0.6862)
12 tensor(0.6816)
13 tensor(0.6775)
14 tensor(0.6739)
15 tensor(0.6707)
16 tensor(0.6679)
17 tensor(0.6654)
18 tensor(0.6631)
19 tensor(0.6611)
20 tensor(0.6594)
21 tensor(0.6578)
22 tensor(0.6564)
23 tensor(0.6551)
24 tensor(0.6540)
25 tensor(0.6530)
26 tensor(0.6522)
27 tensor(0.6514)
28 tensor(0.6507)
29 tensor(0.6501)
30 tensor(0.6495)
31 tensor(0.6490)
32 tensor(0.6486)
33 tensor(0.6482)
34 tensor(0.6478)
35 tensor(0.6475)
36 tensor(0.6472)
37 tensor(0.6470)
38 tensor(0.6468)
39 tensor(0.6466)
40 tensor(0.6464)
41 tensor(0.6462)
42 tensor(0.6461)
43 tensor(0.6460)
44 tensor(0.6458)
45 tensor(0.6457)
46 tensor(0.6457)
47 tensor(0.6456)
48 tensor(0.6455)
49 tensor(0.6454)
50 tensor(0.6454)
51 tensor(0.6453)
52 tensor(0.6453)
53 tensor(0.6452)
54 tensor(0.6452)
55 tensor(0.6452)
56