and pass data through our layer

In [8]:
def init_model(n_features, neur_l1, neur_out):
    W1 = np.random.randn(n_features, neur_l1) / np.sqrt(n_features)
    b1 = np.zeros((1, neur_l1))
    W2 = np.random.randn(neur_l1, neur_out) / np.sqrt(neur_l1)
    b2 = np.zeros((1, neur_out))
    model = {'W1': W1, 'b1': b1, 'W2': W2, 'b2': b2}
    return model

We initialized a set of weight matrices (`W1`, and `W2`) and bias vectors (`b1` and `b2`), which we later tune through our training procedure.  Once these weights and biases are trained, we can use them to make a prediction with our model.

With pytorch, we can initialize our weights and biases all at once through the Linear function. 

In [16]:
W1 = nn.Linear(784, 8)
W1

Linear(in_features=784, out_features=8, bias=True)

Above we created a linear layer that consists of eight neurons, each which takes in 784 features.  This Linear object consists of both a weight matrix and a bias vector.

In [24]:
W1.weight.data, W1.bias.data

(tensor([[ 0.0082, -0.0310, -0.0260,  ..., -0.0095, -0.0038,  0.0065],
         [ 0.0193,  0.0050,  0.0018,  ...,  0.0122, -0.0049, -0.0053],
         [-0.0328,  0.0264,  0.0198,  ..., -0.0072,  0.0268,  0.0058],
         ...,
         [ 0.0050, -0.0222,  0.0079,  ..., -0.0232, -0.0056, -0.0321],
         [-0.0302, -0.0064, -0.0283,  ...,  0.0234, -0.0241, -0.0021],
         [ 0.0050,  0.0298,  0.0083,  ...,  0.0335, -0.0152, -0.0197]]),
 tensor([ 0.0073,  0.0197,  0.0271, -0.0315,  0.0306,  0.0155,  0.0254, -0.0170]))

### Working with a Pytorch Tensor

So we just saw that with Pytorch, we don't need to create a weight matrix and a bias vector individually, but rather we create both initializing a linear layer.  If we look closely, we can see that we are now not working with numpy arrays, but pytorch tensors.

In [27]:
W1.weight.data, W1.bias.data

(tensor([[ 0.0082, -0.0310, -0.0260,  ..., -0.0095, -0.0038,  0.0065],
         [ 0.0193,  0.0050,  0.0018,  ...,  0.0122, -0.0049, -0.0053],
         [-0.0328,  0.0264,  0.0198,  ..., -0.0072,  0.0268,  0.0058],
         ...,
         [ 0.0050, -0.0222,  0.0079,  ..., -0.0232, -0.0056, -0.0321],
         [-0.0302, -0.0064, -0.0283,  ...,  0.0234, -0.0241, -0.0021],
         [ 0.0050,  0.0298,  0.0083,  ...,  0.0335, -0.0152, -0.0197]]),
 tensor([ 0.0073,  0.0197,  0.0271, -0.0315,  0.0306,  0.0155,  0.0254, -0.0170]))

We'll explore some of the differences between numpy arrays and pytorch tensors a bit later, but for now, it's enough to know that they work in similar ways.

For example, if we want to see the shape of our data, we simply use the shape function.

In [29]:
W1.weight.data.shape

torch.Size([8, 784])

And if we want to select data, we do so using our same bracket notation.

In [31]:
W1.weight.data[:2, :3]

tensor([[ 0.0082, -0.0310, -0.0260],
        [ 0.0193,  0.0050,  0.0018]])

Now most importantly for us, is that we can perform the same matrix operations that we have with numpy.  Let's see this next.

### Working towards a Forward function

Remember that the point of initializing our weight and bias vectors is that they are the linear component to our neurons.  And each neuron ultimately produces an output.  

In the input layer to a neural network, we take in a row of data, and pass our data through the linear function, or layer if we are talking about multiple neurons.  We do so by performing matrix vector multiplication.  In Pytorch, we do precisely the same thing. 

In [None]:
Let's see this.  We'll begin by initializing 

In [35]:
W1.weight.data.shape

torch.Size([8, 784])

In [None]:
Let's select the weights of a single neuron

In [5]:
trainset = torch.utils.data.DataLoader(train, batch_size = 10, shuffle = True)
testset = torch.utils.data.DataLoader(test, batch_size = 10, shuffle = True)

In [6]:
import torch.nn as nn
import torch.nn.functional as F

### Initialize the model

In [9]:
def build_model():
    W1 = nn.Linear(28*28, 64)
    W2 = nn.Linear(64, 10)
    return {'W1': W1, 'W2': W2} 

In [41]:
model = build_model()

### Write Forward Method

In [54]:
import torch 
X_1 = torch.rand(28*28)

In [56]:
W1(X_1)

tensor([-0.2319,  0.7662, -0.7318,  0.1258,  0.4845,  0.4858,  0.1362,  0.3198,
        -0.0967, -0.3745, -0.1128, -0.2208,  0.1297, -0.1437, -0.4191,  0.1762,
         0.1912,  0.3604, -0.4663, -0.0506, -0.3978, -0.0748, -0.3987, -0.2132,
         0.1109,  0.3667,  0.8042,  0.3077,  0.0352, -0.1009,  0.1312,  0.2380,
         0.1128,  0.0030,  0.1113, -0.5154, -0.3886,  0.0216,  0.6347, -0.7047,
        -0.6546,  0.2967,  0.1723,  0.0944,  0.1739, -0.3365, -0.4871, -0.1153,
         0.3490, -0.6262,  0.1084, -0.2347,  0.3372, -0.1943,  0.1036, -0.4603,
        -0.5398, -0.1378,  0.3041,  0.2011, -0.7347,  0.3061, -0.3538,  0.3311],
       grad_fn=<AddBackward0>)

In [62]:
W1.weight.shape

torch.Size([64, 784])

In [86]:
z1 = weights @ (X_1.view(1, -1).T) + W1.bias
z1

tensor([[-0.2319, -0.2023, -0.2146,  ..., -0.2262, -0.2268, -0.2174],
        [ 0.7366,  0.7662,  0.7539,  ...,  0.7423,  0.7417,  0.7511],
        [-0.7491, -0.7195, -0.7318,  ..., -0.7434, -0.7440, -0.7345],
        ...,
        [ 0.3004,  0.3300,  0.3177,  ...,  0.3061,  0.3055,  0.3149],
        [-0.3589, -0.3293, -0.3415,  ..., -0.3532, -0.3538, -0.3443],
        [ 0.3165,  0.3461,  0.3339,  ...,  0.3223,  0.3216,  0.3311]],
       grad_fn=<AddBackward0>)

In [88]:
torch.sigmoid(z1)

tensor([[0.4423, 0.4496, 0.4466,  ..., 0.4437, 0.4435, 0.4459],
        [0.6762, 0.6827, 0.6800,  ..., 0.6775, 0.6774, 0.6794],
        [0.3210, 0.3275, 0.3248,  ..., 0.3223, 0.3221, 0.3242],
        ...,
        [0.5745, 0.5818, 0.5788,  ..., 0.5759, 0.5758, 0.5781],
        [0.4112, 0.4184, 0.4154,  ..., 0.4126, 0.4125, 0.4148],
        [0.5785, 0.5857, 0.5827,  ..., 0.5799, 0.5797, 0.5820]],
       grad_fn=<SigmoidBackward>)

### Building a Forward Method

In [42]:
def forward_numpy(X, model):
    W1, b1, W2, b2 = tuple(model.values())
    z1 = X.dot(W1) + b1 
    a1 = sigma(z1)
    z2 = a1.dot(W2) + b2
    return (z1, a1, z2)

In [91]:
def forward(X, model):
    W1, W2 = tuple(model.values())
    Z1 = W1(X)
    A1 = F.sigmoid(Z1)
    Z2 = W2(A1)
    A2 = F.softmax(Z2)
    return (Z1, A1, Z2, A2)

In [96]:
Z1, A1, Z2, A2 = forward(X_1.view(1, -1), model)

  


In [99]:
A2

tensor([[0.0982, 0.1197, 0.1461, 0.0512, 0.1394, 0.0906, 0.0712, 0.0601, 0.1250,
         0.0986]], grad_fn=<SoftmaxBackward>)

In [100]:
Z2

tensor([[-0.0341,  0.1631,  0.3626, -0.6857,  0.3155, -0.1150, -0.3567, -0.5252,
          0.2067, -0.0308]], grad_fn=<AddmmBackward>)