In [None]:
%pylab inline

# Multi-Layer Perceptron in PyTorch

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

torch.manual_seed(1)

Consider a classification task that has two-dimensional feature vectors $(x_1, x_2)$ as input.
We make a dataset that looks like this:

In [None]:
Np1 = 15
Np2 = 15
Nn = 30
N = Np1 + Np2 + Nn
xp1s = torch.randn(Np1, 2) + torch.Tensor([5.0, -5.0])
xp2s = torch.randn(Np2, 2) + torch.Tensor([-5.0, 5.0])
xns = torch.randn(Nn, 2) @ torch.Tensor([[2.0, 1.0], [-1.0, -2.0]])
xs = torch.cat((xp1s, xp2s, xns))
plt.scatter(xp1s[:, 0], xp1s[:, 1], color='red')
plt.scatter(xp2s[:, 0], xp2s[:, 1], color='red')
plt.scatter(xns[:, 0], xns[:, 1], color='blue')

In [None]:
ys_true = torch.cat((torch.ones(Np1 + Np2), torch.zeros(Nn)))
ys_true = ys_true.view(-1, 1)

In [None]:
ys_true

Just for experimental purpose, let's see the logistic regression fails to classify this dataset because of its linearity.

In [None]:
linear = nn.Linear(2, 1)
bce_with_sigmoid = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(linear.parameters(), lr = 0.1)
for epoch in range(300):
    zs = linear(xs)
    loss = bce_with_sigmoid(zs, ys_true)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

In [None]:
ys_pred = torch.sigmoid(linear(xs))
xs_classified_pos = xs[ys_pred[:,0]>0.5]
xs_classified_neg = xs[ys_pred[:,0]<=0.5]
plt.scatter(xs_classified_pos[:, 0], xs_classified_pos[:, 1], color='red')
plt.scatter(xs_classified_neg[:, 0], xs_classified_neg[:, 1], color='blue')

So, we add a hidden layer to increase the expressive power of the model.

In [None]:
from itertools import chain
linear1 = nn.Linear(2, 2)
linear2 = nn.Linear(2, 1)
bce_with_sigmoid = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(chain(linear1.parameters(), linear2.parameters()), lr = 0.1)

In [None]:
for epoch in range(3000):
    z1s = linear1(xs)
    z1s = torch.sigmoid(z1s)
    z2s = linear2(z1s)
    loss = bce_with_sigmoid(z2s, ys_true)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

In [None]:
z1s = linear1(xs)
z1s = torch.sigmoid(z1s)
z2s = linear2(z1s)
ys_pred = torch.sigmoid(z2s)
xs_classified_pos = xs[ys_pred[:,0]>0.5]
xs_classified_neg = xs[ys_pred[:,0]<=0.5]
plt.scatter(xs_classified_pos[:, 0], xs_classified_pos[:, 1], color='red')
plt.scatter(xs_classified_neg[:, 0], xs_classified_neg[:, 1], color='blue')

Sometimes we get a wrong classification.
In that case, retry the initialization of the model and run the training again.
This is an issue of falling local minimum, which generally exists in gradient-based optimization.

In the code above, I intentionally describe the model in a naive way, but a model in PyTorch is usually defined by defining a class extending `nn.Module` as follows.

In [None]:
class MLP(nn.Module):
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(MLP, self).__init__()
        self.linear1 = nn.Linear(input_dim, hidden_dim)
        self.linear2 = nn.Linear(hidden_dim, output_dim)
        
    def forward(self, xs):
        z1s = self.linear1(xs)
        z1s = torch.sigmoid(z1s)
        z2s = self.linear2(z1s)
        return z2s

The model class has to implement at least two methods.

`__init__` is a method that is called once at first, and usually supposed to create instances involving model parameters.
The model parameters instantiated here is captured by the superclass `nn.Module` so that you can easily obtain the list of parameters in the model.

`forward` is a method that is typically supposed to transform an input `xs`.
This is automatically called when we call the instance of the model, as we see below.

In [None]:
model = MLP(2, 2, 1)
print(list(model.parameters()))

In [None]:
zs = model(xs) # forward() method is called through __call__() method in nn.Module
print(zs)

The training and testing code can be described in a simplar way, as follows.

In [None]:
model = MLP(2, 2, 1)
optimizer = optim.SGD(model.parameters(), lr = 0.1)
for epoch in range(3000):
    zs = model(xs)
    loss = bce_with_sigmoid(zs, ys_true)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

In [None]:
ys_pred = torch.sigmoid(model(xs))
xs_classified_pos = xs[ys_pred[:,0]>0.5]
xs_classified_neg = xs[ys_pred[:,0]<=0.5]
plt.scatter(xs_classified_pos[:, 0], xs_classified_pos[:, 1], color='red')
plt.scatter(xs_classified_neg[:, 0], xs_classified_neg[:, 1], color='blue')

In [None]:
list(model.parameters())