# Neural Networks in PyTorch
## Chapter 5: Binary Classification of Multidimensional Data
*Yen Lee Loh, 2022-12-3*

The previous chapter focused on curve fitting (machine learning for functions of a single variable).
For this and the next few chapters, we focus on binary classification (machine learning for binary functions of many variables).

---
## 1.  Setup

In [1]:
import numpy as np
import torch
import torch.nn as nn
import matplotlib.pyplot as plt
torch.cuda.is_available()

True

In [2]:
def train(xnd, ynd, model, lossFunc, epochs=10000, learningRate=0.001, lossTarget=0.01, reportInterval=1000):
  optimizer = torch.optim.Adam(model.parameters(), lr=learningRate)
  model.train()                  # put model in training mode
  for t in range(epochs):      # t is the epoch number
    Ynd = model(xnd)             # uppercase Y = model prediction
    loss = lossFunc(Ynd,ynd)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    F = loss.item()
    if t % reportInterval == 0 or t==epochs:
      print('Training epoch {}/{}  \t Loss = {:.4f}'.format(t, epochs, F))
    if F < lossTarget:
      print('Training epoch {}/{}  \t Loss = {:.4f} < lossTarget\n'.format(t, epochs, F))
      return
  print ('Warning: loss > lossTarget!\n')

def metrics (Yn, yn):   # Yn are model outputs, yn are true outputs
  nmax = len(yn)
  ymax = max(yn)+1
  confmat = np.zeros ([ymax, ymax], dtype=int)   # confmat[Y][y]
  for n in range(nmax): confmat[yn[n], Yn[n]] += 1
  ntot = np.sum(confmat)
  nerr = ntot - np.trace(confmat)
  return ntot,nerr,confmat

def assess (xnd, ynd, model, lossFunc):
  np.set_printoptions(precision=2,suppress=True,floatmode='fixed')
  nmax = xnd.size(0)
  #======== Feedforward
  model.eval()               # put model in evaluation mode
  Ynd = model(xnd)
  loss = lossFunc(Ynd,ynd)
  #======== Convert type
  xnd = xnd.numpy().astype(int)                            # integer just for printing purposes
  Yn = Ynd.detach().numpy().flatten().round().astype(int)  # round this
  yn = ynd.detach().numpy().flatten().astype(int)
  print ('{:10}{:>10}{:>15}'.format('input x', 'target y', 'prediction Y'))
  for n in range(nmax):
    print ('{:10}{:10}{:15}'.format(str(xnd[n]), yn[n], Yn[n]))
  print ()
  #======== Round Y
  ntot,nerr,Cnn = metrics (Yn, yn)
  print("Loss = {:.4f}      Error = {:d}/{:} = {:.1f}%       Confusion matrix = {}".format (loss, nerr, ntot, 100*nerr/ntot, Cnn.tolist()))

---
## 2. Three-way Boolean AND

Binary classification is a supervised learning problem.  We are given a dataset consisting of input vectors $\mathbf{x}_n$ and scalar-valued outputs $y_n$ (training labels).  We wish to train a model on this data, so that the model predictions $Y_n = Y(\mathbf{x}_n)$ match the training labels $y_n$ as closely as possible.  Since the model output is supposed to be either 0 or 1, we will generally use nn.Sigmoid() as the last layer of our neural network, because this layer gives an output between 0 and 1, which can easily be rounded to 0 or 1.

For our first example, we consider a Boolean function of three Boolean variables, $y(x_0,x_1,x_2) = x_0 \text{ AND } x_1 \text{ AND } x_2$, where $y$ and $x_d$ are all either 0 or 1.  This function can easily be learned by a single-layer perceptron, which implements the model $Y=\text{sigmoid} (\mathbf{w}\cdot\mathbf{x} + b)$:

In [3]:
xnd      = torch.tensor([[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]], dtype=torch.float32)
ynd      = torch.tensor([[0,0,0,0,0,0,0,1]], dtype=torch.float32).T
model    = nn.Sequential(
            nn.Linear(xnd.size(1), ynd.size(1)),   # linear layer with 3 inputs and 1 output: nn.Linear(3,1)
            nn.Sigmoid()                           # sigmoid function whose output lies between 0 and 1
           )
lossFunc = nn.BCELoss()                            # binary cross-entropy loss
train (xnd, ynd, model, lossFunc, epochs=1000, reportInterval=100, learningRate=0.1)
print ("Weight matrix = ", model[0].weight.detach().numpy())
print ("Bias vector   = ", model[0].bias.detach().numpy())
print ()
assess (xnd, ynd, model, lossFunc)

Training epoch 0/1000  	 Loss = 0.7436
Training epoch 100/1000  	 Loss = 0.1607
Training epoch 200/1000  	 Loss = 0.0906
Training epoch 300/1000  	 Loss = 0.0593
Training epoch 400/1000  	 Loss = 0.0419
Training epoch 500/1000  	 Loss = 0.0312
Training epoch 600/1000  	 Loss = 0.0242
Training epoch 700/1000  	 Loss = 0.0193
Training epoch 800/1000  	 Loss = 0.0158
Training epoch 900/1000  	 Loss = 0.0131

Weight matrix =  [[7.328484  7.3260894 7.332103 ]]
Bias vector   =  [-18.711174]

input x     target y   prediction Y
[0 0 0]            0              0
[0 0 1]            0              0
[0 1 0]            0              0
[0 1 1]            0              0
[1 0 0]            0              0
[1 0 1]            0              0
[1 1 0]            0              0
[1 1 1]            1              1

Loss = 0.0111      Error = 0/8 = 0.0%       Confusion matrix = [[7, 0], [0, 1]]


---
## 3. Three-way Boolean XOR

It has been proven that a SLP cannot learn the function $y(x_0,x_1,x_2) = x_0 \text{ XOR } x_1 \text{ XOR } x_2$:

In [4]:
xnd      = torch.tensor([[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]], dtype=torch.float32)
ynd      = torch.tensor([[0,1,1,0,1,0,0,1]], dtype=torch.float32).T
model    = nn.Sequential(
            nn.Linear(xnd.size(1), ynd.size(1)),   # linear layer with 3 inputs and 1 output
            nn.Sigmoid()                           # sigmoid function whose output lies between 0 and 1
           )
lossFunc = nn.BCELoss()                            # binary cross-entropy loss
train (xnd, ynd, model, lossFunc, epochs=1000, reportInterval=100, learningRate=0.1)
print ("Weight matrix = ", model[0].weight.detach().numpy())
print ("Bias vector   = ", model[0].bias.detach().numpy())
print ()
assess (xnd, ynd, model, lossFunc)

Training epoch 0/1000  	 Loss = 0.7012
Training epoch 100/1000  	 Loss = 0.6931
Training epoch 200/1000  	 Loss = 0.6931
Training epoch 300/1000  	 Loss = 0.6931
Training epoch 400/1000  	 Loss = 0.6931
Training epoch 500/1000  	 Loss = 0.6931
Training epoch 600/1000  	 Loss = 0.6931
Training epoch 700/1000  	 Loss = 0.6931
Training epoch 800/1000  	 Loss = 0.6931
Training epoch 900/1000  	 Loss = 0.6931

Weight matrix =  [[-0.00  0.00  0.00]]
Bias vector   =  [-0.00]

input x     target y   prediction Y
[0 0 0]            0              0
[0 0 1]            1              0
[0 1 0]            1              0
[0 1 1]            0              0
[1 0 0]            1              0
[1 0 1]            0              0
[1 1 0]            0              0
[1 1 1]            1              0

Loss = 0.6931      Error = 4/8 = 50.0%       Confusion matrix = [[4, 0], [4, 0]]


Below, we demonstrate that the XOR function can be learned by a multilayer perceptron with the structure

$\qquad(x_0,x_1,x_2)
 \xrightarrow{linear} \xrightarrow{sigmoid} (u_0,u_1,u_2,u_3)
 \xrightarrow{linear} \xrightarrow{sigmoid} (Y)$

In [5]:
xnd      = torch.tensor([[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]], dtype=torch.float32)
ynd      = torch.tensor([[0,1,1,0,1,0,0,1]], dtype=torch.float32).T
model    = nn.Sequential(
            nn.Linear(3, 4),   # linear layer with 3 inputs and 4 output
            nn.Sigmoid(),      # sigmoid function 
            nn.Linear(4, 1),   # linear layer with 4 inputs and 1 output
            nn.Sigmoid()       # sigmoid function whose output lies between 0 and 1
           )
lossFunc = nn.BCELoss()        # binary cross-entropy loss
train (xnd, ynd, model, lossFunc, epochs=1000, reportInterval=100, learningRate=0.1)
print ("Weights in first linear layer = \n", model[0].weight.detach().numpy())   # model[0].bias.detach().numpy())
print ("Weights in second linear layer = \n", model[2].weight.detach().numpy())   # model[0].bias.detach().numpy())
print ()
assess (xnd, ynd, model, lossFunc)

Training epoch 0/1000  	 Loss = 0.6945
Training epoch 100/1000  	 Loss = 0.0575
Training epoch 199/1000  	 Loss = 0.0100 < lossTarget

Weights in first linear layer = 
 [[-6.61 -9.27  7.88]
 [-8.41 -4.60 -4.59]
 [-6.84  8.05 -9.44]
 [-8.26 -8.85 -9.34]]
Weights in second linear layer = 
 [[-7.90  9.38 -7.88 -6.25]]

input x     target y   prediction Y
[0 0 0]            0              0
[0 0 1]            1              1
[0 1 0]            1              1
[0 1 1]            0              0
[1 0 0]            1              1
[1 0 1]            0              0
[1 1 0]            0              0
[1 1 1]            1              1

Loss = 0.0099      Error = 0/8 = 0.0%       Confusion matrix = [[4, 0], [0, 4]]


If the error above is zero, then the NN has succeeded in learning the 3-way XOR function.  However, this doesn't mean it is useful for anything, or that it has learned any transferable knowledge.

---
## Appendix: Using pandas to format tables

In [12]:
#================ IF YOU WISH, YOU CAN FORMAT THE OUTPUT NICELY LIKE THIS
import pandas
Ynd = model(xnd).detach()
df = pandas.DataFrame(  np.hstack ([xnd, ynd, Ynd])  , columns=['x0','x1','x2','Target output y','Predicted output Y'])
df = df.style.format("{:.0f}").format("{:.2f}", subset='Predicted output Y')
df = df.set_properties (**{'font-size':'10pt'})
df = df.set_table_styles([dict(selector="th", props=[("font-size", '10pt')])])
df

Unnamed: 0,x0,x1,x2,Target output y,Predicted output Y
0,0,0,0,0,0.01
1,0,0,1,1,0.98
2,0,1,0,1,1.0
3,0,1,1,0,0.0
4,1,0,0,1,1.0
5,1,0,1,0,0.02
6,1,1,0,0,0.01
7,1,1,1,1,0.98


In [16]:
import pandas,IPython
#================ pretty (...) ================
# Pretty-prints a matrix via torch -> numpy -> pandas -> HTML -> display
def pretty (mat):
  if isinstance(mat, torch.Tensor):
    mat = mat.numpy()
  df = pandas.DataFrame(mat)
  #df = df.style.format("{:.0f}").format("{:.2f}", subset='Predicted output Y')
  df = df.style.set_properties (**{'font-size':'10pt'})
  df = df.set_table_styles([dict(selector="th", props=[("font-size", '10pt')])])
  IPython.display.display(IPython.display.HTML(df.to_html()))
pretty (xnd)

Unnamed: 0,0,1,2
0,0.0,0.0,0.0
1,0.0,0.0,1.0
2,0.0,1.0,0.0
3,0.0,1.0,1.0
4,1.0,0.0,0.0
5,1.0,0.0,1.0
6,1.0,1.0,0.0
7,1.0,1.0,1.0
