This code sheet is my attempt to implement a plain neural network from scratch and proving that hidden layers (including deep neural networks) are essential in learning non-linear functions that we see in most of our real-world data  
A neural network without a hidden layer is unable to learn a non-linear XOR function as described below  
Reference: www.pyimagesearch.com

In [2]:
import numpy as np

In [3]:
class NeuralNetwork:
  def __init__(self, layers, alpha = 0.1):
    self.W = []
    self.layers = layers
    self.alpha = alpha

    for i in np.arange(0, len(layers)-2):
      w = np.random.randn(layers[i]+1, layers[i+1]+1) / np.sqrt(layers[i])
      self.W.append(w)

    w = np.random.randn(layers[-2]+1, layers[-1]) / np.sqrt(layers[-2])
    self.W.append(w)

  def __repr__(self):
    return "Neural Network : {}".format("-".join(str(l) for l in self.layers))

  def sigmoid(self, x):
    return 1 / (1 + np.exp(-x))

  def d_sigmoid(self, x):
    return x * (1 - x)


  def fit(self, X, Y, epoch = 1000, displayUpdate = 1000):

    X = np.c_[X, np.ones((X.shape[0]))]

    for i in np.arange(0, epoch):
      for (x, y) in zip(X, Y):
        self.fit_partial(x, y)
      if i == 0 or (i+1) % displayUpdate == 0:
        loss = self.compute_loss(X, Y)
        print("Epoch {}, Loss {:.7f}".format(i+1, loss))


  def fit_partial(self, x, y):

    A = [np.atleast_2d(x)]

    for layer_no in np.arange(0, len(self.W)):
      
      h = A[layer_no].dot(self.W[layer_no])
      a = self.sigmoid(h)
      A.append(a)

    error = A[-1] - y
    D = [error * self.d_sigmoid(A[-1])]

    for layer_no in np.arange(len(A)-2, 0, -1):
      delta = (D[-1].dot(self.W[layer_no].T)) * self.d_sigmoid(A[layer_no])
      D.append(delta)

    D = D[::-1]

    for layer_no in np.arange(0, len(self.W)):
      self.W[layer_no] += -self.alpha * A[layer_no].T.dot(D[layer_no])


  def predict(self, X, addBias = True):
    p = np.atleast_2d(X)

    if addBias:
      p = np.c_[p, np.ones((p.shape[0]))]
    
    for layer_no in np.arange(0, len(self.W)):
      p = self.sigmoid(np.dot(p, self.W[layer_no]))
    return p

  def compute_loss(self, X, Y):

    Y = np.atleast_2d(Y)
    preds = self.predict(X, addBias = False)
    loss = np.sum(np.square(preds - Y))/2
    return loss

In [4]:
NeuralNetwork([2,2,1])

Neural Network : 2-2-1

Training the neural network on XOR function using a single hidden layer [2-2-1] architecture   
Please do note that neural network is able to predict the non-linear XOR function correctly

In [5]:
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
Y = np.array([[0], [1], [1], [0]])
nn = NeuralNetwork([2,2,1], alpha = 0.5)
nn.fit(X, Y, epoch = 20000)

for (x, y) in zip(X,Y):

  pred = nn.predict(x)[0][0]
  step = 1 if pred > 0.5 else 0
  print("Data {}, Ground Truth {} Prediction {} Step {}".format(x, y, pred, step))

Epoch 1, Loss 0.5166495
Epoch 1000, Loss 0.2523390
Epoch 2000, Loss 0.0071080
Epoch 3000, Loss 0.0029031
Epoch 4000, Loss 0.0018016
Epoch 5000, Loss 0.0013022
Epoch 6000, Loss 0.0010184
Epoch 7000, Loss 0.0008357
Epoch 8000, Loss 0.0007084
Epoch 9000, Loss 0.0006146
Epoch 10000, Loss 0.0005426
Epoch 11000, Loss 0.0004857
Epoch 12000, Loss 0.0004396
Epoch 13000, Loss 0.0004014
Epoch 14000, Loss 0.0003693
Epoch 15000, Loss 0.0003419
Epoch 16000, Loss 0.0003183
Epoch 17000, Loss 0.0002978
Epoch 18000, Loss 0.0002797
Epoch 19000, Loss 0.0002637
Epoch 20000, Loss 0.0002494
Data [0 0], Ground Truth [0] Prediction 0.01118402167124236 Step 0
Data [0 1], Ground Truth [1] Prediction 0.9946305738127579 Step 1
Data [1 0], Ground Truth [1] Prediction 0.9854332648170621 Step 1
Data [1 1], Ground Truth [0] Prediction 0.011520151727963304 Step 0


Training the neural network on XOR function with no hidden layers (only the input and output layer, just like a simple perceptron unit)  
Here, the neural network fails to learn a non-linear XOR function as predicted

In [6]:
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
Y = np.array([[0], [1], [1], [0]])

nn_single_layer = NeuralNetwork([2, 1], alpha=0.5)
nn_single_layer.fit(X, Y, epoch=20000)

for (x, y) in zip(X,Y):

  pred = nn_single_layer.predict(x)[0][0]
  step = 1 if pred > 0.5 else 0
  print("Data {}, Ground Truth {} Prediction {} Step {}".format(x, y, pred, step))

Epoch 1, Loss 0.5241992
Epoch 1000, Loss 0.5007938
Epoch 2000, Loss 0.5007938
Epoch 3000, Loss 0.5007938
Epoch 4000, Loss 0.5007938
Epoch 5000, Loss 0.5007938
Epoch 6000, Loss 0.5007938
Epoch 7000, Loss 0.5007938
Epoch 8000, Loss 0.5007938
Epoch 9000, Loss 0.5007938
Epoch 10000, Loss 0.5007938
Epoch 11000, Loss 0.5007938
Epoch 12000, Loss 0.5007938
Epoch 13000, Loss 0.5007938
Epoch 14000, Loss 0.5007938
Epoch 15000, Loss 0.5007938
Epoch 16000, Loss 0.5007938
Epoch 17000, Loss 0.5007938
Epoch 18000, Loss 0.5007938
Epoch 19000, Loss 0.5007938
Epoch 20000, Loss 0.5007938
Data [0 0], Ground Truth [0] Prediction 0.51610600358629 Step 1
Data [0 1], Ground Truth [1] Prediction 0.5 Step 0
Data [1 0], Ground Truth [1] Prediction 0.4838939964137099 Step 0
Data [1 1], Ground Truth [0] Prediction 0.46782138179306076 Step 0


Training and testing the created neural network on a subset of MNIST hand-written digits data

In [7]:
from sklearn.preprocessing import LabelBinarizer
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn import datasets

In [8]:
digits = datasets.load_digits()
data = digits.data.astype("float")
data = (data - data.min())/(data.max()-data.min())
print("No of Examples {}, Dimension {}.".format(data.shape[0],data.shape[1]))

No of Examples 1797, Dimension 64.


In [18]:
train_X, test_X, train_Y, test_Y = train_test_split(data, digits.target, test_size = 0.25)
print(train_X.shape, train_Y.shape)

train_Y = LabelBinarizer().fit_transform(train_Y)
test_Y = LabelBinarizer().fit_transform(test_Y)
print(train_X.shape, train_Y.shape)

(1347, 64) (1347,)
(1347, 64) (1347, 10)


In [19]:
print("Training neural network...")
nn = NeuralNetwork([train_X.shape[1],32,16,10])
print("Layer info of {}".format(nn))
nn.fit(train_X, train_Y, epoch=1000, displayUpdate = 100)

Training neural network...
Layer info of Neural Network : 64-32-16-10
Epoch 1, Loss 605.4132198
Epoch 100, Loss 6.6015819
Epoch 200, Loss 2.9785251
Epoch 300, Loss 2.5117352
Epoch 400, Loss 2.3399467
Epoch 500, Loss 2.2514718
Epoch 600, Loss 2.1952869
Epoch 700, Loss 1.7046182
Epoch 800, Loss 1.6584573
Epoch 900, Loss 1.6319657
Epoch 1000, Loss 1.6126860


In [20]:
print("Predicting the trained network...")
preds = nn.predict(test_X)
preds = preds.argmax(axis = 1)
print(classification_report(test_Y.argmax(axis = 1), preds))

Predicting the trained network...
              precision    recall  f1-score   support

           0       1.00      0.98      0.99        45
           1       1.00      1.00      1.00        52
           2       1.00      0.98      0.99        54
           3       1.00      0.98      0.99        55
           4       0.98      0.96      0.97        49
           5       0.97      0.97      0.97        39
           6       1.00      1.00      1.00        43
           7       0.98      0.98      0.98        45
           8       1.00      1.00      1.00        35
           9       0.89      1.00      0.94        33

    accuracy                           0.98       450
   macro avg       0.98      0.99      0.98       450
weighted avg       0.99      0.98      0.98       450

