This code sheet is my attempt to implement a plain neural network from scratch and proving that hidden layers (including deep neural networks) are essential in learning non-linear functions that we see in most of our real-world data  
A neural network without a hidden layer is unable to learn a non-linear XOR function as described below  
Reference: www.pyimagesearch.com

In [1]:
import numpy as np

In [33]:
class NeuralNetwork:
  def __init__(self, layers, alpha = 0.1):
    self.W = []
    self.layers = layers
    self.alpha = alpha

    for i in np.arange(0, len(layers)-2):
      w = np.random.randn(layers[i]+1, layers[i+1]+1) / np.sqrt(layers[i])
      self.W.append(w)

    w = np.random.randn(layers[-2]+1, layers[-1]) / np.sqrt(layers[-2])
    self.W.append(w)

  def __repr__(self):
    return "Neural Network : {}".format("-".join(str(l) for l in self.layers))

  def sigmoid(self, x):
    return 1 / (1 + np.exp(-x))

  def d_sigmoid(self, x):
    return x * (1 - x)


  def fit(self, X, Y, epoch = 1000, displayUpdate = 1000):

    X = np.c_[X, np.ones((X.shape[0]))]

    for i in np.arange(0, epoch):
      for (x, y) in zip(X, Y):
        self.fit_partial(x, y)
      if i == 0 or (i+1) % displayUpdate == 0:
        loss = self.compute_loss(X, Y)
        print("Epoch {}, Loss {:.7f}".format(i+1, loss))


  def fit_partial(self, x, y):

    A = [np.atleast_2d(x)]

    for layer_no in np.arange(0, len(self.W)):
      
      h = A[layer_no].dot(self.W[layer_no])
      a = self.sigmoid(h)
      A.append(a)

    error = A[-1] - y
    D = [error * self.d_sigmoid(A[-1])]

    for layer_no in np.arange(len(A)-2, 0, -1):
      delta = (D[-1].dot(self.W[layer_no].T)) * self.d_sigmoid(A[layer_no])
      D.append(delta)

    D = D[::-1]

    for layer_no in np.arange(0, len(self.W)):
      self.W[layer_no] += -self.alpha * A[layer_no].T.dot(D[layer_no])


  def predict(self, X, addBias = True):
    p = np.atleast_2d(X)

    if addBias:
      p = np.c_[p, np.ones((p.shape[0]))]
    
    for layer_no in np.arange(0, len(self.W)):
      p = self.sigmoid(np.dot(p, self.W[layer_no]))
    return p

  def compute_loss(self, X, Y):

    Y = np.atleast_2d(Y)
    preds = self.predict(X, addBias = False)
    loss = np.sum(np.square(preds - Y))/2
    return loss

In [34]:
NeuralNetwork([2,2,1])

Neural Network : 2-2-1

Training the neural network on XOR function using a single hidden layer [2-2-1] architecture   
Please do note that neural network is able to predict the non-linear XOR function correctly

In [35]:
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
Y = np.array([[0], [1], [1], [0]])
nn = NeuralNetwork([2,2,1], alpha = 0.5)
nn.fit(X, Y, epoch = 20000)

for (x, y) in zip(X,Y):

  pred = nn.predict(x)[0][0]
  step = 1 if pred > 0.5 else 0
  print("Data {}, Ground Truth {} Prediction {} Step {}".format(x, y, pred, step))

Epoch 1, Loss 0.5059337
Epoch 1000, Loss 0.3094045
Epoch 2000, Loss 0.0083001
Epoch 3000, Loss 0.0034353
Epoch 4000, Loss 0.0020839
Epoch 5000, Loss 0.0014714
Epoch 6000, Loss 0.0011281
Epoch 7000, Loss 0.0009108
Epoch 8000, Loss 0.0007618
Epoch 9000, Loss 0.0006538
Epoch 10000, Loss 0.0005720
Epoch 11000, Loss 0.0005082
Epoch 12000, Loss 0.0004569
Epoch 13000, Loss 0.0004150
Epoch 14000, Loss 0.0003800
Epoch 15000, Loss 0.0003504
Epoch 16000, Loss 0.0003250
Epoch 17000, Loss 0.0003030
Epoch 18000, Loss 0.0002838
Epoch 19000, Loss 0.0002669
Epoch 20000, Loss 0.0002519
Data [0 0], Ground Truth [0] Prediction 0.008439173929876975 Step 0
Data [0 1], Ground Truth [1] Prediction 0.9865558483532232 Step 1
Data [1 0], Ground Truth [1] Prediction 0.9901585484129376 Step 1
Data [1 1], Ground Truth [0] Prediction 0.012445631501320642 Step 0


Training the neural network on XOR function with no hidden layers (only the input and output layer, just like a simple perceptron unit)  
Here, the neural network fails to learn a non-linear XOR function as predicted

In [37]:
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
Y = np.array([[0], [1], [1], [0]])

nn_single_layer = NeuralNetwork([2, 1], alpha=0.5)
nn_single_layer.fit(X, Y, epoch=20000)

for (x, y) in zip(X,Y):

  pred = nn_single_layer.predict(x)[0][0]
  step = 1 if pred > 0.5 else 0
  print("Data {}, Ground Truth {} Prediction {} Step {}".format(x, y, pred, step))

Epoch 1, Loss 0.6289037
Epoch 1000, Loss 0.5007938
Epoch 2000, Loss 0.5007938
Epoch 3000, Loss 0.5007938
Epoch 4000, Loss 0.5007938
Epoch 5000, Loss 0.5007938
Epoch 6000, Loss 0.5007938
Epoch 7000, Loss 0.5007938
Epoch 8000, Loss 0.5007938
Epoch 9000, Loss 0.5007938
Epoch 10000, Loss 0.5007938
Epoch 11000, Loss 0.5007938
Epoch 12000, Loss 0.5007938
Epoch 13000, Loss 0.5007938
Epoch 14000, Loss 0.5007938
Epoch 15000, Loss 0.5007938
Epoch 16000, Loss 0.5007938
Epoch 17000, Loss 0.5007938
Epoch 18000, Loss 0.5007938
Epoch 19000, Loss 0.5007938
Epoch 20000, Loss 0.5007938
Data [0 0], Ground Truth [0] Prediction 0.5161060035862902 Step 1
Data [0 1], Ground Truth [1] Prediction 0.5000000000000001 Step 1
Data [1 0], Ground Truth [1] Prediction 0.4838939964137099 Step 0
Data [1 1], Ground Truth [0] Prediction 0.4678213817930606 Step 0
