# Perceptrons and XOR

Perceptrons, part of work in cybernetics, showed early promise but failed to produce a learning algorithm capable of learning the XOR function. This is because XOR is not *linearly separable*.

We demonstrate these ideas using linear and non-linear models built using TensorFlow/Keras rather than using the original perceptron formulation.

In [42]:
import tensorflow as tf
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD
import numpy as np

import warnings
warnings.filterwarnings("ignore", category=DeprecationWarning) 

The following class will implement a variety of simple MLP models for logic gates.

In [4]:
class LogicGateModel:
    '''This class models a two-input logic gate function, i.e., AND, OR, XOR, NAND, NOR, XNOR
    hidden_layers gives a list of layer specifications of the form: (#ofNodes, activationFunction);
    output_layer is similar but with only layer specification.
    '''

    def __init__(self,
                 optimizer=SGD(lr=0.1),
                 hidden_layers=[],
                 output_layer=(1, tf.keras.activations.linear),
                 loss_function=tf.keras.losses.mse
                 ):
        self.model = Sequential()
        for layer in hidden_layers:
            self.model.add(Dense(layer[0], input_dim=2, activation=layer[1]))
        self.model.add(Dense(output_layer[0], input_dim=2, activation=output_layer[1]))
        self.model.compile(loss=loss_function,
                           optimizer=optimizer,
                           metrics=[tf.keras.metrics.binary_accuracy]
                           )

    def train(self, X, y, n_epochs=100):
        self.model.fit(X, y, batch_size=1, epochs=n_epochs, verbose=0)

    def predict(self, X):
        return self.model.predict(X)

We now define AND, OR and XOR logic functions.

In [29]:
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y_and = np.array([[0],[0],[0],[1]])
y_or = np.array([[0],[1],[1],[1]])
y_xor = np.array([[0],[1],[1],[0]])

print('Correct answers for the three logical functions:')
print('\t AND\tOR\tXOR')
for i in range(4):
    print(f'{X[i]} -> ',
          f'{X[i,0] & X[i,1]}\t',
          f'{X[i,0] | X[i,1]}\t',
          f'{X[i,0] ^ X[i,1]}'
          )

Correct answers for the three logical functions:
	 AND	OR	XOR
[0 0] ->  0	 0	 0
[0 1] ->  0	 1	 1
[1 0] ->  0	 1	 1
[1 1] ->  1	 1	 0


The AND function can be learned by a one-layer network with a linear activation function. See how the output for [1,1] is clearly the highest.

In [30]:
model = LogicGateModel(output_layer=(1, tf.keras.activations.linear))
model.train(X, y_and)
print(model.predict(X))

[[-0.21683198]
 [ 0.3096367 ]
 [ 0.3371088 ]
 [ 0.86357754]]


The same holds for the OR function. See how the output value for [0.0] is clearly lower than the others.

In [31]:
model = LogicGateModel(output_layer=(1, tf.keras.activations.linear))
model.train(X, y_or)
print(model.predict(X))

[[0.28801927]
 [0.7916323 ]
 [0.7597754 ]
 [1.2633884 ]]


Unfortunately, a linear model cannot learn the XOR function, regardless of how many layers we add. 
See how none of the outputs fail to converge on the correct solution.

In [32]:
# One layer
model = LogicGateModel(output_layer=(1, tf.keras.activations.linear))
model.train(X, y_xor)
print(model.predict(X))

# Two layers
model = LogicGateModel(hidden_layers=[(2, tf.keras.activations.linear)])
model.train(X, y_xor)
print(model.predict(X))

# More layers
model = LogicGateModel(hidden_layers=[
    (4, tf.keras.activations.linear),
    (2, tf.keras.activations.linear)
])
model.train(X, y_xor)
print(model.predict(X))

[[0.5376339 ]
 [0.53214777]
 [0.4777659 ]
 [0.47227973]]
[[0.5545918 ]
 [0.5560949 ]
 [0.55774784]
 [0.559251  ]]
[[0.5256991 ]
 [0.5243015 ]
 [0.52510256]
 [0.52370495]]


A non-linear regression model with a two layer network can learn the XOR function. 
Here, a non-linear tanh() is used. 
This model, suggested by Goodfellow, et al., does not always find the solution.
Here, I ran this code a few times until it found one.

In [35]:
model = LogicGateModel(
    hidden_layers=[(2, tf.nn.tanh)],
    output_layer=(1, tf.keras.activations.linear)
)
model.train(X, y_xor, n_epochs=500)
print(model.predict(X))

[[0.       ]
 [0.9999998]
 [0.9999998]
 [0.       ]]


This slightly more complicated model, suggested by S. Park, works more reliably.

In [43]:
model = LogicGateModel(
    hidden_layers=[(8, tf.nn.tanh)],
    output_layer=(1, tf.nn.sigmoid),
    loss_function=tf.keras.losses.binary_crossentropy
)
model.train(X, y_xor, n_epochs=1000)
print(model.predict(X))

[[0.0014103 ]
 [0.9952068 ]
 [0.99467033]
 [0.00644412]]
