# Perceptrons and XOR

Perceptrons, part of work in cybernetics, showed early promise but failed to produce a learning algorithm capable of learning the XOR function. This is because XOR is not *linearly separable*.

We demonstrate these ideas using linear and non-linear models built using TensorFlow/Keras rather than using the original perceptron formulation.

In [1]:
import tensorflow as tf
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation
from keras.optimizers import SGD
import numpy as np

Using TensorFlow backend.


The following class will implement a variety of simple MLP models for logic gates.

In [2]:
class LogicGateModel:
    '''This class models a two-input logic gate function, i.e., AND, OR, XOR, NAND, NOR, XNOR
    hidden_layers gives a list of layer specifications of the form: (#ofNodes, activationFunction);
    output_layer is similar but with only layer specification.
    '''

    def __init__(self,
                 optimizer=SGD(lr=0.1),
                 hidden_layers=[],
                 output_layer=(1, tf.keras.activations.linear),
                 loss_function=tf.keras.losses.mse
                 ):
        self.model = Sequential()
        for layer in hidden_layers:
            self.model.add(Dense(layer[0], input_dim=2, activation=layer[1]))
        self.model.add(Dense(output_layer[0], input_dim=2, activation=output_layer[1]))
        self.model.compile(loss=loss_function,
                           optimizer=optimizer,
                           metrics=[tf.keras.metrics.binary_accuracy]
                           )

    def train(self, X, y, n_epochs=100):
        self.model.fit(X, y, batch_size=1, epochs=n_epochs, verbose=0)

    def predict(self, X):
        return self.model.predict(X)

We now define AND, OR and XOR logic functions.

In [4]:
X = np.array([[0,0],[0,1],[1,0],[1,1]])
y_and = np.array([[0],[0],[0],[1]])
y_or = np.array([[0],[1],[1],[1]])
y_xor = np.array([[0],[1],[1],[0]])

# Print the correct answers.
print('Correct answers')
print("         AND OR XOR")
for i in range(4):
    print('{} ->  '.format(X[i]),
          '{}  '.format(X[i,0] & X[i,1]),
          '{}   '.format(X[i,0] | X[i,1]),
          '{}'.format(X[i,0] ^ X[i,1])
          )

Correct answers
         AND OR XOR
[0 0] ->   0   0    0
[0 1] ->   0   1    1
[1 0] ->   0   1    1
[1 1] ->   1   1    0


The AND function can be learned by a one-layer network with a linear activation function. See how the output for [1,1] is clearly the highest.

In [7]:
model = LogicGateModel(output_layer=(1, tf.keras.activations.linear))
model.train(X, y_and)
print(model.predict(X))

[[-0.23015495]
 [ 0.304143  ]
 [ 0.33643454]
 [ 0.8707325 ]]


The same holds for the OR function. See how the output value for [0.0] is clearly lower than the others.

In [8]:
model = LogicGateModel(output_layer=(1, tf.keras.activations.linear))
model.train(X, y_or)
print(model.predict(X))

[[0.31762263]
 [0.81649065]
 [0.81242156]
 [1.3112895 ]]


Unfortunately, a linear model cannot learn the XOR function, regardless of how many layers we add. See how the outputs fail to converge on the correct solution.

In [9]:
# One layer
model = LogicGateModel(output_layer=(1, tf.keras.activations.linear))
model.train(X, y_xor)
print(model.predict(X))

# Two layers
model = LogicGateModel(hidden_layers=[(2, tf.keras.activations.linear)])
model.train(X, y_xor)
print(model.predict(X))

# More layers
model = LogicGateModel(hidden_layers=[
    (4, tf.keras.activations.linear),
    (2, tf.keras.activations.linear)
])
model.train(X, y_xor)
print(model.predict(X))

[[0.47008473]
 [0.356476  ]
 [0.40504622]
 [0.2914375 ]]
[[0.49500963]
 [0.49542087]
 [0.47822714]
 [0.47863835]]
[[0.5077244 ]
 [0.49466753]
 [0.51906437]
 [0.5060075 ]]


A non-linear regression model can learn the XOR function. Here, a non-linear tanh() is used. This model, suggested by Goodfellow, et al., does not always find the solution.

In [10]:
model = LogicGateModel(
    hidden_layers=[(2, tf.nn.tanh)],
    output_layer=(1, tf.keras.activations.linear)
)
model.train(X, y_xor, n_epochs=500)
print(model.predict(X))

[[2.3841858e-07]
 [9.9999976e-01]
 [9.9999988e-01]
 [0.0000000e+00]]


This slightly more complicated model, suggested by S. Park, works more reliably.

In [11]:
model = LogicGateModel(
    hidden_layers=[(8, tf.nn.tanh)],
    output_layer=(1, tf.nn.sigmoid),
    loss_function=tf.keras.losses.binary_crossentropy
)
model.train(X, y_xor, n_epochs=1000)
print(model.predict(X))

[[0.00219102]
 [0.99306774]
 [0.9937768 ]
 [0.00844336]]
