# Teaching a NN to predict the output of a XOR Gate input

# XOR Problem Theory

Let's imagine neurons that have attributes as follow:
- they are set in one layer
- each of them has its own polarity (by the polarity we mean $b_1$ weight which leads from single value signal)
- each of them has its own weights $W_{ij}$ that lead from $x_j$ inputs

This structure of neurons with their attributes form a single-layer neural network. Above parameters are set in the learning process of a network (output yi signals are adjusting themselves to expected $u_i$ set signals) (Fig.1). This type of network has limited abilities. For example, there is a problem with XOR function implementation. (Assume that activation function is step function signal).

![sln](http://home.agh.edu.pl/~vlsi/AI/xor_t/en/files/fig1.jpg)

The possibility of learning process of neural network is defined by linear separity of teaching data (one line separates set of data that represents u=1, and that represents u=0). These conditions are fulfilled by functions such as OR or AND. 

For example, AND function has a following set of vectors:

|x1 |x2 |u|
|---|---|---|
|0	|0	|0|
|0	|1	|0|
|1	|0	|0|
|1	|1	|1|



The neural network that implements such a function is made of one output neuron with two inputs $x_1$, $x_2$ and $b_1$ polarity.

![fig2](http://home.agh.edu.pl/~vlsi/AI/xor_t/en/files/fig2.jpg)


If we assume that during teaching process $y_1 = f ( W_{11}x_1 + W_{12}x_{2} + b_1 ) = u_1$ (shown below)

![fig3](http://home.agh.edu.pl/~vlsi/AI/xor_t/en/files/fig3.jpg)

As it's seen in table 1, we should receive '1' as output signal only in (1,1) point. The equation of line that implements linear separity is $u_1 = W_{11}x_1 + W_{12}x_{2} + b_1$. So we can match this line to obtain linear separity by finding suitable coefficients of the line ($W_{11}$, $W_{12}, $b_1$). As we can see of the figure above., it's no problem for AND function.

Linear separity can no longer be used with XOR:

|x1 |x2 |u|
|---|---|---|
|0	|0	|0|
|0	|1	|1|
|1	|0	|1|
|1	|1	|0|

It means that it's not possible to find a line which separates data space to space with output signal - 0, and space with output signal - 1. 

![fig4](http://home.agh.edu.pl/~vlsi/AI/xor_t/en/files/fig4.jpg)

Inside the oval area signal on output is '1'. Outside of this area, output signal equals '0'. It's not possible to make it by one line. 

The coefficients of this line and the weights $W_{11}$, $W_{12}$ and $b_1$ make no affect to impossibility of using linear separity. So we can't implement XOR function by one perceptron.

# Implementing a simple NN network to solve the XOR problem

This is the model that we want to create:

![fig5](https://blog.thoughtram.io/images/xor_model.png)

In [10]:
import numpy as np

# Keras has two different APIS to construct a model: functional and sequential
from keras.models import Sequential

# Neural networks consist of different layers where input data flows 
# through and gets transformed on its way. There are a bunch of different 
# layer types available in Keras. These different types of layer help us 
# to model individual kinds of neural nets for various machine learning 
# tasks. In our specific case the Dense layer is what we want.
from keras.layers.core import Dense


# the four different states of the XOR gate
# this is a @D array where each inner array has two items.
training_data = np.array([[0,0],[0,1],[1,0],[1,1]], "float32")


# the four expected results in the same order
# these each correspond to the order of the training data
target_data = np.array([[0],[1],[1],[0]], "float32")


# create a sequential model.
model = Sequential()


# `Dense` is used here becuase our input data is 1D
# create a layer with 16 outputs - i.e. we have two input neurons spreading
# into 16 neurons in a "hidden layer".
#
# the `input_dim` is 2 because of the size / length of our training data
#
# `activation='relu'` says to use the `relu` function as the activation function
# i.e. - activation functions take the inputs and transforms them into some output
# See: https://en.wikipedia.org/wiki/Rectifier_(neural_networks)
model.add(Dense(16, input_dim=2, activation='relu'))

# add another layer with an output dimension of 1 and no explicit input dimension
# in this case, the input dimension is implicitly bound to be 16 since that's the
# output dimension of the previous layer.
model.add(Dense(1, activation='sigmoid'))  # this is our output dimension


# before we can train the model we have to compile it with a set of parameters
# which tells us how well the code is performing. The `loss` tells us how
# badly our model is performing, and we want to keep it lower. In this case
# We're using the MSE - an alternative is the `binary_crossentropy`.
#
# The optimizer is to find the right adjustments for the weights. `adam` is a well
# known and well-proven one, hence its use.
#
# `metrics` tells us which metrics to collect during the training. Since
# we're interested in the binary_accuracy, it tells us how accurate the
# predictions are. For example, a binary_accuracy of 0.25 means that the 
# model predicts one out of the 4 target states
model.compile(loss='mean_squared_error',
              optimizer='adam',
              metrics=['binary_accuracy'])

# Now we begin training by using the training data which will be compared
# against the target data, and calculate the `binary_accuracy` using the 
# loss and optimizer functions.
#
# `epochs` is the number of learning iterations
# `verbose` is how much inform Keras should print
model.fit(training_data, target_data, nb_epoch=500, verbose=2)

# at this point training is complete. Now we can begin make predictions.
# Note that while we're using the training data that's already known,
# in the real world you should use data that the NN has not seen.
# Since in our case there are only a 4 possibilities, this is unavoidable.
print(model.predict(training_data).round())



Epoch 1/500
 - 0s - loss: 0.2552 - binary_accuracy: 0.5000
Epoch 2/500
 - 0s - loss: 0.2550 - binary_accuracy: 0.5000
Epoch 3/500
 - 0s - loss: 0.2548 - binary_accuracy: 0.5000
Epoch 4/500
 - 0s - loss: 0.2546 - binary_accuracy: 0.5000
Epoch 5/500
 - 0s - loss: 0.2543 - binary_accuracy: 0.5000
Epoch 6/500
 - 0s - loss: 0.2541 - binary_accuracy: 0.5000
Epoch 7/500
 - 0s - loss: 0.2539 - binary_accuracy: 0.5000
Epoch 8/500
 - 0s - loss: 0.2537 - binary_accuracy: 0.5000
Epoch 9/500
 - 0s - loss: 0.2534 - binary_accuracy: 0.5000
Epoch 10/500
 - 0s - loss: 0.2532 - binary_accuracy: 0.5000
Epoch 11/500
 - 0s - loss: 0.2530 - binary_accuracy: 0.5000
Epoch 12/500
 - 0s - loss: 0.2527 - binary_accuracy: 0.5000
Epoch 13/500
 - 0s - loss: 0.2525 - binary_accuracy: 0.5000
Epoch 14/500
 - 0s - loss: 0.2523 - binary_accuracy: 0.5000
Epoch 15/500
 - 0s - loss: 0.2520 - binary_accuracy: 0.5000
Epoch 16/500
 - 0s - loss: 0.2518 - binary_accuracy: 0.5000
Epoch 17/500
 - 0s - loss: 0.2516 - binary_accura

 - 0s - loss: 0.2187 - binary_accuracy: 1.0000
Epoch 138/500
 - 0s - loss: 0.2183 - binary_accuracy: 1.0000
Epoch 139/500
 - 0s - loss: 0.2180 - binary_accuracy: 1.0000
Epoch 140/500
 - 0s - loss: 0.2176 - binary_accuracy: 1.0000
Epoch 141/500
 - 0s - loss: 0.2173 - binary_accuracy: 1.0000
Epoch 142/500
 - 0s - loss: 0.2169 - binary_accuracy: 1.0000
Epoch 143/500
 - 0s - loss: 0.2166 - binary_accuracy: 1.0000
Epoch 144/500
 - 0s - loss: 0.2162 - binary_accuracy: 1.0000
Epoch 145/500
 - 0s - loss: 0.2158 - binary_accuracy: 1.0000
Epoch 146/500
 - 0s - loss: 0.2155 - binary_accuracy: 1.0000
Epoch 147/500
 - 0s - loss: 0.2151 - binary_accuracy: 1.0000
Epoch 148/500
 - 0s - loss: 0.2148 - binary_accuracy: 1.0000
Epoch 149/500
 - 0s - loss: 0.2144 - binary_accuracy: 1.0000
Epoch 150/500
 - 0s - loss: 0.2140 - binary_accuracy: 1.0000
Epoch 151/500
 - 0s - loss: 0.2137 - binary_accuracy: 1.0000
Epoch 152/500
 - 0s - loss: 0.2133 - binary_accuracy: 1.0000
Epoch 153/500
 - 0s - loss: 0.2130 - b

Epoch 272/500
 - 0s - loss: 0.1668 - binary_accuracy: 1.0000
Epoch 273/500
 - 0s - loss: 0.1664 - binary_accuracy: 1.0000
Epoch 274/500
 - 0s - loss: 0.1660 - binary_accuracy: 1.0000
Epoch 275/500
 - 0s - loss: 0.1656 - binary_accuracy: 1.0000
Epoch 276/500
 - 0s - loss: 0.1652 - binary_accuracy: 1.0000
Epoch 277/500
 - 0s - loss: 0.1648 - binary_accuracy: 1.0000
Epoch 278/500
 - 0s - loss: 0.1644 - binary_accuracy: 1.0000
Epoch 279/500
 - 0s - loss: 0.1640 - binary_accuracy: 1.0000
Epoch 280/500
 - 0s - loss: 0.1636 - binary_accuracy: 1.0000
Epoch 281/500
 - 0s - loss: 0.1632 - binary_accuracy: 1.0000
Epoch 282/500
 - 0s - loss: 0.1628 - binary_accuracy: 1.0000
Epoch 283/500
 - 0s - loss: 0.1624 - binary_accuracy: 1.0000
Epoch 284/500
 - 0s - loss: 0.1620 - binary_accuracy: 1.0000
Epoch 285/500
 - 0s - loss: 0.1616 - binary_accuracy: 1.0000
Epoch 286/500
 - 0s - loss: 0.1612 - binary_accuracy: 1.0000
Epoch 287/500
 - 0s - loss: 0.1608 - binary_accuracy: 1.0000
Epoch 288/500
 - 0s - lo

Epoch 407/500
 - 0s - loss: 0.1149 - binary_accuracy: 1.0000
Epoch 408/500
 - 0s - loss: 0.1146 - binary_accuracy: 1.0000
Epoch 409/500
 - 0s - loss: 0.1142 - binary_accuracy: 1.0000
Epoch 410/500
 - 0s - loss: 0.1139 - binary_accuracy: 1.0000
Epoch 411/500
 - 0s - loss: 0.1135 - binary_accuracy: 1.0000
Epoch 412/500
 - 0s - loss: 0.1132 - binary_accuracy: 1.0000
Epoch 413/500
 - 0s - loss: 0.1128 - binary_accuracy: 1.0000
Epoch 414/500
 - 0s - loss: 0.1125 - binary_accuracy: 1.0000
Epoch 415/500
 - 0s - loss: 0.1122 - binary_accuracy: 1.0000
Epoch 416/500
 - 0s - loss: 0.1118 - binary_accuracy: 1.0000
Epoch 417/500
 - 0s - loss: 0.1115 - binary_accuracy: 1.0000
Epoch 418/500
 - 0s - loss: 0.1111 - binary_accuracy: 1.0000
Epoch 419/500
 - 0s - loss: 0.1108 - binary_accuracy: 1.0000
Epoch 420/500
 - 0s - loss: 0.1105 - binary_accuracy: 1.0000
Epoch 421/500
 - 0s - loss: 0.1101 - binary_accuracy: 1.0000
Epoch 422/500
 - 0s - loss: 0.1098 - binary_accuracy: 1.0000
Epoch 423/500
 - 0s - lo

# Activity 1: What happens if we increase the size of the hidden layer?

# Activity 2: What happens in we add another layer?

# Activity 3: What happens if you use a different activation function?
Hint see: https://keras.io/activations/

# Extra: Modify the NN to train / learn how to predict the 3 input XOR gate problem.

![fig6](https://www.electronicshub.org/wp-content/uploads/2015/07/3-IP-TRUTH-TABLE2.jpg)

# References
1. http://home.agh.edu.pl/~vlsi/AI/xor_t/en/main.htm
2. https://blog.thoughtram.io/machine-learning/2016/11/02/understanding-XOR-with-keras-and-tensorlow.html