# MLP XOR Solution

Multi Layer Perceptron (a.k.a MLP) which is basic model of deep learning works like below.

![Image of Perceptron](https://www.ritchievink.com/img/post-9-mlp/nn_diagram_1.png) 

Neuron (a.k.a node, perceptron) works like below picture.

![Image of Perceptron](https://i.stack.imgur.com/VqOpE.jpg)

There are various activation functions used by many type of deep learnings,  
Traditionally, Single Neuron Perceptron used step function as activation function.  
Single Perceptron could solve AND, OR operation, but it couldn't solve XOR operation.
XOR can be solved by Multi Layer Perceptron (MLP), even the MLP has perceptron in its name,  
any activation function can be used for MLP.

![Image of Perceptron](https://qph.ec.quoracdn.net/main-qimg-01c26eabd976b027e49015428b7fcf01?convert_to_webp=true)

## single perceptron only works on linearly separable classification
One perceptron is one decision boundary, so it only solve linearly separable problem.

![Image of Perceptron](https://qph.fs.quoracdn.net/main-qimg-a6c557af4280d1f85cacc66e048e82f3)

MLP (multi layer perceptron) with two neurons in hidden layer can solve XOR.  
Two neurons in hidden layer will draw two boundary lines (z1, z2), 

we can make z1, z2 truth table like below,
z1, z2, value
0,  0,  0
0,  1,  1
1,  0,  1

As you can see from below upper 2d chart, now it is linearly separable on z1, z2 axis,  
one perceptron in the next layer can classify output from hidden layer.

![Image of Perceptron](http://cps0715.weebly.com/uploads/7/4/0/3/74035485/8009014_orig.png)

# Practice with Tensorflow

In [1]:
import tensorflow as tf

# Define Tensorflow Graph
firstly, we will define train data shape.  
XOR train data has input X and output Y.  

X is [4,2] shape like below,  
[0, 0], [0, 1], [1, 0], [1, 1]  

Y is [4,1] shape like below,  
[[0], [1], [1], [0]]  

In [11]:
X = tf.placeholder(tf.float32, shape=[4,2])
Y = tf.placeholder(tf.float32, shape=[4,1])

# First Layer

In [12]:
# we define first layer has two neurons taking two input values.  
W1 = tf.Variable(tf.random_uniform([2,2]))
# each neuron has one bias.
B1 = tf.Variable(tf.zeros([2]))
# First Layer's output is Z which is the sigmoid(W1 * X + B1)
Z = tf.sigmoid(tf.matmul(X, W1) + B1)

# Second Layer

In [13]:
# we define second layer has one neurons taking two input values.  
W2 = tf.Variable(tf.random_uniform([2,1]))
# one neuron has one bias.
B2 = tf.Variable(tf.zeros([1]))
# Second Layer's output is Y_hat which is the sigmoid(W2 * Z + B2)
Y_hat = tf.sigmoid(tf.matmul(Z, W2) + B2)

# Loss Function

In [14]:
# cross entropy
loss = tf.reduce_mean(-1*((Y*tf.log(Y_hat))+((1-Y)*tf.log(1.0-Y_hat))))

# Optimizer

In [15]:
# Gradient Descent
train_step = tf.train.GradientDescentOptimizer(0.05).minimize(loss)

# Train

In [16]:
# train data
train_X = [[0,0],[0,1],[1,0],[1,1]]
train_Y = [[0],[1],[1],[0]]

In [18]:
# initialize
init = tf.global_variables_initializer()
# Start training
with tf.Session() as sess:
    # Run the initializer
    sess.run(init)
    print("train data: "+str(train_X))
    for i in range(20000):
        sess.run(train_step, feed_dict={X: train_X, Y: train_Y})
        if i % 5000 == 0:
            print('Epoch : ', i)
            print('Output : ', sess.run(Y_hat, feed_dict={X: train_X, Y: train_Y}))

train data: [[0, 0], [0, 1], [1, 0], [1, 1]]
Epoch :  0
Bias1  [-0.0015105  -0.00148694]
Output :  [[0.6631402 ]
 [0.6786914 ]
 [0.71121323]
 [0.722656  ]]
Epoch :  5000
Bias1  [-0.03477474 -0.30622375]
Output :  [[0.2932655 ]
 [0.5733818 ]
 [0.5659242 ]
 [0.60077167]]
Epoch :  10000
Bias1  [-4.524982  -2.0431015]
Output :  [[0.08664235]
 [0.89662325]
 [0.89681584]
 [0.13940719]]
Epoch :  15000
Bias1  [-6.298708 -2.640406]
Output :  [[0.03407335]
 [0.97077954]
 [0.9708251 ]
 [0.033969  ]]
