# XOR Problem in Deep Neural Network
> In this post, it will be mentioned about Basic Concept of Deep Neural Network, and covered XOR problems with Deep Neural Network.

- toc: true 
- badges: true
- comments: true
- author: Chanseok Kang
- categories: [Python, Deep_Learning]
- image: images/linearly_separable_xor.png

In [1]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

plt.rcParams['figure.figsize'] = (16, 10)
plt.rcParams['text.usetex'] = True
plt.rc('font', size=15)

## Linearly Separable
Neurons, one of the basic elements in Neural Network are very similar with its in biology. When the input or signal is entered into synapse, cell nucleus interprets the information contained in signals and generates the output through Axon. similiarily, almost same operation happens in neurons in Neural Network. When the other axon sends the output, that is the input of next neurons ($x$), cell body interprets its information with embedded weight vector and bias($W, b$). Then, activation function filters information and send it to next neuron.

Actually, we already cover the concepts, and its implementation with tensorflow. But let's review the basic problem with simple examples.
 If your major is from EECS, you may heard about Logic Gates. Basic logic is AND and OR gates and have a following properties,
 
 - AND
 
| $x_1$   	| $x_2$  	| $y$  	|
|---	|---	|---	|
| 0  	| 0  	| 0  	|
| 0  	| 1  	| 0  	|
| 1  	| 0  	| 0  	|
| 1  	| 1  	| 1  	|


- OR

| $x_1$   	| $x_2$  	| $y$  	|
|---	|---	|---	|
| 0  	| 0  	| 0  	|
| 0  	| 1  	| 1  	|
| 1  	| 0  	| 1  	|
| 1  	| 1  	| 1  	|

And we can draw the graph of AND/OR like this,

![linearly_separable_and_or](image/linearly_separable_and_or.png)

Note that, black one is 1, white one is 0. Then if we want to separate it by two, we can easily do it by drawing the straight line, In this case, if we can separate the given data by drawing the line, we can call **"the linearly separable"**. But how about XOR case? (also known as eXclusive OR)

- XOR

| $x_1$   	| $x_2$  	| $y$  	|
|---	|---	|---	|
| 0  	| 0  	| 0  	|
| 0  	| 1  	| 1  	|
| 1  	| 0  	| 1  	|
| 1  	| 1  	| 0  	|

And the plot is like this,

![linearly_separable_xor](image/linearly_separable_xor.png)

Can we separate it with same manner in linearly separable? Maybe you cannot. In this kind of **Non-linearly separable** data cannot be separated by the linear equation. 

For this purpose, [Marvin Minsky](https://en.wikipedia.org/wiki/Marvin_Minsky), the co-founder of the MIT AI Lab, proved that Machine Learning tools cannot solve the Non-linearly separable case through his book "perceptrons". Instead, the book said that it could be solved with the hierachical architecture of Multiple perceptrons, so called **Multi-Layered Perceptron**(MLP for short). But in that time, there was no concepts for training, such as updating weights and bias, and optimization methods, and so on. So most of one thought that it is impossible to train the network.

## Backpropagation
In 1970, [Paul Werbos](https://en.wikipedia.org/wiki/Paul_Werbos) describes the **backpropagation** in his dissertation. And it helps to handle the problem mentioned previously.

![backpropagation](image/training_inference1.png) [^1]

Backpropagation is one of approaches to update weights using error. When the input is coming into the input layer, the process is executed with forward direction and gets inference. There maybe some errors comparing inferenced output and actual output, Based on this error, the process is excuted with backward direction for the weight update. Then updated weight is used in next step. 

This approach is re-discovered by [Geoffrey Hinton](https://en.wikipedia.org/wiki/Geoffrey_Hinton) in mid-80's.

## Convolutional Neural Networks
In other case, there is another approach to handle non-linearly separable problem, especially on visual data. Someone found out that there is some general patterns of cell operation in optics, Imitated from the process of optic cell, Yann LeCun introduced **Convolutional Neural Network** (CNN for short) with his network LeNet-5, and showed the efficiency in handwriting recognition.

![lenet5](image/LeNet_Original_Image.jpg)

Unlike the operation in MLP, CNN compressed the input signal and handle it as a feature of visual data.

## The problems

These approaches like backpropagation and CNN improves the AI technology, but still remains the problem in architecture. If we want to gather lots of information from the features, it requires some numbers of layers. But when the number of network layer is increased, the errors that need for backpropagation is vanished, so called **Vanishing Gradient**. Because each of the neural network's weight receive an update proportional to the partial derivate of the error function. If the number of layer is increased, the order of gradient is also increased, then error might be vanished.


## Breakthrough
But in 2006, Hinton and Bengio found out that the neural networks with many layers could be trained well through **weight initialization**. Previously, there is no rule for weights, so usually it initialized randomly. Thanks to this approach, deep machine learning methods are more efficient for difficult problems than shallow models. Actually, The word **"Deep"** in Deep Learning means the learning architecture with some numbers of layers.

So this is the beginning of AI era!

[^1] : Figure from [nvidia blogs](https://developer.nvidia.com/blog/inference-next-step-gpu-accelerated-deep-learning/)