# Solving XOR Problem with Deep Learning

One of the most prominent limitations of a linear model is its inability to solve the XOR problem. While a linear model can handle AND and OR operations, it fails with the XOR operation. In the study of AI, it is commonly known that a Multilayer Perceptron (MLP) can solve the XOR problem. This document explores how this is achieved.

## 1. Limitations of a Linear Model

When classifying data, a linear model can only create a linear boundary, effectively dividing data into two categories using a straight line. This approach works for operations like AND and OR, where a linear line can separate the outputs (0 and 1). However, the XOR operation is not linearly separable, meaning a single straight line cannot separate the outputs for XOR.

This limitation of the perceptron was mathematically demonstrated by Marvin Minsky and Seymour Papert in their book *Perceptrons: An Introduction to Computational Geometry* (1969).

## 2. Solving XOR with a Hidden Layer

To solve the XOR problem, at least two lines or a non-linear boundary are required. In this document, we demonstrate how two lines can be used to solve the XOR problem.

### XOR Logic

Conceptually, the XOR operator can be represented by the intersection of two logical operations:

1. **NOT (x1 AND x2)** - A line representing this operation.
2. **(x1 OR x2)** - A line representing this operation.

The intersection of these two lines (AND operation) yields a result of 1 where the XOR condition is met, and 0 otherwise.

In [None]:
import tensorflow as tf
import numpy as np

In [20]:
# XOR input output data
x = [
     [0, 0],
     [0, 1],
     [1, 0],
     [1, 1]
] * 5000

y = [
     [0],
     [1],
     [1],
     [0]
] * 5000

x = np.array(x)
y = np.array(y)

In [41]:
def print_weights(model, state):
    print()
    print(f"Model weights {state}:")
    weights = model.get_weights()[0]
    for i in range(len(weights)):
        print(f"{i} layer weights = {weights[i]}")
    print()

## 3. Classification of XOR with MLP

### Model Architecture

To classify XOR, the input layer must consist of neurons corresponding to the input pairs (0,0), (0,1), (1,0), and (1,1). Therefore, the input layer contains two neurons. The hidden layer needs at least two neurons to form the two lines necessary for classification. These neurons are fully connected (FC) and use weights, biases, and the Sigmoid activation function.

### Activation Function

The Sigmoid function is used as the activation function. It approaches 1 as the input becomes more positive and 0 as the input becomes more negative.

```bash
# Define the neural network model
model = tf.keras.Sequential([
    # Fully connected layer with sigmoid activation
    tf.keras.layers.Dense(2, activation="sigmoid", input_shape=(2,)),
    
    # Output layer with sigmoid activation
    tf.keras.layers.Dense(1, activation="sigmoid")
])
```

In [42]:
# Define the neural network model
model = tf.keras.Sequential([
    # Fully connected layer with sigmoid activation
    tf.keras.layers.Dense(2, activation="sigmoid", input_shape=(2,)),

    # Output layer with sigmoid activation
    tf.keras.layers.Dense(1, activation="sigmoid")
])

model.summary()

# Model weights before training
print_weights(model, "before")

optimizer = tf.keras.optimizers.Adam(learning_rate=0.1)
model.compile(optimizer=optimizer, loss="binary_crossentropy", metrics=['accuracy'])

model.fit(x, y, batch_size=64, epochs=10)

# Model weights after training
print_weights(model, "after")

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)



Model weights before:
0 layer weights = [-0.99807197 -0.00325847]
1 layer weights = [-1.0873995  0.8312775]

Epoch 1/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - accuracy: 0.6718 - loss: 0.5083
Epoch 2/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 1.0000 - loss: 0.0084
Epoch 3/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 1.0000 - loss: 0.0029
Epoch 4/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 1.0000 - loss: 0.0015
Epoch 5/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 1.0000 - loss: 9.6297e-04
Epoch 6/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 1.0000 - loss: 6.6401e-04
Epoch 7/10
[1m313/313[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - accuracy: 1.0000 - loss: 4.7684e-04
Epoch 8/10
[1m313/313[0m [32m━━━━━━