## What is XOR ?
A binary operator

## Choosing model

### Linear model ?
We have MSE loss function. We have the form of the model: $f(\mathbf{x}; \mathbf{w}, b) = \mathbf{x}^T \mathbf{w} + b$

A linear model **is not able** to represent the XOR function. 

### Solution
One way to solve this problem is to use a model that learns a different feature space in which a linear model is able to represent the solution

The complete model:

$\mathbf{h} = f^{(1)}(\mathbf{x}; \mathbf{W} , \mathbf{c})$ 

$y = f^{(2)}(\mathbf{h}; \mathbf{w}, b)$

$y = f(\mathbf{x}; \mathbf{W}, c, w, b) = f^{(2)}(f^{(1)})(\mathbf{x}))$

We need a nonlinear function to describe the features. -> **affine transformation** as activation function

$h = g(\mathbf{W}^{T} \mathbf{x} + c)$

The recommendation is to use the **rectified linear unit** or ReLU. $g(z) = max\{0, z\}$

#### The complete model:
$f(\mathbf{x}; \mathbf{W}, \mathbf{c}, w, b) = w^T max\{0, \mathbf{W}^T \mathbf{x}\ + c\} + b$

## Implementation in Python

In [6]:
import numpy as np



# input
X = np.matrix([[0,0],[0,1],[1,0],[1,1]])

# function
def xor(input):
    # parameters
    W = np.matrix([[1,1],[1,1]])
    c = np.matrix([0,-1])
    w = np.matrix([1,-2]).T
    b = 0
    
    # rectified linear unit
    h = np.maximum(X * W + c, 0)
    
    output = h * w + b
    
    return output

print(xor(X))


[[0]
 [1]
 [1]
 [0]]
