# The XOR problem

## Static Networks

Source: http://dynet.readthedocs.io/en/latest/tutorials_notebooks/tutorial-1-xor.html

Consider a model for solving the “xor” problem. The network has two inputs, which can be 0 or 1, and a single output which should be the xor of the two inputs. We will model this as a multi-layer perceptron with a single hidden layer.

Let $x = x_1, x_2$ be our input. We will have a hidden layer of 8 nodes, and an output layer of a single node. The activation on the hidden layer will be a tanh. Our network will then be:

$\sigma(V(\\tanh(Wx+b)))$

Where $W$ is a $8 \times 2$ matrix, $V$ is an $8 \times 1$ matrix, and $b$ is an 8-dim vector.

We want the output to be either 0 or 1, so we take the output layer to be the logistic-sigmoid function, $\sigma(x)$, that takes values between $-\infty$ and $+\infty$ and returns numbers in $[0,1]$.

We will begin by defining the model and the computation graph.

In [2]:
import dynet as dy

Create a parameter collection and add the parameters.

In [3]:
m = dy.ParameterCollection()
W = m.add_parameters((8,2))  # 8x2 matrix
V = m.add_parameters((1,8))  # 8x1 matrix
b = m.add_parameters((8))    # 8-dim vector

Create a new computation graph. Not strictly needed here, but good practice.

In [4]:
dy.renew_cg()

<_dynet.ComputationGraph at 0x7fa0580ed0d8>

The model parameters can be used as expressions in the computation graph. We now make use of V, W, and b in order to create the complete expression for the network.

In [5]:
x = dy.vecInput(2)  # an input vector of size 2. Also an expression.
output = dy.logistic(V*(dy.tanh((W*x)+b)))

We can now query our network:

In [6]:
x.set([0,0])
output.value()

0.47304537892341614

We want to be able to define a loss, so we need an input expression to work against.

In [7]:
y = dy.scalarInput(0)  # this will hold the correct answer
loss = dy.binary_log_loss(output, y)

Loss examples:

In [8]:
x.set([1,0])
y.set(0)
print(loss.value())  # xor(1, 0) = 1, so y = 0 --> hight loss

y.set(1)
print(loss.value())  # xor(1, 0) = 1, so y = 0 --> lower loss

0.9884130954742432
0.465480774641037


## Training

We now want to set the parameter weights such that the loss is minimized.

For this, we will use a trainer object. A trainer is constructed with respect to the parameters of a given model.

In [9]:
trainer = dy.SimpleSGDTrainer(m)