# The XOR problem

As an example, consider a model for solving the “xor” problem. The network has two inputs, which can be 0 or 1, and a single output which should be the xor of the two inputs. We will model this as a multi-layer perceptron with a single hidden layer.

In [2]:
import dynet as dy

## Create a parameter collection and add the parameters.
Let x=x1,x2 be our input. We will have a hidden layer of 8 nodes, and an output layer of a single node. The activation on the hidden layer will be a tanh. Our network will then be:
σ(V(tanh(Wx+b)))

We want the output to be either 0 or 1, so we take the output layer to be the logistic-sigmoid function, σ(x), that takes values between −∞ and +∞ and returns numbers in [0,1].

Create a parameter collection and populate it with parameters.

In [3]:
m = dy.ParameterCollection()
W = m.add_parameters((8,2))  # 8x2 matrix
V = m.add_parameters((1,8))  # 8x1 matrix
b = m.add_parameters((8))  # 8-dim vector

b.value()

[0.527769148349762,
 -0.5609636902809143,
 -0.5765582919120789,
 0.08570045232772827,
 0.5291308760643005,
 0.08153563737869263,
 0.46551305055618286,
 -0.03702342510223389]

Create a computation graph.

In [4]:
dy.renew_cg() # new computation graph. not strictly needed here, but good practice.

<_dynet.ComputationGraph at 0x7f9d1ada4c60>

The model parameters can be used as expressions in the computation graph. We now make use of V, W, and b in order to create the complete expression for the network.

In [5]:
x = dy.vecInput(2) # an input vector of size 2. Also an expression.
output = dy.logistic(V*(dy.tanh((W*x)+b)))

We can now query the (untrained) network:

In [6]:
x.set([0,0])
output.value()

0.49302706122398376

We want to be able to define a loss, so we need an input expression to work against.

In [8]:
y = dy.scalarInput(0)  # this will hold the correct answer
loss = dy.binary_log_loss(output, y)  # define the loss with respect to an output y

x.set([1,0])
y.set(0)
print(loss.value())

y.set(1)
print(loss.value())

0.6106782555580139
0.7830334901809692


## Training

We now want to set the parameter weights such that the loss is minimized.

For this, we will use a trainer object. A trainer is constructed with respect to the parameters of a given model.

In [13]:
def create_xor_instances(num_rounds=2000):
    questions = []
    answers = []
    for round in range(num_rounds):
        for x1 in 0,1:
            for x2 in 0,1:
                answer = 0 if x1==x2 else 1
                questions.append((x1,x2))
                answers.append(answer)
    return questions, answers

questions, answers = create_xor_instances()

In [14]:
trainer = dy.SimpleSGDTrainer(m)  # remember that m is the ParameterCollection

In [16]:
total_loss = 0
seen_instances = 0

for question, answer in zip(questions, answers):
    x.set(question)
    y.set(answer)
    seen_instances += 1
    total_loss += loss.value()
    loss.backward()
    trainer.update()

    if (seen_instances > 1 and seen_instances % 100 == 0):
        print("average loss is:",total_loss / seen_instances)  # observe how in each iteration the loss gets smaller

average loss is: 0.0018376447784248739
average loss is: 0.00182388814719161
average loss is: 0.0018103889521444217
average loss is: 0.0017971337407652755
average loss is: 0.001784119611256756
average loss is: 0.0017713370975495006
average loss is: 0.00175878117850516
average loss is: 0.0017464446159283398
average loss is: 0.0017343213751963857
average loss is: 0.001722404639062006
average loss is: 0.0017106894676213746
average loss is: 0.001699170211601692
average loss is: 0.001687841442001697
average loss is: 0.001676698186908782
average loss is: 0.0016657353423846265
average loss is: 0.001654948228133435
average loss is: 0.0016443321316996042
average loss is: 0.001633883116786213
average loss is: 0.0016235966343542954
average loss is: 0.0016134688405436465
average loss is: 0.0016034952484603438
average loss is: 0.0015936724746345797
average loss is: 0.001583997006249695
average loss is: 0.0015744649877403086
average loss is: 0.001565072685922496
average loss is: 0.001555817284836219


 The network is now trained. Let’s verify that it indeed learned the xor function:

In [17]:
x.set([0,1])
print("0,1", output.value())

x.set([1,0])
print("1,0", output.value())

x.set([0,0])
print("0,0", output.value())

x.set([1,1])
print("1,1", output.value())


0,1 0.9992327690124512
1,0 0.9989888072013855
0,0 0.0003380033012945205
1,1 0.0011132430518046021


In case we are curious about the parameter values, we can query them:

In [18]:
W.value()

array([[ 3.35719442, -2.4858532 ],
       [-0.31870559, -0.13677683],
       [-2.44874334, -2.38197374],
       [-1.51086211, -1.50610352],
       [-1.20791674, -1.4196291 ],
       [-2.87451792,  3.69964123],
       [ 0.59257954,  0.49650541],
       [-2.24656987,  1.25947213]])

In [19]:
V.value()

array([[-4.58468819, -0.41393092, -3.70492268, -2.07314849,  2.64222288,
        -5.93753099,  0.61036408,  2.10825324]])

In [20]:
b.value()

[1.0862807035446167,
 -0.6335340142250061,
 0.6043263077735901,
 -0.15373480319976807,
 2.006375312805176,
 1.2247034311294556,
 0.8017867207527161,
 -0.428931325674057]

## Dynamic Networks

Dynamic networks are very similar to static ones, but instead of creating the network once and then calling “set” in each training example to change the inputs, we just create a new network for each training example.

We present an example below. While the value of this may not be clear in the xor example, the dynamic approach is very convenient for networks for which the structure is not fixed, such as recurrent or recursive networks.

In [21]:
import dynet as dy

def create_xor_instances(num_rounds=2000):
    questions = []
    answers = []
    for round in range(num_rounds):
        for x1 in 0,1:
            for x2 in 0,1:
                answer = 0 if x1==x2 else 1
                questions.append((x1,x2))
                answers.append(answer)
    return questions, answers

questions, answers = create_xor_instances()

# create a network for the xor problem given input and output
def create_xor_network(W, V, b, inputs, expected_answer):
    dy.renew_cg() # new computation graph
    x = dy.vecInput(len(inputs))
    x.set(inputs)
    y = dy.scalarInput(expected_answer)
    output = dy.logistic(V*(dy.tanh((W*x)+b)))
    loss =  dy.binary_log_loss(output, y)
    return loss

m2 = dy.ParameterCollection()
W = m2.add_parameters((8,2))
V = m2.add_parameters((1,8))
b = m2.add_parameters((8))
trainer = dy.SimpleSGDTrainer(m2)

seen_instances = 0
total_loss = 0
for question, answer in zip(questions, answers):
    loss = create_xor_network(W, V, b, question, answer)  # we create a new computation graph for each example 
    seen_instances += 1
    total_loss += loss.value()
    loss.backward()
    trainer.update()
    if (seen_instances > 1 and seen_instances % 100 == 0):
        print("average loss is:",total_loss / seen_instances)

average loss is: 0.7246274840831757
average loss is: 0.7049259021878242
average loss is: 0.6761396740873654
average loss is: 0.6295092997699976
average loss is: 0.5688946435749531
average loss is: 0.5064878949895502
average loss is: 0.4511629040965012
average loss is: 0.40475043436978014
average loss is: 0.3662109077721834
average loss is: 0.334039968220517
average loss is: 0.30691831898452204
average loss is: 0.28380593460518866
average loss is: 0.26390510631725195
average loss is: 0.24660537498870067
average loss is: 0.2314363291611274
average loss is: 0.21803172358020675
average loss is: 0.2061031749472022
average loss is: 0.19542103442932582
average loss is: 0.1858004413378474
average loss is: 0.1770910691670142
average loss is: 0.1691695119431686
average loss is: 0.1619335715574297
average loss is: 0.15529792264427827
average loss is: 0.14919079013906109
average loss is: 0.14355137522779404
average loss is: 0.1383278423553118
average loss is: 0.13347573350062938
average loss is: 0