# Fundamentals

*Main piece*: **ComputationGraph** (created when dynet is imported, is in the background as a singleton object)

ComputationGraph = **expressions** (related to the inputs and outputs of the network) + **ParameterCollection** (containing the parameters that are optimized over time)


# The XOR problem

## Static Networks

Source: http://dynet.readthedocs.io/en/latest/tutorials_notebooks/tutorial-1-xor.html

Consider a model for solving the “xor” problem. The network has two inputs, which can be 0 or 1, and a single output which should be the xor of the two inputs. We will model this as a multi-layer perceptron with a single hidden layer.

Let $x = x_1, x_2$ be our input. We will have a hidden layer of 8 nodes, and an output layer of a single node. The activation on the hidden layer will be a tanh. Our network will then be:

$\sigma(V(\tanh(Wx+b)))$

Where $W$ is a $8 \times 2$ matrix, $V$ is an $8 \times 1$ matrix, and $b$ is an 8-dim vector.

We want the output to be either 0 or 1, so we take the output layer to be the logistic-sigmoid function, $\sigma(x)$, that takes values between $-\infty$ and $+\infty$ and returns numbers in $[0,1]$.

We will begin by defining the model and the computation graph.

In [1]:
import dynet as dy

Create a parameter collection and add the parameters.

In [2]:
m = dy.ParameterCollection()
W = m.add_parameters((8,2))  # 8x2 matrix
V = m.add_parameters((1,8))  # 8x1 matrix
b = m.add_parameters((8))    # 8-dim vector

Create a new computation graph. Not strictly needed here, but good practice.

In [3]:
dy.renew_cg()

<_dynet.ComputationGraph at 0x7f96ac7bd090>

The model parameters can be used as expressions in the computation graph. We now make use of V, W, and b in order to create the complete expression for the network.

In [4]:
x = dy.vecInput(2)  # an input vector of size 2. Also an expression.
output = dy.logistic(V*(dy.tanh((W*x)+b)))

We can now query our network:

In [5]:
x.set([0,0])
output.value()

0.659337043762207

We want to be able to define a loss, so we need an input expression to work against.

In [6]:
y = dy.scalarInput(0)  # this will hold the correct answer
loss = dy.binary_log_loss(output, y)

Loss examples:

In [7]:
x.set([1,0])
y.set(0)
print(loss.value())  # xor(1, 0) = 1, so y = 0 --> hight loss

y.set(1)
print(loss.value())  # xor(1, 0) = 1, so y = 1 --> lower loss

1.3439751863479614
0.30219602584838867


### Training
We now want to set the parameter weights such that the loss is minimized. 

For this, we will use a trainer object. A trainer is constructed with respect to the parameters of a given model.

In [8]:
trainer = dy.SimpleSGDTrainer(m)

To use the trainer, we need to:
* **call the `forward_scalar`** method of `ComputationGraph`. This will run a forward pass through the network, calculating all the intermediate values until the last one (`loss`, in our case), and then convert the value to a scalar. The final output of our network **must** be a single scalar value. However, if we do not care about the value, we can just use `cg.forward()` instead of `cg.forward_sclar()`.
* **call the `backward`** method of `ComputationGraph`. This will run a backward pass from the last node, calculating the gradients with respect to minimizing the last expression (in our case we want to minimize the loss). The gradients are stored in the parameter collection, and we can now let the `trainer` take care of the optimization step.
* **call `trainer.update()`** to optimize the values with respect to the latest gradients.

In [9]:
x.set([1,0])
y.set(1)
loss_value = loss.value() # this performs a forward through the network.
print("the loss before step is:",loss_value)

# now do an optimization step
loss.backward()  # compute the gradients
trainer.update()

# see how it affected the loss:
loss_value = loss.value(recalculate=True) # recalculate=True means "don't use precomputed value"
print("the loss after step is:",loss_value)


the loss before step is: 0.30219602584838867
the loss after step is: 0.262879341840744


The optimization step indeed made the loss decrease. We now need to run this in a loop.
To this end, we will create a `training set`, and iterate over it.

For the xor problem, the training instances are easy to create.

In [10]:
def create_xor_instances(num_rounds=2000):
    questions = []
    answers = []
    for round in range(num_rounds):
        for x1 in 0,1:
            for x2 in 0,1:
                answer = 0 if x1==x2 else 1
                questions.append((x1,x2))
                answers.append(answer)
    return questions, answers 

questions, answers = create_xor_instances()

We now feed each question / answer pair to the network, and try to minimize the loss.


In [11]:
total_loss = 0
seen_instances = 0
for question, answer in zip(questions, answers):
    x.set(question)
    y.set(answer)
    seen_instances += 1
    total_loss += loss.value()
    loss.backward()
    trainer.update()
    if (seen_instances > 1 and seen_instances % 100 == 0):
        print("average loss is:",total_loss / seen_instances)


average loss is: 0.7172141650319099
average loss is: 0.6497034585475921
average loss is: 0.5619007851183414
average loss is: 0.47428334223106505
average loss is: 0.40338586842268703
average loss is: 0.3489488636702299
average loss is: 0.30685874039041144
average loss is: 0.2736465136078186
average loss is: 0.24687683546087807
average loss is: 0.22488178132474423
average loss is: 0.20650553174401548
average loss is: 0.19093013191440453
average loss is: 0.17756360904241983
average loss is: 0.16596830448035949
average loss is: 0.15581427463392417
average loss is: 0.14684836332366102
average loss is: 0.1388732382690753
average loss is: 0.13173288560368948
average loss is: 0.12530237336446973
average loss is: 0.11948049996187911
average loss is: 0.11418442966460828
average loss is: 0.10934571184408427
average loss is: 0.10490729135906567
average loss is: 0.10082123500023348
average loss is: 0.09704697908721864
average loss is: 0.09354996512473847
average loss is: 0.09030056918617684
average

Our network is now trained. Let's verify that it indeed learned the xor function:

In [12]:
x.set([0,1])
print("0,1",output.value())

x.set([1,0])
print("1,0",output.value())

x.set([0,0])
print("0,0",output.value())

x.set([1,1])
print("1,1",output.value())


0,1 0.9983042478561401
1,0 0.9983420372009277
0,0 0.0007349436054937541
1,1 0.0017125688027590513


In case we are curious about the parameter values, we can query them:

In [13]:
W.value()

array([[ 2.62200642, -1.80553484],
       [ 2.69344974, -1.90905726],
       [ 0.6123637 ,  0.4486255 ],
       [-1.75319183,  2.35105109],
       [-2.57931232, -2.61533093],
       [ 2.35912442, -3.0319798 ],
       [-1.13818753, -1.09748042],
       [ 1.15303671,  1.14093041]])

In [14]:
V.value()

array([[-3.00425816, -3.13614678, -1.25156963, -2.33445621, -4.22193146,
         3.8618443 , -1.67410648, -2.27995872]])

In [15]:
b.value()

[0.7008389830589294,
 0.764814019203186,
 -1.0103081464767456,
 0.7629812359809875,
 0.76901775598526,
 -1.0365179777145386,
 -0.45914003252983093,
 -1.862380862236023]

### To summarize
Here is a complete program:

In [16]:
# define the parameters
m = dy.ParameterCollection()
W = m.add_parameters((8,2))
V = m.add_parameters((1,8))
b = m.add_parameters((8))

# renew the computation graph
dy.renew_cg()

# create the network
x = dy.vecInput(2) # an input vector of size 2.
output = dy.logistic(V*(dy.tanh((W*x)+b)))
# define the loss with respect to an output y.
y = dy.scalarInput(0) # this will hold the correct answer
loss = dy.binary_log_loss(output, y)

# create training instances
def create_xor_instances(num_rounds=2000):
    questions = []
    answers = []
    for round in range(num_rounds):
        for x1 in 0,1:
            for x2 in 0,1:
                answer = 0 if x1==x2 else 1
                questions.append((x1,x2))
                answers.append(answer)
    return questions, answers 

questions, answers = create_xor_instances()

# train the network
trainer = dy.SimpleSGDTrainer(m)

total_loss = 0
seen_instances = 0
for question, answer in zip(questions, answers):
    x.set(question)
    y.set(answer)
    seen_instances += 1
    total_loss += loss.value()
    loss.backward()
    trainer.update()
    if (seen_instances > 1 and seen_instances % 100 == 0):
        print("average loss is:",total_loss / seen_instances)



average loss is: 0.7239488640427589
average loss is: 0.6833679360151291
average loss is: 0.6269646707177162
average loss is: 0.554561566747725
average loss is: 0.48260273870825765
average loss is: 0.42192818397035203
average loss is: 0.3730690868145653
average loss is: 0.33372908423654735
average loss is: 0.3016572737486826
average loss is: 0.27511965315882114
average loss is: 0.25284487135877665
average loss is: 0.2339041292628584
average loss is: 0.2176117386124455
average loss is: 0.20345395165961236
average loss is: 0.19103979984298347
average loss is: 0.18006722247693688
average loss is: 0.17029953107046072
average loss is: 0.16154883304766068
average loss is: 0.1536641826679146
average loss is: 0.1465229888765607
average loss is: 0.14002469280512914
average loss is: 0.13408605291571637
average loss is: 0.12863758449503424
average loss is: 0.12362083888729103
average loss is: 0.11898630112111569
average loss is: 0.11469174928918409
average loss is: 0.11070096134863518
average loss

### Dynamic Networks

Dynamic networks are very similar to static ones, but instead of creating the network once and then calling "set" in each training example to change the inputs, we just create a new network for each training example.

We present an example below. While the value of this may not be clear in the `xor` example, the dynamic approach
is very convenient for networks for which the structure is not fixed, such as recurrent or recursive networks.

In [17]:
import dynet as dy
# create training instances, as before
def create_xor_instances(num_rounds=2000):
    questions = []
    answers = []
    for round in range(num_rounds):
        for x1 in 0,1:
            for x2 in 0,1:
                answer = 0 if x1==x2 else 1
                questions.append((x1,x2))
                answers.append(answer)
    return questions, answers 

questions, answers = create_xor_instances()

# create a network for the xor problem given input and output
def create_xor_network(W, V, b, inputs, expected_answer):
    dy.renew_cg() # new computation graph
    x = dy.vecInput(len(inputs))
    x.set(inputs)
    y = dy.scalarInput(expected_answer)
    output = dy.logistic(V*(dy.tanh((W*x)+b)))
    loss =  dy.binary_log_loss(output, y)
    return loss

m2 = dy.ParameterCollection()
W = m2.add_parameters((8,2))
V = m2.add_parameters((1,8))
b = m2.add_parameters((8))
trainer = dy.SimpleSGDTrainer(m2)

seen_instances = 0
total_loss = 0
for question, answer in zip(questions, answers):
    loss = create_xor_network(W, V, b, question, answer)
    seen_instances += 1
    total_loss += loss.value()
    loss.backward()
    trainer.update()
    if (seen_instances > 1 and seen_instances % 100 == 0):
        print("average loss is:",total_loss / seen_instances)



average loss is: 0.736072083413601
average loss is: 0.7029607383906842
average loss is: 0.657716857890288
average loss is: 0.5936274072155356
average loss is: 0.522500530987978
average loss is: 0.45876798949514824
average loss is: 0.4062296753270285
average loss is: 0.36354592158226295
average loss is: 0.32861912462446424
average loss is: 0.29967247347719966
average loss is: 0.2753582175998864
average loss is: 0.25467700297789025
average loss is: 0.23688568113873212
average loss is: 0.22142534360688712
average loss is: 0.20786977378775676
average loss is: 0.19588929961028043
average loss is: 0.1852254262692569
average loss is: 0.17567287252698507
average loss is: 0.16706668734746544
average loss is: 0.1592728893388994
average loss is: 0.15218156833756005
average loss is: 0.1457017392155037
average loss is: 0.13975745311310594
average loss is: 0.13428482367482503
average loss is: 0.12922973018987105
average loss is: 0.12454602355531488
average loss is: 0.12019411381713493
average loss i