# Dynet tutorial: Approximating $\sqrt{2}$ with gradient descent

Here's a quick tutorial to get to know the dynet workflow on a simple optimization problem: computing the value of $\sqrt{2}$.

This can be reframed as an optimization problem where we are looking for 

$$\text{argmin}_{0\leq \theta\leq 2} \left[\theta^2-2\right]^2$$

which we can solve via gradient descent. Note that this function is *not* convex on this domain (its second derivative, $4(3\theta^2-2)$, is negative near zero for instance). However it does have only one global minimum on the $[0, 2]$ interval.

Here's how to solve this problem in dynet:

Import dynet and numpy to get started

I'm assuming that you're using the latest version of dynet (I wrote the tutorial on [this commit](https://github.com/clab/dynet/tree/d060d1103424cfecfaa39739d21d8e37818170be) but the latest master should do).

In [1]:
import numpy as np
import dynet as dy

Next we will define the parameters of our optimization problems. Parameters in dynet are stored in a `ParameterCollection`. The `ParameterCollection` is used to save/load and update multiple parameters a the same time. Here we only have one (scalar) parameter.

In [2]:
# Declare the parameter collection
pc = dy.ParameterCollection()
# Add our parameter to the collection. Here the shape of the parameter is just 1: it's a scalar.
# The parameter will be initialized at random between 0 and 2 (we know our solution to be in this interval)
theta = pc.add_parameters(1, init=np.random.uniform(0, 2))
# Let's print the value
# First retrieve the parmeter value as a numpy array of shape (1,)
theta_val = theta.as_array()
# Then print it!
print('Initial value: theta=%.10f' % theta_val)

Initial value: theta=0.3572946191


Next we will declare our `Trainer` object which will define the optimization method that we will be using.

In [3]:
# Number of iterations
n_iterations=20
# Learning rate
learning_rate=0.1
# SGD trainer (ie vanilla gradient descent)
trainer = dy.SimpleSGDTrainer(pc, learning_rate)

Now we can start optimizing our objective. The optimization cycle always follows the same order in dynet:
1. Call `dy.renew_cg()`: this basically clears all previous computation. This is VERY important, think of it as garbage collection.
2. Add the constants to the computation graph (in the case of neural nets that would be the inputs).
3. Build the objective function with dynet's built-in operations: no computation is performed yet, but dynet will build a computation graph with all the operations in the right order
4. Call the forward pass on the objective: this will compute the value of the objective function and all intermediate values needed for the backward pass
5. Call the backward pass on the objective: this is where the magic happens! Every operation has its derivative computed for you, and using the chain rule dynet automatically derives the gradient of the objective with respect ot the parameters
6. Call `.update()` on your trainer object to update the parameters according to your optimizer
7. Back to step 1.

In [4]:
for iteration in range(n_iterations):
    # Step 1: renew the computation graph
    dy.renew_cg()
    # Step 2: add constants in the computation graph
    two = dy.scalarInput(2)   # Input the scalar 2
    # Step 3: define the loss function: here the loss is (theta^2 - 2)^2
    loss = dy.square(dy.square(theta) - two)
    # Step 4: Call the forward pass to compute the value of the loss function (and then print it)
    loss.forward()
    loss_value = loss.value() # Retrieve the value from the computation graph
    print('Loss @ %d iterations:\t%.10f' % (iteration+1, loss_value))
    # Step 5: Call the backward pass (this computes the gradients)
    loss.backward()
    # Step 6: update the parameter(s)
    trainer.update()
    

Loss @ 1 iterations:	3.5056591034
Loss @ 2 iterations:	2.5905482769
Loss @ 3 iterations:	0.8927946091
Loss @ 4 iterations:	0.0000116434
Loss @ 5 iterations:	0.0000042144
Loss @ 6 iterations:	0.0000015123
Loss @ 7 iterations:	0.0000005452
Loss @ 8 iterations:	0.0000001960
Loss @ 9 iterations:	0.0000000706
Loss @ 10 iterations:	0.0000000254
Loss @ 11 iterations:	0.0000000092
Loss @ 12 iterations:	0.0000000033
Loss @ 13 iterations:	0.0000000012
Loss @ 14 iterations:	0.0000000004
Loss @ 15 iterations:	0.0000000001
Loss @ 16 iterations:	0.0000000001
Loss @ 17 iterations:	0.0000000000
Loss @ 18 iterations:	0.0000000000
Loss @ 19 iterations:	0.0000000000
Loss @ 20 iterations:	0.0000000000


We can then look at the final values for $\theta$ and $\theta^2$

In [5]:
theta_val = theta.as_array()
print('Final values:\n\ttheta\t= %.10f\n\ttheta^2\t= %.10f' % (theta_val, theta_val**2))

Final values:
	theta	= 1.4142132998
	theta^2	= 1.9999992572


Let's compare the difference with numpy's value for $\sqrt{2}$

In [6]:
target = np.sqrt(2)
print('Distance to sqrt(2): %.1e' % np.abs(target-theta_val))

Distance to sqrt(2): 2.6e-07


Not too shabby!