# Neural network - First steps with Theano

In [1]:
%pylab
%matplotlib inline

%config InlineBackend.figure_format = 'retina'

import numpy as np

Using matplotlib backend: Qt5Agg
Populating the interactive namespace from numpy and matplotlib


### Theano installation

If you don't have `Theano` installed, you have to install it like that:

    conda install theano

### Import Theano 

In [2]:
env MKL_THREADING_LAYER=GNU

env: MKL_THREADING_LAYER=GNU


In [3]:
import theano
import theano.tensor as T

### Symbolic variables

In [4]:
# Create a symbolic variable like scalar
x = T.scalar('x')
# Create another symbolic variable like the last variable squared
y = x ** 2

# None of these variables have a value, because they are symbolic
# Evaluate the symbolic variable with x=2 
y.eval({x : 2})

array(4.0)

In [5]:
# Create two symbolic variables to make another symbolic variable more complex
x = T.scalar('x')
y = T.scalar('y')

# Create the symbolic variable
z = 2 * x + 3 * y
# Evaluate the variable
z.eval({x : 1, y : 10})

array(32.0)

Apart from `scalar`, there are many kind of symbolic variables:

- `scalar`: a scalar variable.
- `vector`: a vector variable.
- `matrix`: a matrix 2x2 variable.
- `row`: a row matrix.
- `col`: a column matrix.
- `tensor3`: a matrix 3x3 variable.
- `tensor4`: a matrix 4x4 variable.

### Variable's evaluation with functions

`eval()` method is not used. For variables evaluation, we are going to use the functions:

In [6]:
# With the same example:
x = T.scalar('x')
y = T.scalar('y')

# Create the symbolic variable
z = 2 * x + 3 * y

# Create the function
f = theano.function(inputs = [x, y], outputs = z)
f(1, 10)

array(32.0)

In [7]:
# We can use the Python standard functions like this:
def sum_vars(a, b) :
    return a + b

z = sum_vars(x, y)
f = theano.function(inputs = [x, y], outputs = z)
f(1, 2)

array(3.0)

In [8]:
# Also, another functions:
y = cos(x)
f = theano.function(inputs = [x], outputs = y)
f(0)

array(1.0)

### Conditional operations 

The standard conditional operations of Python are not used in `Theano`. For that, we are going to use the functions: `T.switch()` or `T.ifelse()`, and the Theano's comparation functions:

- `T.lt()`
- `T.le()`
- `T.gt()`
- `T.ge()`
- `T.and_()`
- `T.or_()`

In [9]:
# Let's create the absolute value function:
x = T.scalar('x')

y = T.switch(T.gt(x, 0), x, -x)

Tabs = theano.function(inputs = [x], outputs = y)

print("Tabs(3) = ", Tabs(3))
print("Tabs(-3) = ", Tabs(-3))

Tabs(3) =  3.0
Tabs(-3) =  3.0


The `T.switch()` and `T.ifelse()` functions are the same, but `T.switch()` executes the two sentences before do the comparation. This makes the execute's time are higher than `T.ifelse()`. Let's check this:

In [10]:
import time
from theano.ifelse import ifelse

In [11]:
# Create the variables
a, b = T.scalars('a', 'b')
x, y = T.matrices('x', 'y')

# Create the conditionals sentences
z_switch = T.switch(T.lt(a, b), T.mean(x), T.mean(y))
z_ifelse = ifelse(T.lt(a, b), T.mean(x), T.mean(y))

# Create the functions
f_switch = theano.function(inputs = [a, b, x, y], outputs = z_switch)
f_ifelse = theano.function(inputs = [a, b, x, y], outputs = z_ifelse)


# Create data
val1 = 0.
val2 = 1.
big_mat = np.ones((15000, 15000))

# Execute with switch
tic = time.clock()
f_switch(val1, val2, big_mat, big_mat)
print('The time using switch is:', (time.clock() - tic))

# Execute with ifelse
tic = time.clock()
f_ifelse(val1, val2, big_mat, big_mat)
print('The time using ifelse is:', (time.clock() - tic))

The time using switch is: 0.47379799999999994
The time using ifelse is: 0.23953200000000008


As we can observe, the time using `T.switch()` is the double than the time using `T.ifelse()`, because it execute both sentences before make the comparation.

### Default values

Theano allows us indicate default values in the functions, like this:

In [12]:
x, y = T.scalars('x', 'y')

z = x * y

f = theano.function(inputs = [x, theano.In(y, value = 3)], outputs= z)
print(f(10))
print(f(10,2))

30.0
20.0


### Shared variables 

In the last examples, we don't indicate the value of each variable. However, there are times that indicate the variable's values is necessary. For that, `Theano` gives us the `shared variables`:

In [13]:
# Create the shared variable indicating its type
x = theano.shared(np.array(1, dtype = theano.config.floatX))

# Create a scalar
A = T.scalar()

# Create the function
f = theano.function(inputs = [A], 
                    outputs = x, 
                    updates = {x : x - A}) # With 'updates', we indicate how update the shared value

print(f(np.array(1)))
print(x.get_value()) # Get value of shared variable

1.0
0.0


As we can observe, first it returns the outputs values and after it updates the shared variables. This is so important.

In [14]:
# Let's see another example:
x = theano.shared(np.array([[1, 2], [3, 4]], dtype = theano.config.floatX))
A = T.matrix()
f = theano.function(inputs = [A], outputs = x, updates = {x: x - A})

print(f(np.array([[1, 1], [1, 1]])))
print(x.get_value())

[[ 1.  2.]
 [ 3.  4.]]
[[ 0.  1.]
 [ 2.  3.]]


### Matrices operations 

Theano is so efficient with the matrices operations, so, let's use it implementing the operation:
$$ x = v \times W + b$$

In [15]:
W = T.matrix()
v = T.vector()
b = T.vector()

# Use the 'dot' function to make the matrices product
x = T.dot(v, W) + b 

f = theano.function(inputs = [v, W, b], outputs = x)
f([1,1], [[2,4],[3,5]], [2, 3])

array([  7.,  12.])

### Gradient function 

A advantage of `Theano` is that it allows us make the gradient function to any function, without if the function is easier or more complex. For that, `Theano` gives us the `T.grad()` function:

In [16]:
# Let's use a simple example
x = T.scalar()
y = x ** 2

# y_grad = dy/dx
y_grad = T.grad(y, x)

# dy/dx = 2x
y_grad.eval({x : 10})

array(20.0)

### Using `Theano` for estimate a linear regression 

In [17]:
# Create the train data, for estimate the y = 2x + 10 linear regression
trX = np.linspace(-1, 1, 101)
trY = 2 * trX + np.random.randn(*trX.shape) * 0.5 + 10

# Create the variables
X = T.scalar()
Y = T.scalar()

# Create the model
def model(X, w, c) :
    return w * X + c

# Create the variables to estimate
w = theano.shared(np.asarray(0., dtype = theano.config.floatX))
c = theano.shared(np.asarray(0., dtype = theano.config.floatX))
y = model(X, w, c)

# Create the cost functions like RMSE
cost = T.mean(T.sqr(y - Y))

# Create the gradient of each variable using the cost
gradient_w = T.grad(cost = cost, wrt = w)
gradient_c = T.grad(cost = cost, wrt = c)

# Create the update function with the learned_rate
learned_rate = 0.01
updates = [[w, w - gradient_w * learned_rate], [c, c - gradient_c * learned_rate]]

# Create the function to train the model
train = theano.function(inputs = [X, Y],
                        outputs = cost,
                        updates = updates)


# Train the model with 15 iterations:
for i in range(15) :
    for x, y in zip(trX, trY) :
        cost_i = train(x, y)
    
    print('In the step', i, '\n\tw value:', w.get_value(), 
            '\n\tc value:', c.get_value(), '\n\twith a cost:', cost_i)

In the step 0 
	w value: -0.8221021448232505 
	c value: 9.65237606595713 
	with a cost: 8.609280468825466
In the step 1 
	w value: 0.36946501346751454 
	c value: 10.377253126985792 
	with a cost: 0.8795841520802172
In the step 2 
	w value: 1.1757373611025022 
	c value: 10.215922512143077 
	with a cost: 0.07078109831206114
In the step 3 
	w value: 1.6090842956416629 
	c value: 10.067365519899944 
	with a cost: 0.0009369119092350618
In the step 4 
	w value: 1.8313476775888404 
	c value: 9.983782915080479 
	with a cost: 0.030648860518913155
In the step 5 
	w value: 1.9440745550926066 
	c value: 9.940466962171929 
	with a cost: 0.0611925136029158
In the step 6 
	w value: 2.0010878271859944 
	c value: 9.918442217430444 
	with a cost: 0.08055242368342794
In the step 7 
	w value: 2.0299029720179305 
	c value: 9.907295804724823 
	with a cost: 0.09133845041495778
In the step 8 
	w value: 2.044463909839546 
	c value: 9.901661385078475 
	with a cost: 0.09704531754099789
In the step 9 
	w value: 2

We can observe that with 15 iterations, the model is aproximated to the real values.