# Introduction to Theano

This notebook contains the code snippets from the slides, so you can execute them and tinker with those examples.

To execute a cell: Ctrl-Enter.

The code was executed with the default configuration of Theano: `floatX=float64`, `device=cpu`. Force this configuration, by executing the following cell.

In [1]:
import os
#os.environ['THEANO_FLAGS'] = 'floatX=float64,device=cpu'
os.environ['THEANO_FLAGS'] = 'floatX=float32,device=gpu'

In [None]:
from theano import function, config, shared, tensor
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], tensor.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in range(iters):
    r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, tensor.Elemwise) and
              ('Gpu' not in type(x.op).__name__)
              for x in f.maker.fgraph.toposort()]):
    print('Used the cpu')
else:
    print('Used the gpu')

# Theano concepts

## Symbolic inputs

The symbolic inputs that you operate on are **Variables** and what you get from applying various **Ops** to these inputs are also Variables. A Variable is the main data structure you work with. A **Type** in Theano represents a set of constraints on potential data objects. These constraints allow Theano to tailor C code to handle them and to statically optimize the computation graph. The Type of both `x` and `y` is `vector`. Here is [the complete list of types](http://deeplearning.net/software/theano/library/tensor/basic.html#all-fully-typed-constructors).

In [2]:
import numpy as np
import theano
import theano.tensor as T
x = T.dmatrix('x')
y = T.dmatrix('y')

 https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: GeForce GTX TITAN Black (CNMeM is disabled, cuDNN 6021)


## Operation

An **Op** defines a certain computation on some types of inputs, producing some types of outputs. From a list of input Variables and an Op, you can build an **Apply** node representing the application of the Op to the inputs.

An Apply node is a type of internal node used to represent a computation graph.
It represents the application of an Op on one or more inputs, where each input is a Variable. By convention, each Op is responsible for knowing how to build an Apply node from a list of inputs.

In [3]:
z = x + y

![image.png](attachment:image.png)

## Functions

`theano.function` is the interface for compiling graphs into callable objects. When `theano.function` is executed, the computation graph is optimized and theano generates an efficient code in C (with calls to CUDA if the gpu flag is set). This is totally transparent to the user, except for the different compilation modes.
The mode argument controls the sort of optimizations that will be applied to the graph, and the way the optimized graph will be evaluated. These modes are:
- `FAST_COMPILE`: Apply just a few graph optimizations and only use Python implementations. So GPU is disabled.

- `FAST_RUN`: Apply all optimizations and use C implementations where possible.

- `DebugMode`: Verify the correctness of all optimizations, and compare C and Python implementations. This mode can take much longer than the other modes, but can identify several kinds of problems.

The default is typically `FAST_RUN` but this can be changed in `theano.config.mode`.

In [None]:
a = np.random.randn(1, 3)
b = np.random.randn(1, 3)

# theano.function([inputs], [outputs])
f = theano.function([x, y], z)
f(a,b)

## Shared variables

A **Shared Variable** is a hybrid symbolic and non-symbolic variable whose value may be shared between multiple functions. Shared variables can be used in symbolic expressions but they also have an internal value that defines the value taken by this symbolic variable in all the functions that use it. It is called a shared variable because its value is shared between many functions. The value can be accessed and modified by the .get_value() and .set_value() methods.

In [None]:
import numpy as np
np.random.seed(42)

W_val = np.random.randn(4, 3)
b_val = np.ones(3)

W = theano.shared(W_val)
b = theano.shared(b_val)

W.name = 'W'
b.name = 'b'

In [None]:
print(W.get_value())
print('Before ', b.get_value())
# b.set_value(1) --  Type error, must be a numpy array of shape (3,)
# b.set_value(np.array([[1,2],[3,4]])) # Type error, must be a numpy array of shape (3,)
b.set_value(np.array([1,2,3]))
print('After ', b.get_value())

## Shared variables and functions

Shared variables can be used to represent an internal state of a function. In order to modify this internal state, the function has an argument called `updates`, which takes an iterable over pairs (shared_variable, new_expression) List, tuple or dict.

Note in the following that `state` is an implicit input of the function `accumulator`.

In [None]:
state = theano.shared(0)
inc = T.iscalar('inc')
accumulator = theano.function([inc], state, updates=[(state, state+inc)])

The function is evaluated and then, the update mechanism is executed.

In [None]:
print('First call to accumulator {}:'.format(accumulator(1)))
print('Second call to accumulator {}:'.format(accumulator(10)))
print('Third call to accumulator {}:'.format(accumulator(100)))

## A regression toy example
### Build a simple model
The following is a simple linear transformation (out = Wx +b) followed by a nonlinearity (theano.sigmoid).

In [None]:
x = T.vector('x')
y = T.vector('y')

W_val = np.random.randn(4, 3)
b_val = np.ones(3)
W = theano.shared(W_val)
b = theano.shared(b_val)
W.name = 'W'
b.name = 'b'

dot = T.dot(x, W)
out = T.nnet.sigmoid(dot + b)

predict = theano.function([x], out)
x_val = np.random.rand(4).astype(np.float32)
result = predict(x_val)
print(result)

In order to train the model, we define a cost function that will evaluate how far the model is from the target.

In [None]:
C = ((out - y) ** 2).sum()
C.name = 'C'
error = theano.function([out, y], C)

y_val = np.random.uniform(size=3).astype(np.float32)
print(error([0.942, 0.737, 0.676], y_val))

## Automatic differentiation

Now that the graph is defined, we can compute the gradient of the cost C w.r.t some parameters (W,b). The gradient must be applied to a scalar expression, e.g., the cost C.

In [None]:
# theano.grad(exp, [Variable])
dC_dW, dC_db = theano.grad(C, [W, b])

Now that we can compute the gradients, we define the gradient descent update rule.

In [None]:
upd_W = W - 1 * dC_dW
upd_b = b - 1 * dC_db

Finally, we compile the expressions and the update rules.

In [None]:
train = theano.function([x, y], C,
                        updates=[(W, upd_W),
                                 (b, upd_b)])
print(b.get_value())
print(W.get_value())

We iterate the gradient descent update rule in order to minimize the cost.

In [None]:
for i in range(25):
    C = train(x_val, y_val)
    print('Cost {:} at iteration {}'.format(C,i))
print(b.get_value())
print(W.get_value())

# Visualization and debugging
## Graph visualization
### Comparing `out` with `predict`

In [None]:
from theano.printing import pydotprint
from IPython.display import Image, SVG

In [None]:
Image(pydotprint(out, format='png', compact=False, return_image=True))

In [None]:
Image(pydotprint(out, format='png', return_image=True))

In [None]:
Image(pydotprint(predict, format='png', return_image=True))

### Comparing `upd_*` with `train`

In [None]:
Image(pydotprint([upd_W, upd_b], format='png', return_image=True), width=1000)

In [None]:
Image(pydotprint(train, format='png', return_image=True), width=1000)

### `debugprint`

In [None]:
from theano.printing import debugprint
debugprint(out)

In [None]:
debugprint(predict)