# Theano Tutorials - Basics
This is collection of tutorials for Theano

Let's import `theano` library

In [1]:
from theano import *

Several of the symbols you will need to use are in the tensor subpackage of Theano. Let us import that subpackage under a handy name like T (the tutorials will frequently use this convention).

In [2]:
import theano.tensor as T

## 1. Numpy refresher

Here are some quick guides to NumPy:
- [Numpy quick guide for Matlab users](http://www.scipy.org/NumPy_for_Matlab_Users)
- [Numpy User Guide](http://docs.scipy.org/doc/numpy/user/index.html)
- [More detailed Numpy tutorial](http://www.scipy.org/Tentative_NumPy_Tutorial)
- [100 NumPy exercises](https://github.com/rougier/numpy-100)
- [Numpy tutorial](http://www.labri.fr/perso/nrougier/teaching/numpy/numpy.html)

### Matrix conventions for machine learning

Rows are horizontal and columns are vertical. Every row is an example. Therefore, inputs[10,5] is a matrix of 10 examples where each example has dimension 5. If this would be the input of a neural network then the weights from the input to the first hidden layer would represent a matrix of size (5, #hid).

Consider this array:

In [3]:
import numpy as np

In [4]:
a = np.asarray([[1., 2], [3, 4], [5, 6]])

In [5]:
a.shape

(3, 2)

This is a 3x2 matrix, i.e. there are 3 rows and 2 columns.

To access the entry in the 3rd row (row #2) and the 1st column (column #0)

In [6]:
a[2, 0]

5.0

### Broadcasting

Numpy does broadcasting of arrays of different shapes during arithmetic operations. What this means in general is that the smaller array (or scalar) is broadcasted across the larger array so that they have compatible shapes. The example below shows an instance of broadcastaing:

In [7]:
a = np.asarray([1.0, 2.0, 3.0])
b = 2.0

In [12]:
a * b

array([ 2.,  4.,  6.])

The smaller array b (actually a scalar here, which works like a 0-d array) in this case is broadcasted to the same size as a during the multiplication. This trick is often useful in simplifying how expression are written. More detail about broadcasting can be found in the [numpy user guide](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).

## 2. Basics

### Baby steps - Algebra

#### Adding two scalars

To get us started with Theano and get a feel of what we’re working with, let’s make a simple function: add two numbers together. Here is how you do it:

In [13]:
import numpy as np
import theano.tensor as T
from theano import  function

In [14]:
x = T.dscalar('x')
y = T.dscalar('y')
z = x + y
f = function([x, y], z)

And now that we’ve created our function we can use it:


In [15]:
f(2, 3)

array(5.0)

In [16]:
np.allclose(f(18.4, 12.1), 30.5)

True

Let’s break this down into several steps. The first step is to define two symbols (Variables) representing the quantities that you want to add. Note that from now on, we will use the term Variable to mean “symbol” (in other words, x, y, z are all Variable objects). The output of the function f is a numpy.ndarray with zero dimensions.

In [17]:
from theano import pp
print(pp(z))

(x + y)


#### Adding two Matrices

In [18]:
x = T.dmatrix('x')
y = T.dmatrix('y')
z = x + y
f = function([x, y], z)

In [19]:
f([[1, 2], [3, 4]], [[10, 20], [30, 40]])

array([[ 11.,  22.],
       [ 33.,  44.]])

#### Exercise

In [20]:
a = T.fvector('a')
b = T.fvector('a')
out = a**2 + b**2 + 2*a*b
f = function([a,b], out)

In [21]:
f([0,1,2], [2,3,4])

array([  4.,  16.,  36.], dtype=float32)

### More examples

#### Logistic Function

Logistic curve is given by:
$$s(x) = \frac{1}{1 + e^{-x}}$$

In [22]:
x = T.dmatrix('x')
s = 1 / (1 + T.exp(-x))
logistic = function([x], s)

In [23]:
logistic([[0,1],[-1, -2]])

array([[ 0.5       ,  0.73105858],
       [ 0.26894142,  0.11920292]])

The Logistic function can also be expressed as:
$$s(x) = \frac{1}{1 + e^{-x}} = \frac{1 + tanh(x/2}{2}$$

We can verify that this alternate form produces the same values:

In [24]:
s2 = (1 + T.tanh(x / 2)) / 2
log2 = function([x], s2)

In [25]:
log2([[0, 1], [-1, -2]])

array([[ 0.5       ,  0.73105858],
       [ 0.26894142,  0.11920292]])

#### Computing More than one thing at the same time

Theano supports functions with multiple outputs. For example, we can compute the elementwise difference, absolute difference, and squared difference between two matrices a and b at the same time:

In [26]:
a, b = T.matrices('a', 'b')
diff = a - b
abs_diff = abs(diff)
diff_squared = diff**2
f = function([a, b], [diff, abs_diff, diff_squared])

In [27]:
f([[1,1],[1,1]],[[0,1],[2,3]])

[array([[ 1.,  0.],
        [-1., -2.]]), array([[ 1.,  0.],
        [ 1.,  2.]]), array([[ 1.,  0.],
        [ 1.,  4.]])]

#### Setting a Default Value for an Argument



Let’s say you want to define a function that adds two numbers, except that if you only provide one number, the other input is assumed to be one. You can do it like this:

In [28]:
from theano import In

In [30]:
x, y = T.dscalars('x', 'y')
z = x + y
f = function([x, In(y, value=1)], z)

In [31]:
f(33)

array(34.0)

In [32]:
f(33, 2)

array(35.0)

Inputs with default values must follow inputs without default values (like Python’s functions). There can be multiple inputs with default values. These parameters can be set positionally or by name, as in standard Python:


In [33]:
x, y, w = T.dscalars('x', 'y', 'w')
z = (x + y) * w
f = function([x, In(y, value=1), In(w, value=2, name='w_by_name')], z)

In [34]:
f(33)

array(68.0)

In [35]:
f(33, 2)

array(70.0)

In [36]:
f(33, 0, 1)

array(33.0)

In [37]:
f(33, w_by_name=1)

array(34.0)

In [38]:
f(33, w_by_name=1, y=0)

array(33.0)

#### Using Shared Variables