# More examples

Following: http://deeplearning.net/software/theano/tutorial/examples.html

See also: [Basic Tensor Functionality](http://deeplearning.net/software/theano/library/tensor/basic.html#libdoc-basic-tensor)

## Logistic function

Suppose we wish to compute the logistic curve:

\begin{equation}
s(x) = \frac{1}{1 + e^{-x}}
\end{equation}

_elementwise_ on a matrices of doubles.

In [1]:
import theano
import theano.tensor as T

In [2]:
x = T.dmatrix('x')
s = 1 / (1 + T.exp(-x))
logistic = theano.function([x], s)

In [3]:
logistic([[0, 1], [-1, -2]])

array([[ 0.5       ,  0.73105858],
       [ 0.26894142,  0.11920292]])

It is also true that:

\begin{equation}
s(x) = \frac{1}{1 + e^{-x}} = \frac{1 + \tanh (x/2)}{2}
\end{equation}

In [4]:
s2 = (1 + T.tanh(x / 2)) / 2
logistic2 = theano.function([x], s2)

In [5]:
logistic2([[0, 1], [-1, -2]])

array([[ 0.5       ,  0.73105858],
       [ 0.26894142,  0.11920292]])

## Computing more than one thing at the same time

Theano supports function with multiple outputs. We can keep the _elementwise_ difference, absolute difference, and squared difference between two matrices `a` and `b` all at the same time as an example.

In [6]:
a, b = T.dmatrices('a', 'b')
diff = a - b
abs_diff = abs(diff)
diff_squared = diff ** 2
f = theano.function([a, b], [diff, abs_diff, diff_squared])

**Note**: `dmatrices()` produces as many outputs as names you provide.

In [7]:
f([[1, 1], [1, 1]], [[0, 1], [2, 3]])

[array([[ 1.,  0.],
        [-1., -2.]]), array([[ 1.,  0.],
        [ 1.,  2.]]), array([[ 1.,  0.],
        [ 1.,  4.]])]

## Setting a default value for an argument

Suppose we want a function that adds two numbers, but, if we only provide one number, it assumes the other number is 1.

In [8]:
from theano import Param
from theano import function

In [9]:
x, y = T.dscalars('x', 'y')
z = x + y
f = function([x, Param(y, default=1)], z)

In [10]:
f(33)

array(34.0)

In [11]:
f(33, 2)

array(35.0)

Inputs with default values must follow inputs without default values. There may be multiple inputs with default values. We may also set parameters positionally or by name, as in standard Python:

In [12]:
x, y, w = T.dscalars('x', 'y', 'w')
z = (x + y) * w
f = function([x, Param(y, default=1), Param(w, default=2, name='w_by_name')], z)

In [13]:
f(33)

array(68.0)

In [14]:
f(33, 2)

array(70.0)

In [15]:
f(33, 0, 1)

array(33.0)

In [16]:
f(33, w_by_name=1)

array(34.0)

In [17]:
f(33, w_by_name=1, y=0)

array(33.0)

**Note**: `Param` does not know the name of the local variables `y` and `w` that are passed as arguments. The symbolic variable objects have name attributes (set by `dscalars()` above) and _these_ are the names of the keyword parameters in the functions we build. We can override those names though, as seen above.

See [Function](http://deeplearning.net/software/theano/library/compile/function.html#usingfunction) in more detail.

## Using shared variables

It is also possible to make a function with internal state. Suppose we want an accumulator - initially it is zero, but after each function call, we increment the state by the value of the function argument.

In [18]:
from theano import shared
state = shared(0)
inc = T.iscalar('inc')
accumulator = function([inc], state, updates=[(state, state + inc)])

`shared` constructs [shared variables](http://deeplearning.net/software/theano/library/compile/shared.html#libdoc-compile-shared).
These are hybrid symbolic and non-symbolic variables whose values may be shared between multiple functions. The value may be accessed and modified with `.get_value()` and `.set_value()`.

`updates` must be supplied with a list of pairs of the form (shared variable, new expression). It may also be a dictionary whose keys are shared variables and whose values are the new expressions. In both cases, the meaning is "whenever this function runs, it repalces the `.value` of each shared variable with the result of the corresponding expression."

In [19]:
state.get_value()

array(0)

In [20]:
accumulator(1)

array(0)

In [21]:
state.get_value(1)

array(1)

In [22]:
accumulator(300)

array(1)

In [23]:
state.get_value()

array(301)

We may also `.set_value()`:

In [24]:
state.set_value(-1)

In [25]:
accumulator(3)

array(-1)

In [26]:
state.get_value()

array(2)

We may define more than one function to use the same shared variable:

In [27]:
decrementor = function([inc], state, updates=[(state, state - inc)])

In [28]:
decrementor(2)

array(2)

In [29]:
state.get_value()

array(0)

The update mechanism can be a syntactic convenience, but it is mainly for efficiency. Updates to shared variables can sometimes be faster using in-place algorithms. Also, Theano has more control over where and how shared variables are allocated, and this is important for getting good performance on a [GPU](http://deeplearning.net/software/theano/tutorial/using_gpu.html#using-gpu).

We may have expressed some formula using a shared variable in a case where we don't want to use its value. In thise case, we use the `givens` parameter, which replaces a particular node in a graph for the purpose of one particular function.

In [30]:
fn_of_state = state * 2 + inc
# the type of foo must match the shared variable we are replacing with the 'givens'
foo = T.scalar(dtype=state.dtype)
skip_shared = function([inc, foo], fn_of_state, givens=[(state, foo)])

In [31]:
skip_shared(1, 3)   # we're using 3 (foo) for the state, not state.value

array(7)

In [32]:
state.get_value()   # old state still there, we didn't use it

array(0)

`givens` may be used to replace any symbolic variable, not just a shared variable. Be careful however, to not let the expressions introduced by a `givens` substitution be co-dependent. The order of substitution is not defined, so the substitution needs to work in any order.

In practice, a good way of thinking about `givens` is as a mechanism for replacing any part of our formula with a different expression that evaluates to a tensor of the same shape and `dtype`.

**Note**: Theano shared variables broadcast patterns default to `False` for each dimension. Shared variable sizes can change over time, so we can't use the shape to find a broadcastable pattern. If you want a different pattern, just pass it as a parameter `theano.shared(..., broadcastable=(True, False))`.

## Using random numbers

Because we first express everything in Theano symbolically and then compile the expressions into functions, using pseudo-random numbers is not as straightforward as it is with NumPy, etc.

The way to think about adding randomness to Theano's computations is to put random variables in our graph. Theano will allocate a NumPy `RandomStream` object for each such variable, and use it as needed. These _random streams_ are essentially shared variables, so observations there hold here as well.

See [RandomStreams](http://deeplearning.net/software/theano/library/tensor/shared_randomstreams.html#libdoc-tensor-shared-randomstreams) and [RandomStreamsBase](http://deeplearning.net/software/theano/library/tensor/raw_random.html#libdoc-tensor-raw-random).

### Brief example

In [33]:
from theano.tensor.shared_randomstreams import RandomStreams

In [34]:
srng = RandomStreams(seed=234)
rv_u = srng.uniform((2, 2))    # 2x2 matrices
rv_n = srng.normal((2, 2))
f = function([], rv_u)
g = function([], rv_n, no_default_updates=True)   # not updating rv_n.rng
nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)

In [35]:
f_val0 = f()
f_val1 = f()
print f_val0, f_val1

[[ 0.12672381  0.97091597]
 [ 0.13989098  0.88754825]] [[ 0.31971415  0.47584377]
 [ 0.24129163  0.42046081]]


In [36]:
g_val0 = g()
g_val1 = g()
print g_val0
print g_val1

[[ 0.37328447 -0.65746672]
 [-0.36302373 -0.97484625]]
[[ 0.37328447 -0.65746672]
 [-0.36302373 -0.97484625]]


A random variable is drawn at most once during any single function execution.

In [37]:
nearly_zeros()

array([[ 0.,  0.],
       [ 0.,  0.]])

### Seeding streams

Random variables may be seeded collectively or individually. We can seed just one r.v. by seeding or assigning to the `.rng` attribute with `.rng.set_value()`.

In [38]:
rng_val = rv_u.rng.get_value(borrow=True)  # get rng for rv_u
print rng_val                              #
rng_val.seed(89234)                        # seed the generator
rv_u.rng.set_value(rng_val, borrow=True)   # assign back seeded rng

<mtrand.RandomState object at 0x107d33a10>


We can seed _all_ the r.v.s allocated by a `RandomStream` object using that object's `seed` method. This seed will seed a temporary random number generator that will in turn create seeds for each of the r.v.s.

In [39]:
srng.seed(902340)    # seed rv_u and rv_n with different seeds each

### Sharing streams between functions

The random number generators used for random variables are common between functions, as we expect for shared variables. For example, we can capture a seed state, make a call, and make another call (using a different function that shares the same generator), and watch the underlying seed evolve in the same way twice:

In [40]:
state_after_v0 = rv_u.rng.get_value().get_state()

In [41]:
nearly_zeros()       # this affects rv_u's generator

array([[ 0.,  0.],
       [ 0.,  0.]])

In [42]:
v1 = f()

In [43]:
rng = rv_u.rng.get_value(borrow=True)

In [44]:
rng.set_state(state_after_v0)

In [45]:
rv_u.rng.set_value(rng, borrow=True)

In [46]:
v2 = f()   # v2 != v1

In [47]:
v1

array([[ 0.5025809 ,  0.99544429],
       [ 0.75073355,  0.17926032]])

In [48]:
v2

array([[ 0.33919835,  0.85344878],
       [ 0.14881562,  0.79659413]])

In [49]:
v3 = f()   # v3 == v1

In [50]:
v3

array([[ 0.5025809 ,  0.99544429],
       [ 0.75073355,  0.17926032]])

In [51]:
v3 - v1

array([[ 0.,  0.],
       [ 0.,  0.]])

### Copying random state betweenTheano graphs

Sometimes we may wish to transfer the "state" of all random number generators from one graph to another. For example, if we are trying to initialize the state of a model from the parameters of a pickled version of a previous model. For `theano.tensor.shared_randomstreams.RandomStreams` and `theano.sandbox.rng_mrg.MRG_RandomStreams` this may be achieved by copying elements of the `state_updates` parameter.

Every time a random variable is drawn from a `RandomStreams` object, a tuple is added to the `state_updates` list. The first element is a shared variable that represents the state of the random number generator associated with the _particular_ variable, while the second represents the Theano graph corresponding to the random number generation process.

In [52]:
from __future__ import print_function

In [53]:
from theano.sandbox.rng_mrg import MRG_RandomStreams
from theano.tensor.shared_randomstreams import RandomStreams

In [54]:
class Graph():
    def __init__(self, seed=123):
        self.rng = RandomStreams(seed)
        self.y = self.rng.uniform(size=(1,))

In [55]:
g1 = Graph(seed=123)
f1 = function([], g1.y)

In [56]:
g2 = Graph(seed=987)
f2 = function([], g2.y)

In [57]:
# by default, the functions are out of sync
print(f1())
print(f2())

[ 0.72803009]
[ 0.55056769]


In [58]:
def copy_random_state(g1, g2):
    if isinstance(g1.rng, MRG_RandomStreams):
        g2.rng.rstate = g1.rng.rstate
    for (su1, su2) in zip(g1.rng.state_updates, g2.rng.state_updates):
        su2[0].set_value(su1[0].get_value())

In [59]:
# now copy the state of the rng's
copy_random_state(g1, g2)

In [60]:
print(f1())
print(f2())

[ 0.59044123]
[ 0.59044123]


### Other random distriubtions

See http://deeplearning.net/software/theano/library/tensor/raw_random.html#libdoc-tensor-raw-random

### Other implementations

See

http://deeplearning.net/software/theano/library/sandbox/rng_mrg.html#libdoc-rng-mrg

and

http://deeplearning.net/software/theano/library/sandbox/cuda/op.html#module-theano.sandbox.cuda.rng_curand

## A real example: logistic regression

In [61]:
cat logistic_regression_example.py

#!/usr/bin/env python
"""
See:
    http://deeplearning.net/software/theano/tutorial/examples.html
"""
from __future__ import print_function
import numpy as np
import theano
import theano.tensor as T


rng = np.random

N = 400
feats = 784
D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2))
training_steps = 10000

# declare Theano symbolic variables
x = T.matrix('x')
y = T.vector('y')
w = theano.shared(rng.randn(feats), name='w')
b = theano.shared(0., name='b')
print("Initial model:")
print(w.get_value())
print(b.get_value())

# construct the Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b))            # P(target = 1)
prediction = p_1 > 0.5                             # prediction threshold
xent = -y * T.log(p_1) - (1 - y) * T.log(1 - p_1)  # cross-entry loss func.
cost = xent.mean() + 0.01 * (w ** 2).sum()         # cost func. to minimize
gw, gb = T.grad(cost, [w, b])                      # gradient of cost

# compile
train = th