# More Examples

Theano의 기본 object, operation에 대해 이제 익숙하신가요? 아니라면!
[Basic Tensor Functionality](http://deeplearning.net/software/theano/library/tensor/basic.html#libdoc-basic-tensor) 복습!!

## Logistic Function

![logistic](http://deeplearning.net/software/theano/_images/math/943718fb001e2e9576e781d97946d74e44de5251.png)

![logistic2](http://deeplearning.net/software/theano/_images/logistic.png)
<center>Plot of logistic function</center>

###Elementwise

fucntion을 elementwise 하게 compute한다 = matrix의 각 element에 function을 각각 적용한다는 의미.  

In [25]:
import theano.tensor as T
from theano import function

In [26]:
x = T.dmatrix('x')

In [44]:
s = 1/(1 + T.exp(-x)) # logistic function

In [29]:
logistic = function([x],s) 

In [31]:
logistic([[0,1],[-1,-2]])

array([[ 0.5       ,  0.73105858],
       [ 0.26894142,  0.11920292]])

logistic이 elementwise하게 수행되는 이유는 logistic함수의 모든 operation(division, addition, exponentiation)이 elementwise operation이기 때문이다. 

다음의 경우도 마찬가지이다

![logistic](http://deeplearning.net/software/theano/_images/math/d27229dfcd1ce305c126bbbd8a2e0fa867ccc503.png)

두 식이 같은 값을 내는지 확인해봅시다~

In [43]:
s2 = (1+T.tanh(x/2))/2  # alternative logistic function

In [34]:
logistic2 = function([x],s2)

In [35]:
logistic2([[0,1],[-1,-2]])

array([[ 0.5       ,  0.73105858],
       [ 0.26894142,  0.11920292]])

In [46]:
# produce the same values~!!

## Computing More than one Thing at the same Time

theano는 multiple output을 가지는 function을 지원합니다.  


In [36]:
a,b = T.dmatrices('a','b')

In [53]:
diff = a-b                # (1)

In [54]:
abs_diff = abs(diff)      # (2)

In [55]:
diff_squared = diff*2     # (3)

In [40]:
f = function([a,b],[diff,abs_diff, diff_squared])

*dmatrices* produces as many outputs as names that you provide.   
It is a shortcut for allocating symbolic variables that we will often use in the tutorials.

In [48]:
f([[1, 1], [1, 1]], [[0, 1], [2, 3]])

[array([[ 1.,  0.],
        [-1., -2.]]), array([[ 1.,  0.],
        [ 1.,  2.]]), array([[ 2.,  0.],
        [-2., -4.]])]

## Setting a Default Value for an Argument

두 개의 숫자를 더하는 function을 정의해봅시다.   
단, 하나의 숫자만 제공할 경우는, 다른 input은 1로 가정합니다.

In [57]:
from theano import Param

In [58]:
x, y = T.dscalars('x', 'y')

In [59]:
z = x + y

In [60]:
f = function([x, Param(y, default=1)], z)

In [63]:
f(33)  # 한 개의 input만 제공하였으므로 두번째 input은 1로 가정하여 33 + 1

array(34.0)

In [64]:
f(33, 2) # 33 + 2

array(35.0)

*Param class*

we give a default value of 1 for y by creating a Param instance with its default field set to 1.

Inputs with default values must follow inputs without default values (like Python’s functions)

 default value 없는 input 다음에 default 값을 가지고 있는 input이 놓인다. (like Python’s functions)

function([x, Param(y, default=1)], z)

In [65]:
x, y, w = T.dscalars('x', 'y', 'w')

In [66]:
z = (x + y) * w

In [67]:
f = function([x, Param(y, default=1), Param(w, default=2, name='w_by_name')], z)

In [70]:
f(33)  # (33 + 1)* 2

array(68.0)

In [74]:
f(33, 2) # (33 + 2) * 2

array(70.0)

In [75]:
f(33, 0, 1) # (33 + 0) * 1

array(33.0)

In [76]:
f(33, w_by_name=1)  # (33 + 1) * 1

array(34.0)

In [78]:
f(33, w_by_name=1, y=0)   # (33 + 0) * 1

array(33.0)

Param은 local variables y, w의 이름을 알지 못하고 argument로 pass 합니다. 
symbolic variable object는  name attributes를 가지고 이것이 우리가 만드는 function에서  keyword paramenter의 이름입니다.

In [80]:
Param(y, default=1)

<theano.compile.pfunc.Param at 0x16644ef0>

In [81]:
Param(w, default=2, name='w_by_name')

<theano.compile.pfunc.Param at 0x16644a90>

You may like to see [Function](http://deeplearning.net/software/theano/library/compile/function.html#usingfunction) in the library for more detail.

## Using Shared Variables

 function with an internal state

accumulator( 누산기)를 만들어봅시다~.   
처음에 0으로 state 초기화하고, 각 function call시, state는 function argument에 의해 증가합니다.

In [82]:
from theano import shared

In [83]:
state = shared(0)

In [84]:
inc = T.iscalar('inc')

In [85]:
accumulator = function([inc], state, updates=[(state, state+inc)])

### new concept 1 : shared variables

The shared function constructs so-called [shared variables](http://deeplearning.net/software/theano/library/compile/shared.html#libdoc-compile-shared)

These are hybrid symbolic and non-symbolic variables : multiple function간에 값을 공유한다. 

shared variable은 dmatrices() 에 의해 return되는 object와 같은 symbolic expression에서 사용 될 수 있지만, internal value 도  갖는다. 

### new concept 2 : updates parameter of function. 

updates는   
1. (shared-variable, new expression) 의 pair형태의 list
2. key : shared variables, values: new expression 의 딕셔너리
의 형태를 가진다.

function 실행할 때 마다 각 shared variable의 .value를 corresponding expression의 결과로 대체한다.

위에서 만들었던 accumulator 함수는 state의 value를 state의 합계와 증가분으로 대체할 것이다. 확인해보자

In [90]:
state.get_value()

array(0)

In [91]:
accumulator(1)

array(0)

In [92]:
state.get_value()

array(1)

In [93]:
accumulator(300)

array(1)

In [94]:
state.get_value()

array(301)

##### .set_value()   
state를 reset한다. 

In [95]:
state.set_value(-1)

In [96]:
accumulator(3)

array(-1)

In [97]:
state.get_value()

array(2)

위에서 이야기했던 것 처럼, 같은 shared variable을 사용하는 하나이상의 function을 정의할 수 있다. 이들 function은 모든 값을 update한다. 

In [98]:
decrementor = function([inc], state, updates=[(state, state-inc)])

In [99]:
decrementor(2)

array(2)

In [100]:
state.get_value()

array(0)

왜 updates mechnism이 존재하는 걸까요?
new expression에 의해 비슷한 결과를 얻을 수 있고 평소대로 numpy에서 그것들로 작업할 수 있습니다. 
updates mechanisms can be a syntactic convenience일 수 있지만, 주로 효율성때문에 쓰입니다. shared variable의 update는 때때로 in-place algorithm을 사용하는 것 보다 빠릅니다! 

또한 theano는 shared variable들이 어디에 어떻게 할당되었는지에 대하여 control을 가집니다. 이는 GPU에서 좋은 성능을 내는데에 중요합니다. 

shared variable을 사용하여 formaula를 표현하지만, 그 값은 사용하고 싶지 않을때, function의 givens parameter를 사용할 수 있으며, 이는 그래프에서 특정한 node를 대신한다. 

In [102]:
fn_of_state = state * 2 + inc

In [103]:
# The type of foo must match the shared variable we are replacing with the ``givens``

In [104]:
foo = T.scalar(dtype=state.dtype)

In [105]:
skip_shared = function([inc, foo], fn_of_state,
                           givens=[(state, foo)])

In [106]:
skip_shared(1, 3)  # we're using 3 for the state, not state.value

array(7)

In [107]:
state.get_value()  # old state still there, but we didn't use it

array(0)

The givens parameter can be used to replace any symbolic variable, not just a shared variable. You can replace constants, and expressions, in general. Be careful though, not to allow the expressions introduced by a givens substitution to be co-dependent, the order of substitution is not defined, so the substitutions have to work in any order.

givens를 이용해서, formula를 같은 shape과 dtype의 tensor를 사용하는 다른 expression으로 바꿀 수 있다. 

Theano shared variable broadcast pattern default to False for each dimensions. Shared variable size can change over time, so we can’t use the shape to find the broadcastable pattern. If you want a different pattern, just pass it as a parameter theano.shared(..., broadcastable=(True, False))

## Using Random Numbers

1. express everything symbolically
2. compile this expression to get functions using pseudo-random numbers

The way to think about putting randomness into Theano’s computations is to put random variables in your graph. Theano will allocate a NumPy RandomStream object (a random number generator) for each such variable, and draw from it as necessary. We will call this sort of sequence of random numbers a random stream. Random streams are at their core shared variables, so the observations on shared variables hold here as well. Theanos’s random objects are defined and implemented in RandomStreams and, at a lower level, in RandomStreamsBase.

NumPy RandomStream object  
a random number generator  
random stream : sequence of random number  

### Brief Example

In [111]:
from theano.tensor.shared_randomstreams import RandomStreams
from theano import function
srng = RandomStreams(seed=234)
rv_u = srng.uniform((2,2))
rv_n = srng.normal((2,2))
f = function([], rv_u)
g = function([], rv_n, no_default_updates=True)    #Not updating rv_n.rng
nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)

Here, ‘rv_u’ represents a random stream of 2x2 matrices of draws from a uniform distribution. Likewise, ‘rv_n’ represents a random stream of 2x2 matrices of draws from a normal distribution. The distributions that are implemented are defined in RandomStreams and, at a lower level, in raw_random. They only work on CPU. See Other Implementations for GPU version.

Now let’s use these objects. If we call f(), we get random uniform numbers. The internal state of the random number generator is automatically updated, so we get different random numbers every time.

In [115]:
f_val0 = f()
f_val0

array([[ 0.44078224,  0.26993381],
       [ 0.14317277,  0.43571539]])

In [116]:
f_val1 = f()  #different numbers from f_val0
f_val1

array([[ 0.86342685,  0.81031029],
       [ 0.86695784,  0.6813093 ]])

### no_default_updates=True

In [120]:
g_val0 = g()  # different numbers from f_val0 and f_val1
g_val0

array([[ 0.37328447, -0.65746672],
       [-0.36302373, -0.97484625]])

In [121]:
g_val1 = g()  # same numbers as g_val0!
g_val1

array([[ 0.37328447, -0.65746672],
       [-0.36302373, -0.97484625]])

An important remark is that a random variable is drawn at most once during any single function execution. So the nearly_zeros function is guaranteed to return approximately 0 (except for rounding error) even though the rv_u random variable appears three times in the output expression.

In [122]:
nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)

## Seeding Streams

Random variables can be seeded individually or collectively.
You can seed just one random variable by seeding or assigning to the .rng attribute, using .rng.set_value().

In [124]:
rng_val = rv_u.rng.get_value(borrow=True)   # Get the rng for rv_u

In [125]:
rng_val.seed(89234)                         # seeds the generator

In [126]:
rv_u.rng.set_value(rng_val, borrow=True)    # Assign back seeded rng

You can also seed all of the random variables allocated by a RandomStreams object by that object’s seed method. This seed will be used to seed a temporary random number generator, that will in turn generate seeds for each of the random variables.

In [127]:
srng.seed(902340)  # seeds rv_u and rv_n with different seeds each

## Sharing Streams Between Functions

As usual for shared variables, the random number generators used for random variables are common between functions. So our nearly_zeros function will update the state of the generators used in function f above.

In [143]:
state_after_v0 = rv_u.rng.get_value().get_state()

In [144]:
nearly_zeros()       # this affects rv_u's generator

array([[ 0.,  0.],
       [ 0.,  0.]])

In [145]:
v1 = f()
v1

array([[ 0.23219826,  0.25305996],
       [ 0.02116774,  0.65845077]])

In [146]:
rng = rv_u.rng.get_value(borrow=True)

In [147]:
rng.set_state(state_after_v0)

In [148]:
rv_u.rng.set_value(rng, borrow=True)

In [149]:
v2 = f()             # v2 != v1
v2

array([[ 0.62720432,  0.90458979],
       [ 0.14363919,  0.89279932]])

In [150]:
v3 = f()             # v3 == v1
v3

array([[ 0.23219826,  0.25305996],
       [ 0.02116774,  0.65845077]])

### Copying Random State Between Theano Graphs

In some use cases, a user might want to transfer the “state” of all random number generators associated with a given theano graph (e.g. g1, with compiled function f1 below) to a second graph (e.g. g2, with function f2). This might arise for example if you are trying to initialize the state of a model, from the parameters of a pickled version of a previous model. For theano.tensor.shared_randomstreams.RandomStreams and theano.sandbox.rng_mrg.MRG_RandomStreams this can be achieved by copying elements of the state_updates parameter.

Each time a random variable is drawn from a RandomStreams object, a tuple is added to the state_updates list. The first element is a shared variable, which represents the state of the random number generator associated with this particular variable, while the second represents the theano graph corresponding to the random number generation process (i.e. RandomFunction{uniform}.0).

An example of how “random states” can be transferred from one theano function to another is shown below.

In [151]:
import theano
import numpy
import theano.tensor as T
from theano.sandbox.rng_mrg import MRG_RandomStreams
from theano.tensor.shared_randomstreams import RandomStreams

In [152]:

class Graph():
    def __init__(self, seed=123):
        self.rng = RandomStreams(seed)
        self.y = self.rng.uniform(size=(1,))


In [153]:

g1 = Graph(seed=123)
f1 = theano.function([], g1.y)

g2 = Graph(seed=987)
f2 = theano.function([], g2.y)


In [154]:

print 'By default, the two functions are out of sync.'
print 'f1() returns ', f1()
print 'f2() returns ', f2()


By default, the two functions are out of sync.
f1() returns  [ 0.72803009]
f2() returns  [ 0.55056769]


In [155]:

def copy_random_state(g1, g2):
    if isinstance(g1.rng, MRG_RandomStreams):
        g2.rng.rstate = g1.rng.rstate
    for (su1, su2) in zip(g1.rng.state_updates, g2.rng.state_updates):
        su2[0].set_value(su1[0].get_value())


In [156]:

print 'We now copy the state of the theano random number generators.'
copy_random_state(g1, g2)
print 'f1() returns ', f1()
print 'f2() returns ', f2()

We now copy the state of the theano random number generators.
f1() returns  [ 0.59044123]
f2() returns  [ 0.59044123]


### Other Random Distributions

[other distributions implemented](http://deeplearning.net/software/theano/library/tensor/raw_random.html#libdoc-tensor-raw-random)


### Other Implementations

There is 2 other implementations based on MRG31k3p and CURAND.
The RandomStream only work on the CPU, MRG31k3p
work on the CPU and GPU. CURAND only work on the GPU.

In [None]:
To use you the MRG version easily, you can just change the import to:

In [None]:
from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams

### A Real Example: Logistic Regression

In [157]:
import numpy
import theano
import theano.tensor as T

In [158]:
rng = numpy.random

N = 400
feats = 784
D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2))
training_steps = 10000


In [159]:

# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats), name="w")
b = theano.shared(0., name="b")
print "Initial model:"
print w.get_value(), b.get_value()



Initial model:
[ 2.97739684  0.60328309  0.18612802 -0.17431635  1.81582416 -0.7556303
  0.05762588  0.74761964  0.5758447  -0.99091574  0.14852788  0.89403287
 -0.41224442 -0.15732817 -0.74100024 -2.44914531  0.9619713   0.26468002
  0.45709349  0.27335298 -1.11268119  0.69512412 -0.81561863 -0.57028575
  0.61096082  1.03591175  0.69089474  0.89338241  0.09597136 -1.06471582
  0.60605161  0.53348035  1.0884427  -0.25794321 -2.31603685 -0.64299083
 -1.79738322  1.72116934 -0.12311971  1.71783085 -0.47786566 -0.17668474
 -0.34049941 -0.40684382  0.04639455  1.31671869 -0.61527003 -0.50243778
 -0.2934254  -1.59780094 -0.7067815  -0.68389618  0.55261793  1.41730855
  0.73349963  0.66580206  0.33274777 -0.16525912 -0.712255   -0.37436395
 -0.48237688 -1.77934276 -0.8181882   0.18564964  0.42333525  0.85046899
 -1.41237427 -0.90589871  0.29309088  1.93621144 -1.29793247  0.60649278
  0.50386246 -0.98540444 -0.15027369 -0.28208615 -0.10324082  0.39118048
 -1.54055852  0.85125075 -1.23926884 

In [160]:
# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b))   # Probability that target = 1
prediction = p_1 > 0.5                    # The prediction thresholded
xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy loss function
cost = xent.mean() + 0.01 * (w ** 2).sum()# The cost to minimize
gw, gb = T.grad(cost, [w, b])             # Compute the gradient of the cost
                                          # (we shall return to this in a
                                          # following section of this tutorial)


In [161]:

# Compile
train = theano.function(
          inputs=[x,y],
          outputs=[prediction, xent],
          updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)))
predict = theano.function(inputs=[x], outputs=prediction)


In [162]:

# Train
for i in range(training_steps):
    pred, err = train(D[0], D[1])



In [163]:
print "Final model:"
print w.get_value(), b.get_value()
print "target values for D:", D[1]
print "prediction on D:", predict(D[0])

Final model:
[ -1.81131249e-01   7.91474551e-02  -1.44525934e-01  -1.00037207e-01
   1.04376413e-01   7.24664898e-02  -2.65346334e-02   5.74266922e-02
   4.42154749e-02  -8.14707713e-03   1.52100176e-01   1.06668865e-01
   1.69001666e-01   8.38883510e-02  -1.47190716e-01  -1.27891085e-01
  -7.11170825e-02  -1.66955393e-02  -7.55397395e-03  -1.77481040e-01
   6.56628547e-02   1.99854380e-01   1.43514942e-01  -1.48513939e-01
   1.68757308e-02  -1.47978583e-01  -6.67616177e-02  -4.64896635e-03
   1.17690142e-01   1.06428018e-01   2.95507573e-02  -8.96944360e-02
  -3.72206422e-02  -1.81272779e-02   7.50819753e-03   4.77551512e-02
  -7.86157462e-02   3.99775263e-02   5.41979454e-02  -7.07490788e-02
  -5.04209744e-02   5.02318171e-02  -3.83757740e-02  -9.00337194e-02
   1.28004116e-01   1.44180562e-01  -1.70874519e-01   2.07105118e-02
   2.51054609e-03  -9.05010910e-02   9.86029481e-02   1.48841783e-02
   7.70972541e-02   1.03998241e-01   1.00480749e-01  -5.88197873e-02
  -3.69735828e-02  -7