# 01. theano 설치 및 환경설정,  Demo / NumPy refresher / Baby Steps - Algebra / More EXamples

# 차례

* 설치 및 환경설정
* Demo
* NumPy refresher
* Baby Steps - Algebra
* More EXamples

# 설치 및 환경설정

In [1]:
!pip install theano

Downloading/unpacking theano
  Downloading Theano-0.7.0.tar.gz (2.0MB): 2.0MB downloaded
  Running setup.py (path:/home/konan/.venv/deep/build/theano/setup.py) egg_info for package theano
    
    
Installing collected packages: theano
  Running setup.py install for theano
    changing mode of build/scripts-2.7/theano-cache from 664 to 775
    changing mode of build/scripts-2.7/theano-nose from 664 to 775
    changing mode of build/scripts-2.7/theano-test from 664 to 775
    
    
    changing mode of /home/konan/.venv/deep/bin/theano-test to 775
    changing mode of /home/konan/.venv/deep/bin/theano-nose to 775
    changing mode of /home/konan/.venv/deep/bin/theano-cache to 775
Successfully installed theano
Cleaning up...


# Demo

# NumPy refresher

* Matrix conventions for machine learning
* Broadcasting

## Matrix conventions for machine learning

In [2]:
import numpy

This is a 3x2 matrix, i.e. there are 3 rows and 2 columns.

In [3]:
a = numpy.asarray([[1., 2], [3, 4], [5, 6]])
print a

[[ 1.  2.]
 [ 3.  4.]
 [ 5.  6.]]


In [4]:
a.shape

(3, 2)

In [5]:
# row 3, colum 1 - 요소에 접근하기
a[2,0]

5.0

## Broadcasting

* Numpy does broadcasting of arrays of different shapes during arithmetic operations. What this means in general is that the smaller array (or scalar) is broadcasted across the larger array so that they have compatible shapes. The example below shows an instance of broadcastaing:

In [6]:
a = numpy.asarray([1.0, 2.0, 3.0])
print a

[ 1.  2.  3.]


In [7]:
b = 2.0
print b

2.0


In [8]:
a * b

array([ 2.,  4.,  6.])

# Baby Steps - Algebra

* Adding two Scalars
* Adding two Matrices

## Adding two Scalars

In [9]:
import theano.tensor as T
from theano import function

In [10]:
x = T.dscalar('x')

In [11]:
y = T.dscalar('y')

In [12]:
z = x + y

In [14]:
f = function([x, y], z)

In [16]:
f(2, 3)

array(5.0)

In [17]:
f(16.3, 12.1)

array(28.4)

#### Let’s break this down into several steps.

If you are following along and typing into an interpreter, you may have noticed that there was a slight delay in executing the function instruction. Behind the scene, f was being compiled into C code.

### Step 1

In [18]:
x = T.dscalar('x')
y = T.dscalar('y')       

In [19]:
type(x)

theano.tensor.var.TensorVariable

In [21]:
x.type

TensorType(float64, scalar)

In [22]:
T.dscalar

TensorType(float64, scalar)

In [23]:
x.type is T.dscalar

True

### Step 2

In [24]:
z = x + y

z is yet another Variable which represents the addition of x and y. You can use the pp function to pretty-print out the computation associated to z.

In [25]:
from theano import pp
print pp(z)

(x + y)


### Step 3

In [26]:
f = function([x, y], z)

* The first argument to function is a list of Variables that will be provided as inputs to the function. 
* The second argument is a single Variable or a list of Variables. 
* For either case, the second argument is what we want to see as output when we apply the function. f may then be used like a normal Python function.

## Adding two Matrices

In [27]:
x = T.dmatrix('x')

In [28]:
y = T.dmatrix('y')

In [29]:
z = x + y

In [30]:
f = function([x, y], z)

In [31]:
f([[1, 2], [3, 4]], [[10, 20], [30, 40]])

array([[ 11.,  22.],
       [ 33.,  44.]])

In [32]:
import numpy
a = numpy.array([[1, 2], [3, 4]])
b = numpy.array([[10, 20], [30, 40]])
f(a, b)

array([[ 11.,  22.],
       [ 33.,  44.]])

It is possible to add scalars to matrices, vectors to matrices, scalars to vectors, etc. = broadcasting

The following types are available:

* <b>byte</b>: bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4
* <b>16-bit integers</b>:wscalar, wvector, wmatrix, wrow, wcol, wtensor3, wtensor4
* <b>32-bit integers</b>: iscalar, ivector, imatrix, irow, icol, itensor3, itensor4
* <b>64-bit integers</b>: lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4
* <b>float</b>: fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4
* <b>double</b>: dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4
* <b>complex</b>: cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4

In [33]:
import theano
a = theano.tensor.vector() # declare variable
out = a + a ** 10               # build symbolic expression
f = theano.function([a], out)   # compile function
print f([0, 1, 2])  # prints `array([0, 2, 1026])`

[    0.     2.  1026.]


# More EXamples

* Logistic Function
* Computing More than one Thing at the Same Time
* Setting a Default Value for an Argument
* Using Shared Variables
* Using Random Numbers
    - Brief Example
    - Seeding Streams
    - Sharing Streams Between Functions
    - Copying Random State Between Theano Graphs
    - Other Random Distributions
    - Other Implementations
* Real Example: Logistic Regression

## Logistic Function

<img src="http://deeplearning.net/software/theano/_images/math/943718fb001e2e9576e781d97946d74e44de5251.png" />

<img src="http://deeplearning.net/software/theano/_images/logistic.png" />

You want to compute the function elementwise on matrices of doubles, which means that you want to apply this function to each individual element of the matrix.

In [34]:
x = T.dmatrix('x')
s = 1 / (1 + T.exp(-x))
logistic = function([x], s)

In [35]:
logistic([[0, 1], [-1, -2]])

array([[ 0.5       ,  0.73105858],
       [ 0.26894142,  0.11920292]])

<img src="http://deeplearning.net/software/theano/_images/math/d27229dfcd1ce305c126bbbd8a2e0fa867ccc503.png" />

In [36]:
s2 = (1 + T.tanh(x / 2)) / 2
logistic2 = function([x], s2)

In [37]:
logistic2([[0, 1], [-1, -2]])

array([[ 0.5       ,  0.73105858],
       [ 0.26894142,  0.11920292]])

### Computing More than one Thing at the Same Time

Theano supports functions with multiple outputs. For example, we can compute the elementwise difference, absolute difference, and squared difference between two matrices a and b at the same time:

In [38]:
a, b = T.dmatrices('a', 'b')
diff = a - b
abs_diff = abs(diff)
diff_squared = diff**2
f = function([a, b], [diff, abs_diff, diff_squared])

In [39]:
f([[1, 1], [1, 1]], [[0, 1], [2, 3]])

[array([[ 1.,  0.],
        [-1., -2.]]), array([[ 1.,  0.],
        [ 1.,  2.]]), array([[ 1.,  0.],
        [ 1.,  4.]])]

### Setting a Default Value for an Argument

Let’s say you want to define a function that adds two numbers, except that if you only provide one number, the other input is assumed to be one. You can do it like this:

In [40]:
from theano import Param
x, y = T.dscalars('x', 'y')
z = x + y
f = function([x, Param(y, default=1)], z)

In [41]:
f(33)

array(34.0)

In [42]:
f(33, 2)

array(35.0)

Inputs with default values must follow inputs without default values (like Python’s functions). There can be multiple inputs with default values. These parameters can be set positionally or by name, as in standard Python:

In [43]:
x, y, w = T.dscalars('x', 'y', 'w')
z = (x + y) * w
f = function([x, Param(y, default=1), Param(w, default=2, name='w_by_name')], z)

In [44]:
f(33)

array(68.0)

In [47]:
f(33, 2)

array(70.0)

In [48]:
f(33, 0, 1)

array(33.0)

In [49]:
f(33, w_by_name=1)

array(34.0)

In [50]:
f(33, w_by_name=1, y=0)

array(33.0)

### Using Shared Variables

It is also possible to make a function with an internal state. For example, let’s say we want to make an accumulator: at the beginning, the state is initialized to zero. Then, on each function call, the state is incremented by the function’s argument.

In [51]:
from theano import shared
state = shared(0)
inc = T.iscalar('inc')
accumulator = function([inc], state, updates=[(state, state+inc)])

In [52]:
state.get_value()

array(0)

In [53]:
accumulator(1)

array(0)

In [54]:
state.get_value()

array(1)

In [55]:
accumulator(300)

array(1)

In [56]:
state.get_value()

array(301)

### Using Random Numbers

The way to think about putting randomness into Theano’s computations is to put random variables in your graph. Theano will allocate a NumPy RandomStream object (a random number generator) for each such variable, and draw from it as necessary. We will call this sort of sequence of random numbers a random stream. Random streams are at their core shared variables, so the observations on shared variables hold here as well. Theanos’s random objects are defined and implemented in RandomStreams and, at a lower level, in RandomStreamsBase.

#### Brief Example

In [57]:
from theano.tensor.shared_randomstreams import RandomStreams
from theano import function

In [58]:
srng = RandomStreams(seed=234)

In [59]:
rv_u = srng.uniform((2,2))

In [60]:
rv_n = srng.normal((2,2))

In [61]:
f = function([], rv_u)

In [62]:
g = function([], rv_n, no_default_updates=True)    #Not updating rv_n.rng

In [63]:
nearly_zeros = function([], rv_u + rv_u - 2 * rv_u)

Here, ‘rv_u’ represents a random stream of 2x2 matrices of draws from a uniform distribution. Likewise, ‘rv_n’ represents a random stream of 2x2 matrices of draws from a normal distribution.

Now let’s use these objects. If we call f(), we get random uniform numbers.

In [64]:
f_val0 = f()

In [65]:
f_val0

array([[ 0.12672381,  0.97091597],
       [ 0.13989098,  0.88754825]])

In [66]:
f_val1 = f()  #different numbers from f_val0
f_val1

array([[ 0.31971415,  0.47584377],
       [ 0.24129163,  0.42046081]])

When we add the extra argument no_default_updates=True to function (as in g), then the random number generator state is not affected by calling the returned function. So, for example, calling g multiple times will return the same numbers.

In [67]:
g_val0 = g()  # different numbers from f_val0 and f_val1
g_val0

array([[ 0.37328447, -0.65746672],
       [-0.36302373, -0.97484625]])

In [68]:
g_val1 = g()  # same numbers as g_val0!
g_val1

array([[ 0.37328447, -0.65746672],
       [-0.36302373, -0.97484625]])

#### Seeding Streams

Random variables can be seeded individually or collectively.

You can seed just one random variable by seeding or assigning to the .rng attribute, using .rng.set_value().

In [69]:
rng_val = rv_u.rng.get_value(borrow=True)   # Get the rng for rv_u

In [70]:
rng_val.seed(89234)                         # seeds the generator

In [71]:
rv_u.rng.set_value(rng_val, borrow=True)    # Assign back seeded rng

You can also seed all of the random variables allocated by a RandomStreams object by that object’s seed method. This seed will be used to seed a temporary random number generator, that will in turn generate seeds for each of the random variables.

In [72]:
srng.seed(902340)  # seeds rv_u and rv_n with different seeds each

#### Sharing Streams Between Functions

As usual for shared variables, the random number generators used for random variables are common between functions. So our nearly_zeros function will update the state of the generators used in function f above.

In [73]:
state_after_v0 = rv_u.rng.get_value().get_state()

In [74]:
nearly_zeros()       # this affects rv_u's generator

array([[ 0.,  0.],
       [ 0.,  0.]])

In [81]:
v1 = f()
v1

array([[ 0.62720432,  0.90458979],
       [ 0.14363919,  0.89279932]])

In [76]:
rng = rv_u.rng.get_value(borrow=True)

In [77]:
rng.set_state(state_after_v0)

In [78]:
rv_u.rng.set_value(rng, borrow=True)

In [80]:
v2 = f()             # v2 != v1
v2

array([[ 0.5025809 ,  0.99544429],
       [ 0.75073355,  0.17926032]])

In [82]:
v3 = f()             # v3 == v1
v3

array([[ 0.23219826,  0.25305996],
       [ 0.02116774,  0.65845077]])

#### Copying Random State Between Theano Graphs

* In some use cases, a user might want to transfer the “state” of all random number generators associated with a given theano graph (e.g. g1, with compiled function f1 below) to a second graph (e.g. g2, with function f2). This might arise for example if you are trying to initialize the state of a model, from the parameters of a pickled version of a previous model. For
    - theano.tensor.shared_randomstreams.RandomStreams 
    - and theano.sandbox.rng_mrg.MRG_RandomStreams this can be achieved by copying elements of the state_updates parameter.
* Each time a random variable is drawn from a RandomStreams object, a tuple is added to the state_updates list.
    - The first element is a shared variable, which represents the state of the random number generator associated with this particular variable, 
    - while the second represents the theano graph corresponding to the random number generation process (i.e. RandomFunction{uniform}.0).

An example of how “random states” can be transferred from one theano function to another is shown below.

In [83]:
import theano
import numpy
import theano.tensor as T
from theano.sandbox.rng_mrg import MRG_RandomStreams
from theano.tensor.shared_randomstreams import RandomStreams

In [84]:
class Graph():
    def __init__(self, seed=123):
        self.rng = RandomStreams(seed)
        self.y = self.rng.uniform(size=(1,))

In [85]:
g1 = Graph(seed=123)
g1

<__main__.Graph instance at 0x7f4abe181908>

In [86]:
f1 = theano.function([], g1.y)
f1

<theano.compile.function_module.Function at 0x7f4abe8e2210>

In [87]:
g2 = Graph(seed=987)
f2 = theano.function([], g2.y)

In [88]:
print 'By default, the two functions are out of sync.'
print 'f1() returns ', f1()
print 'f2() returns ', f2()

By default, the two functions are out of sync.
f1() returns  [ 0.72803009]
f2() returns  [ 0.55056769]


In [89]:
def copy_random_state(g1, g2):
    if isinstance(g1.rng, MRG_RandomStreams):
        g2.rng.rstate = g1.rng.rstate
    for (su1, su2) in zip(g1.rng.state_updates, g2.rng.state_updates):
        su2[0].set_value(su1[0].get_value())

In [90]:
print 'We now copy the state of the theano random number generators.'
copy_random_state(g1, g2)
print 'f1() returns ', f1()
print 'f2() returns ', f2()

We now copy the state of the theano random number generators.
f1() returns  [ 0.59044123]
f2() returns  [ 0.59044123]


#### Other Random Distributions

There are other distributions implemented - http://deeplearning.net/software/theano/library/tensor/raw_random.html#libdoc-tensor-raw-random

#### Other Implementations

* There is 2 other implementations based on CURAND and MRG31k3p
    - CURAND - http://deeplearning.net/software/theano/library/sandbox/cuda/op.html#module-theano.sandbox.cuda.rng_curand
    - MRG31k3p - http://deeplearning.net/software/theano/library/sandbox/rng_mrg.html#libdoc-rng-mrg

### Real Example: Logistic Regression

In [92]:
import numpy
import theano
import theano.tensor as T
rng = numpy.random

N = 400
feats = 784
D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2))
training_steps = 10000

# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats), name="w")
b = theano.shared(0., name="b")
print "Initial model:"
print w.get_value(), b.get_value()

# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b))   # Probability that target = 1
prediction = p_1 > 0.5                    # The prediction thresholded
xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy loss function
cost = xent.mean() + 0.01 * (w ** 2).sum()# The cost to minimize
gw, gb = T.grad(cost, [w, b])             # Compute the gradient of the cost
                                          # (we shall return to this in a
                                          # following section of this tutorial)

# Compile
train = theano.function(
          inputs=[x,y],
          outputs=[prediction, xent],
          updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)))
predict = theano.function(inputs=[x], outputs=prediction)

# Train
for i in range(training_steps):
    pred, err = train(D[0], D[1])

print "Final model:"
print w.get_value(), b.get_value()
print "target values for D:", D[1]
print "prediction on D:", predict(D[0])

Initial model:
[  1.67019495e+00   2.92417189e-01  -7.46813878e-01   1.30802467e+00
   2.49083200e-03   1.46521564e+00   1.52705311e+00  -1.86482485e-01
  -8.67383502e-01  -1.47840817e+00   2.04217345e+00  -1.79772329e-01
   4.68483034e-01   9.15536222e-02  -6.87248735e-01   1.63020176e-02
   4.07246700e-01   7.04426327e-01   1.37555142e+00  -7.60415114e-01
   4.93261508e-01   1.95323720e-01   1.17411339e-01  -8.94363620e-01
  -2.59411103e-01   1.94318463e+00  -1.52386756e-01   1.09117029e+00
   1.19284815e-01  -1.05395816e+00  -3.18195284e-01  -3.89012203e-01
   2.05109678e+00   1.06314377e+00   1.35568454e+00   1.21947194e-01
  -4.57464894e-01  -6.62252040e-01   7.95171362e-01  -1.42240859e-01
  -4.60494242e-01  -8.94458180e-01  -1.77438878e-01   1.56352842e+00
   4.02649656e-01   6.26960777e-02  -5.15609361e-02  -9.85760276e-01
   3.77736771e-01  -1.46082120e+00   1.77843757e+00  -1.57858054e+00
   1.28744097e+00  -4.45699479e-01   6.09213766e-01   6.50906451e-02
   1.41345030e+00  

# 참고자료

* [1] Theano 0.7 Tutorial - http://deeplearning.net/software/theano/tutorial/index.html