# 01. theano 설치 및 환경설정,  Demo / NumPy refresher / Baby Steps - Algebra / More EXamples

# 차례

* 설치 및 환경설정
* Demo
* NumPy refresher
* Baby Steps - Algebra
* More EXamples

# 설치 및 환경설정

In [59]:
!pip install theano

Cleaning up...


# Demo

# NumPy refresher

* Matrix conventions for machine learning
* Broadcasting

## Matrix conventions for machine learning

In [1]:
import numpy

This is a 3x2 matrix, i.e. there are 3 rows and 2 columns.

In [2]:
a = numpy.asarray([[1., 2], [3, 4], [5, 6]])
print a

[[ 1.  2.]
 [ 3.  4.]
 [ 5.  6.]]


In [3]:
a.shape

(3, 2)

In [4]:
# row 3, colum 1 - 요소에 접근하기
a[2,0]

5.0

## Broadcasting

* Numpy does broadcasting of arrays of different shapes during arithmetic operations. What this means in general is that the smaller array (or scalar) is broadcasted across the larger array so that they have compatible shapes. The example below shows an instance of broadcastaing:

In [5]:
a = numpy.asarray([1.0, 2.0, 3.0])
print a

[ 1.  2.  3.]


In [6]:
b = 2.0
print b

2.0


In [7]:
a * b

array([ 2.,  4.,  6.])

# Baby Steps - Algebra

* Adding two Scalars
* Adding two Matrices

## Adding two Scalars

In [11]:
import theano.tensor as T
from theano import function

In [12]:
x = T.dscalar('x')

In [13]:
y = T.dscalar('y')

In [14]:
z = x + y

In [15]:
 f = function([x, y], z)

In [16]:
 f(2, 3)

array(5.0)

In [17]:
 f(16.3, 12.1)

array(28.4)

#### Let’s break this down into several steps.

If you are following along and typing into an interpreter, you may have noticed that there was a slight delay in executing the function instruction. Behind the scene, f was being compiled into C code.

### Step 1

In [19]:
x = T.dscalar('x')
y = T.dscalar('y')       

In [20]:
type(x)

theano.tensor.var.TensorVariable

In [21]:
 x.type

TensorType(float64, scalar)

In [22]:
T.dscalar

TensorType(float64, scalar)

In [23]:
 x.type is T.dscalar

True

### Step 2

In [24]:
z = x + y

z is yet another Variable which represents the addition of x and y. You can use the pp function to pretty-print out the computation associated to z.

In [25]:
from theano import pp
print pp(z)

(x + y)


### Step 3

In [26]:
f = function([x, y], z)

* The first argument to function is a list of Variables that will be provided as inputs to the function. 
* The second argument is a single Variable or a list of Variables. 
* For either case, the second argument is what we want to see as output when we apply the function. f may then be used like a normal Python function.

## Adding two Matrices

In [27]:
x = T.dmatrix('x')

In [28]:
y = T.dmatrix('y')

In [29]:
z = x + y

In [30]:
f = function([x, y], z)

In [31]:
 f([[1, 2], [3, 4]], [[10, 20], [30, 40]])

array([[ 11.,  22.],
       [ 33.,  44.]])

In [32]:
import numpy
a = numpy.array([[1, 2], [3, 4]])
b = numpy.array([[10, 20], [30, 40]])
f(a, b)

array([[ 11.,  22.],
       [ 33.,  44.]])

It is possible to add scalars to matrices, vectors to matrices, scalars to vectors, etc. = broadcasting

The following types are available:

* <b>byte</b>: bscalar, bvector, bmatrix, brow, bcol, btensor3, btensor4
* <b>16-bit integers</b>:wscalar, wvector, wmatrix, wrow, wcol, wtensor3, wtensor4
* <b>32-bit integers</b>: iscalar, ivector, imatrix, irow, icol, itensor3, itensor4
* <b>64-bit integers</b>: lscalar, lvector, lmatrix, lrow, lcol, ltensor3, ltensor4
* <b>float</b>: fscalar, fvector, fmatrix, frow, fcol, ftensor3, ftensor4
* <b>double</b>: dscalar, dvector, dmatrix, drow, dcol, dtensor3, dtensor4
* <b>complex</b>: cscalar, cvector, cmatrix, crow, ccol, ctensor3, ctensor4

In [33]:
import theano
a = theano.tensor.vector() # declare variable
out = a + a ** 10               # build symbolic expression
f = theano.function([a], out)   # compile function
print f([0, 1, 2])  # prints `array([0, 2, 1026])`

[    0.     2.  1026.]


# More EXamples

* Logistic Function
* Computing More than one Thing at the Same Time
* Setting a Default Value for an Argument
* Using Shared Variables
* Using Random Numbers
    - Brief Example
    - Seeding Streams
    - Sharing Streams Between Functions
    - Copying Random State Between Theano Graphs
    - Other Random Distributions
    - Other Implementations
* Real Example: Logistic Regression

## Logistic Function

<img src="http://deeplearning.net/software/theano/_images/math/943718fb001e2e9576e781d97946d74e44de5251.png" />

<img src="http://deeplearning.net/software/theano/_images/logistic.png" />

You want to compute the function elementwise on matrices of doubles, which means that you want to apply this function to each individual element of the matrix.

In [35]:
x = T.dmatrix('x')
s = 1 / (1 + T.exp(-x))
logistic = function([x], s)

In [36]:
logistic([[0, 1], [-1, -2]])

array([[ 0.5       ,  0.73105858],
       [ 0.26894142,  0.11920292]])

<img src="http://deeplearning.net/software/theano/_images/math/d27229dfcd1ce305c126bbbd8a2e0fa867ccc503.png" />

In [38]:
s2 = (1 + T.tanh(x / 2)) / 2
logistic2 = function([x], s2)

In [39]:
logistic2([[0, 1], [-1, -2]])

array([[ 0.5       ,  0.73105858],
       [ 0.26894142,  0.11920292]])

### Computing More than one Thing at the Same Time

Theano supports functions with multiple outputs. For example, we can compute the elementwise difference, absolute difference, and squared difference between two matrices a and b at the same time:

In [41]:
a, b = T.dmatrices('a', 'b')
diff = a - b
abs_diff = abs(diff)
diff_squared = diff**2
f = function([a, b], [diff, abs_diff, diff_squared])

In [42]:
f([[1, 1], [1, 1]], [[0, 1], [2, 3]])

[array([[ 1.,  0.],
        [-1., -2.]]), array([[ 1.,  0.],
        [ 1.,  2.]]), array([[ 1.,  0.],
        [ 1.,  4.]])]

### Setting a Default Value for an Argument

Let’s say you want to define a function that adds two numbers, except that if you only provide one number, the other input is assumed to be one. You can do it like this:

In [43]:
from theano import Param
x, y = T.dscalars('x', 'y')
z = x + y
f = function([x, Param(y, default=1)], z)

In [44]:
f(33)

array(34.0)

In [45]:
f(33, 2)

array(35.0)

Inputs with default values must follow inputs without default values (like Python’s functions). There can be multiple inputs with default values. These parameters can be set positionally or by name, as in standard Python:

In [46]:
x, y, w = T.dscalars('x', 'y', 'w')
z = (x + y) * w
f = function([x, Param(y, default=1), Param(w, default=2, name='w_by_name')], z)

In [47]:
f(33)

array(68.0)

In [48]:
f(33, 2)

array(70.0)

In [49]:
f(33, 0, 1)

array(33.0)

In [50]:
f(33, w_by_name=1)

array(34.0)

In [51]:
f(33, w_by_name=1, y=0)

array(33.0)

### Using Shared Variables

It is also possible to make a function with an internal state. For example, let’s say we want to make an accumulator: at the beginning, the state is initialized to zero. Then, on each function call, the state is incremented by the function’s argument.

In [52]:
from theano import shared
state = shared(0)
inc = T.iscalar('inc')
accumulator = function([inc], state, updates=[(state, state+inc)])

In [53]:
state.get_value()

array(0)

In [54]:
accumulator(1)

array(0)

In [55]:
state.get_value()

array(1)

In [56]:
accumulator(300)

array(1)

In [57]:
state.get_value()

array(301)

### Using Random Numbers

#### Brief Example

#### Seeding Streams

#### Sharing Streams Between Functions

#### Copying Random State Between Theano Graphs

#### Other Random Distributions

#### Other Implementations

### Real Example: Logistic Regression

In [58]:
import numpy
import theano
import theano.tensor as T
rng = numpy.random

N = 400
feats = 784
D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2))
training_steps = 10000

# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(feats), name="w")
b = theano.shared(0., name="b")
print "Initial model:"
print w.get_value(), b.get_value()

# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w) - b))   # Probability that target = 1
prediction = p_1 > 0.5                    # The prediction thresholded
xent = -y * T.log(p_1) - (1-y) * T.log(1-p_1) # Cross-entropy loss function
cost = xent.mean() + 0.01 * (w ** 2).sum()# The cost to minimize
gw, gb = T.grad(cost, [w, b])             # Compute the gradient of the cost
                                          # (we shall return to this in a
                                          # following section of this tutorial)

# Compile
train = theano.function(
          inputs=[x,y],
          outputs=[prediction, xent],
          updates=((w, w - 0.1 * gw), (b, b - 0.1 * gb)))
predict = theano.function(inputs=[x], outputs=prediction)

# Train
for i in range(training_steps):
    pred, err = train(D[0], D[1])

print "Final model:"
print w.get_value(), b.get_value()
print "target values for D:", D[1]
print "prediction on D:", predict(D[0])

Initial model:
[ -1.42111412e-01   6.94340172e-01   1.28997755e-02   1.50638369e+00
   2.82010656e+00  -1.69207620e+00   4.28408181e-01  -5.57769398e-01
   1.07503183e+00  -2.03069110e+00  -3.88104714e-01  -1.06068334e+00
  -2.04697300e+00  -6.10142058e-01  -8.73433858e-01  -1.73390466e+00
  -8.34776497e-01   5.08669506e-01   3.32153194e-01  -1.07191715e+00
   1.27794970e+00   3.80428164e-01  -2.48096498e+00   1.42942676e+00
  -7.07996228e-01   1.58385998e+00   8.85295462e-01   6.17573224e-01
  -2.73795434e-01   5.96132223e-01  -7.48656907e-01   7.23562800e-02
  -3.56461864e-01   2.58109768e-02  -1.27895692e-01  -7.00580793e-02
   1.16765724e+00  -1.21837648e+00   4.79024817e-01  -1.87097567e+00
   1.75591356e+00   7.13428910e-01  -1.27778281e+00   4.43080622e-01
   4.66769970e-01  -9.25411578e-01   5.37686827e-01  -7.49149481e-01
   2.17391478e+00  -1.52930349e-01  -3.88442543e-01   4.43644289e-01
   1.11423181e-01  -6.20008671e-01   7.18401482e-01   1.08682712e-01
   3.56102291e-01  

# 참고자료

* [1] Theano 0.7 Tutorial - http://deeplearning.net/software/theano/tutorial/index.html