# 02. Graph Structures,  Printing/Drawing Theano graphs, Derivatives in Theano

# 차례

* Graph Structures
* Printing/Drawing Theano graphs
* Derivatives in Theano

# Graph Structures

* Theano Graphs
* Automatic Differentiation
* Optimizations

## Theano Graphs

* This chapter is meant to introduce you to a required minimum of the inner workings of Theano. 
For more detail see Extending Theano.
    - variables : The first step in writing Theano code is to write down all mathematical relations using symbolic placeholders (variables). 
    - ops : When writing down these expressions you use operations like +, -, **, sum(), tanh(). All these are represented internally as ops.
* Theano builds internally a graph structure composed of interconnected
    - variable nodes
    - op nodes
    - apply nodes - An apply node represents the application of an op to some variables.

In [1]:
import theano.tensor as T

x = T.dmatrix('x')
y = T.dmatrix('y')
z = x + y

<img src="http://deeplearning.net/software/theano/_images/apply1.png" />

Interaction between instances of Apply (blue), Variable (red), Op (green), and Type (purple).

In [2]:
import theano
x = theano.tensor.dmatrix('x')
y = x * 2.

In [3]:
y.owner

Elemwise{mul,no_inplace}(x, DimShuffle{x,x}.0)

In [4]:
type(y.owner)

theano.gof.graph.Apply

In [5]:
y.owner.op.name

'Elemwise{mul,no_inplace}'

In [6]:
len(y.owner.inputs)

2

In [7]:
y.owner.inputs[0]

x

In [8]:
y.owner.inputs[1]

DimShuffle{x,x}.0

In [9]:
type(y.owner.inputs[1])

theano.tensor.var.TensorVariable

In [10]:
type(y.owner.inputs[1].owner)

theano.gof.graph.Apply

In [11]:
y.owner.inputs[1].owner.op

<theano.tensor.elemwise.DimShuffle at 0x7f195e14d910>

In [12]:
y.owner.inputs[1].owner.inputs

[TensorConstant{2.0}]

## Automatic Differentiation

* Having the graph structure, computing automatic differentiation is simple. The only thing tensor.grad() has to do is to traverse the graph from the outputs back towards the inputs through all apply nodes (apply nodes are those that define which computations the graph does). 
* For each such apply node, its op defines how to compute the gradient of the node’s outputs with respect to its inputs. Note that if an op does not provide this information, it is assumed that the gradient is not defined. 
* Using the chain rule these gradients can be composed in order to obtain the expression of the gradient of the graph’s output with respect to the graph’s inputs .

## Optimizations

In [13]:
import theano

In [14]:
a = theano.tensor.vector("a")      # declare symbolic variable

In [15]:
b = a + a ** 10                    # build symbolic expression

In [16]:
f = theano.function([a], b)        # compile function

In [17]:
print f([0, 1, 2])  

[    0.     2.  1026.]


#### Unoptimized graph

In [18]:
theano.printing.pydotprint(b, outfile="./symbolic_graph_unopt.png", var_with_name_simple=True)

The output file is available at ./symbolic_graph_unopt.png


<img src="./symbolic_graph_unopt.png" />

#### optimized graph

In [19]:
theano.printing.pydotprint(f, outfile="./symbolic_graph_opt.png", var_with_name_simple=True)

The output file is available at ./symbolic_graph_opt.png


<img src="./symbolic_graph_opt.png" />

# Printing/Drawing Theano graphs

* Pretty Printing
* Debug Printing
* Picture Printing

# Derivatives in Theano

* Computing Gradients
* Computing the Jacobian
* Computing the Hessian
* Jacobian times a Vector
    - R-operator
    - L-operator
* Hessian times a Vector
* Final Pointers

# 참고자료

* [1] Theano 0.7 Tutorial - http://deeplearning.net/software/theano/tutorial/index.html