# Workflow when using Theano
1. Symbolically define mathematical functions
  * Automatically derive gradient expressions
2. Compile expressions into executable functions
  * theano.function([input params], output)
3. Execute expression

## Building Symbolic Expressions
In Theano, all algorithms are defined symbolically. It's more like writing out math than writing code. The following Theano variables are symbolic; they don't have an explicit value.
1. Tensor
  * Scalars 0th order tensor
  * Vectors 1th order tensor
  * Matrices 2th order tensor
2. Tensors ...
  * Reductions ?
  * Dimshuffle ?

In [58]:
import theano as theano
from theano import tensor as T
from theano import pp
import numpy as np

### Scalar math

In [16]:
x = T.scalar()
y = T.scalar('dd') #Optional name to help with debugging
z = x + y
w = z * x
a = T.sqrt(w)
b = T.exp(a)
c = a ** b
d = T.log(c)

$$c = \sum\limits_{i=1}^n i^2 = \frac{n(n+1)(2n+1)}{6}$$

$$c = \sqrt{a^2 + b^2}$$

### Vector Math

In [11]:
x = T.vector()
y = T.vector()
# Scalar math applied elementwise
a = x * y
# The cross product of x and y, a binary operation on two vectors of a 3-dimensional vector space which produces another such vector.
b = T.dot(x, y)

Elemwise{add,no_inplace}.0

$$\vec{z} = \vec{x} \cdot \vec{y}$$
$$\vec{z} = \vec{x} \times \vec{y}$$
$$z = \vec{x} \cdot y$$

### Matrix Math

In [20]:
X = T.matrix()
Y = T.matrix()
a = T.vector()
# Matrix matrix product ? how in latex
B = T.dot(x, y)
# Matrix vector product 
c = T.dot(x, a)

$$B_{m,p} = X_{m,n} \times Y_{n,p}$$
$$\vec{c}_n = X_{m,n} \times \vec{a}_m$$

## Compiling and executing expressions

### theano.function
To actually compute things with Theano, you define symbolic functions, which can then be called with actual values to retrieve an actual value.

In [25]:
x = T.scalar()
y = T.scalar()

# First ar is list of SYMBOLIC inputs
# Second arg is SYMBOLIC output
f = theano.function([x, y], x + y)

# Call it with NUMERICAL values
# Get a NUMERICAL output
f(1., 2.)

array(3.0)

### Shared variables

Shared variables are a little different - they actually do have an explicit value, which can be get/set and is shared across functions which use the variable. They're also useful
because they have state across function calls.
The value of a shared variable can be updated in a function by using the updates argument of theano.function.
To modify outside a function use get_value and set_value
Use shared variables for the values that will be changed alot. Helps with optimizing GPU performance.

In [48]:
#x = theano.shared(0.)
#from theano.compat.python2x import OrderedDict
#updates[x] = x + 1

#f = theano.function([], updates=updates) # Use updates when shared variables are modified in function?

#f() #updates
#x.get_value()

#x.set_value(100.)
#f() #updates
#x.get_value()

## Manipulating Symbolic Expressions

Automatic differantiation
tensor.grad(func, [params])
The second argument of grad can be a list (partial derivatives) 

In [56]:
x = T.dscalar('x')
y = x ** 2
gy = T.grad(y, x)
#pp(gy)

f = theano.function([x], gy)
f(4)

array(8.0)

## Loop: scan
Dont get it can I use this to sum stuff up?

In [65]:
# define shared variables
k = theano.shared(0)
n_sym = T.iscalar("n_sym")

results, updates = theano.scan(lambda:{k:(k+1)}, n_steps=n_sym)
accumulator = theano.function([n_sym], [], updates=updates, allow_input_downcast=True)

k.get_value()
accumulator(5)
k.get_value()

array(5)

# Linear regression

Model: $$x y$$
Cost function: $$x y$$ 

In [119]:
rng = np.random
X = np.asarray([3,4,5,6.1,6.3,2.88,8.89,5.62,6.9,1.97,8.22,9.81,4.83,7.27,5.14,3.08])
Y = np.asarray([0.9,1.6,1.9,2.9,1.54,1.43,3.06,2.36,2.3,1.11,2.57,3.15,1.5,2.64,2.20,1.21])
N = X.shape[0]
training_steps = 10
learning_rate = 0.01

# Declare Theano symbolic variables
m = theano.shared(rng.rand(), name = 'm')
c = theano.shared(rng.rand(), name = 'c')
x = T.vector('X')
y = T.vector('y')

# Construct Theano expression graph
prediction = T.dot(x,m) + c
cost = T.sum(T.pow(prediction-y,2))/(2*N)
g_m, g_c = T.grad(cost,[m, c])

# Compile
train = theano.function(inputs = [x,y],
                        outputs = cost,
                        updates = [(m,m-learning_rate*g_m), (c,c-learning_rate*g_c)])
test = theano.function([x], prediction)

# Train
for i in range(training_steps):
    val = train(X,Y)

# Logistic regression / Softmax (Binary classification)

Discriminative function
$$-p(y = 1|x) = {1 \over 1+exp(-w \cdot x-b)}$$

Objective function (Cross-entropy)
$$ J = -y \cdot log p - (1 - y)log(1 - p)$$
Model: $$x y$$
Cost function: $$x y$$ 

In [100]:
rng = np.random

N = 400 # number of samples
feats = 784 # dimensionality of features

# Tuple of test data
D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2)) #(X, y)
training_steps = 10000

# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w = theano.shared(rng.randn(784), name="w") #vector
b = theano.shared(0., name="b") # scalar
#print "Initial model: "
#print w.get_value(), b.get_value()

# Construct Theano expression graph
p_1 = 1 / (1 + T.exp(-T.dot(x, w)-b)) # probability that target = 1
prediction = p_1 > 0.5 # the prediction threshold
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # cross-entropy loss function
cost = xent.mean() + 0.01 * (w**2).sum() # the cost to minimize
gw, gb = T.grad(cost, [w, b])

# Compile
train = theano.function(
                inputs = [x,y],
                outputs = [prediction, xent],
                updates = [(w, w-0.1*gw), (b, b-0.1*gb)]) # or used ordereddict
predict = theano.function(inputs = [x], outputs = prediction)

# Train
for i in range(training_steps):
    pred, err = train(D[0], D[1])
    
#print 'Final model:'
#print w.get_value(), b.get_value()
#print 'target values for D: ', D[1]
#print 'predictions on D: ', predict(D[0])

# Multi-Layer Perceptron (Hidden layer(s))

Discriminative function
$$p(y=1|x) = f(w_2 \cdot (g(w_1 \cdot x + b_1) + b_2)$$ (f and g can be sigmoid/than functions)

Objective function (Cross-entropy)
$$ J = -y \cdot log p - (1 - y)log(1 - p)$$

Model: $$x y$$
Cost function: $$x y$$ 

In [115]:
rng = np.random
N = 400 #number of samples
feats = 784
D = (rng.randn(N, feats), rng.randint(size=N, low=0, high=2))
training_steps = 10000

# Declare Theano symbolic variables
x = T.matrix("x")
y = T.vector("y")
w_1 = theano.shared(rng.randn(784,300), name="w1")
b_1 = theano.shared(np.zeros((300,)), name="b1")
w_2 = theano.shared(rng.randn(300), name="w2")
b_2 = theano.shared(0., name="b2")

# Construct Theano expression graph
p_1 = T.nnet.sigmoid(-T.dot(T.nnet.sigmoid(-T.dot(x, w_1)-b_1), w_2)-b_2) # probability target = 1
prediction = p_1 > 0.5 # prediction threshold
xent = -y*T.log(p_1) - (1-y)*T.log(1-p_1) # Cross-entropy loss func
cost = xent.mean() + 0.01 * (w**2).sum() # The cost to minimize
gw_1, gb_1, gw_2, gb_2 = T.grad(cost, [w_1, b_1, w_2, b_2])

# Compile
train = theano.function(
                inputs = [x, y],
                outputs = [prediction, xent],
                updates = [(w_1, w_1-0.1*gw_1), (b_1,  b_1-0.1*gb_1), (w_2,  w_2-0.1*gw_2), (b_2, b_2-0.1*gb_2)])
predict = theano.function(inputs = [x], outputs = prediction)

# Train
for i in range(training_steps):
    pred, err = train(D[0], D[1])

# Recurrent Neural Network
Use scan to implement the loop operation