# Starting with TensorFlow

In this notebook we'll study the basic features of [TensorFlow](https://www.tensorflow.org/).

## Python example
Let's implement a Python function that computes the sum of squares of numbers from 0 to N-1.
We'll use two methods.

In [None]:
def sum_(N):
    return sum([x**2 for x in range(N)])

In [None]:
%%time
print(sum_(10**6))

In [None]:
import numpy as np

# Function in basic python
def sum_np(N):
    return np.sum(np.arange(N)**2)

In [None]:
%%time
print(sum_np(10**6))

## TensoFlow translation 

Doing the very same thing

In [None]:
import tensorflow as tf
import os

tf.logging.set_verbosity(tf.logging.INFO)

# Let's reset the default graph
tf.reset_default_graph()

# Let's start a new interactive session
sess = tf.InteractiveSession()

In [None]:
# An integer parameter
N = tf.placeholder('int64', name="input_value")

# A recipe on how to produce the same result
result = tf.reduce_sum(tf.range(N)**2, name="reduce_sum")

In [None]:
N

In [None]:
result

In [None]:
%%time
result.eval({N: 10**6}) # evaluate graph: method 1

In [None]:
%%time
sess.run(result, {N:10**6}) # evaluate graph: method 2

In [None]:
# Graph definition in sess.graph 
# Let's enable the Graph Visualize 

writer = tf.summary.FileWriter("/tmp/tboard/1", graph=sess.graph)

In [None]:
# Let's run TensorBoard
os.system("tensorboard --logdir=/tmp/tboard/1")

In [None]:
sess.close()

You run the notebook locally, you should be able to access TensorBoard on http://127.0.0.1:6006/

## How does it work?
1. Define <font color='red'>placeholders</font> where you'll send the <font color='red'>inputs</font>
2. Make symbolic graph: a recipe for mathematical transformation of those placeholders
3. Compute outputs of your graph with particular values for each placeholder:
  * `output.eval({placeholder:value})`
  * `sess.run(output, {placeholder:value})`

So far there are two main entities: "placeholder" and "transformation"
* Both can be numbers, vectors, matrices, tensors, etc.
* Both can be int32/64, floats, booleans (uint8) of various size.

* You can define new transformations as an arbitrary operation on placeholders and other transformations
 * `tf.reduce_sum(tf.arange(N)**2)` are 3 sequential transformations of placeholder `N`
 * There's a tensorflow symbolic version for every numpy function
   * `a+b, a/b, a**b, ...` behave just like in numpy
   * `np.mean` -> `tf.reduce_mean`
   * `np.arange` -> `tf.range`
   * `np.cumsum` -> `tf.cumsum`
   * If you can't find the op you need, see the [docs](https://www.tensorflow.org/api_docs/python).
   
`tf.contrib` has many high-level features, may be worth a look.

In [None]:
# Let's reset the default graph
tf.reset_default_graph()

# Let's start a new interactive session
sess = tf.InteractiveSession()

In [None]:
with tf.name_scope("Placeholders_examples"):
    # Default placeholder that can be arbitrary float32
    # scalar, vertor, matrix, etc.
    arbitrary_input = tf.placeholder('float32')

    # Input vector of arbitrary length
    input_vector = tf.placeholder('float32', shape=(None,))

    # Input vector that must have 10 elements and integer type
    fixed_vector = tf.placeholder('int32', shape=(10,))

    # Matrix of arbitrary number of rows and 10 columns
    # (e.g. a minibatch your data table)
    input_matrix = tf.placeholder('float32', shape=(None, 10))
    
    # You can generally use None whenever you don't need a specific shape
    input1 = tf.placeholder('float64', shape=(None, 100, None))
    input2 = tf.placeholder('int32', shape=(None, None, 3, 224, 224))

    # elementwise multiplication
    double_the_vector = input_vector*2

    # elementwise cosine
    elementwise_cosine = tf.cos(input_vector)

    # difference between squared vector and vector itself plus one
    vector_squares = input_vector**2 - input_vector + 1

In [None]:
with tf.name_scope("transformation"):
    my_vector =  tf.placeholder('float32', shape=(None,), name="VECTOR_1")
    my_vector2 = tf.placeholder('float32', shape=(None,))
    my_transformation = my_vector * my_vector2 / (tf.sin(my_vector) + 1)

In [None]:
print(my_transformation)

In [None]:
dummy = np.arange(5).astype('float32')
print(dummy)
my_transformation.eval({my_vector:dummy, my_vector2:dummy[::-1]})

In [None]:
# Graph definition in sess.graph 
# Let's enable the Graph Visualize 

writer = tf.summary.FileWriter("/tmp/tboard/2", graph=sess.graph)

In [None]:
writer.add_graph(my_transformation.graph)
writer.flush()

In [None]:
# Let's run TensorBoard
os.system("tensorboard --logdir=/tmp/tboard/2")

In [None]:
sess.close()

TensorBoard allows writing scalars, images, audio, histogram. You can read more on tensorboard usage [here](https://www.tensorflow.org/get_started/graph_viz).

## Summary
* Tensorflow is based on computation graphs
* The graphs consist of placeholders and transformations

# Mean squared error

Your assignment is to implement mean squared error in tensorflow.

In [None]:
# Let's reset the default graph
tf.reset_default_graph()

# Let's start a new interactive session
sess = tf.InteractiveSession()

In [None]:
with tf.name_scope("MSE"):
    y_true = tf.placeholder("float32", shape=(None,), name="y_true")
    y_predicted = tf.placeholder("float32", shape=(None,), name="y_predicted")
    mse = tf.reduce_mean((y_true-y_predicted)**2)
    
def compute_mse(vector1, vector2):
    return mse.eval({y_true: vector1, y_predicted: vector2})

In [None]:
# Graph definition in sess.graph 
# Let's enable the Graph Visualize 

writer = tf.summary.FileWriter("/tmp/tboard/3", graph=sess.graph)

In [None]:
writer.add_graph(mse.graph)
writer.flush()

In [None]:
os.system("tensorboard --logdir=/tmp/tboard/3")

In [None]:
sess.close()

# Variables

The inputs and transformations have no value outside function call. This isn't too comfortable if you want your model to have parameters (e.g. network weights) that are always present, but can change their value over time.

Tensorflow solves this with `tf.Variable` objects.
* You can assign variable a value at any time in your graph
* Unlike placeholders, there's no need to explicitly pass values to variables when `sess.run(...)`-ing
* You can use variables the same way you use transformations 
 

In [None]:
# Let's start a new interactive session
sess = tf.Session()

In [None]:
# Creating a shared variable
shared_vector_1 = tf.Variable(initial_value=np.ones(5),
                              name="example_variable")

In [None]:
# Initialize variable(s) with initial values
sess.run(tf.global_variables_initializer())

# Evaluating shared variable (outside symbolic graph)
print("Initial value", sess.run(shared_vector_1))

# Within symbolic graph you use them just as any other input or transformation, not "get value" needed

In [None]:
# Setting a new value
sess.run(shared_vector_1.assign(np.arange(5)))

# Getting that new value
print("New value", sess.run(shared_vector_1))

# Gradients
* Tensorflow can compute derivatives and gradients automatically using the computation graph
* True to its name it can manage matrix derivatives
* Gradients are computed as a product of elementary derivatives via the chain rule:

$$ {\partial f(g(x)) \over \partial x} = {\partial f(g(x)) \over \partial g(x)}\cdot {\partial g(x) \over \partial x} $$

It can get you the derivative of any graph as long as it knows how to differentiate elementary operations

In [None]:
my_scalar = tf.placeholder('float32')

scalar_squared = my_scalar**2

# A derivative of scalar_squared by my_scalar
derivative = tf.gradients(scalar_squared, [my_scalar,])

In [None]:
derivative

In [None]:
import matplotlib.pyplot as plt
%matplotlib inline

x = np.linspace(-3, 3)
x_squared, x_squared_der = sess.run([scalar_squared, derivative[0]],
                                 {my_scalar:x})

plt.plot(x, x_squared,label="$x^2$")
plt.plot(x, x_squared_der, label=r"$\frac{dx^2}{dx}$")
plt.legend();

In [None]:
my_vector = tf.placeholder('float32', [None])
# Compute the gradient of the next weird function over my_scalar and my_vector
mixed_function = tf.reduce_mean(
    (my_vector+my_scalar)**(1+tf.nn.moments(my_vector,[0])[1]) + 
    1./ tf.atan(my_scalar))/(my_scalar**2 + 1) + 0.01*tf.sin(
    2*my_scalar**1.5)*(tf.reduce_sum(my_vector)* my_scalar**2
                      )*tf.exp((my_scalar-4)**2)/(
    1+tf.exp((my_scalar-4)**2))*(1.-(tf.exp(-(my_scalar-4)**2)
                                    )/(1+tf.exp(-(my_scalar-4)**2)))**2

der_by_scalar = tf.gradients(mixed_function, my_scalar)
der_by_vector = tf.gradients(mixed_function, my_vector)

In [None]:
# Plotting the derivative
scalar_space = np.linspace(1, 7, 100)

y = [sess.run(mixed_function, {my_scalar:x, my_vector:[1, 2, 3]})
     for x in scalar_space]

plt.plot(scalar_space, y, label='function')

y_der_by_scalar = [sess.run(der_by_scalar,
                         {my_scalar:x, my_vector:[1, 2, 3]})
                   for x in scalar_space]

plt.plot(scalar_space, y_der_by_scalar, label='derivative')
plt.grid()
plt.legend();

# Optimizers

While you can perform gradient descent by hand with automatic grads from above, tensorflow also has some optimization methods implemented for you. Recall momentum & rmsprop?

In [None]:
y_guess = tf.Variable(np.zeros(2, dtype='float32'))
y_true = tf.range(1, 3, dtype='float32')
loss = tf.reduce_mean((y_guess - y_true + tf.random_normal([2]))**2)  
optimizer = tf.train.MomentumOptimizer(0.01, 0.5).minimize(loss, var_list=y_guess)

In [None]:
import matplotlib_utils
from matplotlib import animation, rc
from IPython.display import HTML, display_html

fig, ax = plt.subplots()
y_true_value = sess.run(y_true)
level_x = np.arange(0, 2, 0.02)
level_y = np.arange(0, 3, 0.02)
X, Y = np.meshgrid(level_x, level_y)
Z = (X - y_true_value[0])**2 + (Y - y_true_value[1])**2
ax.set_xlim(-0.02, 2)
ax.set_ylim(-0.02, 3)
sess.run(tf.global_variables_initializer())
ax.scatter(*sess.run(y_true), c='red')
contour = ax.contour(X, Y, Z, 10)
ax.clabel(contour, inline=1, fontsize=10)
line, = ax.plot([], [], lw=2)

def init():
    line.set_data([], [])
    return (line,)

guesses = [sess.run(y_guess)]

def animate(i):
    sess.run(optimizer)
    guesses.append(sess.run(y_guess))
    line.set_data(*zip(*guesses))
    return (line,)

anim = animation.FuncAnimation(fig, animate, init_func=init,
                               frames=400, interval=20, blit=True)

anim.save(None, writer=matplotlib_utils.SimpleMovieWriter(0.001))

# Logistic regression
We implement the logistic regression

Plan:
* Use a shared variable for weights
* Use a matrix placeholder for `X`
 
The training is done on a two-class MNIST dataset
* please note that target `y` are `{0,1}` and not `{-1,1}` as in some formulae

In [None]:
my_scalar = tf.placeholder('float32')
my_vector = tf.placeholder('float32', [None])
# Compute the gradient of the next weird function over my_scalar and my_vector
mixed_function = tf.reduce_mean(
    (my_vector+my_scalar)**(1+tf.nn.moments(my_vector,[0])[1]) + 
    1./ tf.atan(my_scalar))/(my_scalar**2 + 1) + 0.01*tf.sin(
    2*my_scalar**1.5)*(tf.reduce_sum(my_vector)* my_scalar**2
                      )*tf.exp((my_scalar-4)**2)/(
    1+tf.exp((my_scalar-4)**2))*(1.-(tf.exp(-(my_scalar-4)**2)
                                    )/(1+tf.exp(-(my_scalar-4)**2)))**2

In [None]:
from sklearn.datasets import load_digits
mnist = load_digits(2) # loads images of only two digits (0/1)

X, y = mnist.data, mnist.target

print("y [shape - %s]:" % (str(y.shape)))
print("X [shape - %s]:" % (str(X.shape)))

In [None]:
plt.imshow(X[5].reshape([8,8]));

It's your turn now!
Just a small reminder of the relevant math:

$$
P(y=1|X) = \sigma(X \cdot W + b)
$$
$$
\text{loss} = -\log\left(P\left(y_\text{predicted} = 1\right)\right)\cdot y_\text{true} - \log\left(1 - P\left(y_\text{predicted} = 1\right)\right)\cdot\left(1 - y_\text{true}\right)
$$

$\sigma(x)$ is available via `tf.nn.sigmoid` and matrix multiplication via `tf.matmul`

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42) # 75% data in train set

In [None]:
# Model parameters - weights and bias
weights = tf.Variable(np.zeros([X.shape[1],1], dtype='float32'), name = "W") 
b = tf.Variable(0., dtype='float32', name = "b")

In [None]:
# Placeholders for the input data
input_X = tf.placeholder('float32', shape=(None,X.shape[1]))
input_y = tf.placeholder('float32', shape=(None,))

In [None]:
# The model

# Compute a vector of predictions, resulting shape should be [input_X.shape[0],]
predicted_y = tf.squeeze(tf.nn.sigmoid(tf.matmul(input_X, weights) + b)) # <predicted probabilities for input_X>
# Loss. Should be a scalar number - average loss over all the objects
loss = tf.reduce_mean(-input_y*tf.log(predicted_y)-(1-input_y)*tf.log(1-predicted_y)) 
optimizer = tf.train.GradientDescentOptimizer(0.001).minimize(loss)

A test to help with the debugging

In [None]:
validation_weights = 1e-3 * np.fromiter(map(lambda x:
        sess.run(mixed_function, {my_scalar:x, my_vector:[1, 0.1, 2]}),
                                   0.15 * np.arange(1, X.shape[1] + 1)),
                                   count=X.shape[1], dtype=np.float32)[:, np.newaxis]
# Compute predictions for given weights and bias
prediction_validation = sess.run(
    predicted_y, {
    input_X: X,
    weights: validation_weights,
    b: 1e-1})

# Load the reference values for the predictions
validation_true_values = np.loadtxt("validation_predictons.txt")

assert prediction_validation.shape == (X.shape[0],),\
       "Predictions must be a 1D array with length equal to the number " \
       "of examples in input_X"
assert np.allclose(validation_true_values, prediction_validation)
loss_validation = sess.run(
        loss, {
            input_X: X[:100],
            input_y: y[-100:],
            weights: validation_weights+1.21e-3,
            b: -1e-1})
assert np.allclose(loss_validation, 0.728689)

In [None]:
from sklearn.metrics import roc_auc_score
sess.run(tf.global_variables_initializer())
for i in range(5):
    sess.run(optimizer, {input_X: X_train, input_y: y_train})
    loss_i = sess.run(loss, {input_X: X_train, input_y: y_train})
    print("loss at iter %i: %.4f" % (i, loss_i))
    print("train auc:", roc_auc_score(y_train, sess.run(predicted_y, {input_X:X_train})))
    print("test auc:", roc_auc_score(y_test, sess.run(predicted_y, {input_X:X_test})))

## More

In [None]:
test_weights = 1e-3 * np.fromiter(map(lambda x:
    sess.run(mixed_function, {my_scalar:x, my_vector:[1, 2, 3]}),
                               0.1 * np.arange(1, X.shape[1] + 1)),
                               count=X.shape[1], dtype=np.float32)[:, np.newaxis]

First, test prediction and loss computation. This part doesn't require a fitted model.

In [None]:
prediction_test = sess.run(
    predicted_y, {
    input_X: X,
    weights: test_weights,
    b: 1e-1})

In [None]:
assert prediction_test.shape == (X.shape[0],),\
       "Predictions must be a 1D array with length equal to the number " \
       "of examples in X_test"

In [None]:
loss_test = sess.run(
    loss, {
        input_X: X[:100],
        input_y: y[-100:],
        weights: test_weights+1.21e-3,
        b: -1e-1})