# TensorFlow

We don't have to code up back propagation for every possible function or neural network architecture that we want to fit. There are lots of libraries targeted towards machine learning that make this task easy and computationally efficient. One of the most popular libraries is [TensorFlow](https://www.tensorflow.org/). It was developed by Google Brain and is now open source under the Apache License 2.0.

The workflow consists of building a computational graph where "operations" act on "tensors" that can be automatically differentiated. The tensors themselves don't hold values, but instead are "initialized" or "fed" when actually running the computation. We will see how that works in this tutorial.

The TensorFlow website contains a much more [detailed introduction](https://www.tensorflow.org/guide/low_level_intro) if you want to learn more.

## Auto differentiation
By building up a computiational graph, the gradient w.r.t. arbitrary parameters in the graph can be calculated via backpropagation. Let's try this for the toy example of calculating the first and second derivative of the sinus function.

In [None]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

In [None]:
t = tf.constant(np.linspace(0, 2*np.pi, 100))
t

In [None]:
f = tf.sin(t)
f

In [None]:
df = tf.gradients(f, t)
df

Since the gradient is also just a computation graph, we can calculate the gradient of the gradient to get the second derivative.

In [None]:
ddf = tf.gradients(df, t)
ddf

The `tf.Tensor` objects above don't contain any values yet. We need to actually *run* the computation graph in a `tf.Session` to obtain the output values.

In [None]:
with tf.Session() as sess:
    t_np = sess.run(t)
    plt.plot(t_np, sess.run(f), label="f(t)")
    plt.plot(t_np, sess.run(df[0]), label="f'(t)")
    plt.plot(t_np, sess.run(ddf[0]), label="f''(t)")
    plt.legend()

## Manually build a NN in TensorFlow

Let's build a 1-hidden-layer NN, similar to what we did in [NNFromScratch.ipynb](NNFromScratchNumpy.ipynb) now with TensorFlow.

Placeholders are used for values that should be fed in later. Dimensions of size `None` are meant to be of arbitray size, in this case this will be the training example index. 

When we create Tensors, they will be added to the current graph of TensorFlow. To identify them later, in case we haven't assigned them to a python variable it is useful to give them a name. TensorFlow will attach an index to the name if it is already taken.

The first Tensor will hold the input values that we will feed in later.

In [None]:
inp = tf.placeholder(tf.float32, (None, 2), name='input')
inp

Next, we define the weights and biases for the hidden layer. We could use a placeholder as well and feed it the initial weights, but to illustrate a different concept, let's use a variable. We use the `tf.get_variable` method. Note that this method won't add indices to the names when they are already taken, but throw an exception instead.

Variables have to be initialized. We could e.g. specify that the weights are initalized to some fixed values:

In [None]:
W = tf.get_variable("W", dtype=tf.float32, initializer=np.random.randn(2, 16).astype(np.float32))

In [None]:
b = tf.get_variable("b", dtype=tf.float32, initializer=tf.zeros(16))

In [None]:
print(W)
print(b)

Now let's define the output of the first hidden layer

In [None]:
z = tf.add(tf.matmul(inp, W), b, name="z")
z

In [None]:
a = tf.nn.relu(z, name="a")
a

And the weights and outputs of the output layer.

Let's try another method for variable initialization here - using tensorflows initializers:

In [None]:
W2 = tf.get_variable("W2", dtype=tf.float32, initializer=tf.random_normal_initializer()(shape=(16, 1)))
b2 = tf.get_variable("b2", dtype=tf.float32, initializer=tf.zeros(1))
print(W2)
print(b2)

In [None]:
z2 = tf.add(tf.matmul(a, W2), b2)
z2

We will skip the activation function, since we will use a loss function that already applies the sigmoid transformation. This is numerically more stable.

But first, we need to define a Tensor that will hold the labels

In [None]:
y = tf.placeholder(tf.float32, (None,1))
y

Now the binary cross entropy with a sigmoid transformation of the input values that don't have the sigmoid applied already.

In [None]:
L = tf.nn.sigmoid_cross_entropy_with_logits(labels=y, logits=z2)
L

Now - **and this is the whole point of this tutorial** - to get the gradients w.r.t. all parameters, we can simply create the gradients as a `tf` operation and done!!

In [None]:
grad = tf.gradients(L, [W, b, W2, b2])
grad

We can use the `tensorboard` extension for Jupyter notebooks to display the graph we have created:

In [None]:
%load_ext tensorboard

In [None]:
from tensorflow.python.summary.writer.writer import FileWriter
FileWriter('logs/train', graph=tf.get_default_graph()).close()
%tensorboard --logdir logs/train

## Run the NN and calculate the gradient

Getting the output of the NN and the gradient w.r.t. the parameters is now simply a matter of feeding in the values and running the actual `tf` graph. Lets create the feed values with numpy:

In [None]:
inp_val = np.random.randn(10, 2)
inp_val

In [None]:
y_val = np.random.randint(0, 2, size=10).reshape(-1, 1)
y_val

To run the graph, we have to create a `tf.Session` and pass a Tensor to the `run` method and a dictionary with the values for all placeholder tensors - in our case the input and targets. But before that, the values of the variables have to be initialized.

In [None]:
sess = tf.Session()

In [None]:
sess.run(tf.global_variables_initializer())

In [None]:
sess.run(b2)

Now, lets just output the values of the gradients

In [None]:
inp

In [None]:
feed_dict = {inp : inp_val, y : y_val}
grad_val = sess.run(
    grad,
    feed_dict=feed_dict
)
grad_val

And store them for later use

In [None]:
dW_tf, db_tf, dW2_tf, db2_tf = grad_val

Let's see if we can reproduce that with the formulas we used in [NNFromScratch.ipynb](NNFromScratchNumpy.ipynb)

Here a copy paste of the relevant functions:

In [None]:
def sigmoid(Z):
    return 1/(1+np.exp(-Z))

def relu(Z):
    return np.maximum(0,Z)

def sigmoid_derivative(Z):
    sig = sigmoid(Z)
    return sig * (1 - sig)

def relu_derivative(Z):
    dZ = (Z >= 0)
    return dZ

In [None]:
def single_layer_forward_propagation(A_prev, W_curr, b_curr, activation="relu"):
    Z_curr = np.matmul(W_curr, A_prev) + b_curr
    
    if activation is "relu":
        activation_func = relu
    elif activation is "sigmoid":
        activation_func = sigmoid
    else:
        raise Exception('Non-supported activation function')
        
    return activation_func(Z_curr), Z_curr

In [None]:
def single_layer_backward_propagation(dA_curr, W_curr, b_curr, Z_curr, A_prev, activation="relu"):
    
    if activation is "relu":
        derivative_activation_func = relu_derivative
    elif activation is "sigmoid":
        derivative_activation_func = sigmoid_derivative
    else:
        raise Exception('Non-supported activation function')
            
    dZ_curr = dA_curr * derivative_activation_func(Z_curr)
    dW_curr = np.matmul(
        dZ_curr,
        # need to transpose only the last 2 dimensions, 
        # since the first dimension is the training example index
        np.transpose(A_prev, (0, 2, 1))
    )
    db_curr = dZ_curr
    dA_prev = np.matmul(W_curr.T, dZ_curr)

    return dA_prev, dW_curr, db_curr

In [None]:
def get_loss_value(Y_hat, Y):
    return - np.mean(Y * np.log(Y_hat) + (1 - Y) * np.log(1 - Y_hat))

In [None]:
def get_loss_derivative(Y_hat, Y):
    return - (np.divide(Y, Y_hat) - np.divide(1 - Y, 1 - Y_hat));

First let's calculate the forward pass. Unfortunately, tf and numpy have a bit different conventions for matmul, so i'm very sorry for all the confusing transposes and reshapes in the following section. If you have to do something like that, usually it's best to experiment and see if output dimensions are correct after each step.

First, lets store the initialized values of the NN parameters in python variables

In [None]:
W_val, b_val, W2_val, b2_val = sess.run([W, b, W2, b2], feed_dict=feed_dict)

And then run our manual `numpy` forward propagation

In [None]:
a_val, z_val = single_layer_forward_propagation(inp_val.reshape(-1, 2, 1), W_val.T, b_val.reshape(-1, 1))
print(a_val[0].reshape(-1))
print(z_val[0].reshape(-1))

compared to what `tf` gives:

In [None]:
print(sess.run(a, feed_dict=feed_dict)[0])
print(sess.run(z, feed_dict=feed_dict)[0])

Next layer:

In [None]:
a2_val, z2_val = single_layer_forward_propagation(
    a_val.reshape(-1, 16, 1), W2_val.T, b2_val.reshape(-1, 1), activation="sigmoid"
)
print(a2_val.reshape(-1))
print(z2_val.reshape(-1))

For `tf`, we don't have `a2` because we used a definition of the loss function where the sigmoid activation is already included. But we have `z2`:

In [None]:
print(sess.run(z2, feed_dict=feed_dict).reshape(-1))

Great, we implemented the forward pass correctly, so now lets do the backward pass and check if we get the same gradients like Tensorflow

In [None]:
dL = get_loss_derivative(a2_val, y_val.reshape(-1, 1, 1))
dL.reshape(-1)

Propagate back into the output layer

In [None]:
da, dW2, db2 = single_layer_backward_propagation(
    dL, W2_val.T, b2_val.reshape(-1, 1), z2_val, a_val, activation="sigmoid"
)
print(np.sum(dW2, axis=0).reshape(-1))
print(np.sum(db2, axis=0).reshape(-1))

In [None]:
print(dW2_tf.reshape(-1))
print(db2_tf.reshape(-1))

And from there into the hidden layer

In [None]:
dinp, dW, db = single_layer_backward_propagation(
    da, W_val.T, b_val.reshape(-1, 1), z_val, inp_val.reshape(-1, 2, 1), activation="relu"
)
print(np.sum(dW, axis=0).T)
print(np.sum(db, axis=0).reshape(-1))

In [None]:
print(dW_tf)
print(db_tf)

Great! Tensorflow does the same thing we attempted to do before.