[View in Colaboratory](https://colab.research.google.com/github/gbaeke/xylosai/blob/master/intro/TensorFlow_Eager.ipynb)

# 0 Eager execution

This notebook section follows the same structure as the eager execution guide on the TensorFlow website.   
https://www.tensorflow.org/guide/eager

## 0.1 Basics

In [None]:
import tensorflow as tf
tf.enable_eager_execution()

In [None]:
#x = tf.Variable(2) # not available when eager execution is enabled!

In [None]:
import tensorflow.contrib.eager as tfe

In [None]:
import numpy as np

In [None]:
vector_a = tf.constant([3,4])
vector_b = tf.constant([1,2])

print(vector_a)
print(vector_b)

vector_c = vector_a + vector_b
vector_cc = tf.add(vector_a,vector_b) #equivalent
vector_ccc = np.add(vector_a,vector_b)

print(vector_c)
print(vector_cc)

print(vector_ccc)

In [None]:
print(vector_a.numpy())
print(vector_b.numpy())

In [None]:
a = tf.constant(0)
print(a)
print(a + tf.constant(1))
print(a+1)

**conclusion** 
 
 1. Eager execution is imperative. TensorFlow operations evaluate and return values immediately (no session.run()). This means you can write code in an intuitive and imperative way. It also makes debugging easier.
 2. tf.Tensor objects reference concrete values instead of symbolic handles to nodes in a computational graph. 
 3. Nice integration with NumPy

## 0.2 Dynamic control flow

In [None]:
#Summing all even numbers between 0 and 100

tf_i = tf.constant(0)
tf_stop = tf.constant(100)
tf_sum = tf.constant(0)

while tf_i.numpy() <= 100:
    if int(tf_i % 2) == 0:
        tf_sum = tf_sum + tf_i
    tf_i = tf_i + 1
    
print(tf_sum)
print(tf_sum.numpy())

    



**conclusion**  
All functionality of the host language  is available while executing. This allows, for example, to write conditionals based on the tensor values.

## 0.3 Eager training

The power of machine learning frameworks like TensorFlow comes from **Automatic differentiation**. This allows to compute gradients easily during **backpropagation** when training a neural network with gradient descent.

In the original TensorFlow, a computational graph was fully built and then run afterwards. When running a graph that contains an optimizer (which involves backpropagation), TensorFlow keeps track of gradients, at every step of the graph, and uses them to update the parameters

In eager execution, the calculations are made "on the fly". A **tf.GradientTape** is used to trace operations for computing gradients later. All forward-pass operations are recorded to the tape. To compute the gradient, play the tape backwards and then discard. A tf.GradientTape can only compute one gradient, subsequent calls throw a runtime error.

In [None]:
x = tfe.Variable(2.0) #notice the float type

with tf.GradientTape() as tape:
    loss = x**3
    loss_a = a**3
    
grad = tape.gradient(loss, x)
print(grad)


Below is a simple training loop for linear regression, using the tf.GradientTape. First, a toy dataset of 1000 points around 3 \* x + 2 is generated. Then, the weight (W) and bias (B) are estimated with gradient descent. The estimated W and B should be close to 3 and 2 respectively. 

In [None]:
# A toy dataset of points around 3 * x + 2
NUM_EXAMPLES = 1000
training_inputs = tf.random_normal([NUM_EXAMPLES])
noise = tf.random_normal([NUM_EXAMPLES])
training_outputs = training_inputs * 3 + 2 + noise

def prediction(input, weight, bias):
    return input * weight + bias

# A loss function using mean-squared error
def loss(weights, biases):
    error = prediction(training_inputs, weights, biases) - training_outputs
    return tf.reduce_mean(tf.square(error))

# Return the derivative of loss with respect to weight and bias
def grad(weights, biases):
    with tf.GradientTape() as tape:
        loss_value = loss(weights, biases)
    return tape.gradient(loss_value, [weights, biases])

train_steps = 200
learning_rate = 0.01
# Start with arbitrary values for W and B on the same batch of data
W = tfe.Variable(5.)
B = tfe.Variable(10.)

print("Initial loss: {:.3f}".format(loss(W, B)))

for i in range(train_steps):
    dW, dB = grad(W, B)
    # a tfe.Variable object has the handy assign_sub method. See next section.
    W.assign_sub(dW * learning_rate)
    B.assign_sub(dB * learning_rate)
    if i % 20 == 0:
        print("Loss at step {:03d}: {:.3f}".format(i, loss(W, B)))

print("Final loss: {:.3f}".format(loss(W, B)))
print("W = {}, B = {}".format(W.numpy(), B.numpy()))

Success! After 200 steps, the estimated line is 2.99 \* X + 2.122. This is pretty close to the original 3 \* x + 2.

## 0.4 About variables in eager execution: tfe.Variable()

Variables are objects! During eager execution, variables persist until the last reference to the object is removed, and is then deleted. Sharing variables means simply reusing variables. No need to worry about variable scopes etc. tfe.Variable objects store **mutable** tf.Tensor values. 

Thus, TensorFlow eager is more object-oriented and pythonic. Read more on the [TensorFlow guide on eager execution](https://www.tensorflow.org/guide/eager)

In [None]:
with tf.device("gpu:0"):
    v = tfe.Variable(tf.random_normal([1000, 1000]))
    v = None  # v no longer takes up GPU memory

## 0.5 Working with graphs

We have seen the advantages of eager execution in action. Writing graph code was more complex and more difficult to debug. 

But **graph execution** still has important advantages over eager execution. It is better for distributed training, performance optimizations, and - most importantly - **deployment**.

From the TensorFlow guide:  

"  
For building and training graph-constructed models, the Python program first builds a graph representing the computation, then invokes Session.run to send the graph for execution on the C++-based runtime. This provides:

1. Automatic differentiation using static autodiff.
2. Simple deployment to a platform independent server.
3. Graph-based optimizations (common subexpression elimination, constant-folding, etc.).
4. Compilation and kernel fusion.
5. Automatic distribution and replication (placing nodes on the distributed system).  

Deploying code written for eager execution is more difficult: either generate a graph from the model, or run the Python runtime and code directly on the server

The same code written for eager execution will also build a graph during graph execution. Do this by simply running the same code in a new Python session where eager execution is not enabled. 
"

Write compatible code! Write code that is going to work both ways. Tip: use the Keras API. From the video:


Refer to the resources below for more details and ideas.

Conversely, it is possible to force eager execution inside a graph session. This can be used, for example, when there is a specific piece of code that you don't know how to implement in the graph way. Use **tfe.py_func**, a wrapper for eager python functions. 

In [None]:
#To see that this code actually works, restart the Jupyter kernel (Note that you will lose all existing variables).
#notice that tf.enable_eager_execution() is not called.

import tensorflow as tf
import tensorflow.contrib.eager as tfe

def my_py_func(x):
    x = tf.matmul(x, x)  # You can use tf ops
    print(x)  # but it's eager!
    print(x.numpy())
    if int(x.numpy()) > 3:
        print("the result is higher than 3")
    return x

with tf.Session() as sess:
    x = tf.placeholder(dtype=tf.float32)
    # Call eager function in graph!
    pf = tfe.py_func(my_py_func, [x], tf.float32)
    sess.run(pf, feed_dict={x: [[2.0]]})  # [[4.0]]

Inside the eager function my_py_func, you can do all "eager things" like inspecting values and using the dynamic control flow with conditionals.

## 0.6 Resources

This notebook was inspired on the [TensorFlow guide on eager execution](https://www.tensorflow.org/guide/eager), found on the official website, and a [video from the TensorFlow Dev Summit 2018](https://www.youtube.com/watch?v=T8AW0fKP0Hs). Follow the links to learn more about eager execution!





# Eager basics

Eager execution is an imperative programming environment. There is no need to build a graph and then execute it. Operations return concrete values.

For more details also check [this](https://towardsdatascience.com/eager-execution-tensorflow-8042128ca7be)

In [None]:
import tensorflow as tf

tf.enable_eager_execution()

In [None]:
tf.executing_eagerly() # should return true

In [None]:
vector_a = tf.constant([3,4])
vector_b = tf.constant([1,2])

In [None]:
total = vector_a + vector_b
print(total) # the output operation is the actual value and not the tensor

In [None]:
# tensors of different ranks
# rank 0 (just a scalar)
my_scalar = tf.constant("Elephant", tf.string)

# rank 1 (vector)
my_vector = tf.constant([1,2,3,4])

# rank 2 (matrix)
my_matrix = tf.constant([[1,2], [3,4]])


# rank 3 (cube, e.g. x and y coordinates for pixels + RGB values)
my_cube = tf.constant([ [[1,2],[3,4]], [[5,6], [7,8]] ])

In [None]:
# print the ranks
print(tf.rank(my_scalar), tf.rank(my_vector), tf.rank(my_matrix), tf.rank(my_cube))