### Forked from https://github.com/zaidalyafeai/Notebooks/blob/master/Eager_Execution_Enabled.ipynb

# Introduction 

Eager Execution (EE) enables you to run operations immediately. As we know in TensorFlow, you have to create a graph and run it within a session in order to execute the operations of the graph. On the other hand, EE enables you to run operations directly and inspect the output as the operations are executed. This is very useful especially for debugging. Moreover, EE is pythonic and intergrates pretty well with numpy so it makes programming really easy and flexible. The next version of TenosrFlow "2.0" will enable EE by default. 

From Google AI [article](https://ai.googleblog.com/2017/10/eager-execution-imperative-define-by.html), these are some benefits of using EE

   * Fast debugging with immediate run-time errors and integration with  python tools
   * Support for dynamic models using easy-to-use Python control flow
   * Strong support for custom and higher-order gradients
   * Almost all of the available TensorFlow operations
   
   ![alt text](https://i.imgur.com/YUlhihi.png)

# Enabling Eager Execution 
In current versions of TensorFlow eager execution is not enabled by default so you have to enable it. 

In [0]:
import tensorflow as tf

tf.enable_eager_execution()

Check if eager execution is enabled 

In [0]:
tf.executing_eagerly()

True

# Executing Ops Eagerly 
By perfoming operations you can see the output directly without creating a session. 

In [0]:
x = [[2.]]
m = tf.square(x)
print(m)

tf.Tensor([[4.]], shape=(1, 1), dtype=float32)


You can call `.numpy` to retrieve the results of the tensor as a numpy array

In [0]:
m.numpy()

array([[4.]], dtype=float32)

You can also compute an operation including two tensors 

In [0]:
a = tf.constant([[1, 2],
                 [3, 4]])

b = tf.constant([[2, 1],
                 [3, 4]])

ab = tf.matmul(a, b)

print('a * b = \n', ab.numpy())

a * b = 
 [[ 8  9]
 [18 19]]


# Constants and Variables 


*   `tf.constant`, creates a constant tensor populated with the values as argument. The values are immutable. 
*   `tf.Variable `, this method encapsultes a mutable tensor that can be changed later using assign 


Creating a constant tensor 

In [0]:
a = tf.constant([[2,3]])
print(a)

tf.Tensor([[2 3]], shape=(1, 2), dtype=int32)


A constant tensor is immutable so you cannot assign a new value to it.

In [0]:
try:
    a.assign([[3,4]])
except:
    print('Exception raised ')

Exception raised 


On the other hand variables are mutable and can be assigned a new value

In [0]:
v = tf.Variable(5.)

print('Old value for v =', v.numpy())
v.assign(2.)
print('New value for v =', v.numpy())

Old value for v = 5.0
New value for v = 2.0


You can also increment/decrement the value of a tensor 

In [0]:
v.assign(2.)
print('value     : ', v.numpy())
print('increment : ', tf.assign_add(v, 1).numpy())
print('decrement : ', tf.assign_sub(v, 1).numpy())

value     :  2.0
increment :  3.0
decrement :  2.0


You can return many information from a tensor variable, like the name, type , shape and what device it executes on. 

In [0]:
print('name  : ', v.name)
print('type  : ', v.dtype)
print('shape : ', v.shape)
print('device: ', v.device)

name  :  Variable:0
type  :  <dtype: 'float32'>
shape :  ()
device:  /job:localhost/replica:0/task:0/device:GPU:0


# Gradient Evaluation

Gradient evaluation is very importnat machine learning because it is based on function optimization. You can use `tf.GradientTape()` method to record the gradient of an arbitrary function

In [0]:
w = tf.Variable(2.0)

#watch the gradient of the loss operation
with tf.GradientTape() as tape:
    loss = w * w

grad = tape.gradient(loss, w)
print(f'The gradient of w^2 at {w.numpy()} is {grad.numpy()}')

The gradient of w^2 at 2.0 is 4.0


You also compute the gradient directly using `gradients_function`. In this example we evaluate the gradient of the sigmoid function 

$$f(x) = \frac{1}{1+e^{-x}}$$

Note that 

$$f'(x) = \frac{e^{-x}}{(1+e^{-x})^2} = f(x)(1-f(x)) $$

In [0]:
import tensorflow.contrib.eager as tfe 

def sigmoid(x):
    return 1/(1 + tf.exp(-x))

grad_sigmoid = tfe.gradients_function(sigmoid)

print('The gradient of the sigmoid function at 2.0 is ', grad_sigmoid(2.0)[0].numpy())

The gradient of the sigmoid function at 2.0 is  0.104993574


You can also compute higher order derivatives by nesting a gradient functions. For instance, 

$$f(x) = \log(x) , f'(x) = \frac{1}{x}, f''(x) = \frac{-1}{x^2}$$

In [0]:
dx = tfe.gradients_function

def log(x):
    return tf.log(x)

dx_log = dx(log)
dx2_log = dx(dx(log))
dx3_log = dx(dx(dx(log)))

print('The first  derivative of log at x = 1 is ', dx_log(1.)[0].numpy())
print('The second derivative of log at x = 1 is ', dx2_log(1.)[0].numpy())
print('The third  derivative of log at x = 1 is ', dx3_log(1.)[0].numpy())

The first  derivative of log at x = 1 is  1.0
The second derivative of log at x = 1 is  -1.0
The third  derivative of log at x = 1 is  2.0


# Linear Regression 

This example is refactored from https://www.tensorflow.org/guide/eager. We create a complete example of using linear regression to predict the paramters of the function 

$$f(x) = 3 x + 2 + noise$$

Given a point $x$ we want to predict the value of $y$. We train the model on 1000 data pairs $(x,f(x))$. 

The model to learn is a linear model 

$$\hat{y} = W x + b$$

Note that, we use `tf.GradientTape` to record the gradient with respect our trainable paramters.  

We MSE to calcuate the loss 

$$g = (y-\hat{y})^2$$

We use Gradient Descent to update the paramters 

$$W = W - \alpha  \frac{\partial g}{\partial W}$$

$$b = b - \alpha  \frac{\partial g}{\partial b}$$

In [0]:
#1000 data points 
NUM_EXAMPLES = 1000

#define inputs and outputs with some noise 
X = tf.random_normal([NUM_EXAMPLES])  #inputs 
noise = tf.random_normal([NUM_EXAMPLES]) #noise 
y = X * 3 + 2 + noise  #true output

#create model paramters with initial values 
W = tf.Variable(0.)
b = tf.Variable(0.)

#training info
train_steps = 200
learning_rate = 0.01

for i in range(train_steps):
  
  #watch the gradient flow 
    with tf.GradientTape() as tape:
    
    #forward pass 
    yhat = X * W + b
    
    #calcuate the loss (difference squared error)
    error = yhat - y
    loss = tf.reduce_mean(tf.square(error))
  
    #evalute the gradient with the respect to the paramters
    dW, db = tape.gradient(loss, [W, b])

    #update the paramters using Gradient Descent  
    W.assign_sub(dW * learning_rate)
    b.assign_sub(db* learning_rate)

    #print the loss every 20 iterations 
    if i % 20 == 0:
    print("Loss at step {:03d}: {:.3f}".format(i, loss))
      
print(f'W : {W.numpy()} , b  = {b.numpy()} ')

Loss at step 000: 14.478
Loss at step 020: 6.947
Loss at step 040: 3.632
Loss at step 060: 2.173
Loss at step 080: 1.531
Loss at step 100: 1.248
Loss at step 120: 1.124
Loss at step 140: 1.069
Loss at step 160: 1.045
Loss at step 180: 1.035
W : 2.9950246810913086 , b  = 1.9608360528945923 
