<a href="https://colab.research.google.com/github/hellocybernetics/TensorFlow_Eager_Execution_Tutorials/blob/master/TF_eager_basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
import tensorflow as tf
import numpy as np
import pandas as pd

# To start eager execution (this must be top of code)
tf.enable_eager_execution()

### Create Tensors

In [30]:
x = tf.convert_to_tensor(1.)
w = tf.convert_to_tensor(2.)
b = tf.convert_to_tensor(3.)

print(type(x))
print(x)

<class 'tensorflow.python.framework.ops.EagerTensor'>
tf.Tensor(1.0, shape=(), dtype=float32)


### Build a computational graph for Automatic differentiation.
When you focus the $x$ of two functions, 

$$
y(x) = w  x + b
$$ 
and
$$ 
z(x) = w  x^2 + b x
$$
you can write the code of "build a computational graph for automatic differentiation" as below.

In [31]:
with tf.GradientTape(persistent=True) as g:
    g.watch(x)
    y = w * x + b
    z = w * x**2 + b * x

# dy/dx = 2
# dz/dx = 4 * x + 3  (now x=1 so dz/dx = 7)
dy_dx = g.gradient(y, x)
dz_dx = g.gradient(z, x)
    
print(dy_dx)
print(dz_dx)

tf.Tensor(2.0, shape=(), dtype=float32)
tf.Tensor(7.0, shape=(), dtype=float32)


### linear model
Linear model is as below.
$$
 y_i = Wx_i + b
$$

In [63]:
x = tf.random_normal(shape=[10, 3])
y = tf.random_normal(shape=[10, 2])

# tf.keras.layers.Dense needs only output dimension.
# When tf.keras.layers get input to calculate output at the first time,
# the input dimension is determined.
linear = tf.keras.layers.Dense(units=2)
predict_y = linear(x)

print("weight: \n", linear.weights[0])
print("bias:\n", linear.weights[1], end="\n\n")
print("output shape:\n", y.shape)

weight: 
 <tf.Variable 'dense_16/kernel:0' shape=(3, 2) dtype=float32, numpy=
array([[ 0.8318362 , -0.61764836],
       [ 0.06596565, -0.2781443 ],
       [-0.324251  ,  0.02151608]], dtype=float32)>
bias:
 <tf.Variable 'dense_16/bias:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)>

output shape:
 (10, 2)


#### loss function
TensorFlow eager execution has similar API to PyTorch, however the implementation of "Build a computational graph for Automatic differentiation"  is a little diferrent.
At PyTorch, Tensor itself holds calculation graph, and have the method for automatic differentiation. On the other hand, at TensorFlow eager execution, computational graph is keeped by some functions (for example, `tf.GradientTape()`). 

When training neural network, we can use `tf.contrib.eager.implicit_value_and_gradients()`. This function recognizes the trainable parameters of NN, holds related computational graph, and return the loss value, parameter instances, and that grads.

In [64]:
def loss_fn(model, x, y):
    predict_y = model(x)
    return tf.keras.losses.mean_squared_error(predict_y, y)

value_and_grads = tf.contrib.eager.implicit_value_and_gradients(loss_fn)
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.01)


loss, grads = value_and_grads(model=linear, x=x, y=y)

print("loss: \n", loss, end="\n\n")
print("weight grads: \n", grads[0][0], end="\n\n")
print("weight instances: \n", grads[0][1], end="\n\n")
print("bias grads: \n", grads[1][0], end="\n\n")
print("bias instances: \n", grads[1][1], end="\n\n")

loss: 
 tf.Tensor(
[0.04972696 1.6548975  0.33088797 0.23095478 0.6001038  1.9319891
 2.4406717  0.98343223 0.9945836  0.40538147], shape=(10,), dtype=float32)

weight grads: 
 tf.Tensor(
[[ 2.0563087 -3.6274142]
 [ 4.147823  -0.8882311]
 [-1.2567902  2.3985233]], shape=(3, 2), dtype=float32)

weight instances: 
 <tf.Variable 'dense_16/kernel:0' shape=(3, 2) dtype=float32, numpy=
array([[ 0.8318362 , -0.61764836],
       [ 0.06596565, -0.2781443 ],
       [-0.324251  ,  0.02151608]], dtype=float32)>

bias grads: 
 tf.Tensor([-4.62727    1.6120654], shape=(2,), dtype=float32)

bias instances: 
 <tf.Variable 'dense_16/bias:0' shape=(2,) dtype=float32, numpy=array([0., 0.], dtype=float32)>



#### Optimizing
We aim to decrese loss value with update parameters as below. 
$$
\begin{align}
W & \leftarrow W - \epsilon \frac{dLoss(W)}{dW}\\\
b & \leftarrow b - \epsilon \frac{dLoss(W)}{db}
\end{align}
$$

where $\epsilon$ is learning rate.

After understanding this code, you are able to write training loop code.

In [65]:
# initial loss value of sum of all data.
loss, grad = value_and_grads(model=linear, x=x, y=y)
print("loss: ", tf.reduce_sum(loss))

# update prameters using grads
optimizer.apply_gradients(grads)

# loss value after update (may be less than before update)
loss, grad = value_and_grads(model=linear, x=x, y=y)
print("loss: ", tf.reduce_sum(loss))

loss:  tf.Tensor(9.622629, shape=(), dtype=float32)
loss:  tf.Tensor(9.004152, shape=(), dtype=float32)


### Data
#### Convert to tf.Tensor from numpy.ndarray

In [76]:
X_numpy = np.random.randn(3, 3)
print(type(X_numpy))
print(X_numpy)

X_tensor = tf.convert_to_tensor(X_numpy)
print(type(X_tensor))
print(X_tensor)

<class 'numpy.ndarray'>
[[-0.58254555  0.31973299 -1.05691421]
 [-0.50315322  0.52309492 -0.38714436]
 [-0.20711872  0.55952568  0.17786334]]
<class 'tensorflow.python.framework.ops.EagerTensor'>
tf.Tensor(
[[-0.58254555  0.31973299 -1.05691421]
 [-0.50315322  0.52309492 -0.38714436]
 [-0.20711872  0.55952568  0.17786334]], shape=(3, 3), dtype=float64)


#### conver to numpy.array from tf.Tensor

In [77]:
X_tensor = tf.random_normal(shape=[3, 3])
print(type(X_tensor))
print(X_tensor)

X_numpy = X_tensor.numpy()
print(type(X_numpy))
print(X_numpy)

<class 'tensorflow.python.framework.ops.EagerTensor'>
tf.Tensor(
[[-0.91925687 -0.6136823  -1.4136612 ]
 [-0.5081144  -1.202485    1.4589684 ]
 [ 0.24295777 -0.2634425  -0.9960576 ]], shape=(3, 3), dtype=float32)
<class 'numpy.ndarray'>
[[-0.91925687 -0.6136823  -1.4136612 ]
 [-0.5081144  -1.202485    1.4589684 ]
 [ 0.24295777 -0.2634425  -0.9960576 ]]


### tf.Dataset pipline