# TensorFlow 2 Introduction

[TensorFlow](https://www.tensorflow.org/) is an open-source software library for numerical computations developed by Google. Its main application area is machine learning, in particular deep learning with neural networks. TensorFlow  makes it very easy to distribute computations over CPUs and GPUs. Thus, it is very well suited for large-scale data analysis.

A comparison of deep learning frameworks taken from [towardsdatascience.com](https://towardsdatascience.com/):
<img src="https://cdn-images-1.medium.com/max/1600/1*dYjDEI0mLpsCOySKUuX1VA.png"></img>
<img src="https://cdn-images-1.medium.com/max/800/1*s_BwkYxpGv34vjOHi8tDzg.png"></img>
<img src="https://cdn-images-1.medium.com/max/800/1*eOOYV3C5klsuVBMX31ngxw.png"></img>

The following introduction is based on the [Getting Started With TensorFlow](https://www.tensorflow.org/get_started/get_started) tutorial and the first lessons from [LearningTensorFlow](http://learningtensorflow.com/). Go to these sites to learn more.

## Tensors

Let's start by importing TensorFlow and checking its version, which we assume to be at least 2 in this tutortial:

In [None]:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)

If you are using *Google Colaborator* notebook environment, you may need to install the proper TensorFlow version first using something like  `!pip install -q tensorflow==2.0.0-alpha0`.

In a TensorFlow program, the computational operations are organized in a graph. The nodes correspond to particular operations. The edges define the data flow between the nodes. These data are tensors, which in this context are simply multidimensional arrays. 
In TensorFlow versions larger than 2.0.0, this can almost be regarded as an implementation detail which is  hidden from the user.

Each node can take some tensors as input and outputs a tensor. One basic type of node without input is a constant node representing a constant tensor. Let's define some constants holding tensors of different shapes:

In [None]:
node1 = tf.constant(3.0, tf.float32, name='Klaus') # rank 0 tensor; scalar with shape [], tf.float32 is default and can be omitted
print("node1, rank 0 tensor, a scalar:", node1)

In [None]:
print("rank 1 tensor; a vector:", tf.constant([1. ,2., 3.]))
print("rank 2 tensor; a matrix:", tf.constant([[1., 2., 3.], [4., 5., 6.]]))
print("rank 3 tensor:", tf.constant([[[1., 2., 3.], [1., 2., 3.]], [[7., 8., 9.], [1., 2., 3.]]] ) )
print("rank 3 tensor:", tf.constant([[[1., 2., 3.], [4., 5., 6.]]]))

To get the shape and type of a tensor:

In [None]:
print(node1.shape)
print(node1.dtype)

Tensors have the advantage of being easily handled by GPUs and  TPUs.

## Simple computations

Let's do some simple computations:

In [None]:
print(tf.square(5))
print(tf.reduce_sum([1, 2, 3]))

In [None]:
node1 = tf.constant(3.0)
print("node1:", node2)

node2 = tf.constant(4.0)
print("node2:", node2)

node3 = tf.add(node1, node2)
print("node3:", node3)

node4 = node3 + node2  # The plus operator is mapped to tf.add
print("node4:", node4)

## Automatic conversion

In [None]:
node1 = tf.constant(3.0)
print("node1:", node1)

x = 5.0
print("node2:", x)

node3 = tf.add(node1, x)  # x is automatically converted
print("node3:", node3)

### TensorFlow and NumPy

TensorFlow and NumPy go hand-in-hand, as shown in this example from the official TensorFlow documentation:

In [None]:
import numpy as np

ndarray = np.ones([3, 3])

print("TensorFlow operations convert NumPy arrays to Tensors automatically")
tensor = tf.multiply(ndarray, 42)
print(tensor)

print("And NumPy operations convert Tensors to NumPy arrays automatically")
print(np.add(tensor, 1))

print("The .numpy() method explicitly converts a Tensor to a NumPy array")
print(tensor.numpy())

## Using functions

In TensorFlow, we can distinguish between

1. Defining the computational graph, and

2. Running the computational graph.

In TensorFlow 2 uses "eager execution" which hides this implementation detail. Running graphs now looks like executing normal Python code.

It is recommended to group operations in a function:

In [None]:
def first_TF_function():
    node1 = tf.constant(3.0)
    node2 = tf.constant(4.0)
    return tf.add(node1, node2)

In [None]:
print(first_TF_function())

Adding the decoration `tf.function` to the functions 
has the effect that all operations are combined in a single graph, which is 
compiled just-in-time. This speeds-up computations.

In [None]:
@tf.function
def first_TF_function():
    node1 = tf.constant(3.0)
    node2 = tf.constant(4.0)
    return tf.add(node1, node2)

When adding the decoration, TensorFlow tries to covert the code in the function into TensorFlow operations, which can be executed on GPUs or TPUs. Without decoration:

In [None]:
def first_TF_function():
    node1 = 3.0
    node2 = 4.0
    return node1 + node2

In [None]:
first_TF_function()

With decoration:

In [None]:
@tf.function
def first_TF_function():
    node1 = 3.0
    node2 = 4.0
    return node1 + node2

In [None]:
first_TF_function()

## Visualizing the computational graph

In [None]:
# Load the TensorBoard notebook extension.
%load_ext tensorboard.notebook

In [None]:
# Clear any logs from previous runs
!rm -rf ./logs/ 

# Import for timestamp
from datetime import datetime

In [None]:
@tf.function
def my_func():
  node1 = tf.constant(3.0, name="a")
  node2 = tf.constant(4.0, name="b")
  return tf.add(node1, node2, name="sum")

# Set up logging.
writer = tf.summary.create_file_writer("logs/func/trial0")

# Bracket the function call with
# tf.summary.trace_on() and tf.summary.trace_export().
tf.summary.trace_on(graph=True)
# Call only one tf.function when tracing.
z = my_func()
#z = first_TF_function()
with writer.as_default():
  tf.summary.trace_export(
      name="my_func_trace",
      step=0)

In [None]:
%tensorboard --logdir logs/func

You  can call 

`tensorboard --logdir=/tmp/LSDAIntro/`

outside the notebook, giving you a link to view the graph (and more) in a browser (via opening, e.g., `http://localhost:6006`).
You may have to locate the `tensorboard` executable first. 

## Broadcasting

In [None]:
node1 = tf.constant([1, 2, 3])
node2 = tf.constant([4, 5, 6])
node3 = tf.add(node1, node2)

print(node3)

In [None]:
node1 = tf.constant([1, 2, 3])
node2 = tf.constant([4])
node3 = tf.add(node1, node2)

print(node3)

In [None]:
node1 = tf.constant([[1, 2, 3], [4, 5, 6]])
node2 = tf.constant([[1, 2, 3], [4, 5, 6]])
node3 = tf.add(node1, node2)

print(node3)

In [None]:
node1 = tf.constant([[1, 2, 3], [4, 5, 6]])
node2 = tf.constant(1)
node3 = tf.add(node1, node2)

print(node3)

Now it gets more tricky, have a close look:

In [None]:
node1 = tf.constant([[1, 2, 3], [4, 5, 6]])
node2 = tf.constant([100, 101, 103])
node3 = tf.add(node1, node2)

print(node3)

In [None]:
node1 = tf.constant([[1, 2, 3], [4, 5, 6]])
node2 = tf.constant([100, 103])
node3 = tf.add(node1, node2)

print(node3)

In [None]:
node1 = tf.constant([[1, 2, 3], [4, 5, 6]])
node2 = tf.constant([[100],[103]])
node3 = tf.add(node1, node2)

print(node3)

## Function arguments

External inputs are fed into the graph via arguments to functions:

In [None]:
@tf.function
def second_TF_function(x):
    node1 = 3.0
    return node1 + x

In [None]:
print(second_TF_function(2))

## Variables

A graph can of course contain also variables, for example the parameters of a machine learning model.

In [None]:
W = tf.Variable(tf.ones(shape=(2, 2)), tf.float32, name="W")
b = tf.Variable(tf.zeros(shape=(2)), name="b")

In [None]:
print(W)
print(b)

Let's change the value of a variable:

In [None]:
print(b)
b.assign([-1.,1])
print(b)

## Defining an affine linear function

In [None]:
in_dim = 2
out_dim = 2

W = tf.Variable(tf.ones(shape=(in_dim, out_dim)), tf.float32, name="W")
b = tf.Variable(tf.zeros(shape=(out_dim)), name="b")

@tf.function
def linear_model(x):
    return tf.matmul(x, W) + b

In [None]:
x1 = [[1., 0.]]
x2 = [[1., 0.], [0., 1.]]
x3 = [[1., 0.], [0., 1.], [1, 2]]

print(linear_model(x1))
print(linear_model(x2))
print(linear_model(x3))

In [None]:
b.assign([-1.,1])
print(linear_model(x1))

## Automatic differentiation

One of the main benefits of TensorFlow is its ability to perform gradient based optimization automatically. This is possible because the program is defined as a graph, so that it can trace the computation backwards and compute derivatives using the chain rule. These optimizers can be used to minimize a any tensor in the graph with respect to the the variables.

In [None]:
w = tf.Variable([[1.0]])
with tf.GradientTape() as tape:
  loss = w * w

grad = tape.gradient(loss, w)
print(grad)  # => tf.Tensor([[ 2.]], shape=(1, 1), dtype=float32)

## Training a linear model

We use the following operation 

In [None]:
y = tf.constant([[1.], [0.], [1], [10]])
print(y)
print(tf.squeeze(y))

in our revised definition of an affine linear model with a scalar output:

In [None]:
in_dim = 2

W = tf.Variable(tf.ones(shape=(in_dim, 1), dtype=tf.float64), name="W", dtype=tf.float64)
b = tf.Variable(tf.zeros(shape=(1), dtype=tf.float64), name="b", dtype=tf.float64)

@tf.function
def linear_model(x):
    return tf.squeeze(tf.matmul(x, W) + b)

We define the mean-squared error:

In [None]:
@tf.function
def loss(y, y_prime):
    return tf.reduce_mean(tf.square(y-y_prime))

Next, we define an optimizer, here the popular Adam:

In [None]:
optimizer = tf.optimizers.Adam(learning_rate=0.1)

Let's generate some toy data:

In [None]:
x = np.array([[1., 0.], [0., 1.], [1, 2], [5, 10]])
y = np.matmul(x, np.array([-1, 2])) - 1.
for input, target in zip(x,y):
    print(input, "->", target)

Just some sanity checks:

In [None]:
print(linear_model(x))
print(loss(y,y))
print(loss(y,linear_model(x)))

Here comeas the training loop:

In [None]:
#@tf.function
def train_step(inputs, targets):
    with tf.GradientTape() as tape:
        prediction  = linear_model(inputs)
        loss_value = loss(prediction, targets)
        grads = tape.gradient(loss_value, [W, b])
    if (not ((i+1) % 10)):
        print(i+1, "{:.5f}".format(loss_value))
    optimizer.apply_gradients(zip(grads, [W, b]))

for i in range(100):
    train_step(x, y)

### Metrics

We want the whole training step to be executed on the GPU/TPU. Thus, we cannot simply `print` values.  Instead, we use concept of metrics:

In [None]:
# Re-initialize variables 
W.assign(tf.ones(shape=(in_dim, 1), dtype=tf.float64))
b.assign(tf.zeros(shape=(1), dtype=tf.float64))

# Create the metric
loss_metric = tf.metrics.Mean(name='train_loss')

@tf.function
def train_step(inputs, targets):
    with tf.GradientTape() as tape:
        prediction  = linear_model(inputs)
        loss_value = loss(prediction, targets)
        grads = tape.gradient(loss_value, [W, b])
    optimizer.apply_gradients(zip(grads, [W, b]))
    loss_metric.update_state(loss_value)

for i in range(0,100):
    loss_metric.reset_states()
    train_step(x, y)
    if (not ((i+1) % 10)):
        print(i+1, "{:.5f}".format(loss_metric.result()))