# TensorFlow 2 Introduction
## Christian Igel, 2020

[TensorFlow](https://www.tensorflow.org/) is an open-source software library for numerical computations developed by Google. Its main application area is machine learning, in particular deep learning with neural networks. TensorFlow  makes it very easy to distribute computations over CPUs and GPUs. Thus, it is very well suited for large-scale data analysis.

A map of deep learning frameworks taken from [towardsdatascience.com](https://towardsdatascience.com/):
<img src="https://cdn-images-1.medium.com/max/1600/1*dYjDEI0mLpsCOySKUuX1VA.png"></img>

The following introduction is based on the [Getting Started With TensorFlow](https://www.tensorflow.org/get_started/get_started) tutorial and the first lessons from [LearningTensorFlow](http://learningtensorflow.com/). Go to these sites to learn more.

## Tensors

Let's start by importing TensorFlow and checking its version, which we assume to be 2.x in this tutortial:

In [78]:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.2.0-rc3


In a TensorFlow program, the computational operations are organized in a graph. The nodes correspond to particular operations. The edges define the data flow between the nodes. These data are tensors, which in this context are simply multidimensional arrays. 
In TensorFlow versions larger than 2.0.0, this can almost be regarded as an implementation detail hidden from the user.

Each node can take some tensors as input and outputs a tensor. One basic type of node without input is a constant node representing a constant tensor. Let's define some constants holding tensors of different shapes:

In [79]:
# Rank 0 tensor; scalar with shape []
# tf.float32 is default and can be omitted
node1 = tf.constant(3.0, tf.float32, name='Klaus') 
print("rank 0 tensor, a scalar:", node1)

rank 0 tensor, a scalar: tf.Tensor(3.0, shape=(), dtype=float32)


In [80]:
print("rank 1 tensor; a vector:\n", tf.constant([1. ,2., 3.]))
print("\nrank 2 tensor; a matrix:\n", tf.constant([[1., 2., 3.], [4., 5., 6.]]))
print("\nrank 3 tensor:\n", tf.constant([[[1., 2., 3.], [1., 2., 3.]], 
                                     [[7., 8., 9.], [1., 2., 3.]]] ))
print("\nrank 3 tensor:\n", tf.constant([[[1., 2., 3.], [4., 5., 6.]]]))

rank 1 tensor; a vector:
 tf.Tensor([1. 2. 3.], shape=(3,), dtype=float32)

rank 2 tensor; a matrix:
 tf.Tensor(
[[1. 2. 3.]
 [4. 5. 6.]], shape=(2, 3), dtype=float32)

rank 3 tensor:
 tf.Tensor(
[[[1. 2. 3.]
  [1. 2. 3.]]

 [[7. 8. 9.]
  [1. 2. 3.]]], shape=(2, 2, 3), dtype=float32)

rank 3 tensor:
 tf.Tensor(
[[[1. 2. 3.]
  [4. 5. 6.]]], shape=(1, 2, 3), dtype=float32)


To get the shape and type of a tensor:

In [81]:
print("shape:\t", node1.shape)
print("type:\t", node1.dtype)

shape:	 ()
type:	 <dtype: 'float32'>


Tensors have the advantage that they can be easily handled by massively parallel hardware such as GPUs and  TPUs.

## Simple computations

Let's do some simple computations:

In [82]:
print(tf.square(5))
print(tf.reduce_sum([1, 2, 3]))

tf.Tensor(25, shape=(), dtype=int32)
tf.Tensor(6, shape=(), dtype=int32)


In [83]:
node1 = tf.constant(3.0)
print("node1:", node1)

node2 = tf.constant(4.0)
print("node2:", node2)

node3 = tf.add(node1, node2)
print("node3:", node3)

node4 = node3 + node2  # The plus operator is mapped to tf.add
print("node4:", node4)

node1: tf.Tensor(3.0, shape=(), dtype=float32)
node2: tf.Tensor(4.0, shape=(), dtype=float32)
node3: tf.Tensor(7.0, shape=(), dtype=float32)
node4: tf.Tensor(11.0, shape=(), dtype=float32)


## Automatic conversion

In [84]:
node1 = tf.constant(3.0)
print("node1:", node1)

x = 5.0
print("node2:", x)

node3 = tf.add(node1, x)  # x is automatically converted
print("node3:", node3)

node1: tf.Tensor(3.0, shape=(), dtype=float32)
node2: 5.0
node3: tf.Tensor(8.0, shape=(), dtype=float32)


### TensorFlow and NumPy

TensorFlow and NumPy go hand-in-hand, as shown in this example from the official TensorFlow documentation:

In [85]:
import numpy as np

#ndarray = np.ones([3, 3])  # A NumPy array
ndarray = np.ones([2,2, 2])

print("TensorFlow operations convert NumPy arrays to Tensors automatically")
tensor = tf.multiply(ndarray, 42)
print(tensor)

print("And NumPy operations convert Tensors to NumPy arrays automatically")
print(np.add(tensor, 1))

print("The .numpy() method explicitly converts a Tensor to a NumPy array")
print(tensor.numpy())

TensorFlow operations convert NumPy arrays to Tensors automatically
tf.Tensor(
[[[42. 42.]
  [42. 42.]]

 [[42. 42.]
  [42. 42.]]], shape=(2, 2, 2), dtype=float64)
And NumPy operations convert Tensors to NumPy arrays automatically
[[[43. 43.]
  [43. 43.]]

 [[43. 43.]
  [43. 43.]]]
The .numpy() method explicitly converts a Tensor to a NumPy array
[[[42. 42.]
  [42. 42.]]

 [[42. 42.]
  [42. 42.]]]


## Using functions

In TensorFlow, we can distinguish between

1. Defining the computational graph, and

2. Running the computational graph.

In TensorFlow 2 uses "eager execution" which hides this implementation detail. Running graphs now looks like executing normal Python code.

It is recommended to group operations in a function:

In [0]:
def first_TF_function():
    node1 = tf.constant(3.0)
    node2 = tf.constant(4.0)
    return tf.add(node1, node2)

In [87]:
print(first_TF_function())

tf.Tensor(7.0, shape=(), dtype=float32)


Adding the decoration `tf.function` to the functions 
has the effect that all operations are combined in a single graph, which is 
compiled just-in-time. This speeds-up computations.

In [0]:
@tf.function
def first_TF_function():
    node1 = tf.constant(3.0)
    node2 = tf.constant(4.0)
    return tf.add(node1, node2)

When adding the decoration, TensorFlow tries to covert the code in the function into TensorFlow operations, which can be executed on GPUs or TPUs. Without decoration:

In [0]:
def first_TF_function():
    node1 = 3.0
    node2 = 4.0
    return node1 + node2

In [90]:
first_TF_function()

7.0

With decoration:

In [0]:
@tf.function
def first_TF_function():
    node1 = 3.0
    node2 = 4.0
    return node1 + node2

In [92]:
first_TF_function()

<tf.Tensor: shape=(), dtype=float32, numpy=7.0>

## Visualizing the computational graph

In [93]:
# Load the TensorBoard notebook extension.
%load_ext tensorboard

The tensorboard extension is already loaded. To reload it, use:
  %reload_ext tensorboard


In [0]:
# Clear any logs from previous runs
!rm -rf ./logs/ 

# Import for timestamp
from datetime import datetime

In [0]:
@tf.function
def my_func():
  node1 = tf.constant(3.0, name="a")
  #print("Hallo", node1)
  node2 = tf.constant(4.0, name="b")
  return tf.add(node1, node2, name="sum")

# Set up logging.
writer = tf.summary.create_file_writer("logs/func/trial1")

# Bracket the function call with
# tf.summary.trace_on() and tf.summary.trace_export().
tf.summary.trace_on(graph=True)
# Call only one tf.function when tracing.
z = my_func()
#z = first_TF_function()
with writer.as_default():
  tf.summary.trace_export(
      name="my_func_trace",
      step=0)

In [0]:
%tensorboard --logdir logs/func

You  can call 

`tensorboard --logdir=logs/func`

outside the notebook, giving you a link to view the graph (and more) in a browser (via opening, e.g., `http://localhost:6006`).
You may have to locate the `tensorboard` executable first. 

## Broadcasting

In [97]:
node1 = tf.constant([1, 2, 3])
node2 = tf.constant([4, 5, 6])
node3 = tf.add(node1, node2)

print(node3)

tf.Tensor([5 7 9], shape=(3,), dtype=int32)


In [98]:
node1 = tf.constant([1, 2, 3])
node2 = tf.constant([4])
node3 = tf.add(node1, node2)

print(node3)

tf.Tensor([5 6 7], shape=(3,), dtype=int32)


In [99]:
node1 = tf.constant([[1, 2, 3], [4, 5, 6]])
node2 = tf.constant([[1, 2, 3], [4, 5, 6]])
node3 = tf.add(node1, node2)

print(node3)

tf.Tensor(
[[ 2  4  6]
 [ 8 10 12]], shape=(2, 3), dtype=int32)


In [100]:
node1 = tf.constant([[1, 2, 3], [4, 5, 6]])
node2 = tf.constant(1)
node3 = tf.add(node1, node2)

print(node3)

tf.Tensor(
[[2 3 4]
 [5 6 7]], shape=(2, 3), dtype=int32)


Now it gets more tricky, have a close look:

In [101]:
node1 = tf.constant([[1, 2, 3], [4, 5, 6]])
node2 = tf.constant([100, 101, 103])
node3 = tf.add(node1, node2)

print(node3)

tf.Tensor(
[[101 103 106]
 [104 106 109]], shape=(2, 3), dtype=int32)


The following does not work:

In [102]:
node1 = tf.constant([[1, 2, 3], [4, 5, 6]])
node2 = tf.constant([100, 103])
node3 = tf.add(node1, node2)

print(node3)

InvalidArgumentError: ignored

This is correct:

In [103]:
node1 = tf.constant([[1, 2, 3], [4, 5, 6]])
node2 = tf.constant([[100],[103]])
node3 = tf.add(node1, node2)

print(node3)

tf.Tensor(
[[101 102 103]
 [107 108 109]], shape=(2, 3), dtype=int32)


## Function arguments

External inputs are fed into the graph via arguments to functions:

In [0]:
@tf.function
def second_TF_function(x):
    node1 = 3.0
    return node1 + x

In [0]:
print(second_TF_function(2))

## Variables

A graph can of course contain also variables, for example the parameters of a machine learning model.

In [0]:
W = tf.Variable(tf.ones(shape=(2, 2)), tf.float32, name="W")
b = tf.Variable(tf.zeros(shape=(2)), name="b")

In [0]:
print(W)
print(b)

Let's change the value of a variable:

In [0]:
print(b)
b.assign([-1.,1])
print(b)

## Defining an affine linear function

In [0]:
in_dim = 2
out_dim = 2

W = tf.Variable(tf.ones(shape=(in_dim, out_dim)), tf.float32, name="W")
b = tf.Variable(tf.zeros(shape=(out_dim)), name="b")

@tf.function
def linear_model(x):
    return tf.matmul(x, W) + b

In [0]:
x1 = [[1., 0.]]
x2 = [[1., 0.], [0., 1.]]
x3 = [[1., 0.], [0., 1.], [1, 2]]

print(linear_model(x1))
print(linear_model(x2))
print(linear_model(x3))

In [0]:
b.assign([-1.,1])
print(linear_model(x1))

## Automatic differentiation

One of the main benefits of TensorFlow is its ability to compute the analytical gradient of a computation with respect to its inputs automatically. This is easily possible because the program is defined as a graph, so that it can trace the computation backwards and compute derivatives using the chain rule. 

To do this efficiently, TensoFlow has to keep track of the operations it executes.
We tell it to do so by defining a `tf.GradientTape()` context. All relevant computations in this context are "recorded on a tape". 

For example, we can compute the derivative of a loss tensor with respect to variables:

In [0]:
w = tf.Variable([[9.0, 3.0], [2.0, 5.0]])
print("w: ", w)
with tf.GradientTape() as tape:
  loss = w * w

grad = tape.gradient(loss, w)
print("gradient: ", grad)  

TensorFlow automatically keeps track of derivartives with respect to variables. If you need a derivative with respect to a constant `c`, you have tell this by adding `tape.watch(c)`. See the tutorial [Automatic differentiation and gradient tape](https://www.tensorflow.org/tutorials/customization/autodiff) for more details.

TensorFlow provides gradient-based optimizers, which can be used to minimize a loss function.

## Training a linear model

We use the `tf.squeeze` operation in the following: 

In [0]:
y = tf.constant([[1.], [0.], [1], [10]])
print("Before squeeze:\n", y)
print("After squeeze:\n", tf.squeeze(y))

x = tf.constant([[1., 2.], [0., 3.], [1, 4.], [10, 9]], dtype=tf.float64)
print("\nBefore squeeze:\n", x)
print("After squeeze:\n", tf.squeeze(x))

There is an easy way to information about an operation:

In [0]:
tf.squeeze?

We revise our definition of an affine linear model with a scalar output:

In [0]:
in_dim = 2

W = tf.Variable(tf.ones(shape=(in_dim, 1), dtype=tf.float64), name="W", dtype=tf.float64)
b = tf.Variable(tf.zeros(shape=(1), dtype=tf.float64), name="b", dtype=tf.float64)

@tf.function
def linear_model(x):
    #return tf.matmul(x, W) + b
    return tf.squeeze(tf.matmul(x, W) + b)

In [0]:
x = tf.constant([[1., 2.], [0., 3.], [1, 4.], [10, 9]], dtype=tf.float64)
print(linear_model(x))

We define the mean-squared error:

In [0]:
@tf.function
def loss(y, y_prime):
    return tf.reduce_mean(tf.square(y-y_prime))

Next, we define an optimizer, here the popular Adam:

In [0]:
optimizer = tf.optimizers.Adam(learning_rate=0.1)

Let's generate some toy data:

In [106]:
x = np.array([[1., 0.], [0., 1.], [1, 2], [5, 10]])
y = np.matmul(x, np.array([-1, 2])) - 1.
for input, target in zip(x,y):
    print(input, "->", target)

[1. 0.] -> -2.0
[0. 1.] -> 1.0
[1. 2.] -> 2.0
[ 5. 10.] -> 14.0


Just some sanity checks:

In [107]:
print("Model prediction before training:\n", linear_model(x))

Model prediction before training:
 tf.Tensor([ 1.  1.  3. 15.], shape=(4,), dtype=float64)


Here comes the training loop:

In [113]:
#@tf.function
def train_step(inputs, targets):
    with tf.GradientTape() as tape:
        prediction = linear_model(inputs)
        loss_value = loss(prediction, targets)
    grads = tape.gradient(loss_value, [W, b])
    if (not ((i+1) % 10)):
        print(i+1, "{:.5f}".format(loss_value))
    optimizer.apply_gradients(zip(grads, [W, b]))

for i in range(100):
    train_step(x, y)

TypeError: ignored

In [0]:
print("Model prediction after training:\n", linear_model(x))

### Metrics

We want the whole training step to be executed on the GPU/TPU. Thus, we cannot simply `print` values.  Instead, we use concept of metrics:

In [114]:
# Re-initialize variables 
W.assign(tf.ones(shape=(in_dim, 1), dtype=tf.float64))
b.assign(tf.zeros(shape=(1), dtype=tf.float64))

# Create the metric
loss_metric = tf.metrics.Mean(name='train_loss')

@tf.function
def train_step(inputs, targets):
    with tf.GradientTape() as tape:
        prediction = linear_model(inputs)
        loss_value = loss(prediction, targets)
    grads = tape.gradient(loss_value, [W, b])
    optimizer.apply_gradients(zip(grads, [W, b]))
    loss_metric.update_state(loss_value)

for i in range(0,100):
    loss_metric.reset_states()
    train_step(x, y)
    if (not ((i+1) % 10)):
        print(i+1, "{:.5f}".format(loss_metric.result()))

TypeError: ignored