# Overview

Tensorflow is a library for high-performance (parallel, GPU accelerated ...) computation that also supports automatic reverse-mode differentiation (aka backpropagation).

References:

[[1] Automatic differentiation](http://www.columbia.edu/~ahd2125/post/2015/12/5/)

[[2] Backpropagation, Chris Olah](http://colah.github.io/posts/2015-08-Backprop/)

[[3] Reverse mode automatic differentiation](https://rufflewind.com/2016-12-30/reverse-mode-automatic-differentiation)


## Runtimes
Runtime environments can be local (CPU, GPU ..) or remote (distributed training). Here is how to get a list of the available execution environments:

In [45]:
import tensorflow as tf
from tensorflow.python.client import device_lib


[x.name for x in  device_lib.list_local_devices()]

['/device:CPU:0', '/device:XLA_CPU:0']

If both CPU and GPU are available, Tensorflow will use the GPU (hoping for better performance). This is true even in eager mode. Device placement context managers also work as in TF1 `with tf.device("/gpu:0"):`

## Tensors

Tensors are either simple values or (the output of) operations.

### Values (tensors)
Values are always n-rank tensors (scalar, vector, matrix ...) with a given shape and dtype. Depending on their role in the computation they can be:
* **[`tf.constant`](https://www.tensorflow.org/api_docs/python/tf/constant)** - these tensors are not supposed to change

* **[`tf.Variable`](https://www.tensorflow.org/api_docs/python/tf/Variable)** - these are the model parameters that will be optimized. Usually initialized randomly at the beginning of the computation. Because of the automatic differentiation, Tensorflow can automatically compute the gradients of a specified function (e.g. loss) with respect to the variables (unless they are marked as trainable=False).

Note: There are no placeholders in TF2, because the graph is built dynamically (define by run as in pytorch).

Note: Strictly speaking, in tensorflow terminology, the above three are also operations that return tensors.


### Operations
[Operations](https://www.tensorflow.org/api_docs/python/tf/Operation) are used to perform computations with tensors. They  take $m \in [1, \infty)$ tensors as input and produce $n \in [1,\infty)$ tensors as output. Inputs to operations can be Values or outputs from other operations. Each operation usually has the notion of a batch. This is a data dimension that can be used for parallelization. This has two effects

1. The computations are done in parallel
2. The gradients are accumulated and averaged over the (mini) batch.

For example the shape of a batch of images would be (batch_size, width, height, channels).

## Eager mode vs graph mode


Eager execution is the default mode in TF2. Tensors behave much like numpy nd arrays.This means that a graph is not created. This has several implications:

- Performance may be lower
- You can't save the graph (as in keras.Model and Checkpointable)
- You can't write the graph for tensorboard visualization

If you want to do computations in a single graph, use ` @tf.function` decorator (e.g. on the train_step function). It will apply autograph on the function and ones down the call tree, in order to build a single, callable graph (also converting python operators to graph ops). Inside the annotated function, all tensors will be treated as symbolic.
The graph is created during the first run of the annotated function and then cached. If the function is called with again with different shape/dtype inputs, another graph will be created and cached for this combination of shape/dtype. This may lead to memory exhaustion if the function is called with lots of different input shape/dtypes. 

> CAUTION: Same applies when calling the function with scalars, but then the graph is cached for each scalar value(s), not shapes and dtypes.

It is recommended to puth the `tf.data` calls inside a ` @tf.function` too.

> There are no placeholders in TF2. Just feed the values as function arguments and return the "fetches".

> There are no explicit [Sessions](https://www.tensorflow.org/api_docs/python/tf/Session#run) in TF2. It is managed by TF instead.

Print statements in ` @tf.function` will only be executed once, when the function(s) are being traced. The result of the tracing is a compiled graph that doesn't run python and hence print statements are excluded.

In [56]:
def func1(a,b):
    print('Eager1: ',tf.executing_eagerly())
    return tf.add(a,b) # if you don't use any tf ops, a graph will not be built

@tf.function # this function and the ones it calls will be run in graph mode
def func2(a,b):
    with tf.name_scope("Linear_function"): # name scopes help group ops visually in tensorboard
        result = tf.add(b,func1(a,b))
    
    print('Inside2: ',a,b,result)
    print('Eager2: ',tf.executing_eagerly())
    
    return result

res = func2(1,2)
# print(tf.autograph.to_code(func2.python_function))
res

Eager1:  False
Inside2:  1 2 Tensor("Linear_function/Add_1:0", shape=(), dtype=int32)
Eager2:  False


<tf.Tensor: shape=(), dtype=int32, numpy=5>

In [64]:
res = func2(3,5)
res

<tf.Tensor: shape=(), dtype=int32, numpy=13>

# Visualizing graphs
Tensorflow comes with a visualization tool called [**tensorboard**](https://youtu.be/eBbEDRsCmv4). Let's use this tool to visualize our graph.

To do this, you either use the Keras TensorboardCallback, or you write the graph explicitly.

## Writing graphs
Graphs are written to an output folder using the `tf.summary` module. However, graph must be traced when it is called for the first time (immediately after definition).

https://www.tensorflow.org/tensorboard/graphs

In [66]:
logdir = "tensorboard2"
writer = tf.summary.create_file_writer(logdir)

tf.summary.trace_on(graph=True)
res = func2(8,5) # Call only one tf.function when tracing.
with writer.as_default():
    tf.summary.trace_export(name="func2",step=0)
res

Eager1:  False
Inside2:  8 5 Tensor("Linear_function/Add_1:0", shape=(), dtype=int32)
Eager2:  False


<tf.Tensor: shape=(), dtype=int32, numpy=18>

Now run:

```bash
tensorboard --logdir tensorboard2
```

Then open the URL from the console in a browser where you can see this graph. The names of the variables that we set above will appear in the graph.

<!-- ![example_graph.png](example_graph.png "Resulting graph") -->


## Calculating gradients

As in TF1, gradients can be calculated with `tf.GradientTape`. It works in eager mode too.

In [42]:
# y = ax + b

x = tf.Variable(5.0, name='x')

def calculate(a,b):

    with tf.GradientTape() as tape: # to calculate gradients you need variables and this tape
        y = a*x + b
    dydx = tape.gradient(y, [x])
    return x,y,dydx
    

x_val,y_val,dydx_val = calculate(5,10)
print('x=',x_val)
print('y=',y_val)
print('dydx=',dydx_val)

x= <tf.Variable 'x:0' shape=() dtype=float32, numpy=5.0>
y= tf.Tensor(35.0, shape=(), dtype=float32)
dydx= [<tf.Tensor: shape=(), dtype=float32, numpy=5.0>]


A common approach is to encapsulate variables and operations in a Keras layer or Model. Both have the `trainable_variables` field. However, they are initialized laizily upon the first call to the layer/model or when build is called.

In [77]:
a = np.ones((1,2,2,1),dtype=np.float32)
conv1 = tf.keras.layers.Conv2D(1,(1,1))
#conv1.build(input_shape=(1,2,2,1)) # if we comment this, there will be no variables
conv1(a) # unless we call the layer first
print(conv1.trainable_variables)

[<tf.Variable 'conv2d_28/kernel:0' shape=(1, 1, 1, 1) dtype=float32, numpy=array([[[[0.32172024]]]], dtype=float32)>, <tf.Variable 'conv2d_28/bias:0' shape=(1,) dtype=float32, numpy=array([0.], dtype=float32)>]


In [80]:
with tf.GradientTape() as g:
    x  = tf.constant([1.0, 2.0])
    g.watch(x)
    y = x * x
jacobian = g.jacobian(y, x)
y, jacobian

(<tf.Tensor: shape=(2,), dtype=float32, numpy=array([1., 4.], dtype=float32)>,
 <tf.Tensor: shape=(2, 2), dtype=float32, numpy=
 array([[2., 0.],
        [0., 4.]], dtype=float32)>)

## Summaries

Summaries are a way to collect and visualize tensors.

https://itnext.io/how-to-use-tensorboard-5d82f8654496

### Scalar
The simplest summary is the scalar summary. It's purpose is to track the value of a scalar along one or more steps. It is often used to plot the evolution of the loss function and accuracy along the training process. 


In [None]:
import numpy as np

tf.reset_default_graph() # clean up previous ops

a = tf.constant(5.0, name='a')
x = tf.placeholder(tf.float32, shape=(), name='x')
y = a*x

tf.summary.scalar("y_scalar_summary", y)
merged_summary = tf.summary.merge_all() # merge all summary ops into a single op so that we execute all

data = np.random.randint(10,size=100)

with tf.Session() as session:
    writer = tf.summary.FileWriter("tensorboard3",session.graph)
    session.run(tf.global_variables_initializer())
    for step,val in enumerate(data):
        (y_val,s_val) = session.run((y,merged_summary), {x: val})
        writer.add_summary(s_val,step) # Passing the step is important !!!

<img src="scalar_summary.png" alt="Scalar summary" style="width: 400px;"/>

### Histogram
Histograms are useful for matrces and higher rank tensors. They show the distribution of values in each step and are often used to detect unexpected behavior of weights and biases of neural networks.

Aditionally, the distributions tab shows the evolution of the percentiles of the values (max, 93%, 84%, 69%, 50%, 31%, 16%, 7%, min) over time/steps.


In [None]:
tf.reset_default_graph() 

b = tf.Variable(tf.random_uniform((50,30),-1,1), name='b')
tf.summary.histogram("b_tensor_summary", b)
merged_summary = tf.summary.merge_all()

with tf.Session() as session:
    writer = tf.summary.FileWriter("tensorboard4",session.graph)
    for step in range(50):
        session.run(tf.global_variables_initializer()) # do this on every step to get new random values
        s_val = session.run(merged_summary)
        writer.add_summary(s_val,step)

<tr>
    <td> <img src="tensor_summary.png" alt="Histogram summary" style="width: 400px;"/> </td>
    <td> <img src="distributions.png" alt="Distributions" style="width: 400px;"/> </td>
</tr>

### Image
[tf.summary.image](https://www.tensorflow.org/api_docs/python/tf/summary/image) accepts data as a 4-D tensor of shape (batch_size, height, width, channels) where channels can be:

    1: Grayscale
    3: RGB
    4: RGBA

Depending on the type of the input tensor data is handled as follows:
* **uint8**: interpreted as is. Must be in the range 0-255
* **float32**
    * **positive**: scaled to 0-255
    * **negative**: shifted to positive (centered at 127) and then scaled to 0-255
    
Batch size may be specified as -1 to use the entire batch. Then up to max_outputs images will be written.


There are other [summaries](https://www.tensorflow.org/api_guides/python/summary#Generation_of_Summaries), such as Audio, Text, [Embeddings projector (T-SNE,PCA)](https://www.tensorflow.org/guide/embedding) etc.





