# Exploring Tensorflow

This notebook will broadly follow the topics covered in the [Tensorflow introductory guide](https://www.tensorflow.org/guide/low_level_intro).

In [None]:
import tensorflow as tf

## Eager execution

Tensorflow now support [eager execution](https://www.tensorflow.org/tutorials/eager/eager_basics) (execution on-the-fly), which does not require the definition of a graph structure and placeholders and a compiling step anymore.

In [None]:
# tf.enable_eager_execution()

## Fundamental Tensorflow objects

There are 3 fundamental objects that Tensorflow provides:
- Tensors (see [the documentation](https://www.tensorflow.org/api_docs/python/tf/Tensor) and [a guide](https://www.tensorflow.org/guide/tensors)),
- Sessions,
- Operations.

## Tensors

### Generalities

Tensors are multidimensional arrays that work as inputs or outputs of operations. Tensorflow has fast routines to manipulate them.

There are special types of tensors, such as `tf.Variable`, `tf.constant`, `tf.placeholder` and `tf.SparseTensor`.

All tensors apart from `tf.Variable` are immutable (but their values may vary if they are the result of different runs of an operations, with different inputs). Before eager execution was introduced in Tensorflow, tensors were literally just handles representing the edges of a graph without a particular value, which was only present when the graph was actually evaluated. With eager execution, though, tensors can now have values outside the context of the evaluation of the graph.

__Eager execution:__ the eager execution option has to be selected "at program startup" (right after the initial import statements). If it is not, the tensors will not actually have any value for their component, as without eager execution we need to be in the context of a Tensorflow session and evaluate the tensors explicitly.

In [None]:
m1 = tf.constant([[1, 2], [3, 4]]) # A 2x2 matrix
m2 = tf.constant([[1, 0], [0, 1]]) # The 2x2 identity matrix

print(m1)
print(m2)

In [None]:
# Tensors have a shape
print(tf.constant([[12, 3], [12, 2]], dtype=tf.int32).shape)
print(tf.constant([[2], [4], [6]]).shape)

# In Tensorflow, the rank of a tensor is the number of
# its dimensions (the number of indices)
print(tf.rank(tf.Variable([[1, 2], [1, 3], [8, 9]])))

In [None]:
# Tensorflow implements matrix multiplication
a = tf.constant([[2], [4], [5]])
b = tf.constant([[1, 0, 0]])

print(f"a={a}")
print(f"b={b}")
print(tf.matmul(a, b))
print(f"shape(a)={a.shape}")
print(f"shape(b)={b.shape}")
print(f"shape(a*b)={tf.matmul(a, b).shape}")

# Tensorflow can also return the shape of a tensor
# as another tensor, which can be used at runtime,
# even if the shapes change dynamically
print(tf.shape(a))

# Tensors' components can be accessed by the same
# indexing as NumPy arrays
print(a[0,0])

# Addition and multiplication of tensors can also
# happen with the + and * operators
print(m1*m1)

### Tensorflow Variables

Variables (`tf.Variable`) are tensors whose value can be changed by operations performed on them and (if not in eager execution mode) it can exists outside of the context of a session.

Variables can be initialized with `tf.get_variable()`, specifying a variable name and shape. The variable created this way has its value randomly initialized (with the `tf.glorot_uniform_initializer`).

In [None]:
variable_1 = tf.get_variable("first_variable", (3, 2))

print(variable_1)

### Placeholders

Placeholders (`tf.placeholder`) are an abstraction around tensors that will be passed to an operations. If the operation is a sum of tensors, we can pass it tensors for which we specify values (that will be taken when the graph is run, if eager execution is not active) or we can pass it two placeholder that "promise" that a specific number of tensors with a specific shape will be passed as input to the operation itself.

We don't even have to specify the shape of the placeholder tensor: we'll just get an error if the operation performed on the values actually fed to the operation when the graph is run are inconsistent. This also means that the dimension of the output tensor from the operation is a priori unknown.

In [None]:
# Placeholders
p1 = tf.placeholder(dtype=tf.int32)
p2 = tf.placeholder(dtype=tf.int32)

# Operation
op = tf.matmul(p1, p2)

print(op)

## Sessions

A Tensorflow session can be thought of as an executable, an object that executes a graph, performing operations among tensors and giving them specific values.

__Note:__ Tensorflow variables cannot be passed arbitrary values by hand, so if we want to build a simple graph explicitly we have to use `tf.constant`.

In [None]:
sess = tf.Session()

In [None]:
# Build a graph
v1 = tf.constant([[1, 1], [3, 1]])
v2 = tf.constant([[1, 0], [0, 1]])

prod = tf.matmul(v1, v2) # Operation between tensors, returning another tensor

# Evaluate the graph
sess.run(prod)

Not all operations return a tensor when run. Some of them are run only to cause side effects, such as initializing tensors. In this cases, running them executes the side effect but returns `None`.

In [None]:
# Declare a variable to initialize
var_to_initialize = tf.get_variable("to_init", (4,2))

# Instantiate an initializer
initializer = tf.global_variables_initializer()

# Run the initialization step
print(sess.run(initializer))

If a graph has placeholders as input, we pas explicit values for them and the value of the output will be computed when the the graph is run.

In [None]:
sess.run(
    op, 
    feed_dict = {
        p1: [[1], [1], [2], [4]],
        p2: [[1, 3, 5, 9]]
    }
)

## Tensorflow datasets

The `tf.data` module provides richer ways to treat the input to a graph than placeholders.

We can recast data into a Tensorflow iterator (`tf.data.Iterator`) and get the next sample by calling its `get_next()` method.

In [None]:
# Define some data
data = [
    [1.0],
    [2.0],
    [3.0],
    [4.0]
]

# Create slices from the data
slices = tf.data.Dataset.from_tensor_slices(data)

# Create a one-shot iterator and get the samples
next_item = slices.make_one_shot_iterator().get_next()

# Have a session run the operation to get the data
while True:
    try:
        print(sess.run(next_item))
    except:
        break

## Layers

Layers (`tf.layers`) are the building block for the graph. They add __trainable parameters__ to it, those numbers over which we will optimize while training the neural network.

Let's create a simple graph with an input x (as a placeholder) that is a vector with three components, and a linear fully connected dense layer that gives an output y.

The `shape` option of the placeholder specifies the shape of the input. In this case, `(None, n)` stands for an input with shape `(n_samples, n)`, where `n_samples` is not specified.

The `unit` option to the layer object is represents the dimension of the output and given this and the dimension of the input the layer automatically determines the dimension of the matrix of weights that maps input to output.



In [None]:
# Create placeholder for input
x = tf.placeholder(dtype=tf.float32, shape=(None, 3))

# Create a model with a single layer
linear_model = tf.layers.Dense(units=1)

# Create the output
y = linear_model(x)

The weights inside the layer must be initialized. To do so, we have to have a session run an initializer. The initialized values will remain valid only within that session.

In [None]:
sess.run(tf.global_variables_initializer())

The output of our linear model can now be run (evaluated) given a value for the input.

In [None]:
# One single sample as the input
print(sess.run(
    y,
    feed_dict = {x: [[0.1, 0.4, 12.9]]}
))

# Two samples as the input
print(sess.run(
    y,
    feed_dict = {x: [[0.1, 0.4, 12.9], [12.2, 1.1, 3.0]]}
))

Tensorflow also offers a function associated to each layer that instantiates and executes the layer in one go, taking the input tensor as an input. The example for a `Dense` layer would be a the `tf.layers.dense()` function.

__Note:__ the following cell also runs the initialization routine, which gives different values for the weights in the dense layer every time it's called. Therefore, the output is different every time the cell is run.

In [None]:
# Define input tensor (placeholder)
z_in = tf.placeholder(dtype=tf.float32, shape=(None, 3))

# Define output tensor, including the intermediate layer
z_out = tf.layers.dense(z_in, units=1)

# Initialize global variables
sess.run(tf.global_variables_initializer())

# Execute the graph
print(sess.run(
    z_out,
    feed_dict = {z_in: [[1.2, 23.4, 44.1]]}
))

## Training a neural network

Training a neural network means optimizing the internal values (weights) of the layers according to some criterion, i.e. minimizing a loss function. The loss function maps the predictions made on the training data and the true target value of the training data to a real number: the predictions depend on the weight inside the layers, and we adjust them to get a minimum of the function. To do this, we use an optimizer.

Let's build a simple model and train it on the iris dataset.

Load the data.

In [None]:
from sklearn.datasets import load_iris

In [None]:
iris_data = load_iris()

In [None]:
iris_data_input = tf.constant(iris_data['data'])
iris_data_labels = tf.constant(iris_data['target'])

print(iris_data_input.eval(session=sess)[:10,:])
print(iris_data_labels.eval(session=sess)[:10])

Define the graph. Notice that the output layer has 4 units, as we are going to one-hot encode our 4 labels.

In [None]:
nn_input = tf.placeholder(dtype=tf.float64, shape=(None, 4))

linear_layer = tf.layers.Dense(units=4)

y_pred = linear_layer(nn_input)

# Get a prediction with randomly initialized weights for the
# first 10 samples.
sess.run(tf.global_variables_initializer())
sess.run(
    y_pred,
    feed_dict = {nn_input: iris_data_input.eval(session=sess)}
)[:10]

Define a loss functions. A common loss for multiclass classification problems is the __softmax categorical cross entropy__, which requires that the labels are one-hot encoded.

In [None]:
# One-hot encode the labels
y_true = tf.one_hot(iris_data_labels, depth=4)

print(y_true.eval(session=sess)[:10])

In [None]:
# Define the loss function as a function of the labels
# and the predictions
loss = tf.losses.softmax_cross_entropy(y_true, y_pred)

# Compute loss given the current (randomly initialized)
# values for the weights
print(sess.run(
    loss,
    feed_dict={nn_input: iris_data_input.eval(session=sess)}
))

Train the model. In order to train out model we need to choose an optimizer and have it minimize the loss function w.r.t. the parameters (weights) in the layers. In this case we proceed with a standard __gradient descent__ optimizer.

In [None]:
# Instantiate an optimizer
learning_rate = 0.01
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)

# Create a training operation
train = optimizer.minimize(loss)

# Train the model
loss_values = []
n_epochs = 300
for i in range(n_epochs):
    _, loss_value = sess.run(
        (train, loss),
        feed_dict = {nn_input: iris_data_input.eval(session=sess)}
    )
    loss_values.append(loss_value)
    if i%20==0:
        print(loss_value)

Plot the values of the loss function for each epoch.

In [None]:
from plotly.offline import init_notebook_mode, iplot
import plotly.graph_objs as go
init_notebook_mode(connected=True)

In [None]:
trace = go.Scatter(
    x=list(range(len(loss_values))),
    y=loss_values,
    mode='markers'
)

layout = go.Layout(
    xaxis=dict(
        title='epoch'
    ),
    yaxis=dict(
        title='loss function value'
    )
)

fig = go.Figure(data=[trace], layout=layout)

iplot(fig)