In [0]:
import os
import sys

import tensorflow as tf
import random
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from tensorflow import keras
from sklearn.preprocessing import OneHotEncoder

%matplotlib inline

# Some auxiliary functions

In [0]:
def plot_learning_curves(history):
    pd.DataFrame(history.history).plot(figsize=(8, 5))
    plt.grid(True)
    plt.gca().set_ylim(0, 1)
    plt.show()

# Introduction to TensorFlow

TensorFlow is popular open-source deep learning library developed by Google.

\\

TensorFlow uses a dataflow graph to represent your computation in terms of the dependencies between individual operations. 

A computational graph is a series of TensorFlow operations arranged into a graph. The graph is composed of two types of objects:


*   Operations (or "ops"): The nodes of the graph. Operations describe calculations that consume and produce tensors.
*   Tensors: The edges in the graph. These represent the values that will flow through the graph.

\\

For building and training graph-constructed models, the Python program first builds a graph representing the computation, then invokes TensorFlow Session.run to send the graph for execution on the C++-based runtime. This provides:

*    Automatic differentiation using static autodiff.
*    Simple deployment to a platform independent server.
*    Graph-based optimizations (common subexpression elimination, constant-folding, etc.).
*    Compilation and kernel fusion.
*    Automatic distribution and replication (placing nodes on the distributed system).


\\

We highly recommend to read the resources from the [following link](https://www.tensorflow.org/programmers_guide/), to understand the TF architecture.

## Session initialization

To evaluate tensors, instantiate a tf.Session object, informally known as a session. A session encapsulates the state of the TensorFlow runtime, and runs TensorFlow operations. If a tf.Graph is like a .py file, a tf.Session is like the python executable.

We could define the one, global session in the following manner: *sess = tf.InteractiveSession()*.

However the most popular method of using the session is by running:

*with tf.Session() as sess:*

## Tensor

TensorFlow, is a framework to define and run computations involving tensors. A tensor is a generalization of vectors and matrices to potentially higher dimensions. Internally, TensorFlow represents tensors as n-dimensional arrays of base datatypes.

\\

TensorFlow programs work by first building a graph of tf.Tensor objects, detailing how each tensor is computed based on the other available tensors and then by running parts of this graph to achieve the desired results.

\\

tf.Tensor does not exist outside the context of a single session.run call.

### Initialization

In [0]:
node1 = tf.constant(3.0)
node2 = tf.constant(4, dtype=tf.int32)
node3 = tf.constant(0., name="Zero_tensor")
node4 = tf.constant([1., 2., 3.])
node5 = tf.constant([[1., 2., 3.], [4., 5., 6.]])
node6 = tf.ones([3, 3, 3])

In [0]:
print(node1)
print(type(node1))

In [0]:
print(node2.eval())

The code given above is not working -- we should define the session before calling the eval functions to get the tensor values.

In [0]:
with tf.Session() as sess:
    print(node1)
    print(type(node1))
    print()

    print(node2.eval())
    print(sess.run(node2))
    print()

    print(node3.name)
    print()

    print(node4.shape)
    print()

    print(node5.eval())
    print()

    print(node6)
    print(sess.run(node6))

Now we can define the InteractiveSession

In [0]:
sess = tf.InteractiveSession()

### Operations

In [0]:
x = tf.constant(2.)
y = tf.constant(3.)

add_1 = x + y
add_2 = tf.add(x, y, name='our_add_node')

print(add_1)
print(add_1.eval())
print()

print(add_2)
print(add_2.eval())

In [0]:
x = tf.constant(2.)
y = tf.constant([3., 4., 5.])

add_3 = x + y

print(add_3)
print(add_3.eval())

In [0]:
xx = tf.random_normal(shape=(100, 10), name='xx')
yy = tf.random_normal(shape=(10, 2), name='yy')

xyxy = tf.matmul(xx, yy)

print(xyxy)

### Tensor does not exist outside the context of a single session.run call

The result shows a different random value on each call to run, but a consistent value during a single run (out1 and out2 receive the same random input):

In [0]:
vec = tf.random_uniform(shape=(3,))
out1 = vec + 1
out2 = vec + 2

print(sess.run(vec))
print(sess.run(vec))
print(sess.run((out1, out2)))

### Placeholders

A placeholder represents an entry point for us to feed actual data values
into tensors. It is not initialized and contains no data. A placeholder
generates an error if it is executed without a feed.

#### Placeholder for a single number

In [0]:
x = tf.placeholder(tf.float32, shape=[1,1])
y = tf.matmul(x, x)

In [0]:
""" A placeholder generates an error if it is executed without a feed """
print(sess.run(y))  # ERROR: will fail because x was not fed.

In [0]:
number = [[3.]]
print(sess.run(y, feed_dict={x: number}))  # Will succeed.

#### Placeholder for a tensor with undefined length

In [0]:
x = tf.placeholder(tf.float32, shape=[None, 5])
y = x * 2

In [0]:
tensor = np.ones((1, 5))
print(tensor)
print()
print(sess.run(y, feed_dict={x: tensor}))

In [0]:
tensor = np.ones((10, 5))
print(tensor)
print()
print(sess.run(y, feed_dict={x: tensor}))

#### Operations on placeholders

In [0]:
x = tf.placeholder(tf.float32, shape=[1, None])
y = tf.placeholder(tf.float32, shape=[1, None])

z_1 = x + y
z_2 = tf.matmul(x, tf.transpose(y))

In [0]:
x_tensor = [[1., 2., 3.]]
y_tensor = [[11., 12., 13.]]
print(sess.run(z_1, feed_dict={x: x_tensor, y: y_tensor}))
print()
print(sess.run(z_2, feed_dict={x: x_tensor, y: y_tensor}))

## Variable

A TensorFlow variable is the best way to represent the state manipulated
by your program. A tf.Variable represents a tensor whose value can be
changed by running ops on it.
Internally, a tf.Variable stores a tensor. Specific ops allow you to read and
modify the values of this tensor.

\\

Unlike tf.Tensor objects, a tf.Variable exists outside the context of a single session.run call.

\\

**Note:** TensorFlow 1.X relied heavily on implicitly global namespaces. When you called tf.Variable(), it would be put into the default graph, and it would remain there, even if you lost track of the Python variable pointing to it. You could then recover that tf.Variable, but only if you knew the name that it had been created with. TensorFlow 2.0 eliminates all of these mechanisms (Variables 2.0 RFC) in favor of the default mechanism: Keep track of your variables! If you lose track of a tf.Variable, it gets garbage collected.

In [0]:
var_1 = tf.get_variable("var_1", shape=[2, 3]) # default type tf.float32, default init tf.glorot_uniform_initializer
var_2 = tf.get_variable("var_2", shape=[5], initializer=tf.constant_initializer(1000.))
var_3 = tf.get_variable("var_3", shape=[3, 3, 3], initializer=tf.initializers.random_normal())

var_4 = tf.Variable(tf.constant(3., shape=[1, 2]))
var_5 = tf.Variable(tf.random_normal([2, 1]))
var_6 = tf.Variable(tf.random_uniform([1, 1]), name="var_6")

In [0]:
""" Before you can use a variable, it must be initialized!!! """
print(var_1)
print(var_1.eval()) # ERROR: will fail because variables are not initialized!

In [0]:
""" Before you can use a variable, it must be initialized!!! """
sess.run(tf.global_variables_initializer())

In [0]:
print(var_1)
print(var_1.eval())
print()

print(var_2)
print(var_2.eval())
print()

print(var_3)
print(sess.run(var_3))
print()

print(var_4)
print(sess.run(var_4))
print()

print(var_5)
print(sess.run(var_5))
print()

print(var_6)
print(var_6.eval())
print()

### Variable exists outside the context of a single session.run call

In [0]:
vec = tf.get_variable("vec", shape=(3,), initializer=tf.initializers.random_normal())
sess.run(tf.global_variables_initializer())
out1 = vec + 1
out2 = vec + 2

print(sess.run(vec))
print(sess.run(vec))
print(sess.run((out1, out2)))

### Variable scopes for better naming convention

In [0]:
with tf.variable_scope('some_scope'):
    var_scope = tf.get_variable("variable_in_scope", shape=(2,2))
    
print(var_scope)

With variable scopes we can reuse our variables

In [0]:
def foo():
    with tf.variable_scope("foo_non_reusing", reuse=False):
        v = tf.get_variable("v", [1])
    return v

v1 = foo()  # Creates v.
v2 = foo()  # Error - we cannot reuse our variable
assert v1 == v2

In [0]:
def foo():
    with tf.variable_scope("foo_reusing", reuse=tf.AUTO_REUSE):
        v = tf.get_variable("v", [1])
    return v

v1 = foo()  # Creates v.
v2 = foo()  # Gets the same, existing v.
assert v1 == v2

### Computing gradients

In [0]:
# Reminder:
# var_4 = tf.Variable(tf.constant(3., shape=[1, 2]))
# var_5 = tf.Variable(tf.random_normal([2, 1]))

In [0]:
var_7 = tf.matmul(var_4, var_5)

In [0]:
g = tf.gradients(var_7, [var_4, var_5])
print(g)
print(g[0].eval())
print(g[1].eval())

In [0]:
x_1 = tf.constant(2., shape=[2,1])
x_2 = tf.constant(3., shape=[1,2])
y = tf.matmul(x_1, x_2)

g = tf.gradients(y, [x_1, x_2])
print(g)
print(g[0].eval())
print(g[1].eval())

## Optimizers

In TensorFlow one can use fast, efficient gradient optimizers to minimize the given function. \\
In the following example we will show how to use optimizers in TF, by minimizing x^2 function, with using of the Gradient Decent Optimizer. 

#### Defining the starting point x and the function y = x^2

Notice, that the starting point should be initialized as a variable, not a tensor, as the optimizer have to change its value, by repeatedly subtracting the gradient of function in order to minimize y value.

In [0]:
x = tf.get_variable("opt_x", dtype=tf.float32, initializer=tf.constant_initializer(1000.), shape=[1, 1])
y = tf.pow(x, [2.])

#### Defining the Gradient Descent Optimizer

In [0]:
train_step = tf.train.GradientDescentOptimizer(0.1).minimize(y)

#### Iterate to minimize function

In [0]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print("x = ", x.eval())
    print("y = ", y.eval())
    for i in range(200):
        train_step.run()
        if i % 10 == 0:
            print(x.eval())

In [0]:
""" Optimizer needs variables that can be mutated in order to minimize a function! """
x = tf.constant(1000.)
y = tf.pow(x, [2.])

train_step = tf.train.GradientDescentOptimizer(0.1).minimize(y) # ERROR: there are no gradients provided for any variable!

Now we can close our session

In [0]:
sess.close()

# Image classification with tf and tf.keras

In the following section we will use tf and tf.keras library to create FNNs that classifies the fashion MNIST dataset.

Some parts of this code comes from the [tf 2.0 tutorial](https://github.com/ageron/tf2_course) by Aurélien Geron.


## Load the Fashion MNIST dataset


In [0]:
fashion_mnist = keras.datasets.fashion_mnist

(X_train, y_train), (X_test, y_test) = fashion_mnist.load_data()

print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

In [0]:
X_train = X_train.reshape((X_train.shape[0], -1))
X_test = X_test.reshape((X_test.shape[0], -1))

print(X_train.shape, y_train.shape, X_test.shape, y_test.shape)

In [0]:
X_train = X_train / 255
X_test = X_test / 255

## Build a classification neural network with Keras sequential model

Build a Sequential model ([keras.models.Sequential](https://https://www.tensorflow.org/api_docs/python/tf/keras/models/Sequential)), with the following layers:

1.   Dense layer ([keras.layers.Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)) with 300 neurons (aka units), and the "relu" activation function. Since it is the first layer in your model, you should specify the input_shape argument, leaving out the batch size: [784].
2.   Another Dense layer with 100 neurons, also with the "relu" activation function.
3.  A final Dense layer with 10 neurons (one per class), and with the "softmax" activation function to ensure that the sum of all the estimated class probabilities for each image is equal to 1.


You can do it by calling the Sequential without any argument, then and add layers to it by calling its add() method or by passing a list containing the 3 layers to the constructor of the Sequential model.

In [0]:
model = keras.models.Sequential([
    keras.layers.Dense(300, activation="relu", input_shape=[784]),
    keras.layers.Dense(100, activation="relu"),
    keras.layers.Dense(10, activation="softmax")
])

Call the model's **summary()** method and examine the output.

In [0]:
model.summary()

After a model is created, you must call its **compile()** method to specify the loss function and the optimizer to use.

In this case, you want to use:

*   the **sparse_categorical_crossentropy** loss -- cross-entropy loss but for labels that are not one-hot encoded, but that are integers with the class number
*   the **sgd** optimizer -- stochastic gradient descent. 
*   you can optionally specify a list of additional metrics that should be measured during training. In this case you should specify **metrics=[""accuracy"]**. 

Note: you can find more loss functions in [keras.losses](https://www.tensorflow.org/api_docs/python/tf/keras/losses), more metrics in [keras.metrics](https://www.tensorflow.org/api_docs/python/tf/keras/metrics) and more optimizers in [keras.optimizers](https://www.tensorflow.org/api_docs/python/tf/keras/optimizers).




In [0]:
model.compile(loss="sparse_categorical_crossentropy",
              optimizer="adam", metrics=["accuracy"])

Now your model is ready to be trained. Call its **fit()** method, passing it the input features (X_train) and the target classes (y_train). Set epochs=10 (or else it will just run for a single epoch). 

You can also (optionally) pass the validation data by setting validation_data=(X_valid, y_valid). If you do, Keras will compute the loss and the additional metrics (the accuracy in this case) on the validation set at the end of each epoch. 

If the performance on the training set is much better than on the validation set, your model is probably overfitting the training set (or there is a bug, such as a mismatch between the training set and the validation set). 

**Note**: the fit() method will return a History object containing training stats. Make sure to preserve it (history = model.fit(...)).

In [0]:
history = model.fit(X_train, y_train, epochs=10,
                    validation_data=(X_test, y_test))

Plot the learning curves for our model

In [0]:
plot_learning_curves(history)

Call the model's **evaluate()** method, passing it the test set (X_test and y_test). This will compute the loss (cross-entropy) on the test set, as well as all the additional metrics (in this case, the accuracy).

In [0]:
test_loss, test_acc = model.evaluate(X_test, y_test)
test_loss, test_acc

You can also estimate the probability of each class for each instance for a given dataset, by calling the model's **predict()** method.

In [0]:
model.predict(X_test[0:10])

## Build a classification neural network with Keras functional API

The tf.keras.Sequential model is a simple stack of layers that cannot represent arbitrary models. Use the Keras functional API to build complex model topologies such as:

*    Multi-input models,
*    Multi-output models,
*    Models with shared layers (the same layer called several times),
*    Models with non-sequential data flows (e.g. residual connections).

Building a model with the functional API works like this:

1.    A layer instance is callable and returns a tensor.
2.    Input tensors and output tensors are used to define a tf.keras.Model instance.
3.    This model is trained just like the Sequential model.

In the following subsection we will train the fashion-MNIST classifier using the functional API.

Define the following layers of your new model:

1.   [keras.layers.Input](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Input) layer to represent the inputs. Don't forget to specify the input shape.
2.   Dense layer ([keras.layers.Dense](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense)) with 300 neurons (aka units), and the "relu" activation function. Since you already specified the input layer of the network, you no longer need to specify the input_shape argument.
3.   Another Dense layer with 100 neurons, also with the "relu" activation function.
4.  A final Dense layer with 10 neurons (one per class), and with the "softmax" activation function to ensure that the sum of all the estimated class probabilities for each image is equal to 1.

In [0]:
input_layer = keras.layers.Input(shape=X_train.shape[1:])
hidden1 = keras.layers.Dense(300, activation="relu")(input_layer)
hidden2 = keras.layers.Dense(100, activation="relu")(hidden1)
output_layer = keras.layers.Dense(10, activation="softmax")(hidden2)

Now create a [keras.models.Model](https://www.tensorflow.org/api_docs/python/tf/keras/models/Model) and specify its inputs and outputs (e.g., **inputs=[input]**).

In [0]:
model = keras.models.Model(inputs=[input_layer], outputs=[output_layer])

In functional API you use your this model just like a Sequential model: you need to compile it, display its summary, train it, evaluate it and use it to make predictions.

In [0]:
model.summary()

In [0]:
model.compile(loss="sparse_categorical_crossentropy",
              optimizer="adam", metrics=["accuracy"])

In [0]:
tensorboard_cbk = keras.callbacks.TensorBoard(log_dir='/full_path_to_your_logs')

In [0]:
model.fit(X_train, y_train, epochs=10, batch_size=32,
          validation_data=(X_test, y_test))

In [0]:
test_loss, test_acc = model.evaluate(X_test, y_test)
test_loss, test_acc

In [0]:
model.predict(X_test[0:10])

## Build a classification neural network with TensorFlow

You can also use lower level APIs to create your model. 

In the following subsection we will build the model, using classic tf model creation pipeline.

### Create one-hot-encoding of train and test labels

In [0]:
enc = OneHotEncoder(sparse=False, categories='auto')

In [0]:
y_train = enc.fit_transform(y_train.reshape(-1, 1))
y_test = enc.transform(y_test.reshape(-1, 1))

print(y_train.shape, y_test.shape)

### Define model

**Create placeholders for training data.**

Remember about a propper shape of training images (in mnist.train.images every digit is a 784D vector) and labels (In training dataset labels are in one-hot-encoding form).

In [0]:
x_placeholder = tf.placeholder(tf.float32, [None, 784])
y_placeholder = tf.placeholder(tf.float32, [None, 10])

**Define layers of your new model**

In lower lvl APIs you can still use the tf.keras layers. Please define the following layers of your network:

1.    Dense layer (keras.layers.Dense) with 300 neurons, and the "relu" activation function. You don't need to specify the input_shape argument nor define the keras input layer of the network, as you will pass the input data into placeholders that you have already defined.
2.    Another Dense layer with 100 neurons, also with the "relu" activation function.
3.    A final Dense layer with 10 neurons (one per class), **without any activation function** -- returning the logits (not softmax), together with the specified loss function ensures the numerical stability.



In [0]:
hidden1 = keras.layers.Dense(300, activation="relu")(x_placeholder)
hidden2 = keras.layers.Dense(100, activation="relu")(hidden1)
logits = keras.layers.Dense(10, activation=None)(hidden2)

In [0]:
print(x_placeholder)
print(y_placeholder)
print()
print(hidden1)
print(hidden2)
print(logits)

**Define the loss function**

This is the place to define the loss function for our model. Cross-entropy is the classical approach to use in the multi-label classification task.  However for numerical stability we didn't use the softmax activation function (so as we are taking the log of softmax -- logits), in this case, you could use the [tf.nn.softmax_cross_entropy_with_logits_v2](https://www.tensorflow.org/api_docs/python/tf/nn/softmax_cross_entropy_with_logits_v2) loss function.

In [0]:
cross_entropy = tf.nn.softmax_cross_entropy_with_logits_v2(labels=y_placeholder, logits=logits)
# cross_entropy = tf.losses.softmax_cross_entropy(onehot_labels=y_placeholder, logits=logits)
cross_entropy = tf.reduce_mean(cross_entropy)
print(cross_entropy)

**Define the optimizer**

Use the [SGD optimizer](https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer) and set it to minimize the loss function.

In [0]:
opt = tf.train.GradientDescentOptimizer(0.1)
train_step = opt.minimize(cross_entropy)

** Check whether softmax classifier returns correct predictions and calculate the accuracy**

In [0]:
""" Create a vector that tells us, whether the predictions from our net - logits
    are equal to the correct digit labels - y_placeholder. """
correct_prediction = tf.equal(tf.argmax(y_placeholder, 1), tf.argmax(logits, 1))
correct_prediction = tf.cast(correct_prediction, tf.float32)

""" Calculate the accurracy of correct predictions """
accuracy = tf.reduce_mean(correct_prediction)

**Train the model**

Please notice, that before training the model, you should define the tf.Session and initialize all variables.

**Note**: after session ends, weights of the trained model are lost!

In [0]:
epoch_num = 10
batch_size = 32
set_size = X_train.shape[0]

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    for epoch in range(epoch_num):
        perm = np.random.permutation(set_size)
        X_train = X_train[perm, :]
        y_train = y_train[perm, :]

        for i in range(0, set_size, batch_size):
            step_size = min(batch_size, set_size - i)

            if step_size > 1:
                y_batch = y_train[i:(i + step_size), :]
                x_batch = X_train[i:(i + step_size), :]
                train_step.run(feed_dict={
                                x_placeholder: x_batch, y_placeholder: y_batch})
                
        validation_accuracy = accuracy.eval(feed_dict={
                                  x_placeholder: X_test, y_placeholder: y_test})
        print('step: {}, validation accuracy: {}'.format(epoch, validation_accuracy))


    # Print the test set accuracy
    print('test accuracy: {}'.format(accuracy.eval(feed_dict={
                           x_placeholder: X_test, y_placeholder: y_test})))

In [0]:
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    
    # Print the test set accuracy
    print('test accuracy: {}'.format(accuracy.eval(feed_dict={
                           x_placeholder: X_test, y_placeholder: y_test})))

## You could also specify your own custom tf.keras layers

You could create a custom layer by subclassing tf.keras.layers.Layer and implementing the following methods:

*    build: Create the weights of the layer. Add weights with the add_weight method.
*    call: Define the forward pass.
*    compute_output_shape: Specify how to compute the output shape of the layer given the input shape.
*    Optionally, a layer can be serialized by implementing the get_config method and the from_config class method.


\\

Now let's create a custom layer with its own weights. Use the following template to create a MyDense layer that computes $\phi(\mathbf{X} \mathbf{W}) + \mathbf{b}$, where $\phi$ is the (optional) activation function, $\mathbf{X}$ is the input data, $\mathbf{W}$ represents the kernel (i.e., connection weights), and $\mathbf{b}$ represents the biases, then train and evaluate a model using this instead of a regular Dense layer.

In [0]:
class MyDense(keras.layers.Layer):
    def __init__(self, units, activation=None, **kwargs):
        self.units = units
        self.activation = keras.layers.Activation(activation)
        super(MyDense, self).__init__(**kwargs)

    def build(self, input_shape):
        shape = tf.TensorShape((input_shape[1], self.units))
        self.kernel = self.add_weight(name='kernel', 
                                      shape=shape,
                                      initializer='uniform',
                                      trainable=True)
        self.biases = self.add_weight(name='bias', 
                                      shape=(self.units,),
                                      initializer='zeros',
                                      trainable=True)
        super(MyDense, self).build(input_shape)

    def call(self, X):
        return self.activation(tf.matmul(X, self.kernel) + self.biases)    
    
    def compute_output_shape(self, input_shape):
        shape = tf.TensorShape(input_shape).as_list()
        shape[-1] = self.units
        return tf.TensorShape(shape)


**Now please use the code you have writen earlier to train the model with your custom layer instead od keras.layers.Dense**

You could use keras Sequence API, keras functional API or TensorFlow API

In [0]:
model = keras.models.Sequential([
    MyDense(300, activation="relu", input_shape=X_train.shape[1:]),
    MyDense(100, activation="relu", input_shape=X_train.shape[1:]),
    MyDense(10, activation="softmax")
])

model.compile(loss="categorical_crossentropy", optimizer="sgd")
model.fit(X_train, y_train, epochs=10,
          validation_data=(X_test, y_test))