# Introduction to TensorFlow

A __tensor__ is a mathematical object that has a generalized matrix to store data and interacts with other tensors in the same structure through transformations. A tensor in TensorFlow framework is a Lego piece and we use these pieces to build neural networks.

In TensorFlow 1.0 the operations on the data were represented on _static_ __graph__. A similar framework PyTorch 1.0 has a __dynamic__ computation graph which can be updated in the run-time, thus provides more flexibility. Recently, TensorFlow 2.0 implemented a more user-friendly framework. After successful implementation of PyTorch, TensorFlow 2.0 followed the same suit for computational graphs and this module will cover TensorFlow 2.0.

---

Use a separate Anaconda environment for TensorFlow:  
`conda create -n tf tensorflow jupyter matplotlib pandas scikit-learn`  
`conda activate tf`  

Run the notebook under the virtual environment `tf`.

---

TensorFlow API: https://www.tensorflow.org/api_docs/python/tf

## TensorFlow Architecture
In general, popular and fast Python libraries are coded in C++ programming language for speed. The TensorFlow GPU complementary library uses Nvidia CUDA and it requires CUDA SDK and `cuDNN` libraries to be installed on a computer with CUDA capable GPU hardware available.

The Python low-level API wraps the C++ sources and makes it possible to perform basic operations such as matrix multiplication and convolutional filters.

Top-level level API is made of two components, Keras and the Estimator API. Keras is a user-friendly, modular, and extensible wrapper for TensorFlow. The Estimator API contains several components that allow building ML models easily. As in any other ML methodology, in deep learning, a model usually refers to a neural network that was trained on data. Thus, a model is composed of a neural network architecture, matrix weights, and hyperparameters of the model.

## Convolutional Neural Networks
We have seen the multilayer perceptrons generally composed of __fully-connected__ networks, that is, each neuron in one layer is connected to all neurons in the next layer, such as in the perceptron $y=w^{\top} x+b$. In one perspective, the _fully-connectedness_ of these networks makes them prone to overfitting data. A novel invention came out as the Convolutional Neural Networks (CNN), which are considered regularized versions of multilayer perceptrons. The class `Dense` is the TensorFlow class encapsulating a regular fully-connected layer.

Furthermore, CNNs take an even better approach towards regularization by taking advantage of the hierarchical pattern in data and assemble more complex-patterns by using smaller and simpler patterns, like Lego pieces. CNNs follow the connectivity patterns of neurons similar to the organization of the animal visual cortex. The receptive fields restrict the stimuli to clusters of neurons. These fields overlap to cover the complete visual field.

CNNs require no pre-processing compared to other image classification methods because they build the features to be used for classification, which are derived/enriched/engineered features from the data. Clearly, a feature engineering is not necessary and considered as a major advantage.

As a final note, RNNs track the temporal patterns and CNNs track the spatial patterns.

---

## A Perceptron in TensorFlow

Consider a Perceptron and the representation of it in TensorFlow framework.

$y=w^{\top} x+b$ where $y \in \mathbb{R}$ is the Perceptron output, $w \in \mathbb{R}^{M}$ are weights, $x \in \mathbb{R}^{M}$ is the input (a single data point with all $M$ features), and $b \in \mathbb{R}$ is the offset of the hyperplane defined by $w$. Then, the output can be passed to an activation function.

Below code demonstrates TensorFlow 2.0,
* a Linear SVC
* a Perceptron
* a 1-hidden layer neural network

In TensorFlow framework, similar to PyTorch, most of the mechanics of forward propagation, back-propagation of the error, and optimization steps are hidden from the user. In fact, modeling a neural network is even simpler than PyTorch.

The output of the network is mapped to a single binary value (not one hot encoded). So, we have to use `tf.keras.losses.BinaryCrossentropy()` and `tf.keras.metrics.BinaryAccuracy()` for loss and accuracy.

In case we need to create the `y` values one-hot-encoded, we can convert the `y` vector: `y_bin_tr, y_bin_ts = tf.keras.utils.to_categorical(y_tr), tf.keras.utils.to_categorical(y_ts)` to be used by one-hot-encoded output.

__Question:__ Why does Linear SVC warns about non-convergence for a lower value of `tol`? Why does it perform lower than expected?

---

## Loss Functions
Following table helps how to pick the correct loss function and the output layer (source: Textbook Raschka, 2019).

| Loss Function  | Classification    | Example Probability Output  | Example Logit Output|
|----------------|-------------------|----------------|-------------------|
| `BinaryCrossEntropy`            | binary         | y_true= [1] `y_pred= [0.69]`           | y_true= [1] y_pred= [0.8] |
| `CategoricalCrossEntropy`	            | multi-class	         | y_true= `[0,0,1]` y_pred= `[0.30,0.15,0.55]`             | y_true= `[0,0,1]` y_pred= `[1.5,0.8,2.1]`     |
| `SparseCategoricalCrossEntropy`	          | multi-class     | 	y_true= [2] `y_pred= [0.30,0.15,0.55]`          | y_true= [2] `y_pred= [1.5,0.8,2.1]`     |


In [17]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import datetime
import os
import struct

import tensorflow as tf

print(f'TensorFlow version= {tf.__version__}')
print(f"CUDA available= {tf.test.gpu_device_name()}")

# Check CUDA TensorFlow
tf.test.is_built_with_cuda()

TensorFlow version= 2.3.0
CUDA available= 


False

In [18]:
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.svm import LinearSVC

# Locate and load the cancer data file
bc = load_breast_cancer()
bc_df = pd.DataFrame(data= np.c_[bc.data, [bc.target_names[v] for v in bc.target]],
                     columns= list(bc.feature_names)+['cancer'])

# Populate the dataset, cancer column is target variable
X = bc_df.loc[:, bc_df.columns != 'cancer'].astype(np.float).values
y = bc_df.loc[:, bc_df.columns == 'cancer'].replace({'benign':0, 'malignant':1}).values.ravel()

# Sanity
(N, M), K = X.shape, len(np.unique(y))
print(f'#data points N= {N}, #features M= {M}, #classes K= {K}')

X_tr1, X_ts1, y_tr1, y_ts1 = train_test_split(X, y, stratify=y, test_size=0.5, random_state=0)

# Build a reference classifier model
clf = LinearSVC(random_state=0, tol=4).fit(X_tr1, y_tr1)
print(f'Linear SVC accuracy={accuracy_score(clf.predict(X_ts1), y_ts1):.2f}')

#data points N= 569, #features M= 30, #classes K= 2
Linear SVC accuracy=0.92


__Important:__ Always clear the keras session to make sure the new model starts cleanly.

In [19]:
# Clear the session
tf.keras.backend.clear_session()

In [20]:
# Define the perceptron model
nn1 = tf.keras.Sequential()
nn1.add(tf.keras.layers.Dense(1, input_shape=(M,), activation='sigmoid'))

# Sanity
print(nn1.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 1)                 31        
Total params: 31
Trainable params: 31
Non-trainable params: 0
_________________________________________________________________
None


In [21]:
# Build the model
nn1.compile(
    optimizer='SGD',
    loss=tf.keras.losses.BinaryCrossentropy(),
    metrics=[tf.keras.metrics.BinaryAccuracy()])

In [22]:
# Train the model
nn1.fit(X_tr1, y_tr1, epochs=200, batch_size=64, verbose=0)
loss, acc = nn1.evaluate(X_ts1, y_ts1, verbose=0)

# Print the loss and accuracy
print(f'Loss= {loss:.3f}, Testing accuracy= {acc:.3f}')

Loss= 46.795, Testing accuracy= 0.891


In [23]:
# Sanity
y_pred = nn1.predict_classes(X_ts1)
print(' '.join([f"{int(_):d}" for _ in y_pred[:30]]))

1 1 0 0 0 0 1 0 0 1 1 0 1 0 1 1 1 0 0 1 0 0 0 1 0 0 1 0 1 0


__Model:__ 1-hidden layer neural network.

In [24]:
# Clear the session
tf.keras.backend.clear_session()

nn2 = tf.keras.Sequential()
nn2.add(tf.keras.layers.Dense(10, input_shape=(M,), activation='sigmoid'))
nn2.add(tf.keras.layers.Dense(K, input_shape=(10,), activation='softmax'))

# Sanity
print(nn2.summary())

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 10)                310       
_________________________________________________________________
dense_1 (Dense)              (None, 2)                 22        
Total params: 332
Trainable params: 332
Non-trainable params: 0
_________________________________________________________________
None


In [25]:
# Build the model
nn2.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
    metrics=['accuracy'])

In [26]:
# Convert the vector y to one-hot encoded
y_bin_tr, y_bin_ts = tf.keras.utils.to_categorical(y_tr1), tf.keras.utils.to_categorical(y_ts1)

# Train the model
nn2.fit(X_tr1, y_bin_tr, epochs=200, batch_size=64, verbose=0)
loss, acc = nn2.evaluate(X_ts1, y_bin_ts, verbose=0)

# Print the loss and accuracy
print(f'Loss= {loss:.3f}, Testing accuracy= {acc:.3f}')

Loss= 0.513, Testing accuracy= 0.909


In [27]:
# Sanity
y_pred = nn2.predict_classes(X_ts1)
print(' '.join([f"{int(_):d}" for _ in y_pred[:30]]))

0 1 0 0 0 0 1 0 0 1 1 0 1 0 1 1 1 0 0 1 0 0 0 1 0 0 1 0 1 0


__Exercise:__ Run the previous four cells several times, and observe that from time to time predictions turn out to be constant and the test classification drops to a low value. Change the value of the `learning_rate` to higher and lower values to see the effect.

---

## CNN Design
Consider the `Conv2D` layer:

tf.keras.layers.Conv2D(  
  filters, kernel_size, strides=(1, 1), padding='valid', data_format=None,  
  dilation_rate=(1, 1), activation=None, use_bias=True,  
  kernel_initializer='glorot_uniform', bias_initializer='zeros',  
  kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None,  
  kernel_constraint=None, bias_constraint=None, **kwargs)
  
### Filters 
Consider a convolutional filter of `kernel_size=(H,W)` $H \times W$ and a filter depth of  𝐾  (`filters=K`) where a portion of the input is mapped in 2-dimensions to a single neuron through the filter. In case there are more than 1 channel of the image, such as R, G, B then these also add as $D$ many dimensions.

$z_{i, j}=\phi\left(b+\sum_{l=0}^{k_{H}-1} \sum_{m=0}^{k_{W}-1} \sum_{n=0}^{D-1} w_{l, m, n}, x_{i+l, j+m, n}\right)$ and $w \in \mathbb{R}^{k_{H} \times k_{W} \times D}$ is the weight of the neuron.

### Strides
The `strides` parameter is a 2-tuple of integers, specifying the step of the convolution along the x and y axis of the input volume in _pixels_.  Generally it is left as default `(1, 1)`, occasionally increase it to `(2, 2)` to help reduce the size of the output volume.

### Padding
Padding is necessary to map the input $H \times W$ to the same dimension of $H \times W$ at the output. $\frac{1}{2}$ (`kernel_size`-1) many zeros are necessary on left, right, top and bottom boundaries of the input. Consequently, `kernel_size` has to be an odd number.

### Subsampaling Layers
A pooling $P$  or subsampling layer after a `Conv2D` layer reduces the spatial size of the representation to reduce the amount of parameters (i.e. weights) and computation in the network. Max-pooling layers return only the maximum value at each depth of the pooled area, and average-pooling layers compute the average at each depth of the pooled area. Pooling layer operates on each feature map independently and generally halves each image dimension, e.g. $P_{2 \times 2}$ pooling.

---

## A CNN Example on MNIST Dataset
The following example uses MNIST dataset: MNIST database http://yann.lecun.com/exdb/mnist/

Build a deep learning network,
* Use  $5 \times 5$  kernel size
* 2 convolutional layers of filter size  16  and  32  with $P_{2 \times 2}$  pooling between them
* ReLU activation on CNN layers
* Dense layer after the second pooling with ReLU activation
* Final dense layer with the number of classes (i.e.  10) and Softmax activation
* Input data loaded as  28 $\times$ 28 $\times$ 1  matrices

In [28]:
def load_mnist(path, kind='train'):
    labels_path = os.path.join(path, '%s-labels-idx1-ubyte' % kind)
    images_path = os.path.join(path, '%s-images-idx3-ubyte' % kind)
    with open(labels_path, 'rb') as lbpath:
        magic, n = struct.unpack('>II', lbpath.read(8))
        labels = np.fromfile(lbpath, dtype=np.uint8)
        with open(images_path, 'rb') as imgpath:
            magic, num, rows, cols = struct.unpack(">IIII",imgpath.read(16))
            images = np.fromfile(imgpath, dtype=np.uint8).reshape(len(labels), 28, 28, 1)
            images = ((images / 255.) - .5) * 2
    #
    return images, labels

X_tr, y_tr = load_mnist('datasets/', kind='train')
print(f'N= {X_tr.shape[0]}, HxW= {X_tr.shape[1]}x{X_tr.shape[2]}')

X_ts, y_ts = load_mnist('datasets/', kind='t10k')
print(f'N= {X_ts.shape[0]}, HxW= {X_ts.shape[1]}x{X_ts.shape[2]}')

N= 60000, HxW= 28x28
N= 10000, HxW= 28x28


In [29]:
# Clear session
tf.keras.backend.clear_session()

In [30]:
# Our full CNN neural network
cnn1 = tf.keras.Sequential()

cnn1.add(tf.keras.layers.Conv2D(filters=16, kernel_size=(5, 5),
    data_format='channels_last',
    name='conv_1', activation='relu'))

cnn1.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2), name='pool_1'))

cnn1.add(tf.keras.layers.Conv2D(filters=32, kernel_size=(5, 5),
    name='conv_2', activation='relu'))

cnn1.add(tf.keras.layers.MaxPool2D(pool_size=(2, 2), name='pool_2'))

cnn1.add(tf.keras.layers.Flatten())

cnn1.add(tf.keras.layers.Dense(units=1024, name='fc_1', activation='relu'))

cnn1.add(tf.keras.layers.Dense(units=10, name='fc_2', activation='softmax'))

In [31]:
# Set a seed for repeatibility
tf.random.set_seed(0)

# Build the model
cnn1.build(input_shape=(None, 28, 28, 1))

# Compile the model with the optimizer, loss function and metric
cnn1.compile(
    optimizer=tf.keras.optimizers.Adam(),
    loss=tf.keras.losses.SparseCategoricalCrossentropy(),
    metrics=['accuracy'])

NUM_EPOCHS = 7

In [32]:
# Save weights for debugging purposes and saving the model
cnn1.save_weights('cnn1_weights.h5')

In [33]:
%%time
history = cnn1.fit(X_tr, y_tr,
        epochs=NUM_EPOCHS,
        shuffle=True)

Epoch 1/7
Epoch 2/7
Epoch 3/7
Epoch 4/7
Epoch 5/7
Epoch 6/7
Epoch 7/7
CPU times: user 9min 11s, sys: 3min 42s, total: 12min 53s
Wall time: 3min 1s


In [34]:
# Testing dataset
y_pred = cnn1.predict_classes(X_ts)
print(f'Accuracy= {sum(y_pred==y_ts)/10000:.3f}')

Accuracy= 0.990


--- 

## TensorBoard Introduction
TensorBoard provides the visualization and tools in order to help machine learning experimentation and development:
* Tracking and visualizing metrics such as loss and accuracy
* Visualizing the model graph (operations and layers)
* Viewing histograms of weights, biases, or other tensors as they change over time
* Projecting embeddings to a lower dimensional space
* Displaying images, text, and audio data
* Profiling TensorFlow programs

More about TensorBoard: https://www.tensorflow.org/tensorboard

In [35]:
log_dir = '.\\logs\\fit\\' + datetime.datetime.now().strftime('%Y%m%d-%H%M%S')

tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1)

# Clear the session
tf.keras.backend.clear_session()

# Our previous neural network
nn2 = tf.keras.Sequential()
nn2.add(tf.keras.layers.Dense(10, input_shape=(M,), activation='sigmoid', name='fc_1'))
nn2.add(tf.keras.layers.Dense(K, input_shape=(10,), activation='softmax', name='fc_2'))

nn2.compile(
    optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),
    loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True),
    metrics=['accuracy'])

history = nn2.fit(X_tr1, y_bin_tr, epochs=200, batch_size=64, verbose=0,
                  validation_data=(X_tr1, y_bin_tr),
                  callbacks=[tensorboard_callback])
loss, acc = nn2.evaluate(X_ts1, y_bin_ts, verbose=0)

# Sanity
print(f'Loss= {loss:.3f}, Testing accuracy= {acc:.3f}')

Instructions for updating:
use `tf.profiler.experimental.stop` instead.
Loss= 0.662, Testing accuracy= 0.628


---

__Important:__ In a new Anaconda Prompt, go to the notebook folder and run `tensorboard --logdir logs/fit` after activating `tf` virtual environment.

---

In [36]:
# Start TensorBoard within the notebook using magics
%tensorboard --logdir logs/fit

UsageError: Line magic function `%tensorboard` not found.


## CNN TensorBoard Demonstration
run the tensor board from the command line with local host

![title](img/tensorboard_demo.png)

In [None]:
from tensorboard import notebook

# This info is stored under C:\Users\guvene1\AppData\Local\Temp\.tensorboard-info
notebook.list()

__Exercise:__ Run the CNN above with proper validation dataset pulled from training portion of the MNIST Dataset. Create the `fit` history and display on TensorBoard.

---

In [None]:
%%html
<style>
    table {margin-left: 0 !important;}
</style>

---