### Running shell commands from inside the notebook
To forward a command to the underlying shell of the computer prefix it with '!'.

In [None]:
#Windows, Mac OS and Linux, Google Colab
! echo Hello world

In case that you are missing some packages they can easily be installed in the virtual environment you created from inside the notebook.  
You just need to follow these steps.  
1. source/activate the environment
    * Windows: `C:\Users\<USER>\ml_env\Scripts\activate`
    * Mac/Linux: `. /home/<USER>/workspaces/ml_env/bin/activate`
2. run your desired command e.g. `python -m pip install <PACKAGE_NAME>`
    
Let's try to upgrade pip - the Python package manager. Uncomment the respective line depending on your host system and run the command below.

In [None]:
# Windows
#! C:\Users\<USER>\ml_env\Scripts\activate && python -m pip install --upgrade pip
# Mac OS and Linux
#! . /home/<USER>/workspaces/ml_env/bin/activate && python -m pip install --upgrade pip
# Google Colab
#! python -m pip install --upgrade pip

# TensorFlow (TF 2.x) Tutorial 

In [None]:
from IPython.display import clear_output, Image, HTML, display

Image(url= "https://i.imgur.com/4nk5b4c.jpg", width=700) # from Google

Google's TensorFlow engine has a unique way of solving problems, allowing us to solve machine learning problems very efficiently. Nowadays, machine learning is used in almost all areas of life and work, with famous applications in computer vision, speech recognition, language translations, healthcare, and many more.

Production-oriented and capable of handling different computational architectures (CPUs, GPUs, and now TPUs), TensorFlow is a framework for any kind of computation that requires high performance and easy distribution. It excels at deep learning, making it possible to create everything from shallow networks (neural networks made of a few layers) to complex deep networks for image recognition and natural language processing.

In this Semseter we will use Tensorflow to implement different machine learning algorithms 

* **Feed Forward Neural Networks (FFNNs)** classification and regression based on features.
* **Convolutional Neural Networks (CNNs)** image classification, object detection, video action recognition, etc.
* **Generative Adversarial Networks (GANs)**  unsupervised generation of realistic images, etc.
* **Deep Reinforcement Learning** game playing, robotics in simulation, self-play, neural architecture search, etc.

## Installing TensorFlow
To install TensorFlow on your local machine you can use pip - the python package manager.
```console
!python3 -m pip install tensorflow
```

If you have Nvidia Gpu Support then you can install `tensorflow-gpu`
```console
!python3 -m pip install tensorflow-gpu
```

## Importing TensorFlow
The first step is going to be to import TensorFlow.

In [None]:
import tensorflow as tf 
if(int(tf.__version__[0]) <= 1):
    print('tensorflow {} detected; Please install tensorflow >= 2.0.0'.format(tf.__version__))
else:
    print('tensorflow {} detected'.format(tf.__version__))

The next step is to import all required packages for this notebook.

In [None]:
pip install pytest-shutil

In [None]:
import numpy as np
import os
import pathlib
import shutil
import tempfile
from matplotlib import pyplot as plt

In [None]:
# Install some functions from tensorflow_docs - requires GIT to be installed
# Windows
#! C:\Users\<USER>\ml_env\Scripts\activate && python -m pip install git+https://github.com/tensorflow/docs
# Mac OS and Linux
#! . /home/<USER>/workspaces/ml_env/bin/activate && python -m pip install git+https://github.com/tensorflow/docs
# Google Colab
#! python -m pip install git+https://github.com/tensorflow/docs

In [None]:
import tensorflow_docs as tfdocs
import tensorflow_docs.modeling


## Tensors
TensorFlow does have its own data structure for the purpose of performance and ease of use. Tensor is the data structure used in Tensorflow. You can think of a TensorFlow tensor as an n-dimensional array or list.

**Example**: The shape of a tensor can be described with a vector $[ 10000, 256, 256, 3 ]$
* 10,000 images
* Each image has 256 rows
* Each row has 256 pixels
* Each pixel has 3 channels (RGB)

Each tensor has a data type and a shape:  
* **Data Types Include**: float32, int32, string and others.
* **Shape**: Represents the dimension of data.

Just like vectors and matrices tensors can have operations applied to them like addition, subtraction, dot product, cross product etc.

In [None]:
# Creating Tensors
tensor1 = tf.Variable(300, tf.int16)
tensor2 = tf.Variable(2.123, tf.float64)
tensor3 = tf.Variable("Hello World", tf.string)

### Rank and Shape of Tensors
Rank means the number of dimensions involved in the tensor. 

In [None]:
rank_0_tensor = tf.constant(4)
print(rank_0_tensor)

In [None]:
rank_1_tensor = tf.constant([2.0, 3.0, 4.0])
print(rank_1_tensor)

In [None]:
rank_2_tensor = tf.constant([[2, 2],
                             [3, 3],
                             [4, 4]], dtype=tf.float16)
print(rank_2_tensor)

**To determine the rank** of a tensor we can call the following method.

In [None]:
tf.rank(rank_2_tensor)

The shape of a tensor is simply the number of elements that exist in each dimension.

In [None]:
rank_2_tensor.shape

## Using eager execution

When developing deep and complex neural networks, you need to continuously experiment with architectures and data. This proved difficult in TensorFlow 1.0 because you always need to run your code from the beginning to end in order to check whether it worked. TensorFlow 2.x works in eager execution mode as default, which means that you develop and check your code step by step as you progress into your project.

TensorFlow 1.x performed optimally because it executed its computations after compiling a static computational graph. All computations were distributed and connected into a graph as you compiled your network and that graph helped TensorFlow to execute computations, leveraging the available resources (multi-core CPUs of multiple GPUs) in the best way, and splitting operations between the resources in the most timely and efficient way. That also meant, in any case, that once you defined and compiled your graph, you could not change it at runtime but had to instantiate it from scratch, thereby incurring some extra work.

In TensorFlow 2.x, you can still define your network, compile it, and run it optimally, but the team of TensorFlow developers has now favored, by default, a more experimental approach, allowing immediate evaluation of operations, thus making it easier to debug and to try network variations. This is called eager execution. Operations now return concrete values instead of pointers to parts of a computational graph to be built later.

In [None]:
tf.executing_eagerly()

## tf.keras
Keras was popular because the API was clean and simple, allowing standard deep learning models to be defined, fit, and evaluated in just a few lines of code. In 2019, Google released a TensorFlow 2 that integrated the Keras API directly and promoted this interface as the default or standard interface for deep learning development on the platform.

On hardware, Keras runs on a CPU, GPU, and Google's TPU. In this book, we'll test on a CPU and NVIDIA GPUs (specifically, the GTX 1060, GTX 1080Ti, RTX 2080Ti, V100, and Quadro RTX 8000 models):

<table>
  <tr><td>
    <img src="https://static.packt-cdn.com/products/9781838821654/graphics/Images/B14853_01_01.png"
         alt="Fashion MNIST sprite"  width="600">
  </td></tr>
  <tr><td align="center">
    <b>Figure 1.</b> Keras is a high-level library that sits on top of other deep learning frameworks. Keras is supported on CPU, GPU, and TPU. <br/>&nbsp;
  </td></tr>
</table>

## Machine Lerarning Model with TensorFlow
The basic structure of training a machine learning model is as follows.

1. Import the dataset.
2. Select the type of model.
3. Train the model.
4. Evaluate the model's effectiveness.
5. Use the trained model to make predictions.

A regression model can be used to predict the output of a continuous value, like a stock price or a time series. In contrast to a classification model, where the prediction is a discrete label, e.g. whether a picture contains a dog or a cat.

## Overfitting and underfitting
Learning how to deal with overfitting is important. Although it is often possible to achieve high accuracy on the training set, what we really want is to develop models that generalize well to a testing set (or data they haven't seen before).

The opposite of overfitting is underfitting. Underfitting occurs when there is still room for improvement on the test data. This can happen for a number of reasons: 
* If the model is not powerful enough,
* is over-regularized, or 
* has simply not been trained long enough. 

This means the model has not learned the relevant patterns in the training data.

If you train for too long though, the model will start to overfit and learn patterns from the training data that don't generalize to the test data. We need to strike a balance. Understanding how to train for an appropriate number of epochs as we'll explore below is a useful skill.

To prevent overfitting, the best solution is to use more complete training data. The dataset should cover the full range of inputs that the model is expected to handle. Additional data may only be useful if it covers new and interesting cases.

## Dataset
Data Set Information:

The data has been produced using Monte Carlo simulations. The first 21
features (columns 2-22) are kinematic properties measured by the particle
detectors in the accelerator. The last seven features are functions of the
first 21 features; these are high-level features derived by physicists to help
discriminate between the two classes. There is an interest in using deep
learning methods to obviate the need for physicists to manually develop such
features. Benchmark results using Bayesian Decision Trees from a standard
physics package and 5-layer neural networks are presented in the original
paper. The last 500,000 examples are used as a test set.

In [None]:
# Applay to each row in the dataset this function is good part to 
# add some noise to the data or to make image processing
def pack_row(*row):
  label = row[0]
  features = tf.stack(row[1:],1)
  return features, label

def load_train_validate_ds(N_TRAIN,N_VALIDATION,BATCH_SIZE):
    # If you want to use the original data set then remove # from the next line
    # gz = tf.keras.utils.get_file('HIGGS.csv.gz', 'http://mlphysics.ics.uci.edu/data/higgs/HIGGS.csv.gz')
    # The Higgs dataset contains 11 000 000 examples, each with 28 features, and a binary class label
    gz = tf.keras.utils.get_file('HIGGS.csv.gz', 'export_dataframe.csv.gz')
    # The tf.data.experimental.CsvDataset class can be used to read csv records directly from a gzip 
    # file with no intermediate decompression step
    FEATURES = 28
    ds = tf.data.experimental.CsvDataset(gz,[float(),]*(FEATURES+1), compression_type="GZIP")
    # So instead of repacking each row individually make a new Dataset that takes batches 
    # of 10000-examples, applies the pack_row function to each batch, and then splits the
    # batches back up into individual records:
    packed_ds = ds.batch(10000).map(pack_row).unbatch()  
    BUFFER_SIZE = int(1e4)
    STEPS_PER_EPOCH = N_TRAIN//BATCH_SIZE
    validate_ds = packed_ds.take(N_VALIDATION).cache()
    train_ds = packed_ds.skip(N_VALIDATION).take(N_TRAIN).cache()
    # These datasets return individual examples. Use the .batch method to create batches of
    # an appropriate size for training. Before batching also remember to .shuffle and 
    #.repeat the training set.
    validate_ds = validate_ds.batch(BATCH_SIZE)
    train_ds = train_ds.shuffle(BUFFER_SIZE).repeat().batch(BATCH_SIZE)
    return train_ds, validate_ds

To see more information on this topic, dataset with TensorFlow you can see the following videos:
* https://www.youtube.com/watch?v=oFFbKogYdfc
* https://www.youtube.com/watch?v=TOP2aLxcuu8

In [None]:
# THIS DATASET IS TOO LARGE
# To keep this tutorial relatively short use just the first 1000 samples for validation,
# and the next 10 000 for training:
n_train      = int(1e4) # The number of samples in training data
n_validation = int(1e3)
FEATURES = 28
batch_size   = 500
train_ds, validate_ds =  load_train_validate_ds(n_train,n_validation,batch_size)

In [None]:
train_ds

In [None]:
STEPS_PER_EPOCH = n_train//batch_size
print("The number of steps per epoch: ", STEPS_PER_EPOCH)

### Demonstrate overfitting
The simplest way to prevent overfitting is to start with a small model: A model with a small number of learnable parameters (which is determined by the number of layers and the number of units per layer). In deep learning, the number of learnable parameters in a model is often referred to as the model's "capacity".

Intuitively, a model with more parameters will have more "memorization capacity" and therefore will be able to easily learn a perfect dictionary-like mapping between training samples and their targets, a mapping without any generalization power, but this would be useless when making predictions on previously unseen data.

Always keep this in mind: deep learning models tend to be good at fitting to the training data, but the real challenge is generalization, not fitting.

On the other hand, if the network has limited memorization resources, it will not be able to learn the mapping as easily. To minimize its loss, it will have to learn compressed representations that have more predictive power. At the same time, if you make your model too small, it will have difficulty fitting to the training data. There is a balance between "too much capacity" and "not enough capacity".

To find an appropriate model size, it's best to start with relatively few layers and parameters, then begin increasing the size of the layers or adding new layers until you see diminishing returns on the validation loss.

Start with a simple model using only `layers.Dense` as a baseline, then create larger versions, and compare them.




## Training procedure
Many models train better if you gradually reduce the learning rate during training. Use optimizers.schedules to reduce the learning rate over time:

In [None]:
lr_schedule = tf.keras.optimizers.schedules.InverseTimeDecay(
  0.001,
  decay_steps=STEPS_PER_EPOCH*1000,
  decay_rate=1,
  staircase=False)

The code above sets a `schedules.InverseTimeDecay` to hyperbolically decrease the learning rate to 1/2 of the base rate at 1000 epochs, 1/3 at 2000 epochs and so on.

In [None]:
step = np.linspace(0,100000)
lr = lr_schedule(step)
plt.figure(figsize = (8,6))
plt.plot(step/STEPS_PER_EPOCH, lr)
plt.ylim([0,max(plt.ylim())])
plt.xlabel('Epoch')
_ = plt.ylabel('Learning Rate')

For this tutorial, choose the ADAM optimizer

In [None]:
# Define the Optimizer
def get_optimizer():
    return tf.keras.optimizers.Adam(lr_schedule)

The training for this tutorial runs for many short epochs. To reduce the logging noise use the `tfdocs.EpochDots` which simply prints a `.` for each epoch, and a full set of metrics every 100 epochs.

Next include `callbacks.EarlyStopping` to avoid long and unnecessary training times. Note that this callback is set to monitor the `val_binary_crossentropy`, not the `val_loss`. This difference will be important later.

Use `callbacks.TensorBoard` to generate TensorBoard logs for the training.

In [None]:
def get_callbacks(name):
    return [
    tfdocs.modeling.EpochDots(),
    tf.keras.callbacks.EarlyStopping(monitor='val_binary_crossentropy', patience=200),
    tf.keras.callbacks.TensorBoard(logdir/name),
    ]

In [None]:
logdir = pathlib.Path(tempfile.mkdtemp())/"tensorboard_logs"
shutil.rmtree(logdir, ignore_errors=True) 

Similarly each model will use the same `Model.compile` and `Model.fit` settings. Choose the binary cross entropy loss function. To read more about this loss https://towardsdatascience.com/understanding-binary-cross-entropy-log-loss-a-visual-explanation-a3ac6025181a

In [None]:
def compile_and_fit(model, name, optimizer=None, max_epochs=10000):
    if optimizer is None:
        optimizer = get_optimizer()
    model.compile(optimizer=optimizer,
                loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
                metrics=[
                  tf.keras.losses.BinaryCrossentropy(
                      from_logits=True, name='binary_crossentropy'),
                  'accuracy'])

    model.summary()

    model.fit(
    train_ds,
    steps_per_epoch = STEPS_PER_EPOCH,
    epochs=max_epochs,
    validation_data=validate_ds,
    callbacks=get_callbacks(name),
    verbose=0)

## Tiny model
Start by training a model:

In [None]:
FEATURES = 28
tiny_model = tf.keras.Sequential([
    tf.keras.layers.Dense(16, activation='elu', input_shape=(FEATURES,)),
    tf.keras.layers.Dense(1)
])

In [None]:
compile_and_fit(tiny_model, 'sizes/Tiny')

### View in TensorBoard

These models all wrote TensorBoard logs during training.

Open an embedded  TensorBoard viewer inside a notebook:

In [None]:
 %load_ext tensorboard

In [None]:
#docs_infra: no_execute
%tensorboard --logdir {logdir}/sizes

## Large model
As an exercise, you can create an even larger model, and see how quickly it begins overfitting. Next, let's add to this benchmark a network that has much more capacity, far more than the problem would warrant:

In [None]:
large_model = tf.keras.Sequential([
    tf.keras.layers.Dense(512, activation='elu', input_shape=(FEATURES,)),
    tf.keras.layers.Dense(512, activation='elu'),
    tf.keras.layers.Dense(512, activation='elu'),
    tf.keras.layers.Dense(512, activation='elu'),
    tf.keras.layers.Dense(1)
])

In [None]:
compile_and_fit(large_model, "sizes/large")

In [None]:
#docs_infra: no_execute
%tensorboard --logdir {logdir}/sizes

## Strategies to prevent overfitting

## References
* https://androidkt.com/split-the-data-into-train-test-dev/
* https://www.tensorflow.org/tutorials/keras/overfit_and_underfit
* https://www.tensorflow.org/guide/tensor