#Introduction
This is a summary of the course's topics which highlights the elements of the course present in the official checklist of Tensorflow Exam. We will follow the course and do not focus on the storyline of each topic, something we have already done in the previous scripts, but we are going to pay attention on the single requirements of the checklist and examine them fully. 

#Part 1: Tensorflow Developer Skills
You need to demonstrate that you understand how to develop software programs using TensorFlow and
that you can find the information you need to work as an ML practitioner.

In [None]:
import tensorflow as tf
print(tf.__version__) # check the version (should be 2.x+)

2.8.0


## 1.1
Know how to program in Python, resolve Python issues, and compile and run Python programs	in PyCharm.
**TODO**


## 1.2
Know how to find information about TensorFlow APIs, including how to find guides and API references on tensorflow.org.
**DONE +-**


## 1.3
Know how to debug, investigate, and solve error messages from the TensorFlow API.
**DONE +-**


## 1.4
Know how to search beyond tensorflow.org, as and when necessary, to solve your TensorFlow	questions.
**DONE+-**

## 1.5
Know how to create ML models using TensorFlow where the model size is reasonable for the problem being solved.
**DONE +-**


## 1.6
Know how to save ML models and check the model file size.
**DONE +-**


There are two ways to save a model in TensorFlow:

1.   The [SavedModel format](https://www.tensorflow.org/tutorials/keras/save_and_load#savedmodel_format) (default).
2.   [The HDF5 format](https://www.tensorflow.org/tutorials/keras/save_and_load#hdf5_format)


The main difference between the two is the SavedModel is automatically able to save custom objects (such as special layers) without additional modifications when loading the model back in.

In [None]:
# Save a model using the SavedModel format
model.save('best_model_SavedModel_format')
# Save a model using the HDF5 format
model.save("best_model_HDF5_format.h5") # note the addition of '.h5' on the end

In [None]:
# Load a model from the SavedModel format
loaded_saved_model = tf.keras.models.load_model("best_model_SavedModel_format")
loaded_saved_model.summary()

## 1.7
Understand the compatibility discrepancies between different versions of TensorFlow
**TODO**

# Part 2: Building and training neural network models using TensorFlow 2.x
You need to understand the foundational principles of machine learning (ML) and deep learning (DL) using TensorFlow 2.x.

## 2.1
Use TensorFlow 2.x.
**DONE GENERIC**


## 2.2
Build, compile and train machine learning (ML) models using TensorFlow.
**DONE**


1.   Creating a model - piece together the layers of a neural network yourself (using the [Functional](https://www.tensorflow.org/guide/keras/functional) or [Sequential](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) API) 
2.   Compiling a model - defining Loss, Metrics and Optimizer
3.   Fitting a model



In [None]:
# Set random seed
tf.random.set_seed(42)

# Create a model using the Sequential API
model = tf.keras.Sequential([
  tf.keras.layers.Dense(1)
])

# Compile the model
model.compile(loss=tf.keras.losses.mae, # mae is short for mean absolute error
              optimizer=tf.keras.optimizers.SGD(), # SGD is short for stochastic gradient descent
              metrics=["mae"])

# Fit the model
# model.fit(X, y, epochs=5) # this will break with TensorFlow 2.7.0+
model.fit(tf.expand_dims(X, axis=-1), y, epochs=5)

## 2.3
Preprocess data to get it ready for use in a model.
**DONE BUT MAKE A DOCUMENT WITH THE DIFFERENT WAYS**

https://www.tensorflow.org/guide/data Ti fa fare un sacco di cose fiche nell'importing di varie datasources in tf.dataframe

Normalization and Standardization


1.   Normalization (scaling all values from their original range to be between 0 and 1
2.   Standardization which converts all of your data to unit variance and 0 mean.



When you are dealing with text data you have to convert string in numbers, we will see better in NLP part

Sometimes you have a non dummy variable so it is bettere to use hot-encoding

In [None]:
import pandas as pd
# Read in the insurance dataset
insurance = pd.read_csv("https://raw.githubusercontent.com/stedy/Machine-Learning-with-R-datasets/master/insurance.csv")
# Turn all categories into numbers
insurance_one_hot = pd.get_dummies(insurance)
insurance_one_hot.head() # view the converted columns

## 2.4
Use models to predict results.


In [None]:
# Make predictions
y_preds = model.predict(X_test)

## 2.5
Build sequential models with multiple layers.
**DONE +-**

[Functional](https://www.tensorflow.org/guide/keras/functional) or [Sequential](https://www.tensorflow.org/api_docs/python/tf/keras/Sequential) API

## 2.6
Build and train models for binary classification.
**DONE +-**

The principal solution with a binary classification is to use a sigmoid activation function in the output layer, binary cross entropy as loss function and accuracy as base metric(we have a plenty of binary classification metrics for more elaborated model comparisons)

In [None]:
##### BINARY CLASSIFICATION BASIC EXAMPLE
# Set random seed
tf.random.set_seed(42)

# Create a model
model_7 = tf.keras.Sequential([
  tf.keras.layers.Dense(4, activation=tf.keras.activations.relu), # hidden layer 1, ReLU activation
  tf.keras.layers.Dense(4, activation=tf.keras.activations.relu), # hidden layer 2, ReLU activation
  tf.keras.layers.Dense(1, activation=tf.keras.activations.sigmoid) # ouput layer, sigmoid activation #############################
])

# Compile the model
model_7.compile(loss=tf.keras.losses.binary_crossentropy, #LOSS FUNCTION ==> BINARY CROSS ENTROPY ####################################
                optimizer=tf.keras.optimizers.Adam(),
                metrics=['accuracy']) #MOST USED EVALUATION METRIC ==> ACCURACY

# Fit the model
history = model_7.fit(X, y, epochs=100, verbose=0)

## 2.7
Build and train models for multi-class categorization.
**DONE +-**

It's mainly the same of binary, we have to change:

1.   Output size: numer of classes
2.   Output activation function: softmax
3.   Loss Function: Categorical Cross Entropy(CCE)
        31.   Sparse CCE if the labels are expressed in integer form
        32.   CCE if labels are expressed in hot-encoding form







## 2.8
Plot loss and accuracy of a trained model.
**DONE**

Faster way is to save the history, the output of .fit function of a model, and use Pandas to plot it. The content of the history will depend on the loss function we use, the metrics we set in compiling the model and optionally some callback that writes its information in history

In [None]:
# This is a regression model, so we do not have the accuracy, for more lines in the plot we need to specify more metrics in the compiling part of the model
history_2 = insurance_model_2.fit(X_train, y_train, epochs=100, verbose=0)
# Plot the model trained for 200 total epochs loss curves
pd.DataFrame(history_2.history).plot()
plt.ylabel("loss")
plt.xlabel("epochs"); # note: epochs will only show 100 since we overrid the history variable

## 2.9
Identify strategies to prevent overfitting, including augmentation and dropout.
**DONE BUT MAKE DOCUMENT**

Augmentation is a preventing overfitting technique used only for image recognition, it consist on randomly alter the images of train dataset in way to not make learning too dependent on train set(overfitting)

In TF we can do it in two ways:


*   Using ImageDataGenerator object
*   Creating an ad hoc layer in the model with tf.keras.layers.experimental.preprocessing



In [None]:
# Create ImageDataGenerator training instance with data augmentation
train_datagen_augmented = ImageDataGenerator(rescale=1/255.,
                                             rotation_range=20, # rotate the image slightly between 0 and 20 degrees (note: this is an int not a float)
                                             shear_range=0.2, # shear the image
                                             zoom_range=0.2, # zoom into the image
                                             width_shift_range=0.2, # shift the image width ways
                                             height_shift_range=0.2, # shift the image height ways
                                             horizontal_flip=True) # flip the image on the horizontal axis

# Create ImageDataGenerator training instance without data augmentation
train_datagen = ImageDataGenerator(rescale=1/255.) 

# Create ImageDataGenerator test instance without data augmentation
test_datagen = ImageDataGenerator(rescale=1/255.)

# Import data and augment it from training directory
print("Augmented training images:")
train_data_augmented = train_datagen_augmented.flow_from_directory(train_dir,
                                                                   target_size=(224, 224),
                                                                   batch_size=32,
                                                                   class_mode='binary',
                                                                   shuffle=False) # Don't shuffle for demonstration purposes, usually a good thing to shuffle

# Create non-augmented data batches
print("Non-augmented training images:")
train_data = train_datagen.flow_from_directory(train_dir,
                                               target_size=(224, 224),
                                               batch_size=32,
                                               class_mode='binary',
                                               shuffle=False) # Don't shuffle for demonstration purposes

print("Unchanged test images:")
test_data = test_datagen.flow_from_directory(test_dir,
                                             target_size=(224, 224),
                                             batch_size=32,
                                             class_mode='binary')

In [None]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing

# Create a data augmentation stage with horizontal flipping, rotations, zooms
data_augmentation = keras.Sequential([
  preprocessing.RandomFlip("horizontal"),
  preprocessing.RandomRotation(0.2),
  preprocessing.RandomZoom(0.2),
  preprocessing.RandomHeight(0.2),
  preprocessing.RandomWidth(0.2),
  # preprocessing.Rescaling(1./255) # keep for ResNet50V2, remove for EfficientNetB0
], name ="data_augmentation")

# and use it in a model with functional API, it should work even with sequential API
# Setup input shape and base model, freezing the base model layers
input_shape = (224, 224, 3)
base_model = tf.keras.applications.EfficientNetB0(include_top=False)
base_model.trainable = False
# Create input layer
inputs = layers.Input(shape=input_shape, name="input_layer")
# Add in data augmentation Sequential model as a layer
x = data_augmentation(inputs)
# Give base_model inputs (after augmentation) and don't train it
x = base_model(x, training=False)
# Pool output features of base model
x = layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(x)
# Put a dense layer on as the output
outputs = layers.Dense(10, activation="softmax", name="output_layer")(x)
# Make a model with inputs and outputs
model_1 = keras.Model(inputs, outputs)

The Dropout layer randomly sets input units to 0 with a frequency of rate at each step during training time, which helps prevent overfitting. Inputs not set to 0 are scaled up by 1/(1 - rate) such that the sum over all inputs is unchanged.

In [None]:
combined_dropout = layers.Dropout(0.5)(token_char_concat)

## 2.10
Use pretrained models (transfer learning).
**DONE +-**

Tranfer learning is using a model already trained in way to not waste time in training if the model is huge, the easier way is to use the original model as it is, but we can even modify the output layer in way to adapt the pretrained model to our task(FEATURE EXTRACTION)

In [None]:
# Resnet 50 V2 feature vector
resnet_url = "https://tfhub.dev/google/imagenet/resnet_v2_50/feature_vector/4"

# Download the pretrained model and save it as a Keras layer
feature_extractor_layer = hub.KerasLayer(resnet_url,
                                          trainable=False, # freeze the underlying patterns
                                          name='feature_extraction_layer',
                                          input_shape=IMAGE_SHAPE+(3,)) # define the input image shape

# Create our own model
resnet_model = tf.keras.Sequential([
  feature_extractor_layer, # use the feature extraction layer as the base
  layers.Dense(num_classes, activation='softmax', name='output_layer') # create our own output layer      
])

# Compile
resnet_model.compile(loss='categorical_crossentropy',
                     optimizer=tf.keras.optimizers.Adam(),
                     metrics=['accuracy'])
# Fit the model
resnet_history = resnet_model.fit(train_data_10_percent,
                                  epochs=5,
                                  steps_per_epoch=len(train_data_10_percent),
                                  validation_data=test_data,
                                  validation_steps=len(test_data),
                                  # Add TensorBoard callback to model (callbacks parameter takes a list)
                                  callbacks=[create_tensorboard_callback(dir_name="tensorflow_hub", # save experiment logs here
                                                                         experiment_name="resnet50V2")]) # name of log files


## 2.11
Extract features from pre-trained models.
**DONE +-**

Fine tuning is a more complex way, it consist to get a pretrained model, maybe adjusting the output layer, and train again the last layers (not only the output layer), it takes more time but can archieve better performance.

In the code below we will train a model with feature extraction for 5 epochs and then we will unfreeze the last layers and train the modeel for 5 epochs more 

In [None]:
## FEATURE EXTRACTION
# Create a functional model with data augmentation
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras.layers.experimental import preprocessing
from tensorflow.keras.models import Sequential

# Build data augmentation layer
data_augmentation = Sequential([
  preprocessing.RandomFlip('horizontal'),
  preprocessing.RandomHeight(0.2),
  preprocessing.RandomWidth(0.2),
  preprocessing.RandomZoom(0.2),
  preprocessing.RandomRotation(0.2),
  # preprocessing.Rescaling(1./255) # keep for ResNet50V2, remove for EfficientNet                 
], name="data_augmentation")

# Setup the input shape to our model
input_shape = (224, 224, 3)

# Create a frozen base model
base_model = tf.keras.applications.EfficientNetB0(include_top=False)
base_model.trainable = False

# Create input and output layers
inputs = layers.Input(shape=input_shape, name="input_layer") # create input layer
x = data_augmentation(inputs) # augment our training images
x = base_model(x, training=False) # pass augmented images to base model but keep it in inference mode, so batchnorm layers don't get updated: https://keras.io/guides/transfer_learning/#build-a-model 
x = layers.GlobalAveragePooling2D(name="global_average_pooling_layer")(x)
outputs = layers.Dense(10, activation="softmax", name="output_layer")(x)
model_2 = tf.keras.Model(inputs, outputs)

# Compile
model_2.compile(loss="categorical_crossentropy",
              optimizer=tf.keras.optimizers.Adam(lr=0.001), # use Adam optimizer with base learning rate
              metrics=["accuracy"])

# Fit the model saving checkpoints every epoch
initial_epochs = 5
history_10_percent_data_aug = model_2.fit(train_data_10_percent,
                                          epochs=initial_epochs,
                                          validation_data=test_data,
                                          validation_steps=int(0.25 * len(test_data)), # do less steps per validation (quicker)
                                          callbacks=[create_tensorboard_callback("transfer_learning", "10_percent_data_aug")])

# FINE TUNING
base_model.trainable = True

# Freeze all layers except for the
for layer in base_model.layers[:-10]:
  layer.trainable = False

# Recompile the model (always recompile after any adjustments to a model)
model_2.compile(loss="categorical_crossentropy",
              optimizer=tf.keras.optimizers.Adam(lr=0.0001), # lr is 10x lower than before for fine-tuning
              metrics=["accuracy"])

# Fine tune for another 5 epochs
fine_tune_epochs = initial_epochs + 5

# Refit the model (same as model_2 except with more trainable layers)
history_fine_10_percent_data_aug = model_2.fit(train_data_10_percent,
                                               epochs=fine_tune_epochs,
                                               validation_data=test_data,
                                               initial_epoch=history_10_percent_data_aug.epoch[-1], # start from previous last epoch
                                               validation_steps=int(0.25 * len(test_data)),
                                               callbacks=[create_tensorboard_callback("transfer_learning", "10_percent_fine_tune_last_10")]) # name experime

## 2.12
Ensure that inputs to a model are in the correct shape.
**DONE +-**

## 2.13
Ensure that you can match test data to the input shape of a neural network.
**DONE BUT MAKE DOCUMENT**

## 2.14
Ensure you can match output data of a neural network to specified input shape for test data.
**DONE +-**

## 2.15
Understand batch loading of data.
**DONE +-**

## 2.16
Use callbacks to trigger the end of training cycles.
**DONE BUT MAKE DOCUMENT**

Callbacks are functions operating inside the training of NN, this functions have a purpose and give an output at the end of every epoch, in this way you can save information at every epochs and make actions based on this info(early stopping, learning rate adaptation)

Kind of callbacks we saw:
1.   TensorBoard callback: saves model metrics in a log_dir in way that you can acces them using TensorBoard
2.   Model ChackPoint callback: saves the model every epoch, or every performance increasing epoch
3.   Early Stopping callback
4.   Learning Rate Adaptation





In [None]:
##TENSORBOARD CALLBACK
# Create tensorboard callback (functionized because need to create a new one for each model)
import datetime
def create_tensorboard_callback(dir_name, experiment_name):
  log_dir = dir_name + "/" + experiment_name + "/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
  tensorboard_callback = tf.keras.callbacks.TensorBoard(
      log_dir=log_dir
  )
  print(f"Saving TensorBoard log files to: {log_dir}")
  return tensorboard_callback

# Use it in training
# Fit the model
resnet_history = resnet_model.fit(train_data_10_percent,
                                  epochs=5,
                                  steps_per_epoch=len(train_data_10_percent),
                                  validation_data=test_data,
                                  validation_steps=len(test_data),
                                  # Add TensorBoard callback to model (callbacks parameter takes a list)
                                  callbacks=[create_tensorboard_callback(dir_name="tensorflow_hub", # save experiment logs here
                                                                         experiment_name="resnet50V2")]) # name of log files

## MODEL CHACKPOINT CALLBACK
# Setup checkpoint path
checkpoint_path = "ten_percent_model_checkpoints_weights/checkpoint.ckpt" # note: remember saving directly to Colab is temporary

# Create a ModelCheckpoint callback that saves the model's weights only
checkpoint_callback = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_path,
                                                         save_weights_only=True, # set to False to save the entire model
                                                         save_best_only=False, # set to True to save only the best model instead of a model every epoch 
                                                         save_freq="epoch", # save every epoch
                                                         verbose=1)
## EARLY STOPPING CALLBACK
# Setup EarlyStopping callback to stop training if model's val_loss doesn't improve for 3 epochs
early_stopping = tf.keras.callbacks.EarlyStopping(monitor="val_loss", # watch the val loss metric
                                                  patience=3) # if val loss decreases for 3 epochs in a row, stop training

## LEARNING RATE ADAPTATION CALLBACK
# Creating learning rate reduction callback
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(monitor="val_loss",  
                                                 factor=0.2, # multiply the learning rate by 0.2 (reduce by 5x)
                                                 patience=2,
                                                 verbose=1, # print out when learning rate goes down 
                                                 min_lr=1e-7)


## 2.17
Use datasets from different sources.
**DONE BUT MAKE DOCUMENT**


tf.dataset are useful because can be managed by GPU, this increases the performance of training, that's the reason why tf.keras.prepreprocessing.image_dataset_from_directory() outperform ImageDataGenerator **TODO**

## 2.18
Use datasets in different formats, including json and csv.
**DONE +-**

## 2.19
Use datasets from tf.data.datasets.
**TODO**

tf.dataset are faster in preprocessing because the function operating on them are computated by GPU and is possible to apply mixed precision training, but are a bit more complex to manage:


1.   Have to be scaled and resized with functions
2.   Have to be batched and prepared for training



tf.dataset data preprocessing


1.   Cast data in tensors
2.   Image resizing to a standard size
3.   Scaling values in tensors

For semplicity you can create a function doing this trasformation and apply to tf.dataset using **.map()** function





In [None]:
# Cast data
tf.cast(image, tf.float32)
# Image resizing
img_shape = 224
tf.image.resize(image, [img_shape, img_shape])
#scaling values
image = image/255

tf.dataset DATA PREPARATION AND BATCHING
1.  Appling all the data preprocessing trasformation using map()
2.  Sufffle the element in dataset unsing **.shuffle()** function with **buffer_size** as parameter, the buffer should theorically be the entire dataset, but for memory reasons we have to do it in batches, i.e. the **buffer_size**
3.  Batching the data to make the training sustainable
4.  Prefetch the data in way to parallelize computation between CPU(preprocessing and preparation) and GPU(actual training of the model)
5.  Cache the data in a target dataset in way to avoid loading time, but for now we do not use it 


In [None]:
# Map preprocessing function to training data (and paralellize)
train_data = train_data.map(map_func=preprocess_img, num_parallel_calls=tf.data.AUTOTUNE)
# Shuffle train_data and turn it into batches and prefetch it (load it faster)
train_data = train_data.shuffle(buffer_size=1000).batch(batch_size=32).prefetch(buffer_size=tf.data.AUTOTUNE)

# Map prepreprocessing function to test data
test_data = test_data.map(preprocess_img, num_parallel_calls=tf.data.AUTOTUNE)
# Turn test data into batches (don't need to shuffle)
test_data = test_data.batch(32).prefetch(tf.data.AUTOTUNE)

## 2.20
Split data in train and test set

Easy and naive way is to define a splitting proportion and index the array to divide test and train, remember that is not a random split

In [None]:
import numpy as np
# Make a bigger dataset
X = np.arange(-100, 100, 4)
# Make labels for the dataset (adhering to the same pattern as before)
y = np.arange(-90, 110, 4)
print(f"Len of data: {len(X)}")
# Split data into train and test sets
X_train = X[:40] # first 40 examples (80% of data)
y_train = y[:40]

X_test = X[40:] # last 10 examples (20% of data)
y_test = y[40:]

len(X_train), len(X_test)

Len of data: 50


(40, 10)

Using sklearn.train_test_split to split randomly the data 

In [None]:
# Create training and test sets
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, 
                                                    y, 
                                                    test_size=0.2, 
                                                    random_state=42) # set random state for reproducible splits

##2.21
Enable mixed precision training and check it

Mixed precisiono trainig use float16, instead of float32, so we can speed up the trainig time losing a bit of precision in computation

In [None]:
# Turn on mixed precision training
from tensorflow.keras import mixed_precision
mixed_precision.set_global_policy(policy="mixed_float16") # set global policy to mixed precision
mixed_precision.global_policy() # should output "mixed_float16"

# Part 3: Image classification
You need to understand how to build image recognition and object detection models with deep neural
networks and convolutional neural networks using TensorFlow 2.x.

##3.1
Define Convolutional neural networks with Conv2D and pooling layers.

Conv2D takes the following input shapes, at least 3 rank matrix

*   4+D tensor with shape: batch_shape + (channels, rows, cols) if 
data_format='channels_first' or 4+D tensor with shape: batch_shape + (rows, cols, channels) if data_format='channels_last'.

MaxPool2D takes an input shape of 2 rank matrix

In [None]:
#Example of model with Conv2D and MaxPool2D
# Create the model (this can be our baseline, a 3 layer Convolutional Neural Network)
model_4 = Sequential([
  Conv2D( # The "2D" means our inputs are two dimensional (height and width), even though they have 3 colour channels, the convolutions are run on each channel invididually.
         filters=10, # these are the number of "feature extractors" that will be moving over our images.
         kernel_size=3, # the size of our filters, a kernel_size of (3, 3) will mean each filter will have the size 3x3. The smaller the kernel, the more fine-grained features it will extract.
         strides=1, # the number of pixels a filter will move across as it covers the image. A stride of 1 means the filter moves across each pixel 1 by 1.
         padding='valid', # this can be either 'same' or 'valid', 'same' adds zeros the to outside of the image so the resulting output of the convolutional layer is the same as the input,
         activation='relu', 
         input_shape=(224, 224, 3)), # input layer (specify input shape)
  Conv2D(10, 3, activation='relu'),
  Conv2D(10, 3, activation='relu'),
  Flatten(),
  Dense(1, activation='sigmoid') # output layer (specify output shape)
])
# Compile the model
model_4.compile(loss='binary_crossentropy',
                optimizer=Adam(),
                metrics=['accuracy'])

In [None]:
# Check lengths of training and test data generators
len(train_data), len(test_data)

# Fit the model
history_4 = model_4.fit(train_data,
                        epochs=5,
                        steps_per_epoch=len(train_data), #this is the number of batches a model will go through per epoch,
                        #in our case, we want our model to go through all batches so it's equal to the length of train_data (1500 images in batches of 32 = 1500/32 = ~47 steps)
                        validation_data=test_data,
                        validation_steps=len(test_data)) #same as above, except for the validation_data parameter

For multi-class classification we had to make 3 main changes:


1.   Changing the output layer to use have n ouput neurons (the same number as the number of classes we have).
2.   Changing the output layer to use 'softmax' activation instead of 'sigmoid' activation.
3.   Changing the loss function to be 'categorical_crossentropy' instead of 'binary_crossentropy'.




## 3.2
Build and train models to process real-world image datasets.


We need to reshape the real world images, usual size is (224, 224, 3),

##3.3
Understand how to use convolutions to improve your neural network.



Some tips to improve the image classification model are:
*   Increase the number of model layers (e.g. add more convolutional layers).
*   Increase the number of filters in each convolutional layer (e.g. from 10 to 32, 64, or 128, these numbers aren't set in stone either, they are usually found through trial and error).
*   Train for longer (more epochs).
*   Finding an ideal learning rate.
*   Get more data (give the model more opportunities to learn).
*   Use transfer learning to leverage what another image model has learned and adjust it for our own use case.







##3.4
Use real-world images in different shapes and sizes.



*   See point 3.6, the parameter target_size in train_datagen.flow_from_directory reshapes the image 
*   We can also use tf.io.read_file and tf.image, the example code is below 

In [None]:
# Read in target file (an image)
img = tf.io.read_file(filename)

# Decode the read file into a tensor & ensure 3 colour channels 
# (our model is trained on images with 3 colour channels and sometimes images have 4 colour channels)
img = tf.image.decode_image(img, channels=3)

# Resize the image (to the same size our model was trained on)
img = tf.image.resize(img, size = [img_shape, img_shape])

# Rescale the image (get all values between 0 and 1)
img = img/255.

#Thaìn you have to manage the batch size dimension, by adding a dimension to the image tensor
img = tf.expand_dims(img, axis=0) # add an extra dimension at axis 0

##3.5
Use image augmentation to prevent overfitting.

We can increase the number of layers with pooling, bot MaxPool and AvgPool ==> We have seen how to do it in the build part of the model 

We can use data augmentation, which is the process of altering our training data, leading to it having more diversity and in turn allowing our models to learn more generalizable patterns.

In [None]:
#We can do it directly with the ImageDataGenerator Object initialization
# Create ImageDataGenerator training instance with data augmentation
train_datagen_augmented = ImageDataGenerator(rescale=1/255.,
                                             rotation_range=20, # rotate the image slightly between 0 and 20 degrees (note: this is an int not a float)
                                             shear_range=0.2, # shear the image
                                             zoom_range=0.2, # zoom into the image
                                             width_shift_range=0.2, # shift the image width ways
                                             height_shift_range=0.2, # shift the image height ways
                                             horizontal_flip=True) # flip the image on the horizontal axis

# Create ImageDataGenerator training instance without data augmentation
train_datagen = ImageDataGenerator(rescale=1/255.) 

# Create ImageDataGenerator test instance without data augmentation
test_datagen = ImageDataGenerator(rescale=1/255.)

# Import data and augment it from directories
train_data_augmented_shuffled = train_datagen_augmented.flow_from_directory(train_dir,
                                                                            target_size=(224, 224),
                                                                            batch_size=32,
                                                                            class_mode='binary',
                                                                            shuffle=True) # Shuffle data (default) Means that shuffles the data between the labels in each batch
                                                                            # if its false it passes to batch at firts all the images from a directory and the the other one

Other ways to prevent overfitting, preventing overfitting is also referred to as regularization:


1.   Get more data - Having more data gives the model more opportunities to learn patterns, patterns which may be more generalizable to new examples.
2.   Simplify model - If the current model is already overfitting the training data, it may be too complicated of a model. This means it's learning the patterns of the data too well and isn't able to generalize well to unseen data. One way to simplify a model is to reduce the number of layers it uses or to reduce the number of hidden units in each layer.
3.   Use data augmentation - Data augmentation manipulates the training data in a way so that's harder for the model to learn as it artificially adds more variety to the data. If a model is able to learn patterns in augmented data, the model may be able to generalize better to unseen data.
4.   Use transfer learning - Transfer learning involves leverages the patterns (also called pretrained weights) one model has learned to use as the foundation for your own task. In our case, we could use one computer vision model pretrained on a large variety of images and then tweak it slightly to be more specialized for food images.





##3.6
Use ImageDataGenerator.


ImageDataGeneratir is a usefull tool when you have images splitted into directories, it enables you to rescale and resize the images, augmented them and also creates a stream of batches from the train and test directories which flows directly inside the training neural network

In [None]:
# Create train and test data generators and rescale the data 
from tensorflow.keras.preprocessing.image import ImageDataGenerator
train_datagen = ImageDataGenerator(rescale=1/255.)
test_datagen = ImageDataGenerator(rescale=1/255.)

# Turn it into batches
train_data = train_datagen.flow_from_directory(directory=train_dir, # define the directory of train data, note that it has 2 subdir for the 2 classes, the model takes the labels from subdir names
                                               target_size=(224, 224), # define the target size in which we will reshape 
                                               class_mode='binary', # define the classification type, if classes are > 2 we have to write 'categorical'
                                               batch_size=32) #define batch size

test_data = test_datagen.flow_from_directory(directory=test_dir,
                                             target_size=(224, 224),
                                             class_mode='binary',
                                             batch_size=32)

In [None]:
# Get a sample of the training data batch 
images, labels = train_data.next() # get the 'next' batch of images/labels
len(images), len(labels)

##3.7
Understand how ImageDataGenerator labels images based on the directory structure.

# Part 4: NLP
You need to understand how to use neural networks to solve natural language processing problems
using TensorFlow.

##4.1
Build natural language processing systems using TensorFlow.


##4.2
Prepare text to use in TensorFlow models. TOKENIZATION


 A straight mapping from word or character or sub-word to a numerical value, there are 3 levels:
 1.  Using word-level tokenization with the sentence "I love TensorFlow" might result in "I" being 0, "love" being 1 and "TensorFlow" being 2. In this case, every word in a sequence considered a single token.
 2.  Character-level tokenization, such as converting the letters A-Z to values 1-26. In this case, every character in a sequence considered a single token.
 3.  Sub-word tokenization is in between word-level and character-level tokenization. It involves breaking invidual words into smaller parts and then converting those smaller parts into numbers. For example, "my favourite food is pineapple pizza" might become "my, fav, avour, rite, fo, oo, od, is, pin, ine, app, le, piz, za". After doing this, these sub-words would then be mapped to a numerical value. In this case, every word could be considered multiple tokens.

tf.keras.layers.experimental.preprocessing.TextVectorization **DOCUMENTATION**
1.  max_tokens - The maximum number of words in your vocabulary (e.g. 20000 or the number of unique words in your text), includes a value for OOV (out of vocabulary) tokens.
2.  standardize - Method for standardizing text. Default is "lower_and_strip_punctuation" which lowers text and removes all punctuation marks.
3.  split - How to split text, default is "whitespace" which splits on spaces.
4.  ngrams - How many words to contain per token split, for example, ngrams=2 splits tokens into continuous sequences of 2.
5.  output_mode - How to output tokens, can be "int" (integer mapping), "binary" (one-hot encoding), "count" or "tf-idf". See documentation for more.
6.  output_sequence_length - Length of tokenized sequence to output. For example, if output_sequence_length=150, all tokenized sequences will be 150 tokens long.
7.  pad_to_max_tokens - Defaults to False, if True, the output feature axis will be padded to max_tokens even if the number of unique tokens in the vocabulary is less than max_tokens. Only valid in certain modes, see docs for more.

In [None]:
import tensorflow as tf
from tensorflow.keras.layers.experimental.preprocessing import TextVectorization
# Setup text vectorization with custom variables
max_vocab_length = 10000 # max number of words to have in our vocabulary
max_length = 15 # max length our sequences will be (e.g. how many words from a Tweet does our model see?)

text_vectorizer = TextVectorization(max_tokens=max_vocab_length,
                                    output_mode="int",
                                    output_sequence_length=max_length)
# Fit the text vectorizer to the training text
text_vectorizer.adapt(train_sentences)
# Create sample sentence and tokenize it
sample_sentence = "There's a flood in my street!"
text_vectorizer([sample_sentence])

##4.3
Build models that identify the category of a piece of text using binary categorization


In [None]:
# Example of basic dense model for binary categorization, it's very bad in performances 
# Build model with the Functional API
from tensorflow.keras import layers
inputs = layers.Input(shape=(1,), dtype="string") # inputs are 1-dimensional strings
x = text_vectorizer(inputs) # turn the input text into numbers
x = embedding(x) # create an embedding of the numerized numbers
x = layers.GlobalAveragePooling1D()(x) # lower the dimensionality of the embedding (try running the model without this layer and see what happens)
outputs = layers.Dense(1, activation="sigmoid")(x) # create the output layer, want binary outputs so use sigmoid activation
model_1 = tf.keras.Model(inputs, outputs, name="model_1_dense") # construct the model

# Compile model
model_1.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])
# Fit the model
model_1_history = model_1.fit(train_sentences, # input sentences can be a list of strings due to text preprocessing layer built-in model
                              train_labels,
                              epochs=5,
                              validation_data=(val_sentences, val_labels),
                              callbacks=[create_tensorboard_callback(dir_name=SAVE_DIR, 
                                                                     experiment_name="simple_dense_model")])

##4.4
Build models that identify the category of a piece of text using multi-class categorization


Basically the difference between multinomial and binary categorization remain the same, such as the loss function, the numbers of neurons and activation function of the output layer, but in this poin we will explore the structure of complex multi-class models in way to understand how to build them.

A complex model can be structured as concatenation of simple models and the Functional API gives use a powerful tool to implement them, some focal points:
1.  Contruct the simple model part
2.  Concatenate the simple model output, using concatenation layers, i.e. layers of the final NN with the only purpose to concatenate matrix in way to have a tensor of inputs for the complex layers of our NN
3.  Define the complessive model using the output layer as the result of the overall functional transformations of the previous layers and the model inputs as the concatenation of the very first layers of the different parts of NN 
4.  The last part is to generate a tf.dataset matching with the inputs of the overall model and then zip the labels, the best way is to using **tf.data.Dataset.from_tensor_slices**, you can think **tf.data.Dataset.from_tensor_slices** as a structure which contains different tensors(different datatype and tensors sizes are possible) in way to have all the needed features for the training, do not think about a linear algebra concatenatrion is something more flexible. 

In [None]:
# DEFINITION OF THE COMPLEX MODEL
# 1. Token inputs
token_inputs = layers.Input(shape=[], dtype="string", name="token_inputs")
token_embeddings = tf_hub_embedding_layer(token_inputs)
token_outputs = layers.Dense(128, activation="relu")(token_embeddings)
token_model = tf.keras.Model(inputs=token_inputs,
                             outputs=token_outputs)

# 2. Char inputs
char_inputs = layers.Input(shape=(1,), dtype="string", name="char_inputs")
char_vectors = char_vectorizer(char_inputs)
char_embeddings = char_embed(char_vectors)
char_bi_lstm = layers.Bidirectional(layers.LSTM(32))(char_embeddings)
char_model = tf.keras.Model(inputs=char_inputs,
                            outputs=char_bi_lstm)

# 3. Line numbers inputs
line_number_inputs = layers.Input(shape=(15,), dtype=tf.int32, name="line_number_input")
x = layers.Dense(32, activation="relu")(line_number_inputs)
line_number_model = tf.keras.Model(inputs=line_number_inputs,
                                   outputs=x)

# 4. Total lines inputs
total_lines_inputs = layers.Input(shape=(20,), dtype=tf.int32, name="total_lines_input")
y = layers.Dense(32, activation="relu")(total_lines_inputs)
total_line_model = tf.keras.Model(inputs=total_lines_inputs,
                                  outputs=y)

# 5. Combine token and char embeddings into a hybrid embedding
combined_embeddings = layers.Concatenate(name="token_char_hybrid_embedding")([token_model.output, 
                                                                              char_model.output])
z = layers.Dense(256, activation="relu")(combined_embeddings)
z = layers.Dropout(0.5)(z)

# 6. Combine positional embeddings with combined token and char embeddings into a tribrid embedding
z = layers.Concatenate(name="token_char_positional_embedding")([line_number_model.output,
                                                                total_line_model.output,
                                                                z])

# 7. Create output layer
output_layer = layers.Dense(5, activation="softmax", name="output_layer")(z)

# 8. Put together model
model_5 = tf.keras.Model(inputs=[line_number_model.input,
                                 total_line_model.input,
                                 token_model.input, 
                                 char_model.input],
                         outputs=output_layer)

In [None]:
## DEFINITION OF TRAIN AND TEST DATASET
# Create training and validation datasets (all four kinds of inputs)
train_pos_char_token_data = tf.data.Dataset.from_tensor_slices((train_line_numbers_one_hot, # line numbers
                                                                train_total_lines_one_hot, # total lines
                                                                train_sentences, # train tokens
                                                                train_chars)) # train chars
train_pos_char_token_labels = tf.data.Dataset.from_tensor_slices(train_labels_one_hot) # train labels
train_pos_char_token_dataset = tf.data.Dataset.zip((train_pos_char_token_data, train_pos_char_token_labels)) # combine data and labels
train_pos_char_token_dataset = train_pos_char_token_dataset.batch(32).prefetch(tf.data.AUTOTUNE) # turn into batches and prefetch appropriately

# Validation dataset
val_pos_char_token_data = tf.data.Dataset.from_tensor_slices((val_line_numbers_one_hot,
                                                              val_total_lines_one_hot,
                                                              val_sentences,
                                                              val_chars))
val_pos_char_token_labels = tf.data.Dataset.from_tensor_slices(val_labels_one_hot)
val_pos_char_token_dataset = tf.data.Dataset.zip((val_pos_char_token_data, val_pos_char_token_labels))
val_pos_char_token_dataset = val_pos_char_token_dataset.batch(32).prefetch(tf.data.AUTOTUNE) # turn into batches and prefetch appropriately

# Check input shapes
train_pos_char_token_dataset, val_pos_char_token_dataset

##4.5
Use word embeddings in your TensorFlow model.


An embedding is a representation of natural language which can be learned. It is a semantic, between strings and their relation are expressed in distances in the vector space. 
1.  **Create your own embedding** - Once your text has been turned into numbers (required for an embedding), you can put them through an embedding layer (such as tf.keras.layers.Embedding) and an embedding representation will be learned during model training.
2.  **Reuse a pre-learned embedding** - Many pre-trained embeddings exist online. These pre-trained embeddings have often been learned on large corpuses of text (such as all of Wikipedia) and thus have a good underlying representation of natural language. You can use a pre-trained embedding to initialize your model and fine-tune it to your own specific task.

tf.keras.layers.Embedding **DOCUMENTATION**
1.  input_dim - The size of the vocabulary (e.g. len(text_vectorizer.get_vocabulary()).
2.  output_dim - The size of the output embedding vector, for example, a value of 100 outputs a feature vector of size 100 for each word.
3.  embeddings_initializer - How to initialize the embeddings matrix, default is "uniform" which randomly initalizes embedding matrix with uniform distribution. This can be changed for using pre-learned embeddings.
4.  input_length - Length of sequences being passed to embedding layer.

In [None]:
tf.random.set_seed(42)
from tensorflow.keras import layers

embedding = layers.Embedding(input_dim=max_vocab_length, # set input shape
                             output_dim=128, # set size of embedding vector
                             embeddings_initializer="uniform", # default, intialize randomly
                             input_length=max_length, # how long is each input
                             name="embedding_1") 

embedding
# Get a random sentence from training set
random_sentence = random.choice(train_sentences)
print(f"Original text:\n{random_sentence}\
      \n\nEmbedded version:")

# Embed the random sentence (turn it into numerical representation)
sample_embed = embedding(text_vectorizer([random_sentence]))
sample_embed

##4.6
Use LSTMs in your model to classify text for either binary or multi-class categorization.


How LSTM theorically works is well known, what we say here is that in Keras you have only to be careful of the input of the LSTM layer, in the case above the ambedding layer output have shape (None, 15, 128), Keras LSTM have always, at least, 3 dim whom first is batch size. And LSTM output is (None, 64)

In [None]:
# LSTM exmaple in binary categorization, with multi-class is the same
# Set random seed and create embedding layer (new embedding layer for each model)
tf.random.set_seed(42)
from tensorflow.keras import layers
model_2_embedding = layers.Embedding(input_dim=max_vocab_length,
                                     output_dim=128,
                                     embeddings_initializer="uniform",
                                     input_length=max_length,
                                     name="embedding_2")


# Create LSTM model
inputs = layers.Input(shape=(1,), dtype="string")
x = text_vectorizer(inputs)
x = model_2_embedding(x)
print(x.shape)
# x = layers.LSTM(64, return_sequences=True)(x) # return vector for each word in the Tweet (you can stack RNN cells as long as return_sequences=True)
x = layers.LSTM(64)(x) # return vector for whole sequence
print(x.shape)
# x = layers.Dense(64, activation="relu")(x) # optional dense layer on top of output of LSTM cell
outputs = layers.Dense(1, activation="sigmoid")(x)
model_2 = tf.keras.Model(inputs, outputs, name="model_2_LSTM")

# Compile model
model_2.compile(loss="binary_crossentropy",
                optimizer=tf.keras.optimizers.Adam(),
                metrics=["accuracy"])

# Fit model
model_2_history = model_2.fit(train_sentences,
                              train_labels,
                              epochs=5,
                              validation_data=(val_sentences, val_labels),
                              callbacks=[create_tensorboard_callback(SAVE_DIR, 
                                                                     "LSTM")])

## 4.7
Add RNN and GRU layers to your model.


In [None]:
# Using GRU layer, compile and fit steps are the same
# Set random seed and create embedding layer (new embedding layer for each model)
tf.random.set_seed(42)
from tensorflow.keras import layers
model_3_embedding = layers.Embedding(input_dim=max_vocab_length,
                                     output_dim=128,
                                     embeddings_initializer="uniform",
                                     input_length=max_length,
                                     name="embedding_3")

# Build an RNN using the GRU cell
inputs = layers.Input(shape=(1,), dtype="string")
x = text_vectorizer(inputs)
x = model_3_embedding(x)
# x = layers.GRU(64, return_sequences=True) # stacking recurrent cells requires return_sequences=True
x = layers.GRU(64)(x) 
# x = layers.Dense(64, activation="relu")(x) # optional dense layer after GRU cell
outputs = layers.Dense(1, activation="sigmoid")(x)
model_3 = tf.keras.Model(inputs, outputs, name="model_3_GRU")

In [None]:
# Using Bi-Directional LSTM layer, compile and fit steps are the same
# Set random seed and create embedding layer (new embedding layer for each model)
tf.random.set_seed(42)
from tensorflow.keras import layers
model_4_embedding = layers.Embedding(input_dim=max_vocab_length,
                                     output_dim=128,
                                     embeddings_initializer="uniform",
                                     input_length=max_length,
                                     name="embedding_4")

# Build a Bidirectional RNN in TensorFlow
inputs = layers.Input(shape=(1,), dtype="string")
x = text_vectorizer(inputs)
x = model_4_embedding(x)
# x = layers.Bidirectional(layers.LSTM(64, return_sequences=True))(x) # stacking RNN layers requires return_sequences=True
x = layers.Bidirectional(layers.LSTM(64))(x) # bidirectional goes both ways so has double the parameters of a regular LSTM layer
outputs = layers.Dense(1, activation="sigmoid")(x)
model_4 = tf.keras.Model(inputs, outputs, name="model_4_Bidirectional")

##4.8
Use RNNS, LSTMs, GRUs and CNNs in models that work with text.

We talk about RNN and LSTM(which is a RNN) in the previous subsections, so let's focus on the use of CNN in case of text.

The main difference between using CNNs for images and sequences is the shape of the data. Images come in 2-dimensions (height x width) where as sequences are often 1-dimensional (a string of text).

So to use CNNs with sequences, we use a 1-dimensional convolution instead of a 2-dimensional convolution.

In [None]:
# Using Conv1D layer + MaxPool, compile and fit steps are the same
# Set random seed and create embedding layer (new embedding layer for each model)
tf.random.set_seed(42)
from tensorflow.keras import layers
model_5_embedding = layers.Embedding(input_dim=max_vocab_length,
                                     output_dim=128,
                                     embeddings_initializer="uniform",
                                     input_length=max_length,
                                     name="embedding_5")

# Create 1-dimensional convolutional layer to model sequences
from tensorflow.keras import layers
inputs = layers.Input(shape=(1,), dtype="string")
x = text_vectorizer(inputs)
x = model_5_embedding(x)
x = layers.Conv1D(filters=32, kernel_size=5, activation="relu")(x)
x = layers.GlobalMaxPool1D()(x)
# x = layers.Dense(64, activation="relu")(x) # optional dense layer
outputs = layers.Dense(1, activation="sigmoid")(x)
model_5 = tf.keras.Model(inputs, outputs, name="model_5_Conv1D")

##4.9
Train LSTMs on existing text to generate text (such as songs and poetry)

# Part 5: Time series, sequences and predictions
You need to understand how to solve time series and forecasting problems in TensorFlow.

##5.1
Train, tune and use time series, sequence and prediction models.


##5.2
Train models to predict values for both univariate and multivariate time series.


We saw the univariate omodel in other points, now we focus  on multivariate and even on the data preprocessing using pandas

In [None]:
# Block reward values
block_reward_1 = 50 # 3 January 2009 (2009-01-03) - this block reward isn't in our dataset (it starts from 01 October 2013)
block_reward_2 = 25 # 28 November 2012 
block_reward_3 = 12.5 # 9 July 2016
block_reward_4 = 6.25 # 11 May 2020

# Block reward dates (datetime form of the above date stamps)
block_reward_2_datetime = np.datetime64("2012-11-28")
block_reward_3_datetime = np.datetime64("2016-07-09")
block_reward_4_datetime = np.datetime64("2020-05-11")
# Get date indexes for when to add in different block dates
block_reward_2_days = (block_reward_3_datetime - bitcoin_prices.index[0]).days
block_reward_3_days = (block_reward_4_datetime - bitcoin_prices.index[0]).days
block_reward_2_days, block_reward_3_days
# Add block_reward column
bitcoin_prices_block = bitcoin_prices.copy()
bitcoin_prices_block["block_reward"] = None

# Set values of block_reward column (it's the last column hence -1 indexing on iloc)
bitcoin_prices_block.iloc[:block_reward_2_days, -1] = block_reward_2
bitcoin_prices_block.iloc[block_reward_2_days:block_reward_3_days, -1] = block_reward_3
bitcoin_prices_block.iloc[block_reward_3_days:, -1] = block_reward_4
bitcoin_prices_block.head()
# Setup dataset hyperparameters
HORIZON = 1
WINDOW_SIZE = 7
# Make a copy of the Bitcoin historical data with block reward feature
bitcoin_prices_windowed = bitcoin_prices_block.copy()

# Add windowed columns
for i in range(WINDOW_SIZE): # Shift values for each step in WINDOW_SIZE
  bitcoin_prices_windowed[f"Price+{i+1}"] = bitcoin_prices_windowed["Price"].shift(periods=i+1)
bitcoin_prices_windowed.head(10)

# Let's create X & y, remove the NaN's and convert to float32 to prevent TensorFlow errors 
X = bitcoin_prices_windowed.dropna().drop("Price", axis=1).astype(np.float32) 
y = bitcoin_prices_windowed.dropna()["Price"].astype(np.float32)
X.head()

# Make train and test sets
split_size = int(len(X) * 0.8)
X_train, y_train = X[:split_size], y[:split_size]
X_test, y_test = X[split_size:], y[split_size:]
len(X_train), len(y_train), len(X_test), len(y_test)

In [None]:
tf.random.set_seed(42)

# Make multivariate time series model
model_6 = tf.keras.Sequential([
  layers.Dense(128, activation="relu"),
  # layers.Dense(128, activation="relu"), # adding an extra layer here should lead to beating the naive model
  layers.Dense(HORIZON)
], name="model_6_dense_multivariate")

# Compile
model_6.compile(loss="mae",
                optimizer=tf.keras.optimizers.Adam())

# Fit
model_6.fit(X_train, y_train,
            epochs=100,
            batch_size=128,
            verbose=0, # only print 1 line per epoch
            validation_data=(X_test, y_test),
            callbacks=[create_model_checkpoint(model_name=model_6.name)])

##5.3
Prepare data for time series learning.

Before to split and windowing we have to import data and generate proper tensors(they are usually pd.dataframe, np.arrays or even list), this is quite simple because we often find csv data and python as a plenty of tools which can manage csv data, pd.read_csv

Before to split we can use windowing to make our dataset splitted into labels and features in way to have a dataset for training.

We need something like this

[0, 1, 2, 3, 4, 5, 6] -> [7]

[1, 2, 3, 4, 5, 6, 7] -> [8]

[2, 3, 4, 5, 6, 7, 8] -> [9]

In [None]:
# Create function to label windowed data
def get_labelled_windows(x, horizon=1):
  """
  Creates labels for windowed dataset.

  E.g. if horizon=1 (default)
  Input: [1, 2, 3, 4, 5, 6] -> Output: ([1, 2, 3, 4, 5], [6])
  """
  return x[:, :-horizon], x[:, -horizon:]

# Create function to view NumPy arrays as windows 
def make_windows(x, window_size=7, horizon=1):
  """
  Turns a 1D array into a 2D array of sequential windows of window_size.
  """
  # 1. Create a window of specific window_size (add the horizon on the end for later labelling)
  window_step = np.expand_dims(np.arange(window_size+horizon), axis=0)
  # print(f"Window step:\n {window_step}")

  # 2. Create a 2D array of multiple window steps (minus 1 to account for 0 indexing)
  window_indexes = window_step + np.expand_dims(np.arange(len(x)-(window_size+horizon-1)), axis=0).T # create 2D array of windows of size window_size
  # print(f"Window indexes:\n {window_indexes[:3], window_indexes[-3:], window_indexes.shape}")

  # 3. Index on the target array (time series) with 2D array of multiple window steps
  windowed_array = x[window_indexes]

  # 4. Get the labelled windows
  windows, labels = get_labelled_windows(windowed_array, horizon=horizon)

  return windows, labels

full_windows, full_labels = make_windows(prices, window_size=WINDOW_SIZE, horizon=HORIZON)
len(full_windows), len(full_labels)

Since in time serie order matters, it could be counter-productive to split data randomly, we will lose all the causal effects expressed by the time order, so we limit to take the first 80% of dataset as train set and the rest as test set

In [None]:
# Make the train/test splits
def make_train_test_splits(windows, labels, test_split=0.2):
  """
  Splits matching pairs of windows and labels into train and test splits.
  """
  split_size = int(len(windows) * (1-test_split)) # this will default to 80% train/20% test
  train_windows = windows[:split_size]
  train_labels = labels[:split_size]
  test_windows = windows[split_size:]
  test_labels = labels[split_size:]
  return train_windows, test_windows, train_labels, test_labels

train_windows, test_windows, train_labels, test_labels = make_train_test_splits(full_windows, full_labels)
len(train_windows), len(test_windows), len(train_labels), len(test_labels)

##5.4
Understand Mean Absolute Error (MAE) and how it can be used to evaluate accuracy of
quence models.


Here a list of most common metrics for errors in TS analysis:

Scale-Dependent Errors
1.  mean absolute error
2.  root mean square error

Percentage Errors
1.  mean absolute percentage error - to avoid if y = 0
2.  symmetric mean absolute percentage error

Scaled Errors
1.  mean absolute scaled error

##5.5
Use RNNs and CNNs for time series, sequence and forecasting models.


CNN you have to pay attention to the CNN input shape and maybe create a layer to transform input shape

In [None]:
# Before we pass our data to the Conv1D layer, we have to reshape it in order to make sure it works
x = tf.constant(train_windows[0])
expand_dims_layer = layers.Lambda(lambda x: tf.expand_dims(x, axis=1)) # add an extra dimension for timesteps
print(f"Original shape: {x.shape}") # (WINDOW_SIZE)
print(f"Expanded shape: {expand_dims_layer(x).shape}") # (WINDOW_SIZE, input_dim) 
print(f"Original values with expanded shape:\n {expand_dims_layer(x)}")

tf.random.set_seed(42)

# Create model
model_4 = tf.keras.Sequential([
  # Create Lambda layer to reshape inputs, without this layer, the model will error
  layers.Lambda(lambda x: tf.expand_dims(x, axis=1)), # resize the inputs to adjust for window size / Conv1D 3D input requirements
  layers.Conv1D(filters=128, kernel_size=5, padding="causal", activation="relu"),
  layers.Dense(HORIZON)
], name="model_4_conv1D")

# Compile model
model_4.compile(loss="mae",
                optimizer=tf.keras.optimizers.Adam())

# Fit model
model_4.fit(train_windows,
            train_labels,
            batch_size=128, 
            epochs=100,
            verbose=0,
            validation_data=(test_windows, test_labels),
            callbacks=[create_model_checkpoint(model_name=model_4.name)])

LSTM, as above pay attention to the input shape

In [None]:
tf.random.set_seed(42)

# Let's build an LSTM model with the Functional API
inputs = layers.Input(shape=(WINDOW_SIZE))
x = layers.Lambda(lambda x: tf.expand_dims(x, axis=1))(inputs) # expand input dimension to be compatible with LSTM
# print(x.shape)
# x = layers.LSTM(128, activation="relu", return_sequences=True)(x) # this layer will error if the inputs are not the right shape
x = layers.LSTM(128, activation="relu")(x) # using the tanh loss function results in a massive error
# print(x.shape)
# Add another optional dense layer (you could add more of these to see if they improve model performance)
# x = layers.Dense(32, activation="relu")(x)
output = layers.Dense(HORIZON)(x)
model_5 = tf.keras.Model(inputs=inputs, outputs=output, name="model_5_lstm")

# Compile model
model_5.compile(loss="mae",
                optimizer=tf.keras.optimizers.Adam())

# Seems when saving the model several warnings are appearing: https://github.com/tensorflow/tensorflow/issues/47554 
model_5.fit(train_windows,
            train_labels,
            epochs=100,
            verbose=0,
            batch_size=128,
            validation_data=(test_windows, test_labels),
            callbacks=[create_model_checkpoint(model_name=model_5.name)])

##5.6
Identify when to use trailing versus centred windows.


Trailing MA is the ususal one which evaluated the MA value of the windows using the n steps back 

Centered MA uses n/2-1 steps back, the cuttent value and n/2-1 steps ahead

##5.7
Use TensorFlow for forecasting.


In [None]:
def make_preds(model, input_data):
  """
  Uses model to make predictions on input_data.

  Parameters
  ----------
  model: trained model 
  input_data: windowed input data (same kind of data model was trained on)

  Returns model predictions on input_data.
  """
  forecast = model.predict(input_data)
  return tf.squeeze(forecast) # return 1D array of predictions

##5.8
Prepare features and labels.


You can use padas DF or even list, but when you are facing problems that requires complex architecture the best tool is tf.data.Dataset.from_tensor_slices for composing heterogeneous train dataset and labels then combine them using tf.data.Dataset.zip

##5.9
Identify and compensate for sequence bias.


##5.10
Adjust the learning rate dynamically in time series, sequence and prediction models.

See the callback tf.keras.callbacks.ReduceLROnPlateau in the  previous point