# Exercises

### 4. Fiddle with the activation functions. Try applying sigmoid transformation to both layers. The sigmoid activation is given by the method: tf.nn.sigmoid()

**Solution**

Find the part where we stack layers (Sequential()).

Adjust the activations from 'relu' to 'sigmoid'.
    
Generally, we should reach a <b>worse</b> solution. That is because relu 'cleans' the noise in the data (think about it - if a value is negative, relu filters it out, while if it is positive, it takes it into account). For the MNIST dataset, we care only about the intensely black and white parts in the images of the digits, so such filtering proves beneficial.

The sigmoid does not filter the signals as well as relu, but still reaches a respectable result (around 95%).

**Try using softmax activations for all layers. How does the result change? Can you explain why that happens?**

# Deep Neural Network for MNIST Classification

We'll apply all the knowledge from the lectures in this section to write a deep neural network. The problem we've chosen is referred to as the "Hello World" of deep learning because for most students it is the first deep learning algorithm they see.

The dataset provides 70,000 images (28x28 pixels) of handwritten digits (1 digit per image).

The goal is to write an algorithm that detects which digit is written. Since there are only 10 digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), this is a classification problem with 10 classes.

Our goal would be to build a neural network with 2 hidden layers.

## Import relevant librabries

In [1]:
import numpy as np
import tensorflow as tf

import tensorflow_datasets as tfds

In [2]:
import h5py
print(h5py.__version__);

3.4.0


## Import data from tensorflow dataset

<b>tfsd.load(name, with_info, as_supervised)</b> loads a dataset from Tensorflow datasets

-> <b>as_supervised = True</b>, loads the data in 2-tuple structure [input, target], alternatively, as_supervised=False, would return a dictionary


-> <b>with_info = True</b>, will also provide us with a tuple containing information about the version, features, number of samples

we will use this information a bit below and we will store it in mnist_info


In [3]:
mnist_dataset, mnist_info = tfds.load(name='mnist', with_info=True, as_supervised=True)

## Split into training and testing dataset

<b>tf.cast(x, dtype) casts (converts) a variable into a given data type</b>

<b> Standardization: </b>Normally, we'd like to scale our data in some ways to make the result more numerically stable (e.g. inputs between 0 and 1).

Each pixel contain a number from 0 -> 255, representing the 256 levels of shades of gray. Therefore, we divide each pixel by 255 to get desired result (between 0 and 1)

In [4]:
# once we have loaded the dataset, we can easily extract the training and testing dataset with the built references
mnist_train, mnist_test = mnist_dataset['train'], mnist_dataset['test']

# by default, TF has training and testing datasets, but no validation sets
# thus we must split it on our own

# we start by defining the number of validation samples as a % of the train samples
# this is also where we make use of mnist_info (we don't have to count the observations)
num_validation_samples = 0.1 * mnist_info.splits['train'].num_examples
# let's cast this number to an integer, as a float may cause an error along the way
num_validation_samples = tf.cast(num_validation_samples, tf.int64)

# let's also store the number of test samples in a dedicated variable (instead of using the mnist_info one)
num_test_samples = mnist_info.splits['test'].num_examples
# once more, we'd prefer an integer (rather than the default float)
num_test_samples = tf.cast(num_test_samples, tf.int64)

## Scale the data (both train and test)

In [5]:
def scale(image, label):
    image = tf.cast(image, tf.float32)
    image /= 255.
    return image, label

# the method .map() allows us to apply a custom transformation to a given dataset
# we have already decided that we will get the validation data from mnist_train, so 
scaled_train_and_validation_data = mnist_train.map(scale)


test_data = mnist_test.map(scale)

## Shuffle the data randomly

<b>take(number):</b> extracting the number of samples from the train and validation datasets 

<b>skip(numer):</b> skipping the number of samples <b>(skip the validation data)</b> from the train and validation datasets 

In [6]:
BUFFER_SIZE = 10000
# this BUFFER_SIZE parameter is here for cases when we're dealing with enormous datasets
# then we can't shuffle the whole dataset in one go because we can't fit it all in memory
# so instead TF only stores BUFFER_SIZE samples in memory at a time and shuffles them
# if BUFFER_SIZE=1 => no shuffling will actually happen
# if BUFFER_SIZE >= num samples => shuffling is uniform
# BUFFER_SIZE in between - a computational optimization to approximate uniform shuffling

shuffled_train_and_validation_data = scaled_train_and_validation_data.shuffle(BUFFER_SIZE)

# once we have scaled and shuffled the data, we can proceed to actually extracting the train and validation
# our validation data would be equal to 10% of the training set, which we've already calculated
validation_data = shuffled_train_and_validation_data.take(num_validation_samples)
train_data = shuffled_train_and_validation_data.skip(num_validation_samples)


## Set batch size and prepare our data for batching

batch_size = 1 = Stochastic gradient descent (SGD)

batch_size = number of samples = single batch gradient descent 

1 < batch size < number of samples = mini-batch gradient descent 

<b>dataset.batch(batch_size): </b>a method that combines the consecutive elements of a dataset into batches

<b>iter(): </b>creates an object which can be iterated one element at a time (e.g. in a for loop or while loop)

<b>next(): </b>loads the next element of an iterable object -> wrap <b>iter()</b> inside a <b>next()</b>

In [7]:
BATCH_SIZE = 100

train_data = train_data.batch(BATCH_SIZE)

validation_data = validation_data.batch(num_validation_samples)

test_data = test_data.batch(num_test_samples)

# takes next batch (it is the only batch)
# because as_supervized=True, we've got a 2-tuple structure
validation_inputs, validation_targets = next(iter(validation_data))

## Model

### Outline the model

<b>tf.keras.Sequential()</b>: function that is laying down the model (used to 'stack layers)

<b>tf.keras.layers.Flatten(original shape)</b>: transforms (flattens) a tensor into a vector  

<b>tf.keras.layers.Dense(units,activation)</b> implements the operation: <b>output = activation(dot(input, kernel) + bias)</b> where <b>activation</b> is the <b>activation function</b> passed as the activation argument, <b>kernel</b> is a weights matrix created by the layer, and <b>bias</b> is a bias vector created by the layer, and <b>units</b> is positive integer, dimensionality of the output space.

In [8]:
input_size = 784
output_size = 10
# Use same hidden layer size for both hidden layers. Not a necessity.
hidden_layer_size = 50
    
# define how the model will look like
model = tf.keras.Sequential([
    
    # the first layer (the input layer)
    # each observation is 28x28x1 pixels, therefore it is a tensor of rank 3
    # since we don't know CNNs yet, we don't know how to feed such input into our net, so we must flatten the images
    # there is a convenient method 'Flatten' that simply takes our 28x28x1 tensor and orders it into a (None,) 
    # or (28x28x1,) = (784,) vector
    # this allows us to actually create a feed forward neural network
    tf.keras.layers.Flatten(input_shape=(28, 28, 1)), # input layer
    
    # tf.keras.layers.Dense is basically implementing: output = activation(dot(input, weight) + bias)
    # it takes several arguments, but the most important ones for us are the hidden_layer_size and the activation function
    tf.keras.layers.Dense(hidden_layer_size, activation='sigmoid'), # 1st hidden layer
    tf.keras.layers.Dense(hidden_layer_size, activation='sigmoid'), # 2nd hidden layer
    
    # the final layer is no different, we just make sure to activate it with softmax
    tf.keras.layers.Dense(output_size, activation='softmax') # output layer
])

## Choose the optimizer and the loss function

1. Data (done)

2. Model (done)

3. Objective function -> current

4. Optimization algorithm

There are 3 built-in variations of across entropy-loss

<b>binary_crossentropy</b>: refers to the case where we've got <b>binary encoding</b>

<b>categorical_crossentropy</b>: expects that you've one-hot encoded the targets

<b>sparse_categorical_crossentropy</b>: applies one-hot encoding 


<b>model.compile(optimizer, loss, metrics)</b> configures the model for training

In [9]:
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

## Training

https://www.tensorflow.org/api_docs/python/tf/keras/Model#fit



## What happens inside an epoch?

1. At the beginning of each epoch, the training loss will be set to 0

2. the algorithm will iterate over a preset number of batches, all from train_data

3. The weights and biases will be updated as many time as there are batches

4. We will get a value for a loss function, indicating how the training is going

5. We will also see a training accuracy 

6. At the end of epoch, the algorithm will foward propagate the whole validation set.

*When we reach the maximum number of epochs the training will be over

In [10]:
NUM_EPOCHS = 5

model.fit(train_data, epochs=NUM_EPOCHS, validation_data=(validation_inputs, validation_targets), verbose =2)

Epoch 1/5




540/540 - 4s - loss: 1.0223 - accuracy: 0.7726 - val_loss: 0.4489 - val_accuracy: 0.8942
Epoch 2/5
540/540 - 3s - loss: 0.3381 - accuracy: 0.9115 - val_loss: 0.2922 - val_accuracy: 0.9212
Epoch 3/5
540/540 - 3s - loss: 0.2498 - accuracy: 0.9289 - val_loss: 0.2411 - val_accuracy: 0.9338
Epoch 4/5
540/540 - 3s - loss: 0.2036 - accuracy: 0.9417 - val_loss: 0.2044 - val_accuracy: 0.9417
Epoch 5/5
540/540 - 3s - loss: 0.1729 - accuracy: 0.9501 - val_loss: 0.1781 - val_accuracy: 0.9505


<tensorflow.python.keras.callbacks.History at 0x21205db89d0>

## Test the model

In [11]:
test_loss, test_accuracy = model.evaluate(test_data)



In [12]:
# We can apply some nice formatting if we want to
print('Test loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))

Test loss: 0.17. Test accuracy: 94.89%
