#
### 1. Is it OK to initialize all the values to the same value as lon as that value is selected randomly using He initialization?
No, because backpropagation algorithm won't know how each value affects the predicted output

#### 2. Is it OK to initialize the bias terms to 0?

It is perfectly fine. It doesn't make much difference to initialize them as other weights or to zeros

#### 3. Name three advantages of the SELU activation function over ReLU



#### 4. In which cases would you want to use each of the following activation functions: SELU, leaky RELU (and its variants), ReLU, tanh, logistic, softmax?

- SELU - if you want self-normalizing network which solves the vanishing/exmploding gradients problem. There are few conditions for self normalization to happen:
    - the input features must be standarized
    - every huden layer's weights must be initialized with LeCun normal initialization
    - the network's architecture musi be sequential
    - all layers are dense
- leaky RELU (and its variants - always if SELU is not appropriate
- ReLU when you can benefit from turned off neurons
- tanh - if yoou want output to be between -1 and 1 (might be needed in recurent nets)
- logistic - for output layers when you need estimated probability, but its rarely used in hidden layers (there are exceptions - for example, for the coding layer of variational autoencoders)
- softmax - is useful in the output layer to output probabilities for mutually exclusive classes, bit it is rarely used in hidden layers

#### 5. What may happen if you set the momentum hyperparameter too close to 1 (e.g. 0.99999) when using ans SGD optimizer?
Gradient velocity will be very large and it mights overshoot the global optimum

#### 6.  Name three ways you can produce sparse model.
- use $l_1$ regularization
- train model normally then zero out tiny weights
- use TensorFLow Model Optimization Toolkit

#### 7. Does dropout slow down training? Does it slow down inference? What about MC Dropout?
Dropout slow down training, but doesn't slow down inference. MC Dropout slow down both

#### 8. Practice training deep neural network on the CIFAR10 image dataset:
- Build a DNN with 20 hidden layers of 100 neurons each (that's too many, but it's the point of this exercise). Use He initialization and the ELU activation function.

In [1]:
import os
import numpy as np
import tensorflow as tf
from tensorflow import keras

%load_ext tensorboard


In [2]:
tf.test.is_gpu_available()

Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.


True

In [3]:
os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'

In [4]:
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.cifar10.load_data()
X_train = X_train_full[:40000]
y_train = y_train_full[:40000]
X_val = X_train_full[40000:]
y_val = y_train_full[40000:]

In [5]:
input_tensor = keras.layers.Input(shape=X_train.shape[1:])
x = keras.layers.Flatten()(input_tensor)
for _ in range(20):
    x = keras.layers.Dense(100, activation=keras.activations.elu, kernel_initializer=keras.initializers.he_normal())(x)
output_ = keras.layers.Dense(10, activation=keras.activations.softmax)(x)
model = keras.Model(inputs=[input_tensor], outputs=[output_])

optimizer = keras.optimizers.Nadam(lr=1e-3)

model.compile(optimizer=optimizer, loss='sparse_categorical_crossentropy')

In [6]:
early_stopping_cb = keras.callbacks.EarlyStopping(patience=20)
model_checkpoint_cb = keras.callbacks.ModelCheckpoint('my_cifar10_model.h5', save_best_only=True)
run_index = 4 # increment every time you train the model
run_logdir = os.path.join(os.curdir, 'my_cifar10_logs', "run_{:03d}".format(run_index))
tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)
callbacks = [early_stopping_cb, model_checkpoint_cb, tensorboard_cb]

In [7]:
%tensorboard --logdir=./my_cifar10_logs --port=6006


Launching TensorBoard...

In [8]:
# model.fit(X_train, y_train, epochs=10, validation_data=(X_val, y_val), callbacks=callbacks)

In [9]:
K = keras.backend

class ExponentialLearningRate(keras.callbacks.Callback):
    def __init__(self, factor):
        self.factor = factor
        self.rates = []
        self.losses = []
    def on_batch_end(self, batch, logs):
        self.rates.append(K.get_value(self.model.optimizer.lr))
        self.losses.append(logs["loss"])
        K.set_value(self.model.optimizer.lr, self.model.optimizer.lr * self.factor)

def find_learning_rate(model, X, y, epochs=1, batch_size=32, min_rate=10**-5, max_rate=10):
    init_weights = model.get_weights()
    iterations = len(X) // batch_size * epochs
    factor = np.exp(np.log(max_rate / min_rate) / iterations)
    init_lr = K.get_value(model.optimizer.lr)
    K.set_value(model.optimizer.lr, min_rate)
    exp_lr = ExponentialLearningRate(factor)
    history = model.fit(X, y, epochs=epochs, batch_size=batch_size, callbacks=[exp_lr])
    K.set_value(model.optimizer.lr, init_lr)
    model.set_weights(init_weights)
    return exp_lr.rates, exp_lr.losses

In [10]:
keras.backend.clear_session()
tf.random.set_seed(42)
np.random.seed(42)

model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[32, 32, 3]))
model.add(keras.layers.BatchNormalization())
for _ in range(20):
    model.add(keras.layers.Dense(100, kernel_initializer="he_normal"))
    model.add(keras.layers.BatchNormalization())
    model.add(keras.layers.Activation("elu"))
model.add(keras.layers.Dense(10, activation="softmax"))

optimizer = keras.optimizers.Nadam(lr=5e-4)
model.compile(loss="sparse_categorical_crossentropy",
              optimizer=optimizer,
              metrics=["accuracy"])

early_stopping_cb = keras.callbacks.EarlyStopping(patience=20)
model_checkpoint_cb = keras.callbacks.ModelCheckpoint("my_cifar10_bn_model.h5", save_best_only=True)
run_index = 2 # increment every time you train the model
run_logdir = os.path.join(os.curdir, "my_cifar10_logs", "run_bn_{:03d}".format(run_index))
tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)
callbacks = [early_stopping_cb, model_checkpoint_cb, tensorboard_cb]

model.fit(X_train, y_train, epochs=20,
          validation_data=(X_val, y_val),
          callbacks=callbacks)

model = keras.models.load_model("my_cifar10_bn_model.h5")
model.evaluate(X_val, y_val)

Train on 40000 samples, validate on 10000 samples
Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20
Epoch 11/20
Epoch 12/20
Epoch 13/20
Epoch 14/20
Epoch 15/20
Epoch 16/20
Epoch 17/20
Epoch 18/20
Epoch 19/20
Epoch 20/20


AttributeError: 'str' object has no attribute 'decode'

In [11]:
keras.backend.clear_session()
tf.random.set_seed(42)
np.random.seed(42)

model = keras.models.Sequential()
model.add(keras.layers.Flatten(input_shape=[32, 32, 3]))
for _ in range(20):
    model.add(keras.layers.Dense(100,
                                 kernel_initializer="lecun_normal",
                                 activation="selu"))
model.add(keras.layers.Dense(10, activation="softmax"))

optimizer = keras.optimizers.Nadam(lr=7e-4)
model.compile(loss="sparse_categorical_crossentropy",
              optimizer=optimizer,
              metrics=["accuracy"])

early_stopping_cb = keras.callbacks.EarlyStopping(patience=20)
model_checkpoint_cb = keras.callbacks.ModelCheckpoint("my_cifar10_selu_model.h5", save_best_only=True)
run_index = 1 # increment every time you train the model
run_logdir = os.path.join(os.curdir, "my_cifar10_logs", "run_selu_{:03d}".format(run_index))
tensorboard_cb = keras.callbacks.TensorBoard(run_logdir)
callbacks = [early_stopping_cb, model_checkpoint_cb, tensorboard_cb]

X_means = X_train.mean(axis=0)
X_stds = X_train.std(axis=0)
X_train_scaled = (X_train - X_means) / X_stds
X_valid_scaled = (X_val - X_means) / X_stds
X_test_scaled = (X_test - X_means) / X_stds

model.fit(X_train_scaled, y_train, epochs=40,
          validation_data=(X_valid_scaled, y_val),
          callbacks=callbacks)

Train on 40000 samples, validate on 10000 samples
Epoch 1/40
Epoch 2/40
Epoch 3/40
Epoch 4/40
Epoch 5/40
Epoch 6/40
Epoch 7/40
Epoch 8/40
Epoch 9/40
Epoch 10/40
Epoch 11/40
Epoch 12/40
Epoch 13/40
Epoch 14/40
Epoch 15/40
Epoch 16/40
Epoch 17/40
Epoch 18/40
Epoch 19/40
Epoch 20/40
Epoch 21/40
Epoch 22/40
Epoch 23/40
Epoch 24/40
Epoch 25/40
Epoch 26/40
Epoch 27/40
Epoch 28/40
Epoch 29/40
Epoch 30/40


<tensorflow.python.keras.callbacks.History at 0x7f1b703d7750>