# Best Practices for real World

## Getting the most out of your models

Blindly trying out different architecture configurations works well enough if you just need something that works okay. In this section, we'll go beyond "works okay" to "works great and wins ml competitions" via a set of must-know techniques for building state-of-the-art deep learning models.

### Hyperparams optimization

When building a DL model, you have to make many seemingly arbitrary decisions. These architecture-level parameters are called hyperparameters to distinguish them from the parameters of a model, which are trained via backpropagation.\
\
In practice, experienced ML engineers and researchers build intuition over time as to what works and what does not when it comes to these choices- they develop hyperparameters-tuning skills. But there are no formal rules. If you want to get the very limit of what can be achieved, you can't be content with such arbitrary choices. Your initial decisions are almost always suboptimal, even if you have good intuition. You can refine your choices by tweaking them by hand and retraining the model repeatedly. But it shouldn't be your job as a human to fiddle with hyperparams all day.\
Thus you need to explore the space of possible decisions automatically, systematically, in principled way. You need to search the architecture space and find the best performing architectures empirically.\
The process of optimizing hyperparameters typically looks like this:
1. Choose a set of hyperparams.
2. Build the corresponding model.
3. Fit it to your training data, and measure performance on the validation data.
4. Choose the next set of hyperparams to try
5. Repeat
6. Eventually, measure performance on your test data.

The key to this process is the algorithm that analyzes the relationship between validatio performance and various hyperparameter values to choose the next set of hyperparameters to evaluate. Many different techniques are possible: Bayesian optimization, generic algorithms, simple random search, and so on.\
Training the weights of a model is relatively easy: you compute a loss function on a mini-batch of data and then use backpropagation to move the weights in the right direction. Updating hyperparameters, on the other hand, presents unique challenges.
Consider these points:
* The hyperparam space is typically made up of discrete decisions and thus is not continuous or differentiable. Hence, you typically cannot do gradient descent in hyperparameters space. Instead, you must rely on gradient-free optimization techniques, which naturally are far less efficient than gradient descent.
* Computing the feedback signal of this optimization process can be extremely expensive: it requires creating and training a new model from scratch on your dataset.
* The feedback signal may be noisy: if a training run performs 0.2\% better, is that because of better model configuration, or because you got lucky with the initial weight values?

#### Using KerasTuner


In [1]:
!pip install keras-tuner -q



KerasTuner lets you replace hard-coded hyperparams values, such as units=32, with a range of possible choices, such as Int(name="units",min_value=16,max_values=64, step=16). This set of choices in a given model is called the search space of the hyperparameter tuning process.\
To specify a search space, define a model-building function. It takes an hp argument, from which you can sample hyperparameter ranges, and it returns a compiled Keras model.

In [2]:
from tensorflow import keras
from tensorflow.keras import layers

def build_model(hp):
    units = hp.Int(name="units", min_value=16, max_value=64, step=16)
    model = keras.Sequential([
        layers.Dense(units, activation="relu"),
        layers.Dense(10, activation="softmax")
    ])
    optimizer = hp.Choice(name="optimizer", values=["rmsprop", "adam"])
    model.compile(optimizer=optimizer,
                  loss="sparse_categorical_crossentropy",
                  metrics=["accuracy"])
    return model

More modular and configurable approach to model-building, you can also subclass HyperModel class and define a build method.

In [4]:
import keras_tuner as kt

class SimpleMLP(kt.HyperModel):
    def __init__(self, num_classes):
        self.num_classes = num_classes
    
    def build(self, hp):
        units = hp.Int(name="units", min_value=16, max_value=64, step=16)
        model = keras.Sequential([
            layers.Dense(units, activation="relu"),
            layers.Dense(self.num_classes, activation="softmax")
        ])
        optimzer = hp.Choice(name="optimizer", values=["rmsprop", "adam"])
        model.compile(optimizer=optimzer,loss="sparse_categorical_crossentropy",metrics=["accuracy"])
        return model

The next step is to define a "tuner". Schematically, you can think of a tuner as a for loop that will:
* Pick a set of hyperparams values
* Call the model-building function with these values to create a model
* Train the model and record its metrics

KerasTuner has several built-in tuners available-RandomSearch, BayesianOptimization, and Hyperband. Let's try BayesianOptimization, a tuner that attempts to make smart predictions for which new hyperparameter values are likely to perform best given the outcomes of previous choices:

In [6]:
tuner=kt.BayesianOptimization(build_model,objective="val_accuracy",max_trials=100,executions_per_trial=2, directory="mnist_kt_test",overwrite=True)

In [7]:
tuner.search_space_summary()

Search space summary
Default search space size: 2
units (Int)
{'default': None, 'conditions': [], 'min_value': 16, 'max_value': 64, 'step': 16, 'sampling': 'linear'}
optimizer (Choice)
{'default': 'rmsprop', 'conditions': [], 'values': ['rmsprop', 'adam'], 'ordered': False}


In [8]:
(x_train,y_train), (x_test,y_test) = keras.datasets.mnist.load_data()
x_train = x_train.reshape((-1, 28*28)).astype("float32")/255
x_test = x_test.reshape((-1, 28*28)).astype("float32")/255
x_train_full = x_train[:]
y_train_full = y_train[:]
num_val_samples=10000
x_train,x_val = x_train[:-num_val_samples], x_train[-num_val_samples:]
y_train,y_val = y_train[:-num_val_samples], y_train[-num_val_samples:]
callbacks = [
    keras.callbacks.EarlyStopping(monitor="val_loss", patience=5)
]
tuner.search(
    x_train,y_train,
    batch_size=128,
    validation_data=(x_val,y_val),
    callbacks=callbacks,
    verbose=2
)

Trial 100 Complete [00h 00m 34s]
val_accuracy: 0.934249997138977

Best val_accuracy So Far: 0.9381999969482422
Total elapsed time: 00h 48m 19s
INFO:tensorflow:Oracle triggered exit


In [9]:
#Querying the best hyperparameter configurations
top_n=4
best_hps = tuner.get_best_hyperparameters(top_n)

Before we can train on the full training data, there is one last parameter we need to settle: optimal number of epochs to train for. Typically, you will want to train the new models for longer than you did during the search: using aggressive patience value in the EarlyStopping callback saves time during search, but it may lead to underfit.

In [10]:
def get_best_epoch(hp):
    model = build_model(hp)
    callbacks=[
        keras.callbacks.EarlyStopping(
        monitor="val_loss", mode="min", patience=10
        )
    ]
    history = model.fit(x_train,y_train,
                        validation_data=(x_val,y_val),
                        epochs=100,
                        batch_size=128,
                        callbacks=callbacks)
    val_loss_per_epoch = history.history["val_loss"]
    best_epoch = val_loss_per_epoch.index(min(val_loss_per_epoch)) + 1
    print(f"Best epoch: {best_epoch}")
    return best_epoch

In [11]:
def get_best_trained_model(hp):
    best_epoch = get_best_epoch(hp)
    model.fit(
        x_train_full,y_train_full,
        batch_size=128, epochs=int(best_epoch*1.2)
    )

    return model

best_models = []
for hp in best_hps:
    model = get_best_trained_model(hp)
    model.evaluate(x_test, y_test)
    best_models.append(model)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Best epoch: 18


NameError: name 'model' is not defined

In [12]:
best_models = tuner.get_best_models(hp)

TypeError: '<' not supported between instances of 'int' and 'HyperParameters'