# Best Practices for real World

## Getting the most out of your models

Blindly trying out different architecture configurations works well enough if you just need something that works okay. In this section, we'll go beyond "works okay" to "works great and wins ml competitions" via a set of must-know techniques for building state-of-the-art deep learning models.

### Hyperparams optimization

When building a DL model, you have to make many seemingly arbitrary decisions. These architecture-level parameters are called hyperparameters to distinguish them from the parameters of a model, which are trained via backpropagation.\
\
In practice, experienced ML engineers and researchers build intuition over time as to what works and what does not when it comes to these choices- they develop hyperparameters-tuning skills. But there are no formal rules. If you want to get the very limit of what can be achieved, you can't be content with such arbitrary choices. Your initial decisions are almost always suboptimal, even if you have good intuition. You can refine your choices by tweaking them by hand and retraining the model repeatedly. But it shouldn't be your job as a human to fiddle with hyperparams all day.\
Thus you need to explore the space of possible decisions automatically, systematically, in principled way. You need to search the architecture space and find the best performing architectures empirically.\
The process of optimizing hyperparameters typically looks like this:
1. Choose a set of hyperparams.
2. Build the corresponding model.
3. Fit it to your training data, and measure performance on the validation data.
4. Choose the next set of hyperparams to try
5. Repeat
6. Eventually, measure performance on your test data.

The key to this process is the algorithm that analyzes the relationship between validatio performance and various hyperparameter values to choose the next set of hyperparameters to evaluate. Many different techniques are possible: Bayesian optimization, generic algorithms, simple random search, and so on.\
Training the weights of a model is relatively easy: you compute a loss function on a mini-batch of data and then use backpropagation to move the weights in the right direction. Updating hyperparameters, on the other hand, presents unique challenges.
Consider these points:
* The hyperparam space is typically made up of discrete decisions and thus is not continuous or differentiable. Hence, you typically cannot do gradient descent in hyperparameters space. Instead, you must rely on gradient-free optimization techniques, which naturally are far less efficient than gradient descent.
* Computing the feedback signal of this optimization process can be extremely expensive: it requires creating and training a new model from scratch on your dataset.
* The feedback signal may be noisy: if a training run performs 0.2\% better, is that because of better model configuration, or because you got lucky with the initial weight values?

#### Using KerasTuner


In [1]:
!pip install keras-tuner -q

KerasTuner lets you replace hard-coded hyperparams values, such as units=32, with a range of possible choices, such as Int(name="units",min_value=16,max_values=64, step=16). This set of choices in a given model is called the search space of the hyperparameter tuning process.\
To specify a search space, define a model-building function. It takes an hp argument, from which you can sample hyperparameter ranges, and it returns a compiled Keras model.

In [1]:
from tensorflow import keras
from tensorflow.keras import layers

def build_model(hp):
    units = hp.Int(name="units", min_value=16, max_value=64, step=16)
    model = keras.Sequential([
        layers.Dense(units, activation="relu"),
        layers.Dense(10, activation="softmax")
    ])
    optimizer = hp.Choice(name="optimizer", values=["rmsprop", "adam"])
    model.compile(optimizer=optimizer,
                  loss="sparse_categorical_crossentropy",
                  metrics=["accuracy"])
    return model

2023-07-25 09:33:00.385201: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


More modular and configurable approach to model-building, you can also subclass HyperModel class and define a build method.

In [2]:
import keras_tuner as kt

class SimpleMLP(kt.HyperModel):
    def __init__(self, num_classes):
        self.num_classes = num_classes
    
    def build(self, hp):
        units = hp.Int(name="units", min_value=16, max_value=64, step=16)
        model = keras.Sequential([
            layers.Dense(units, activation="relu"),
            layers.Dense(self.num_classes, activation="softmax")
        ])
        optimzer = hp.Choice(name="optimizer", values=["rmsprop", "adam"])
        model.compile(optimizer=optimzer,loss="sparse_categorical_crossentropy",metrics=["accuracy"])
        return model

The next step is to define a "tuner". Schematically, you can think of a tuner as a for loop that will:
* Pick a set of hyperparams values
* Call the model-building function with these values to create a model
* Train the model and record its metrics

KerasTuner has several built-in tuners available-RandomSearch, BayesianOptimization, and Hyperband. Let's try BayesianOptimization, a tuner that attempts to make smart predictions for which new hyperparameter values are likely to perform best given the outcomes of previous choices:

In [3]:
tuner=kt.BayesianOptimization(build_model,objective="val_accuracy",max_trials=100,executions_per_trial=2, directory="mnist_kt_test",overwrite=True)

2023-07-25 09:33:02.314172: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:840] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-07-25 09:33:02.431599: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:840] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-07-25 09:33:02.431667: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:840] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-07-25 09:33:02.432579: I tensorflow/compiler/xla/stream_executor/rocm/rocm_gpu_executor.cc:840] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2023-07-25 09:33:02.432652: I tensorflow/compiler/xla/stream_executo

In [4]:
tuner.search_space_summary()

Search space summary
Default search space size: 2
units (Int)
{'default': None, 'conditions': [], 'min_value': 16, 'max_value': 64, 'step': 16, 'sampling': 'linear'}
optimizer (Choice)
{'default': 'rmsprop', 'conditions': [], 'values': ['rmsprop', 'adam'], 'ordered': False}


In [5]:
(x_train,y_train), (x_test,y_test) = keras.datasets.mnist.load_data()
x_train = x_train.reshape((-1, 28*28)).astype("float32")/255
x_test = x_test.reshape((-1, 28*28)).astype("float32")/255
x_train_full = x_train[:]
y_train_full = y_train[:]
num_val_samples=10000
x_train,x_val = x_train[:-num_val_samples], x_train[-num_val_samples:]
y_train,y_val = y_train[:-num_val_samples], y_train[-num_val_samples:]
callbacks = [
    keras.callbacks.EarlyStopping(monitor="val_loss", patience=5)
]
tuner.search(
    x_train,y_train,
    batch_size=128,
    validation_data=(x_val,y_val),
    callbacks=callbacks,
    verbose=2
)

Trial 100 Complete [00h 00m 03s]
val_accuracy: 0.9319999814033508

Best val_accuracy So Far: 0.9382999837398529
Total elapsed time: 00h 05m 36s
INFO:tensorflow:Oracle triggered exit


In [6]:
#Querying the best hyperparameter configurations
top_n=4
best_hps = tuner.get_best_hyperparameters(top_n)

Before we can train on the full training data, there is one last parameter we need to settle: optimal number of epochs to train for. Typically, you will want to train the new models for longer than you did during the search: using aggressive patience value in the EarlyStopping callback saves time during search, but it may lead to underfit.

In [7]:
def get_best_epoch(hp):
    model = build_model(hp)
    callbacks=[
        keras.callbacks.EarlyStopping(
        monitor="val_loss", mode="min", patience=10
        )
    ]
    history = model.fit(x_train,y_train,
                        validation_data=(x_val,y_val),
                        epochs=100,
                        batch_size=128,
                        callbacks=callbacks)
    val_loss_per_epoch = history.history["val_loss"]
    best_epoch = val_loss_per_epoch.index(min(val_loss_per_epoch)) + 1
    print(f"Best epoch: {best_epoch}")
    return best_epoch

In [8]:
def get_best_trained_model(hp):
    best_epoch = get_best_epoch(hp)
    model.fit(
        x_train_full,y_train_full,
        batch_size=128, epochs=int(best_epoch*1.2)
    )

    return model

best_models = []
for hp in best_hps:
    model = get_best_trained_model(hp)
    model.evaluate(x_test, y_test)
    best_models.append(model)

Epoch 1/100


2023-07-25 09:38:39.127860: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:39.134924: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:39.142437: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:39.187955: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:39.193056: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:39.196563: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:39.198921: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


 52/391 [==>...........................] - ETA: 0s - loss: 1.2248 - accuracy: 0.6705

2023-07-25 09:38:39.382910: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 2/100
  1/391 [..............................] - ETA: 0s - loss: 0.1790 - accuracy: 0.9531

2023-07-25 09:38:40.161187: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:40.166418: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:40.178051: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:40.182666: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:40.185726: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:40.188024: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:40.228846: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 3/100
  3/391 [..............................] - ETA: 15s - loss: 0.2792 - accuracy: 0.9427

2023-07-25 09:38:41.061634: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:41.066135: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:41.069735: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 4/100
 28/391 [=>............................] - ETA: 0s - loss: 0.1444 - accuracy: 0.9570

2023-07-25 09:38:41.995163: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:41.999851: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:42.003359: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 5/100
 30/391 [=>............................] - ETA: 0s - loss: 0.1203 - accuracy: 0.9654

2023-07-25 09:38:42.917295: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:42.922334: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:42.926090: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 6/100
 28/391 [=>............................] - ETA: 0s - loss: 0.1024 - accuracy: 0.9718

2023-07-25 09:38:43.767312: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:43.771718: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:43.774846: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 7/100
 28/391 [=>............................] - ETA: 0s - loss: 0.0862 - accuracy: 0.9738

2023-07-25 09:38:44.603526: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:44.608472: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:44.612192: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 8/100
 27/391 [=>............................] - ETA: 0s - loss: 0.0682 - accuracy: 0.9812

2023-07-25 09:38:45.465331: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:45.469579: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:45.472955: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 9/100
 27/391 [=>............................] - ETA: 0s - loss: 0.0652 - accuracy: 0.9821

2023-07-25 09:38:46.327152: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:46.331395: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:46.335329: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 10/100
 29/391 [=>............................] - ETA: 0s - loss: 0.0724 - accuracy: 0.9790

2023-07-25 09:38:47.234437: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:47.238463: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:47.241961: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 11/100
 29/391 [=>............................] - ETA: 0s - loss: 0.0479 - accuracy: 0.9873

2023-07-25 09:38:48.066217: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:48.070340: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:48.073804: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 12/100
 29/391 [=>............................] - ETA: 0s - loss: 0.0497 - accuracy: 0.9873

2023-07-25 09:38:48.918755: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:48.923263: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:48.926585: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 13/100
 29/391 [=>............................] - ETA: 0s - loss: 0.0405 - accuracy: 0.9884

2023-07-25 09:38:49.773970: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:49.777943: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:49.780301: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 14/100
 27/391 [=>............................] - ETA: 0s - loss: 0.0377 - accuracy: 0.9913

2023-07-25 09:38:50.647200: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:50.653224: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:50.655960: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 15/100
 27/391 [=>............................] - ETA: 0s - loss: 0.0327 - accuracy: 0.9919

2023-07-25 09:38:51.521829: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:51.525041: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:51.527450: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 16/100
 29/391 [=>............................] - ETA: 0s - loss: 0.0298 - accuracy: 0.9922

2023-07-25 09:38:52.367598: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:52.372211: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:52.375306: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 17/100
 27/391 [=>............................] - ETA: 0s - loss: 0.0246 - accuracy: 0.9939

2023-07-25 09:38:53.179605: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:53.184450: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:53.187766: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 18/100
 29/391 [=>............................] - ETA: 0s - loss: 0.0235 - accuracy: 0.9954

2023-07-25 09:38:54.044316: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:54.047692: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:54.050093: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 19/100
 28/391 [=>............................] - ETA: 0s - loss: 0.0194 - accuracy: 0.9964

2023-07-25 09:38:54.877552: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:54.882383: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:54.893267: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 20/100
 27/391 [=>............................] - ETA: 0s - loss: 0.0278 - accuracy: 0.9928

2023-07-25 09:38:55.736642: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:55.741015: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:55.743961: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 21/100
 29/391 [=>............................] - ETA: 0s - loss: 0.0158 - accuracy: 0.9965

2023-07-25 09:38:56.589463: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:56.592676: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:56.595079: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 22/100
 28/391 [=>............................] - ETA: 0s - loss: 0.0158 - accuracy: 0.9972

2023-07-25 09:38:57.427586: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:57.431832: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:57.435471: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 23/100
 27/391 [=>............................] - ETA: 0s - loss: 0.0120 - accuracy: 0.9980

2023-07-25 09:38:58.273337: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:58.277375: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:58.280881: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 24/100
 29/391 [=>............................] - ETA: 0s - loss: 0.0115 - accuracy: 0.9987

2023-07-25 09:38:59.144152: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:59.148057: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:59.151524: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Epoch 25/100
 28/391 [=>............................] - ETA: 0s - loss: 0.0117 - accuracy: 0.9989

2023-07-25 09:38:59.984311: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:59.989769: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:38:59.994038: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


Best epoch: 15


2023-07-25 09:39:00.833553: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:39:00.838182: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.
2023-07-25 09:39:00.841754: I tensorflow/core/common_runtime/gpu_fusion_pass.cc:508] ROCm Fusion is enabled.


NameError: name 'model' is not defined

In [9]:
best_models = tuner.get_best_models(hp)

TypeError: '<' not supported between instances of 'int' and 'HyperParameters'

### Model Ensembling

Another powerful technique for obtaining the best possible results on a task is model ensembling. Ensembling consists of pooling together the predictions of a set of different models to produce better predictions. If you look at ML competitions you will see that the winners use very large ensembles of models that beat any single model.\
Ensembling relies on the assumption that different well-performing models trained indepenently are likely to be good for different reasons: each model looks at slightly different aspects of the data to make its predictions, getting part of the "truth" but not all of it.\
Let's use classification as an example. The easiest way to pool the predictions of a set of classifiers is to average their predictions at inference time:

In [1]:
# preds_a = model_a.predict(x_val)
# preds_b = model_b.predict(x_val)
# preds_c = model_c.predict(x_val)
# preds_d = model_d.predict(x_val)
# final_preds = 0.25 * (preds_a + preds_b + preds_c + preds_d)

However this will work only if the classifiers are more or less equally good. A smarter way is to do a weighted average, where the weights are learned on the validation data-typically, the better classifiers are given a higher weight, adn the worse are given a lower weight. To search for a good set of ensembling weights, you can use random search or a simple optimization algorithm, such as Nelder-Mead:

In [2]:
# preds_a = model_a.predict(x_val)
# preds_b = model_b.predict(x_val)
# preds_c = model_c.predict(x_val)
# preds_d = model_d.predict(x_val)
# final_preds = 0.5 * preds_a + 0.25*preds_b + 0.1*preds_c + 0.15*preds_d

There are many variants you can do an average of an exponential of the predictions. In general, a simple weighted average with weights optimized on the validation data provides a very strong baseline.\
The key to making ensembling work is the diversity of the set of classifiers. Diversity is strength, it's what makes ensembling work. In ML terms, if all of your models are biased in the same way, your ensemble will retain this same bias. If your models are biased in different ways, the biases will cancel each other out, and the ensemble will be more robust and more accurate.\
For this reason, you should ensemble models that are as good as possible while being as different as possible. This typically means using very different architectures or even different brands of ML approaches. One thisng largely not worth doing is ensembling the same network trained several times independently, from different random initializations. If the only differente vetween your models is their random initialization and the order in which they were exposed to the training data, then your ensemble will be low-diversity and will provide only a tiny improvement over any single model.\
It's not so much how good your best model is; it's about the diversity of your set of candidate models.

## Scaling-up model training

Recall "loop of progress" concept, the quality of your ideas is a function of how many refinement cucles they've been through. As the speed at which you can iterate on an idea is a function of how fast you can set up an experiment, how fast you can run that experiment, and finally, how well yuo can analyze the results.\
As you develop your expertise with the Keras API, how fast you can code up your deep
learning experiments will cease to be the bottleneck of this progress cycle. The next
bottleneck will become the speed at which you can train your models. Fast training
infrastructure means that you can get your results back in 10–15 minutes, and hence,
that you can go through dozens of iterations every day. Faster training directly improves
the quality of your deep learning solutions.\
In this section, we see three ways to train model faster:
* Mixed-precision training, which you can use even with a single GPU
* Training on multiple GPUs
* Training on multiple TPUs

# Continues on the book