# Fire from forest

- Authors:
  - Axel Suárez Polo (@ggzor)
  - Sergio Daniel Cortez Chaves (@SerCor)

This notebook shows the training of a neural network to identify fire and smoke within sub-regions
of an image. Most of this notebook shows the process of adjusting the hyperparameters using an automatic
heuristic search (based on average precision and standard deviation).

The dataset was generated by @SerCor, using [tk-tagger](https://github.com/ggzor/tk-tagger), a 
tkinter python application to interactively tag sub-regions of an image; developed to ease the
dataset creation.

[scikit-learn](https://scikit-learn.org/stable/) is used to train the neural network, but it is used
behind a parallel multiprocessing interface with shared memory developed to speed up the training process. 
The relevant module is `experimenter.parallel`.

This notebook goes step-by-step:
  1. Configure logging and some global variables.
  2. Load the dataset using [pandas](https://pandas.pydata.org/).
  3. Configure the experiment ranges.
  4. Do hyperparameter search:
       - Optimal number of neurons and layers.
       - Optimal number of epochs
       - Optimal value for the learning rate.
       - Optimal value for the momentum.

Each step of the hyperparameters search takes the previous `BEST_FOREACH_PHASE` values (default 10) 
from the previous phase and searchs again in the specified range in the configuration.

A heatmap is shown for each experiment, showing the relation of the optimized parameter and the mean 
validation score.

Some graphics of this notebooks are not shown in the default GitHub notebook viewer, because this
notebook makes use of the [altair](https://github.com/altair-viz/altair) visualization library. The
recommended way to view this notebook is using [Visual Studio Code](https://code.visualstudio.com/) which
integrates a [vega](https://github.com/vega/vega) renderer, required by the `altair` library.


## General Configuration

In [11]:
# Enable some logging capabilities
import logging
logging.basicConfig(level='INFO')

# Disable not useful warnings
import os

os.environ["PYTHONWARNINGS"] = "ignore"

# The random seed to use for all the training initializations.
RANDOM_SEED = 0
# The number of subprocesses to use for the parallel training.
POOL_SIZE = os.cpu_count() - 2
# Set to true if you want to see a test run with an approximate duration of 5 minutes.
# Setting this value to `False` is not recommended because it can take hours.
TEST_ENVIRONMENT = False

print(f"Using {POOL_SIZE=}")


Using POOL_SIZE=30


## Data load

The data is loaded with pandas, we load all the `(train, test)` pairs from the `dataset/` folder.

In [12]:
from pathlib import Path
import pandas as pd
import re

# Get all the integer values from the dataset/ files.
filenames_integers = (
    int(m[0]) for p in Path("dataset/").iterdir() if (m := re.search("\d+", str(p)))
)

# The number of partitions to use for the cross validation
PARTITIONS = max(filenames_integers) + 1

datasets = [
    [pd.read_csv(f"dataset/{ftype}_{i}.csv") for ftype in ["train", "test"]]
    for i in range(PARTITIONS)
]

datasets[0][0]


Unnamed: 0,rgb__mean_c0,rgb__mean_c1,rgb__mean_c2,rgb__stdev_c0,rgb__stdev_c1,rgb__stdev_c2,rgb__median_c0,rgb__median_c1,rgb__median_c2,rgb__cov_0,...,hsv__median_c0,hsv__median_c1,hsv__median_c2,hsv__cov_0,hsv__cov_1,hsv__cov_2,gray__mean,gray__stdev,gray__median,tag
0,-0.854147,-0.833795,-0.853242,-0.491280,-0.470497,-0.446814,-0.857708,-0.861224,-0.898734,-0.797766,...,-0.763448,-0.130435,-0.857708,-0.271764,0.263642,-0.137542,-0.836924,-0.431625,-0.873950,OTHER
1,-0.343325,-0.108892,-0.585593,-0.234473,-0.033132,-0.202226,-0.343874,-0.134694,-0.662447,-0.543264,...,-0.554119,0.209490,-0.162055,-0.206723,0.207260,-0.324056,-0.176654,-0.044259,-0.210084,OTHER
2,-0.183933,0.237009,-0.409963,-0.249102,-0.181247,-0.203540,-0.154150,0.208163,-0.417722,-0.589167,...,-0.488982,0.058824,0.154150,-0.195829,0.157020,-0.275458,0.122746,-0.146074,0.109244,OTHER
3,-0.680830,-0.633464,-0.654085,-0.186207,-0.153143,-0.158475,-0.731225,-0.722449,-0.755274,-0.519254,...,-0.662069,-0.517241,-0.723320,-0.216011,0.194512,-0.189558,-0.638210,-0.098288,-0.731092,OTHER
4,-0.171050,0.088423,-0.500224,-0.081952,-0.040828,-0.226752,-0.083004,0.110204,-0.468354,-0.448166,...,-0.558090,0.119334,0.098814,-0.143581,0.115447,-0.335477,0.011585,-0.014120,0.050420,OTHER
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1642,-0.182917,-0.102501,-0.018490,-0.866989,-0.857741,-0.844239,-0.169960,-0.134694,-0.037975,-0.968427,...,0.393966,-0.858407,-0.114625,-0.195889,0.149306,0.003867,-0.092206,-0.842001,-0.117647,SMOKE
1643,0.317514,-0.588199,-0.997015,-0.887188,-0.997200,-0.990703,0.335968,-0.583673,-1.000000,-0.985231,...,-0.900609,1.000000,0.335968,-0.196159,0.150517,0.004518,-0.402942,-0.982982,-0.411765,SMOKE
1644,0.034336,0.121826,0.175029,-0.028794,0.069586,0.016202,0.146245,0.216327,0.291139,-0.288142,...,0.259561,-0.817204,0.252964,-0.167841,-0.187197,-0.037730,0.132040,0.121805,0.243697,SMOKE
1645,0.247614,-0.348639,-0.847863,-0.923813,-0.954711,-0.960968,0.256917,-0.363265,-0.848101,-0.980592,...,-0.856562,0.775148,0.256917,-0.196021,0.151759,0.005965,-0.235698,-0.942772,-0.252101,SMOKE


A label encoder is used to transform the tags of the data frames to numbers.

In [13]:
from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
label_encoder.fit(
    [tag for experiment in datasets for df in experiment for tag in df["tag"].unique()]
)

label_encoder.classes_


array(['FIRE', 'OTHER', 'SMOKE'], dtype='<U5')

A `DataExperiment` is created for each `(train, test)` dataframe pair.

In [14]:
from experimenter.model import DataExperiment


def to_data_experiment(train_test_df) -> DataExperiment:
    train, test = train_test_df

    return DataExperiment(
        train.loc[:, train.columns != "tag"].to_numpy(),
        label_encoder.transform(train["tag"]),
        test.loc[:, train.columns != "tag"].to_numpy(),
        label_encoder.transform(test["tag"]),
    )


all_experiments = [to_data_experiment(pair) for pair in datasets]


## Experiment configuration

In [15]:
from pprint import pprint
from functools import reduce
import numpy as np

from experimenter.model import ExperimentResult

# The number of classes within the dataset
CLASSES = label_encoder.classes_.size
# The number of attributes found in the dataset
ATTRIBUTES = datasets[0][0].columns.size - 1

# A range for each optimizable hyperparameter
NEURONS_RANGE = np.arange(CLASSES, ATTRIBUTES + CLASSES + 1, 1)
LAYERS_RANGE = np.arange(1, 5 + 1, 1)
EPOCHS_RANGE = np.arange(50, 300 + 1, 10)
LEARNING_RATE_RANGE = np.around(np.linspace(0.001, 0.200, 30), 4)
MOMENTUM_RANGE = np.around(np.linspace(0.01, 0.4, 30), 4)

# Number of best parameters to keep for each training phase
BEST_FOREACH_PHASE = 12
BEST_BY_MEAN = 3
BEST_BY_CUSTOM_CRITERIA = BEST_FOREACH_PHASE - BEST_BY_MEAN


# The custom criteria to use to choose the best that go to the next phase
# It prefers low deviations and high precision
TOLERABLE_DEVIATION = 0.4
NEGATIVE_DEVIATION_WEIGHT = -10

def experiment_result_criteria(experiment: ExperimentResult) -> float:
    deviation_factor = 0
    if experiment.stddev > TOLERABLE_DEVIATION:
        deviation_factor = NEGATIVE_DEVIATION_WEIGHT * experiment.stddev

    return experiment.mean + deviation_factor


# Get the complete total number of experiments
# Note that this is just the theoretical number of experiments that had
# to be run if the best hyperparameters were to be found
total_experiments = reduce(
    lambda x, y: x * y,
    map(
        lambda arr: np.size(arr, 0),
        [
            NEURONS_RANGE,
            LAYERS_RANGE,
            EPOCHS_RANGE,
            LEARNING_RATE_RANGE,
            MOMENTUM_RANGE,
        ],
    ),
)
print(f"Complete total: {total_experiments}")


Complete total: 4680000


## Visual configuration

This variables are used to show the results in the heatmaps within using too much
vertical space.

In [16]:
NEURON_BIN_SIZE = 20
EPOCH_BIN_SIZE = 600
LEARNING_RATE_BIN_SIZE = 0.300
MOMENTUM_BIN_SIZE = 0.500

# Sometimes the accuracy is lower than this number, that makes some
# details disappear, so this treshold is used to show more details.
DETAILED_VIEW_TRESHOLD = 85

## Experimentation

A `ParallelExperimenter` is created with the `DataExperiment`'s that were created in the previous step.
This class will allow us to run our training in parallel.

In [17]:
from experimenter.parallel import ParallelExperimenter
experimenter = ParallelExperimenter(all_experiments, POOL_SIZE, RANDOM_SEED)


## Test parameters

If the `TEST_ENVIRONMENT` flag is set, just a fraction of the hyperparameter tuning phases will be ran.

In [18]:
if TEST_ENVIRONMENT:
    NEURONS_RANGE = NEURONS_RANGE[: NEURONS_RANGE.size // 4]
    LAYERS_RANGE = LAYERS_RANGE[: NEURONS_RANGE.size // 4]
    EPOCHS_RANGE = EPOCHS_RANGE[: EPOCHS_RANGE.size // 5]
    LEARNING_RATE_RANGE = LEARNING_RATE_RANGE[: LEARNING_RATE_RANGE.size // 4]
    MOMENTUM_RANGE = MOMENTUM_RANGE[: MOMENTUM_RANGE.size // 4]


## Initial experimentation

The initial experimentation values are set:

In [19]:
INITIAL_NEURONS = NEURONS_RANGE[0]
INITIAL_LAYERS = LAYERS_RANGE[0]
INITIAL_EPOCHS = EPOCHS_RANGE[EPOCHS_RANGE.size // 4]
INITIAL_LEARNING_RATE = LEARNING_RATE_RANGE[LEARNING_RATE_RANGE.size // 2]
INITIAL_MOMENTUM = MOMENTUM_RANGE[MOMENTUM_RANGE.size // 2]

pprint(
    {k: v for k, v in locals().items() if k.isupper() and k.startswith("INITIAL")},
    sort_dicts=False,
)

# This is the real number of experiments that will be ran from the entire total shown above.
real_experiments = NEURONS_RANGE.size * LAYERS_RANGE.size + BEST_FOREACH_PHASE * (
    EPOCHS_RANGE.size + LEARNING_RATE_RANGE.size + MOMENTUM_RANGE.size
)
print(f"Real experiments count: {real_experiments}")


{'INITIAL_NEURONS': 3,
 'INITIAL_LAYERS': 1,
 'INITIAL_EPOCHS': 110,
 'INITIAL_LEARNING_RATE': 0.1039,
 'INITIAL_MOMENTUM': 0.2117}
Real experiments count: 1232


## Hyperparameter tuning phase

Now the hyperparameters are tuned. The first step is to find the best neurons and layers parameters, where
"better" is when the accuracy is higher and the standard deviation is low.

### Neurons and layers tuning

In [20]:
from experimenter.model import ModelParams
import dataclasses

initial_experiment = ModelParams(
    INITIAL_NEURONS,
    layers=INITIAL_LAYERS,
    epochs=INITIAL_EPOCHS,
    learning_rate=INITIAL_LEARNING_RATE,
    momentum=INITIAL_MOMENTUM,
)

print(f'Running {NEURONS_RANGE.size * LAYERS_RANGE.size} experiments')
neurons_result = experimenter.run_all(
    [
        dataclasses.replace(initial_experiment, neurons=neurons, layers=layers)
        for neurons in NEURONS_RANGE
        for layers in LAYERS_RANGE
    ]
)

Running 200 experiments


INFO:experimenter.parallel:0 / 200: Starting...
INFO:experimenter.parallel:1 / 200: ModelParams(neurons=3, layers=1, epochs=110, learning_rate=0.1039, momentum=0.2117)
INFO:experimenter.parallel:2 / 200: ModelParams(neurons=5, layers=1, epochs=110, learning_rate=0.1039, momentum=0.2117)
INFO:experimenter.parallel:3 / 200: ModelParams(neurons=5, layers=3, epochs=110, learning_rate=0.1039, momentum=0.2117)
INFO:experimenter.parallel:4 / 200: ModelParams(neurons=6, layers=2, epochs=110, learning_rate=0.1039, momentum=0.2117)
INFO:experimenter.parallel:5 / 200: ModelParams(neurons=13, layers=1, epochs=110, learning_rate=0.1039, momentum=0.2117)
INFO:experimenter.parallel:6 / 200: ModelParams(neurons=11, layers=1, epochs=110, learning_rate=0.1039, momentum=0.2117)
INFO:experimenter.parallel:7 / 200: ModelParams(neurons=9, layers=1, epochs=110, learning_rate=0.1039, momentum=0.2117)
INFO:experimenter.parallel:8 / 200: ModelParams(neurons=3, layers=3, epochs=110, learning_rate=0.1039, momentu

In [21]:
from typing import List


def to_dataframe(experiments: List[ExperimentResult]):
    return pd.DataFrame(
        [
            {
                **dataclasses.asdict(result.params),
                "result_mean": result.mean,
                "result_std": result.stddev,
            }
            for result in experiments
        ]
    )


def split_best(results):
    results.sort(key=lambda r: r.mean, reverse=True)
    by_mean = results[:BEST_BY_MEAN]

    by_criteria = results[BEST_BY_MEAN:]
    by_criteria.sort(key=experiment_result_criteria, reverse=True)
    by_criteria = by_criteria[:BEST_BY_CUSTOM_CRITERIA]

    return by_mean, by_criteria


def prepare_to_show_best(results, relevant_columns, transform=lambda x: x):
    best = by_mean, by_criteria = split_best(results)
    by_mean_df, by_criteria_df = map(transform, map(to_dataframe, best))

    result_df = transform(to_dataframe(results))

    return (
        by_mean + by_criteria,
        result_df,
        by_mean_df.head(BEST_BY_MEAN)[relevant_columns],
        by_criteria_df.head(BEST_BY_CUSTOM_CRITERIA)[relevant_columns],
    )


(
    best_by_neurons_layers,
    neurons_result_df,
    neurons_by_mean_df,
    neurons_by_criteria_df,
) = prepare_to_show_best(
    neurons_result, ["result_mean", "result_std", "neurons", "layers"]
)

print('Best by mean')
neurons_by_mean_df

Best by mean


Unnamed: 0,result_mean,result_std,neurons,layers
0,93.783573,1.203922,15,3
1,93.734793,1.251018,42,2
2,93.394515,1.813784,35,4


In [22]:
print('Best by criteria')
neurons_by_criteria_df

Best by criteria


Unnamed: 0,result_mean,result_std,neurons,layers
0,91.161033,0.710203,24,3
1,92.374979,0.928341,42,4
2,93.005693,1.016326,41,3
3,92.035292,0.961088,40,2
4,91.451822,0.916722,21,1
5,91.986157,1.065456,9,2
6,92.181159,1.086725,40,1
7,92.666478,1.182289,36,3
8,92.763447,1.203029,41,5


In [23]:
import altair as alt


def heatmap(title, source, tooltip="props", **kwargs):
    if tooltip == "props":
        tooltip = source.columns.to_list()

    return (
        alt.Chart(source)
        .mark_rect()
        .encode(tooltip=tooltip, **kwargs)
        .properties(title=title)
        .interactive()
    )


def binify(dataframe, param, bins):
    return [dataframe[dataframe[param].isin(bin)] for bin in bins]


def binified_heatmap(title, source, param, bins, tooltip="props", **kwargs):
    return alt.hconcat(
        *(
            heatmap(title, source[source[param].isin(bin)], tooltip=tooltip, **kwargs)
            for bin in bins
        )
    )


neurons_result_encoding = {"x": "layers:O", "y": "neurons:O", "color": "result_mean:Q"}
neurons_result_tooltip = ["result_mean", "result_std", "neurons", "layers"]
NEURON_BINS = np.array_split(NEURONS_RANGE, NEURONS_RANGE.max() // NEURON_BIN_SIZE + 1)


def binified_neurons_heatmap(title, neurons_df):
    return binified_heatmap(
        title,
        neurons_df,
        param="neurons",
        bins=NEURON_BINS,
        tooltip=neurons_result_tooltip,
        **neurons_result_encoding
    )


binified_neurons_heatmap("Mean precision", neurons_result_df)


In [24]:
range_neurons_result_df = neurons_result_df[
    neurons_result_df.result_mean >= DETAILED_VIEW_TRESHOLD
]

binified_neurons_heatmap(
    f"Mean precision (>= {DETAILED_VIEW_TRESHOLD})", range_neurons_result_df
)

### Epochs tuning

In [25]:
print(f"Running {BEST_FOREACH_PHASE * EPOCHS_RANGE.size} experiments")
epochs_results = experimenter.run_all(
    [
        dataclasses.replace(p.params, epochs=epochs)
        for p in best_by_neurons_layers
        for epochs in EPOCHS_RANGE
    ]
)


Running 312 experiments


INFO:experimenter.parallel:0 / 312: Starting...
INFO:experimenter.parallel:1 / 312: ModelParams(neurons=15, layers=3, epochs=50, learning_rate=0.1039, momentum=0.2117)
INFO:experimenter.parallel:2 / 312: ModelParams(neurons=24, layers=3, epochs=50, learning_rate=0.1039, momentum=0.2117)
INFO:experimenter.parallel:3 / 312: ModelParams(neurons=15, layers=3, epochs=80, learning_rate=0.1039, momentum=0.2117)
INFO:experimenter.parallel:4 / 312: ModelParams(neurons=24, layers=3, epochs=80, learning_rate=0.1039, momentum=0.2117)
INFO:experimenter.parallel:5 / 312: ModelParams(neurons=24, layers=3, epochs=110, learning_rate=0.1039, momentum=0.2117)
INFO:experimenter.parallel:6 / 312: ModelParams(neurons=42, layers=2, epochs=60, learning_rate=0.1039, momentum=0.2117)
INFO:experimenter.parallel:7 / 312: ModelParams(neurons=15, layers=3, epochs=110, learning_rate=0.1039, momentum=0.2117)
INFO:experimenter.parallel:8 / 312: ModelParams(neurons=24, layers=3, epochs=140, learning_rate=0.1039, moment

In [26]:
def prepare_reverse_index(source, columns):
    results_reverse_index = {
        tuple(v for k, v in dataclasses.asdict(r.params).items() if k in columns): i
        for i, r in enumerate(source)
    }

    def modify_dataframe(results_df):
        results_df["src_index"] = results_df.apply(
            lambda row: results_reverse_index[tuple(row[k] for k in columns)], axis=1
        )
        return results_df

    return modify_dataframe


(
    best_by_epochs,
    epochs_results_df,
    epochs_by_mean_df,
    epochs_by_criteria_df,
) = prepare_to_show_best(
    epochs_results,
    ["result_mean", "result_std", "neurons", "layers", "epochs", "src_index"],
    transform=prepare_reverse_index(best_by_neurons_layers, ["neurons", "layers"]),
)

print("Best by mean")
epochs_by_mean_df


Best by mean


Unnamed: 0,result_mean,result_std,neurons,layers,epochs,src_index
0,93.783573,1.203922,15,3,110,0
1,93.734793,1.251018,42,2,110,1
2,93.686486,1.372032,15,3,140,0


In [27]:
print("Best by criteria")
epochs_by_criteria_df

Best by criteria


Unnamed: 0,result_mean,result_std,neurons,layers,epochs,src_index
0,92.666241,0.641486,42,4,120,4
1,93.588808,0.86546,42,2,100,1
2,91.161033,0.710203,24,3,110,3
3,93.006638,0.911077,42,2,160,1
4,93.006638,0.911077,42,2,170,1
5,93.006638,0.911077,42,2,180,1
6,93.006638,0.911077,42,2,190,1
7,93.006638,0.911077,42,2,200,1
8,93.006638,0.911077,42,2,210,1


In [28]:
epochs_results_encoding = {
    "x": "src_index:O",
    "y": "epochs:O",
    "color": "result_mean:Q",
}
epochs_results_tooltip = ["result_mean", "result_std", "neurons", "layers", "epochs", "src_index"]
EPOCHS_BINS = np.array_split(EPOCHS_RANGE, EPOCHS_RANGE.max() // EPOCH_BIN_SIZE + 1)


def binified_epochs_heatmap(title, epochs_df):
    return binified_heatmap(
        title,
        epochs_df,
        "epochs",
        EPOCHS_BINS,
        epochs_results_tooltip,
        **epochs_results_encoding
    )


binified_epochs_heatmap("Mean precision", epochs_results_df)


In [29]:
range_epoch_results_df = epochs_results_df[
    epochs_results_df.result_mean >= DETAILED_VIEW_TRESHOLD
]

binified_epochs_heatmap(
    f"Mean precision (>= {DETAILED_VIEW_TRESHOLD})", 
    range_epoch_results_df
)


### Learning rate tuning

In [30]:
print(f"Running {BEST_FOREACH_PHASE * LEARNING_RATE_RANGE.size} experiments")

learning_rate_results = experimenter.run_all(
    [
        dataclasses.replace(p.params, learning_rate=learning_rate)
        for p in best_by_epochs
        for learning_rate in LEARNING_RATE_RANGE
    ]
)


Running 360 experiments


INFO:experimenter.parallel:0 / 360: Starting...
INFO:experimenter.parallel:1 / 360: ModelParams(neurons=15, layers=3, epochs=110, learning_rate=0.1657, momentum=0.2117)
INFO:experimenter.parallel:2 / 360: ModelParams(neurons=15, layers=3, epochs=140, learning_rate=0.1657, momentum=0.2117)
INFO:experimenter.parallel:3 / 360: ModelParams(neurons=15, layers=3, epochs=110, learning_rate=0.1863, momentum=0.2117)
INFO:experimenter.parallel:4 / 360: ModelParams(neurons=15, layers=3, epochs=110, learning_rate=0.1451, momentum=0.2117)
INFO:experimenter.parallel:5 / 360: ModelParams(neurons=15, layers=3, epochs=110, learning_rate=0.0628, momentum=0.2117)
INFO:experimenter.parallel:6 / 360: ModelParams(neurons=15, layers=3, epochs=140, learning_rate=0.1863, momentum=0.2117)
INFO:experimenter.parallel:7 / 360: ModelParams(neurons=15, layers=3, epochs=110, learning_rate=0.1039, momentum=0.2117)
INFO:experimenter.parallel:8 / 360: ModelParams(neurons=15, layers=3, epochs=110, learning_rate=0.0833, m

In [31]:
(
    best_by_learning_rate,
    learning_rate_results_df,
    learning_rate_by_mean_df,
    learning_rate_by_criteria_df,
) = prepare_to_show_best(
    learning_rate_results,
    [
        "result_mean",
        "result_std",
        "neurons",
        "layers",
        "epochs",
        "learning_rate",
        "src_index",
    ],
    transform=prepare_reverse_index(best_by_epochs, ["neurons", "layers", "epochs"]),
)

print("Best by mean")
learning_rate_by_mean_df


Best by mean


Unnamed: 0,result_mean,result_std,neurons,layers,epochs,learning_rate,src_index
0,93.97763,0.975881,42,2,110,0.1794,1
1,93.928968,1.030509,42,2,160,0.1931,6
2,93.928968,1.030509,42,2,170,0.1931,7


In [32]:
print("Best by criteria")
learning_rate_by_criteria_df

Best by criteria


Unnamed: 0,result_mean,result_std,neurons,layers,epochs,learning_rate,src_index
0,92.034937,0.445404,24,3,110,0.0422,5
1,91.403987,0.469086,24,3,110,0.049,5
2,92.666241,0.641486,42,4,120,0.1039,3
3,92.90896,0.678537,42,4,120,0.1108,3
4,91.93785,0.621407,24,3,110,0.1245,5
5,92.277538,0.679347,42,4,120,0.1588,3
6,92.714903,0.736236,42,2,160,0.0833,6
7,92.714903,0.736236,42,2,170,0.0833,7
8,92.714903,0.736236,42,2,180,0.0833,8


In [33]:
learning_rate_results_encoding = {
    "x": "src_index:O",
    "y": "learning_rate:O",
    "color": "result_mean:Q",
}
learning_rate_results_tooltip = [
    "result_mean",
    "result_std",
    "neurons",
    "layers",
    "epochs",
    "learning_rate",
    "src_index",
]
LEARNING_RATE_BINS = np.array_split(
    LEARNING_RATE_RANGE, LEARNING_RATE_RANGE.max() // LEARNING_RATE_BIN_SIZE + 1
)


def binified_learning_rate_heatmap(title, lr_df):
    return binified_heatmap(
        title,
        lr_df,
        "learning_rate",
        LEARNING_RATE_BINS,
        learning_rate_results_tooltip,
        **learning_rate_results_encoding
    )


binified_learning_rate_heatmap("Mean precision", learning_rate_results_df)


In [34]:
range_learning_rate_results_df = learning_rate_results_df[
    learning_rate_results_df.result_mean >= DETAILED_VIEW_TRESHOLD
]

binified_learning_rate_heatmap(
    f"Mean precision (>= {DETAILED_VIEW_TRESHOLD})", range_learning_rate_results_df
)


### Momentum tuning

In [35]:
print(f"Running {BEST_FOREACH_PHASE * MOMENTUM_RANGE.size} experiments")

momentum_results = experimenter.run_all(
    [
        dataclasses.replace(p.params, momentum=momentum)
        for p in best_by_learning_rate
        for momentum in MOMENTUM_RANGE
    ]
)


Running 360 experiments


INFO:experimenter.parallel:0 / 360: Starting...
INFO:experimenter.parallel:1 / 360: ModelParams(neurons=42, layers=2, epochs=160, learning_rate=0.1931, momentum=0.3731)
INFO:experimenter.parallel:2 / 360: ModelParams(neurons=42, layers=2, epochs=170, learning_rate=0.1931, momentum=0.3731)
INFO:experimenter.parallel:3 / 360: ModelParams(neurons=42, layers=2, epochs=110, learning_rate=0.1794, momentum=0.2924)
INFO:experimenter.parallel:4 / 360: ModelParams(neurons=42, layers=2, epochs=110, learning_rate=0.1794, momentum=0.3328)
INFO:experimenter.parallel:5 / 360: ModelParams(neurons=42, layers=2, epochs=160, learning_rate=0.1931, momentum=0.2924)
INFO:experimenter.parallel:6 / 360: ModelParams(neurons=42, layers=2, epochs=110, learning_rate=0.1794, momentum=0.3731)
INFO:experimenter.parallel:7 / 360: ModelParams(neurons=42, layers=2, epochs=170, learning_rate=0.1931, momentum=0.2924)
INFO:experimenter.parallel:8 / 360: ModelParams(neurons=42, layers=2, epochs=170, learning_rate=0.1931, m

## Best results

The best results found are show below.

In [36]:
(
    best_by_momentum,
    momentum_results_df,
    momentum_by_mean_df,
    momentum_by_criteria_df,
) = prepare_to_show_best(
    momentum_results,
    [
        "result_mean",
        "result_std",
        "neurons",
        "layers",
        "epochs",
        "learning_rate",
        "momentum",
        "src_index",
    ],
    transform=prepare_reverse_index(
        best_by_learning_rate, ["neurons", "layers", "epochs", "learning_rate"]
    ),
)

print("Best by mean")
momentum_by_mean_df


Best by mean


Unnamed: 0,result_mean,result_std,neurons,layers,epochs,learning_rate,momentum,src_index
0,94.074363,1.126455,42,2,110,0.1794,0.3462,0
1,94.026055,1.037634,42,2,110,0.1794,0.2386,0
2,94.025819,1.061693,42,2,110,0.1794,0.3597,0


In [37]:
print("Best by criteria")
momentum_by_criteria_df

Best by criteria


Unnamed: 0,result_mean,result_std,neurons,layers,epochs,learning_rate,momentum,src_index
0,92.472185,0.340815,42,4,120,0.1108,0.2924,6
1,92.326318,0.249118,42,4,120,0.1108,0.279,6
2,92.083481,0.395406,42,4,120,0.1039,0.2521,5
3,91.986275,0.31028,42,4,120,0.1588,0.3328,8
4,93.00652,0.467722,42,4,120,0.1039,0.4,5
5,92.180568,0.41947,42,4,120,0.1108,0.2386,6
6,93.831762,0.588698,42,2,110,0.1794,0.3731,0
7,92.034937,0.445404,24,3,110,0.0422,0.2117,3
8,91.452531,0.410568,24,3,110,0.049,0.1714,4


In [38]:
momentum_results_encoding = {
    "x": "src_index:O",
    "y": "momentum:O",
    "color": "result_mean:Q",
}
momentum_results_tooltip = [
    "result_mean",
    "result_std",
    "neurons",
    "layers",
    "epochs",
    "learning_rate",
    "momentum",
    "src_index",
]
MOMENTUM_BINS = np.array_split(
    MOMENTUM_RANGE, MOMENTUM_RANGE.max() // MOMENTUM_BIN_SIZE + 1
)


def binified_momentum_heatmap(title, momentum_df):
    return binified_heatmap(
        title,
        momentum_df,
        "momentum",
        MOMENTUM_BINS,
        momentum_results_tooltip,
        **momentum_results_encoding
    )


binified_momentum_heatmap("Mean precision", momentum_results_df)


In [39]:
range_momentum_results_df = momentum_results_df[
    momentum_results_df.result_mean >= DETAILED_VIEW_TRESHOLD
]

binified_momentum_heatmap(
    f"Mean precision (>= {DETAILED_VIEW_TRESHOLD})", range_momentum_results_df
)
