# Fire from forest

- Authors:
  - Axel Suárez Polo (@ggzor)
  - Sergio Daniel Cortez Chaves (@SerCor)

This notebook shows the training of a neural network to identify fire and smoke within sub-regions
of an image. Most of this notebook shows the process of adjusting the hyperparameters using an automatic
heuristic search (based on average precision and standard deviation).

The dataset was generated by @SerCor, using [tk-tagger](https://github.com/ggzor/tk-tagger), a 
tkinter python application to interactively tag sub-regions of an image; developed to ease the
dataset creation.

[scikit-learn](https://scikit-learn.org/stable/) is used to train the neural network, but it is used
behind a parallel multiprocessing interface with shared memory developed to speed up the training process. 
The relevant module is `experimenter.parallel`.

This notebook goes step-by-step:
  1. Configure logging and some global variables.
  2. Load the dataset using [pandas](https://pandas.pydata.org/).
  3. Configure the experiment ranges.
  4. Do hyperparameter search:
       - Optimal number of neurons and layers.
       - Optimal number of epochs
       - Optimal value for the learning rate.
       - Optimal value for the momentum.

Each step of the hyperparameters search takes the previous `BEST_FOREACH_PHASE` values (default 10) 
from the previous phase and searchs again in the specified range in the configuration.

A heatmap is shown for each experiment, showing the relation of the optimized parameter and the mean 
validation score.

Some graphics of this notebooks are not shown in the default GitHub notebook viewer, because this
notebook makes use of the [altair](https://github.com/altair-viz/altair) visualization library. The
recommended way to view this notebook is using [Visual Studio Code](https://code.visualstudio.com/) which
integrates a [vega](https://github.com/vega/vega) renderer, required by the `altair` library.


## General Configuration

In [1]:
# Enable some logging capabilities
import logging

logging.basicConfig(filename="notebook.ipynb.log", level="WARNING")

# Disable not useful warnings
import os

os.environ["PYTHONWARNINGS"] = "ignore"

# The random seed to use for all the training initializations.
RANDOM_SEED = 0
# The number of subprocesses to use for the parallel training.
POOL_SIZE = os.cpu_count()
# Set to true if you want to see a test run with an approximate duration of 5 minutes.
# Setting this value to `False` is not recommended because it can take hours.
TEST_ENVIRONMENT = False

print(f"Using {POOL_SIZE=}")


Using POOL_SIZE=8


## Data load

The data is loaded with pandas, we load all the `(train, test)` pairs from the `dataset/` folder.

In [2]:
from pathlib import Path
import pandas as pd
import re

# Get all the integer values from the dataset/ files.
filenames_integers = (
    int(m[0]) for p in Path("dataset/").iterdir() if (m := re.search("\d+", str(p)))
)

# The number of partitions to use for the cross validation
PARTITIONS = max(filenames_integers) + 1

datasets = [
    [pd.read_csv(f"dataset/{ftype}_{i}.csv") for ftype in ["train", "test"]]
    for i in range(PARTITIONS)
]

datasets[0][0]


Unnamed: 0,rgb__mean_c0,rgb__mean_c1,rgb__mean_c2,rgb__stdev_c0,rgb__stdev_c1,rgb__stdev_c2,rgb__median_c0,rgb__median_c1,rgb__median_c2,rgb__cov_0,...,hsv__cov_0,hsv__cov_1,hsv__cov_2,hsv__cov_3,hsv__cov_4,hsv__cov_5,gray__mean,gray__stdev,gray__median,tag
0,-0.343603,-0.399791,-0.483646,0.155106,0.143023,0.015099,-0.407115,-0.510204,-0.578059,-0.108204,...,-0.181116,0.032211,-0.181116,-0.388510,0.032211,-0.388510,-0.380955,0.216305,-0.480820,OTHER
1,-0.148882,-0.023510,-0.462608,-0.170048,-0.102284,-0.205109,-0.146245,-0.036735,-0.527426,-0.496538,...,-0.180401,0.149459,-0.180401,-0.204963,0.149459,-0.204963,-0.063179,-0.071409,-0.071546,OTHER
2,-0.384243,-0.314732,-0.390246,-0.084516,-0.023385,-0.078241,-0.470356,-0.436735,-0.485232,-0.386402,...,-0.284572,0.137453,-0.284572,-0.140462,0.137453,-0.140462,-0.321888,0.026419,-0.441796,OTHER
3,-0.348830,0.000724,-0.632997,-0.155131,0.059389,-0.243767,-0.351779,-0.036735,-0.696203,-0.439717,...,-0.190342,0.176021,-0.190342,-0.328477,0.176021,-0.328477,-0.103680,0.039713,-0.128140,OTHER
4,-0.425188,-0.155956,-0.593988,-0.368570,-0.152293,-0.440119,-0.375494,-0.151020,-0.603376,-0.656547,...,-0.172915,0.170774,-0.172915,-0.161318,0.170774,-0.161318,-0.231420,-0.183313,-0.212932,OTHER
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1529,0.348918,-0.459141,-0.850878,-0.960172,-0.977257,-0.980148,0.351779,-0.469388,-0.848101,-0.990671,...,-0.196171,0.151483,-0.196171,0.004657,0.151483,0.004657,-0.293153,-0.976295,-0.300791,SMOKE
1530,0.711822,-0.237385,-0.887499,-0.167512,-0.382122,-0.799823,0.889328,-0.175510,-0.898734,-0.647891,...,-0.194703,0.181633,-0.194703,0.004211,0.181633,0.004211,-0.052989,-0.336232,0.031360,SMOKE
1531,1.000000,0.173099,-0.926741,-0.983273,-0.672096,-0.856934,0.992095,0.142857,-0.940928,-0.991329,...,-0.198950,0.151451,-0.198950,0.004869,0.151451,0.004869,0.306354,-0.748378,0.284923,SMOKE
1532,0.212767,-0.332275,-0.954472,-0.583690,-0.683132,-0.841309,0.177866,-0.379592,-0.966245,-0.900064,...,-0.196516,0.154068,-0.196516,-0.010151,0.154068,-0.010151,-0.240067,-0.658883,-0.285511,SMOKE


A label encoder is used to transform the tags of the data frames to numbers.

In [3]:
from sklearn.preprocessing import LabelEncoder

label_encoder = LabelEncoder()
label_encoder.fit(
    [tag for experiment in datasets for df in experiment for tag in df["tag"].unique()]
)

label_encoder.classes_


array(['FIRE', 'OTHER', 'SMOKE'], dtype='<U5')

A `DataExperiment` is created for each `(train, test)` dataframe pair.

In [4]:
from experimenter.model import DataExperiment


def to_data_experiment(train_test_df) -> DataExperiment:
    train, test = train_test_df

    return DataExperiment(
        train.loc[:, train.columns != "tag"].to_numpy(),
        label_encoder.transform(train["tag"]),
        test.loc[:, train.columns != "tag"].to_numpy(),
        label_encoder.transform(test["tag"]),
    )


all_experiments = [to_data_experiment(pair) for pair in datasets]


## Experiment configuration

In [5]:
from pprint import pprint
from functools import reduce
import numpy as np

# The number of classes within the dataset
CLASSES = label_encoder.classes_.size
# The number of attributes found in the dataset
ATTRIBUTES = datasets[0][0].columns.size - 1

# Number of best parameters to keep for each training phase
BEST_FOREACH_PHASE= 15

# A range for each optimizable hyperparameter
NEURONS_RANGE = np.arange(CLASSES, ATTRIBUTES + CLASSES + 1, 1)
LAYERS_RANGE = np.arange(1, 7 + 1, 1)
EPOCHS_RANGE = np.arange(50, 500 + 1, 15)
LEARNING_RATE_RANGE = np.around(np.linspace(0.001, 0.200, 50), 4)
MOMENTUM_RANGE = np.around(np.linspace(0.01, 0.4, 30), 4)

# Get the complete total number of experiments
# Note that this is just the theoretical number of experiments that had
# to be run if the best hyperparameters were to be found
total_experiments = reduce(
    lambda x, y: x * y,
    map(
        lambda arr: np.size(arr, 0),
        [
            NEURONS_RANGE,
            LAYERS_RANGE,
            EPOCHS_RANGE,
            LEARNING_RATE_RANGE,
            MOMENTUM_RANGE
        ],
    ),
)
print(f"Complete total: {total_experiments}")

Complete total: 17902500


## Visual configuration

This variables are used to show the results in the heatmaps within using too much
vertical space.

In [6]:
NEURON_BIN_SIZE = 20
EPOCH_BIN_SIZE = 250
LEARNING_RATE_BIN_SIZE = 0.100
MOMENTUM_BIN_SIZE = 0.200

# Sometimes the accuracy is lower than this number, that makes some
# details disappear, so this treshold is used to show more details.
DETAILED_VIEW_TRESHOLD = 85

## Experimentation

A `ParallelExperimenter` is created with the `DataExperiment`'s that were created in the previous step.
This class will allow us to run our training in parallel.

In [7]:
from experimenter.parallel import ParallelExperimenter
experimenter = ParallelExperimenter(all_experiments, POOL_SIZE, RANDOM_SEED)

## Test parameters

If the `TEST_ENVIRONMENT` flag is set, just a fraction of the hyperparameter tuning phases will be ran.

In [8]:
if TEST_ENVIRONMENT:
    NEURONS_RANGE = NEURONS_RANGE[: NEURONS_RANGE.size // 4]
    LAYERS_RANGE = LAYERS_RANGE[: NEURONS_RANGE.size // 4]
    EPOCHS_RANGE = EPOCHS_RANGE[: EPOCHS_RANGE.size // 5]
    LEARNING_RATE_RANGE = LEARNING_RATE_RANGE[: LEARNING_RATE_RANGE.size // 4]
    MOMENTUM_RANGE = MOMENTUM_RANGE[: MOMENTUM_RANGE.size // 4]


## Initial experimentation

The initial experimentation values are set:

In [9]:
INITIAL_NEURONS = NEURONS_RANGE[0]
INITIAL_LAYERS = LAYERS_RANGE[0]
INITIAL_EPOCHS = EPOCHS_RANGE[EPOCHS_RANGE.size // 4]
INITIAL_LEARNING_RATE = LEARNING_RATE_RANGE[LEARNING_RATE_RANGE.size // 2]
INITIAL_MOMENTUM = MOMENTUM_RANGE[MOMENTUM_RANGE.size // 2]

pprint(
    {k: v for k, v in locals().items() if k.isupper() and k.startswith("INITIAL")},
    sort_dicts=False,
)

# This is the real number of experiments that will be ran from the entire total shown above.
real_experiments = NEURONS_RANGE.size * LAYERS_RANGE.size + BEST_FOREACH_PHASE * (
    EPOCHS_RANGE.size + LEARNING_RATE_RANGE.size + MOMENTUM_RANGE.size
)
print(f"Real experiments count: {real_experiments}")


{'INITIAL_NEURONS': 3,
 'INITIAL_LAYERS': 1,
 'INITIAL_EPOCHS': 155,
 'INITIAL_LEARNING_RATE': 0.1025,
 'INITIAL_MOMENTUM': 0.2117}
Real experiments count: 2050


## Hyperparameter tuning phase

Now the hyperparameters are tuned. The first step is to find the best neurons and layers parameters, where
"better" is when the accuracy is higher and the standard deviation is low.

### Neurons and layers tuning

In [None]:
from experimenter.model import ModelParams
import dataclasses

initial_experiment = ModelParams(
    INITIAL_NEURONS,
    layers=INITIAL_LAYERS,
    epochs=INITIAL_EPOCHS,
    learning_rate=INITIAL_LEARNING_RATE,
    momentum=INITIAL_MOMENTUM,
)

print(f'Running {NEURONS_RANGE.size * LAYERS_RANGE.size} experiments')
neurons_result = experimenter.run_all(
    [
        dataclasses.replace(initial_experiment, neurons=neurons, layers=layers)
        for neurons in NEURONS_RANGE
        for layers in LAYERS_RANGE
    ]
)

In [None]:
from experimenter.model import ExperimentResult
from typing import List

TOLERABLE_DEVIATION = 0.4
NEGATIVE_DEVIATION_WEIGHT = -10


def experiment_result_criteria(experiment: ExperimentResult) -> float:
    deviation_factor = 0
    if experiment.stddev > TOLERABLE_DEVIATION:
        deviation_factor = NEGATIVE_DEVIATION_WEIGHT * experiment.stddev

    return experiment.mean + deviation_factor


def to_dataframe(experiments: List[ExperimentResult]):
    return pd.DataFrame(
        [
            {
                **dataclasses.asdict(result.params),
                "result_mean": result.mean,
                "result_std": result.stddev,
            }
            for result in experiments
        ]
    )


neurons_result.sort(key=experiment_result_criteria, reverse=True)
neurons_result_df = to_dataframe(neurons_result)

neurons_result_df.head(BEST_FOREACH_PHASE)[["result_mean", "result_std", "neurons", "layers"]]


In [None]:
import altair as alt


def heatmap(title, source, tooltip="props", **kwargs):
    if tooltip == "props":
        tooltip = source.columns.to_list()

    return (
        alt.Chart(source)
        .mark_rect()
        .encode(tooltip=tooltip, **kwargs)
        .properties(title=title)
        .interactive()
    )


def binify(dataframe, param, bins):
    return [dataframe[dataframe[param].isin(bin)] for bin in bins]


def binified_heatmap(title, source, param, bins, tooltip="props", **kwargs):
    return alt.hconcat(
        *(
            heatmap(title, source[source[param].isin(bin)], tooltip=tooltip, **kwargs)
            for bin in bins
        )
    )


neurons_result_encoding = {"x": "layers:O", "y": "neurons:O", "color": "result_mean:Q"}
neurons_result_tooltip = ["result_mean", "result_std", "neurons", "layers"]
NEURON_BINS = np.array_split(NEURONS_RANGE, NEURONS_RANGE.max() // NEURON_BIN_SIZE + 1)


def binified_neurons_heatmap(title, neurons_df):
    return binified_heatmap(
        title,
        neurons_df,
        param="neurons",
        bins=NEURON_BINS,
        tooltip=neurons_result_tooltip,
        **neurons_result_encoding
    )


binified_neurons_heatmap("Mean precision", neurons_result_df)


In [None]:
range_neurons_result_df = neurons_result_df[
    neurons_result_df.result_mean >= DETAILED_VIEW_TRESHOLD
]

binified_neurons_heatmap(
    f"Mean precision (>= {DETAILED_VIEW_TRESHOLD})", range_neurons_result_df
)


### Epochs tuning

In [None]:
best_by_neurons_layers = list(n.params for n in neurons_result[:BEST_FOREACH_PHASE])
print(f'Running {BEST_FOREACH_PHASE * EPOCHS_RANGE.size} experiments')
epochs_results = experimenter.run_all(
    dataclasses.replace(p, epochs=epochs)
    for p in best_by_neurons_layers
    for epochs in EPOCHS_RANGE
)


In [None]:
epochs_results.sort(key=experiment_result_criteria, reverse=True)
epochs_results_reverse_index = {
    (r.neurons, r.layers): i for i, r in enumerate(best_by_neurons_layers)
}

epochs_results_df = to_dataframe(epochs_results)

epochs_results_df["src_index"] = epochs_results_df.apply(
    lambda row: epochs_results_reverse_index[row["neurons"], row["layers"]], axis=1
)

epochs_results_df.head(BEST_FOREACH_PHASE)[
    ["result_mean", "result_std", "neurons", "layers", "epochs", "src_index"]
]


In [None]:
epochs_results_encoding = {
    "x": "src_index:O",
    "y": "epochs:O",
    "color": "result_mean:Q",
}
epochs_results_tooltip = ["result_mean", "result_std", "neurons", "layers", "src_index"]
EPOCHS_BINS = np.array_split(EPOCHS_RANGE, EPOCHS_RANGE.max() // EPOCH_BIN_SIZE + 1)


def binified_epochs_heatmap(title, epochs_df):
    return binified_heatmap(
        title,
        epochs_df,
        "epochs",
        EPOCHS_BINS,
        epochs_results_tooltip,
        **epochs_results_encoding
    )


binified_epochs_heatmap("Mean precision", epochs_results_df)


In [None]:
range_epoch_results_df = epochs_results_df[
    epochs_results_df.result_mean >= DETAILED_VIEW_TRESHOLD
]

binified_epochs_heatmap(
    f"Mean precision (>= {DETAILED_VIEW_TRESHOLD})", 
    range_epoch_results_df
)


### Learning rate tuning

In [None]:
best_by_epochs = list(r.params for r in epochs_results[:BEST_FOREACH_PHASE])

print(f'Running {BEST_FOREACH_PHASE * LEARNING_RATE_RANGE.size} experiments')

learning_rate_results = experimenter.run_all(
    dataclasses.replace(p, learning_rate=learning_rate)
    for p in best_by_epochs
    for learning_rate in LEARNING_RATE_RANGE
)

In [None]:
learning_rate_results.sort(key=experiment_result_criteria, reverse=True)
learning_rate_results_reverse_index = {
    (r.neurons, r.layers, r.epochs): i for i, r in enumerate(best_by_epochs)
}

learning_rate_results_df = to_dataframe(learning_rate_results)

learning_rate_results_df["src_index"] = learning_rate_results_df.apply(
    lambda row: learning_rate_results_reverse_index[
        row["neurons"], row["layers"], row["epochs"]
    ],
    axis=1,
)

learning_rate_results_df.head(BEST_FOREACH_PHASE)[
    [
        "result_mean",
        "result_std",
        "neurons",
        "layers",
        "epochs",
        "learning_rate",
        "src_index",
    ]
]


In [None]:
learning_rate_results_encoding = {
    "x": "src_index:O",
    "y": "learning_rate:O",
    "color": "result_mean:Q",
}
learning_rate_results_tooltip = [
    "result_mean",
    "result_std",
    "neurons",
    "layers",
    "epochs",
    "learning_rate",
    "src_index",
]
LEARNING_RATE_BINS = np.array_split(
    LEARNING_RATE_RANGE, LEARNING_RATE_RANGE.max() // LEARNING_RATE_BIN_SIZE + 1
)


def binified_learning_rate_heatmap(title, lr_df):
    return binified_heatmap(
        title,
        lr_df,
        "learning_rate",
        LEARNING_RATE_BINS,
        learning_rate_results_tooltip,
        **learning_rate_results_encoding
    )


binified_learning_rate_heatmap("Mean precision", learning_rate_results_df)


In [None]:
range_learning_rate_results_df = learning_rate_results_df[
    learning_rate_results_df.result_mean >= DETAILED_VIEW_TRESHOLD
]

binified_learning_rate_heatmap(
    f"Mean precision (>= {DETAILED_VIEW_TRESHOLD})", range_learning_rate_results_df
)


### Momentum tuning

In [None]:
best_by_learning_rate = list(r.params for r in learning_rate_results[:BEST_FOREACH_PHASE])

print(f'Running {BEST_FOREACH_PHASE * MOMENTUM_RANGE.size} experiments')

momentum_results = experimenter.run_all(
    dataclasses.replace(p, momentum=momentum)
    for p in best_by_learning_rate
    for momentum in MOMENTUM_RANGE
)

## Best results

The best results found are show below.

In [None]:
momentum_results.sort(key=experiment_result_criteria, reverse=True)
momentum_results_reverse_index = {
    (r.neurons, r.layers, r.epochs, r.learning_rate): i for i, r in enumerate(best_by_learning_rate)
}

momentum_results_df = to_dataframe(momentum_results)

momentum_results_df["src_index"] = momentum_results_df.apply(
    lambda row: momentum_results_reverse_index[
        row["neurons"], row["layers"], row["epochs"], row["learning_rate"]
    ],
    axis=1,
)

momentum_results_df.head(BEST_FOREACH_PHASE)[
    [
        "result_mean",
        "result_std",
        "neurons",
        "layers",
        "epochs",
        "learning_rate",
        "momentum",
        "src_index",
    ]
]

In [None]:
momentum_results_encoding = {
    "x": "src_index:O",
    "y": "momentum:O",
    "color": "result_mean:Q",
}
momentum_results_tooltip = [
    "result_mean",
    "result_std",
    "neurons",
    "layers",
    "epochs",
    "learning_rate",
    "momentum",
    "src_index",
]
MOMENTUM_BINS = np.array_split(
    MOMENTUM_RANGE, MOMENTUM_RANGE.max() // MOMENTUM_BIN_SIZE + 1
)


def binified_momentum_heatmap(title, momentum_df):
    return binified_heatmap(
        title,
        momentum_df,
        "momentum",
        MOMENTUM_BINS,
        momentum_results_tooltip,
        **momentum_results_encoding
    )


binified_momentum_heatmap("Mean precision", momentum_results_df)


In [None]:
range_momentum_results_df = momentum_results_df[
    momentum_results_df.result_mean >= DETAILED_VIEW_TRESHOLD
]

binified_momentum_heatmap(
    f"Mean precision (>= {DETAILED_VIEW_TRESHOLD})", range_momentum_results_df
)
