Skip to content
A library for Deep Learning Ensembles
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci Add shuffling (and a bit of housekeeping) (#92) Jan 18, 2019
.github add PR template Aug 12, 2018
bin Distinct data and model folders (#95) Feb 21, 2019
examples add cifar-10 example May 22, 2018
tests erroneous commit Mar 2, 2016
tools add some preprocessing tools Feb 2, 2017
toupee Add shuffling (and a bit of housekeeping) (#92) Jan 18, 2019
.gitignore gitignore Apr 30, 2017
README.md
requirements.txt Add shuffling (and a bit of housekeeping) (#92) Jan 18, 2019
setup.py add setup script Feb 7, 2018

README.md

Welcome to Toupee

"The ugly thing on top that covers up what's missing" A library for Deep Learning ensembles, with a tooolkit for running experiments, based on Keras.

Usage:

Experiments are described in a common YAML format, and each network structure is in serialised Keras format.

Supports saving results to MongoDB for analysis later on.

In bin/ you will find 3 files:

  • mlp.py: takes an experiment description and runs it as a single network. Ignores all ensemble directives.

  • ensemble.py: takes an experiment description and runs it as an ensemble.

  • distilled_ensemble.py: takes an experiment description and runs it as an ensemble, and then distils the ensemble into a single network.

In examples/ there are a few ready-cooked models that you can look at.

Quick-start

  • Install keras from the fork
  • Clone this repo
  • In examples/ there are a few working examples of experiments:
    • Download the needed dataset from here and save it to the correct location (or change the location in the example)
  • Run bin/mlp.py for single network experiments, bin/ensemble.py for ensemble experiments

Datasets

Datasets are saved in the .npz format, with three files in a directory:

  • train.npz: the training set
  • valid.npz: the validation set
  • test.npz: the test set Each of these files is a serialised dictionary {x: numpy.array, y: numpy.array} where x is the input data and y is the expected classification output.

Experiment files

This is the file given as an argument to mlp.py, ensemble.py or distilled_ensemble.py. It is a yaml description of the experiment. Here is an example experiment file:

---
## MLP Parameters ##
dataset: /local/mnist_th/
pickled: false
model_file: mnist.model
optimizer:
  class_name: WAME
  config:
    lr: 0.001
    decay: 1e-4
n_epochs: 100 #max number of training epochs
batch_size: 128
cost_function: categorical_crossentropy
shuffle_dataset: true

## Ensemble Parameters ##
ensemble_size: 10
method: !AdaBoostM1 { }
resample_size: 60000

The parameters are as follows:

network parameters

  • dataset: the location of the dataset. If in "pickle" format, this is a file; if in "npz" format, this is a directory.
  • pickled: true if the dataset is in "pickle" format, false if "npz". Default is false.
  • model_file: the location of the serialised Keras model description.
  • optimizer: the SGD optimization method. See separate section for description.
  • n_epochs: the number of training epochs.
  • batch_size: the number of samples to use at each iteration
  • cost_function: the cost function to use. Any string accepted by Keras works.
  • shuffle_dataset: whether to shuffle the dataset at each epoch.

ensemble parameters

  • ensemble_size: the number of ensemble members to create.
  • method: a class describing the ensemble method. See separate section for description.
  • resample_size: if the ensemble method uses resampling, this is the size of the set to be resampled at each round.

optimizer subparameters

  • class_name: a string that Keras can deserialise to a learning algorithm. Note that our Keras fork includes WAME, presented at ESANN
  • config:
    • lr: either a float for a fixed learning rate, or a dictionary of (epoch, rate) pairs
    • decay: learning rate decay

ensemble methods

  • Bagging: Bagging
  • AdaBoostM1: AdaBoost.M1
  • DIB: Deep Incremental Boosting. Parameters are as follows.
    • n_epochs_after_first: The number of epochs for which to train from the second round onwards
    • freeze_old_layers: true if the layers transferred to the next round are to be frozen (made not trainable)
    • incremental_index: the location where the new layers are to be inserted
    • incremental_layers: a serialized yaml of the layers to be added at each round

Model files

These are standard Keras models, serialised to yaml. Effectively, this is the verbatim output of a model's to_yaml().

You can’t perform that action at this time.