# Stacked Autoencoders
by [Mehdi Mirza](http://www-etud.iro.umontreal.ca/~mirzamom/)

## Introduction
This notebook will show you how to perform layer-wise pre-training using denoising autoencoders (DAEs), and subsequently stack the layers to form a multilayer perceptron (MLP) which can be fine-tuned using supervised training. You can also look at this [more detailed tutorial of training DAEs using Theano](http://deeplearning.net/tutorial/dA.html#daa) as well as [this tutorial](http://deeplearning.net/tutorial/SdA.html#sda) which covers the stacked version.

The methods used here can easily be adapted to other models such as contractive auto-encoders (CAEs) or restricted Boltzmann machines (RBMs) with only small modifications.

## First layer

The first layer and its training algorithm are defined in the file `dae_l1.yaml`. Here we load the model and set some of its hypyerparameters.

In [4]:
layer1_yaml = open('dae_l1.yaml', 'r').read()
hyper_params_l1 = {'train_stop' : 50000,
                   'batch_size' : 100,
                   'monitoring_batches' : 5,
                   'nhid' : 500,
                   'max_epochs' : 10,
                   'save_path' : '.'}
layer1_yaml = layer1_yaml % (hyper_params_l1)
print layer1_yaml

!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.mnist.MNIST {
        which_set: 'train',
        start: 0,
        stop: 50000
    },
    model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
        nvis : 784,
        nhid : 500,
        irange : 0.05,
        corruptor: !obj:pylearn2.corruption.BinomialCorruptor {
            corruption_level: .2,
        },
        act_enc: "tanh",
        act_dec: null,    # Linear activation on the decoder side.
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate : 1e-3,
        batch_size : 100,
        monitoring_batches : 5,
        monitoring_dataset : *train,
        cost : !obj:pylearn2.costs.autoencoder.MeanSquaredReconstructionError {},
        termination_criterion : !obj:pylearn2.termination_criteria.EpochCounter {
            max_epochs: 10,
        },
    },
    save_path: "./dae_l1.pkl",
    save_freq: 1
}



Now we can train the model using the YAML string in the same way as the previous tutorials:

In [5]:
from pylearn2.config import yaml_parse
train = yaml_parse.load(layer1_yaml)
train.main_loop()

Parameter and initial learning rate summary:
	vb: 0.0010000000475
	hb: 0.0010000000475
	W: 0.0010000000475
	Wprime: 0.0010000000475
Compiling sgd_update...
Compiling sgd_update done. Time elapsed: 0.000000 seconds
compiling begin_record_entry...
compiling begin_record_entry done. Time elapsed: 0.000000 seconds
Monitored channels: 
	learning_rate
	monitor_seconds_per_epoch
	objective
Compiling accum...
graph size: 23
Compiling accum done. Time elapsed: 0.000000 seconds
Monitoring step:
	Epochs seen: 0
	Batches seen: 0
	Examples seen: 0
	learning_rate: 0.00100000016391
	monitor_seconds_per_epoch: 0.0
	objective: 85.4375915527
Time this epoch: 1.000000 seconds
Monitoring step:
	Epochs seen: 1
	Batches seen: 500
	Examples seen: 50000
	learning_rate: 0.00100000016391
	monitor_seconds_per_epoch: 1.0
	objective: 29.1613636017
Saving to ./dae_l1.pkl...
Saving to ./dae_l1.pkl done. Time elapsed: 0.000000 seconds
Time this epoch: 1.000000 seconds
Monitoring step:
	Epochs seen: 2
	Batches seen: 1

## Second layer
The second layer takes the output of the first layer as its input. Hence we must first apply the first layer's transformations to the raw data using `datasets.transformer_dataset.TransformerDataset`. This class takes two arguments:

   - `raw`: the raw data
   - `transformer`: a Pylearn2 block that transforms the raw data, which in our case is the `dae_l1.pkl` file from the previous step

To train the second layer, we load the YAML file as before and set the hyperparameters before starting the training loop.

In [6]:
layer2_yaml = open('dae_l2.yaml', 'r').read()
hyper_params_l2 = {'train_stop' : 50000,
                   'batch_size' : 100,
                   'monitoring_batches' : 5,
                   'nvis' : hyper_params_l1['nhid'],
                   'nhid' : 500,
                   'max_epochs' : 10,
                   'save_path' : '.'}
layer2_yaml = layer2_yaml % (hyper_params_l2)
print layer2_yaml

!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.transformer_dataset.TransformerDataset {
        raw: !obj:pylearn2.datasets.mnist.MNIST {
            which_set: 'train',
            start: 0,
            stop: 50000
        },
        transformer: !pkl: "./dae_l1.pkl"
    },
    model: !obj:pylearn2.models.autoencoder.DenoisingAutoencoder {
        nvis : 500,
        nhid : 500,
        irange : 0.05,
        corruptor: !obj:pylearn2.corruption.BinomialCorruptor {
            corruption_level: .3,
        },
        act_enc: "tanh",
        act_dec: null,    # Linear activation on the decoder side.
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate : 1e-3,
        batch_size : 100,
        monitoring_batches : 5,
        monitoring_dataset : *train,
        cost : !obj:pylearn2.costs.autoencoder.MeanSquaredReconstructionError {},
        termination_criterion : !obj:pylearn2.termination_criteria.EpochCounter {
            ma

In [7]:
train = yaml_parse.load(layer2_yaml)
train.main_loop()

Parameter and initial learning rate summary:
	vb: 0.0010000000475
	hb: 0.0010000000475
	W: 0.0010000000475
	Wprime: 0.0010000000475
Compiling sgd_update...
Compiling sgd_update done. Time elapsed: 0.000000 seconds
compiling begin_record_entry...
compiling begin_record_entry done. Time elapsed: 0.000000 seconds
Monitored channels: 
	learning_rate
	monitor_seconds_per_epoch
	objective
Compiling accum...
graph size: 23
Compiling accum done. Time elapsed: 0.000000 seconds
Monitoring step:
	Epochs seen: 0
	Batches seen: 0
	Examples seen: 0
	learning_rate: 0.00100000016391
	monitor_seconds_per_epoch: 0.0
	objective: 51.0506210327
Time this epoch: 1.000000 seconds
Monitoring step:
	Epochs seen: 1
	Batches seen: 500
	Examples seen: 50000
	learning_rate: 0.00100000016391
	monitor_seconds_per_epoch: 1.0
	objective: 20.0142116547
Saving to ./dae_l2.pkl...
Saving to ./dae_l2.pkl done. Time elapsed: 0.000000 seconds
Time this epoch: 1.000000 seconds
Monitoring step:
	Epochs seen: 2
	Batches seen: 1

## Supervised fine-tuning
Now that we have two pre-trained layers, we can stack them to form an MLP which can be trained in a supervised fashion. We use the MLP class as usual for this, except that we now use `models.mlp.PretrainedLayer` for the different layers so that we can pass our pre-trained layers (as pickle files) using the `layer_content` argument.

In [8]:
mlp_yaml = open('dae_mlp.yaml', 'r').read()
hyper_params_mlp = {'train_stop' : 50000,
                    'valid_stop' : 60000,
                    'batch_size' : 100,
                    'max_epochs' : 50,
                    'save_path' : '.'}
mlp_yaml = mlp_yaml % (hyper_params_mlp)
print mlp_yaml

!obj:pylearn2.train.Train {
    dataset: &train !obj:pylearn2.datasets.mnist.MNIST {
        which_set: 'train',
        start: 0,
        stop: 50000
    },
    model: !obj:pylearn2.models.mlp.MLP {
        batch_size: 100,
        layers: [
                 !obj:pylearn2.models.mlp.PretrainedLayer {
                     layer_name: 'h1',
                     layer_content: !pkl: "./dae_l1.pkl"
                 },
                 !obj:pylearn2.models.mlp.PretrainedLayer {
                     layer_name: 'h2',
                     layer_content: !pkl: "./dae_l2.pkl"
                 },
                 !obj:pylearn2.models.mlp.Softmax {
                     max_col_norm: 1.9365,
                     layer_name: 'y',
                     n_classes: 10,
                     irange: .005
                 }
                ],
        nvis: 784
    },
    algorithm: !obj:pylearn2.training_algorithms.sgd.SGD {
        learning_rate: .05,
        learning_rule: !obj:pylearn2.training_algori

train = yaml_parse.load(mlp_yaml)
train.main_loop()