mlp with pretraining #3281

Open
wants to merge 7 commits into
from

Projects

None yet

4 participants

@IssamLaradji
Contributor

This extends the generic multi-layer perceptron (mlp) #3204 with a pre-training capability.

An example file - called mlp_with_pretraining.py - is added to show how pre-training with an RBM is done and how it could improve mlp's performance on the digits dataset.

I got the following results,

  1. Testing accuracy of mlp without pretraining: 0.964
  2. Testing accuracy of mlp with pretraining: 0.978
@coveralls

Coverage Status

Coverage increased (+0.07%) when pulling e10d2d6 on IssamLaradji:mlp-with-pretraining into 31f2e07 on scikit-learn:master.

@IssamLaradji IssamLaradji referenced this pull request Jun 16, 2014
Closed

[MRG] Generic multi layer perceptron #3204

3 of 4 tasks complete
@ogrisel
Member
ogrisel commented Jun 18, 2014

You can make a pipeline with make_pipeline instead of calling fit_transform on the first and passing the results to call fit on the second.

dbn = make_pipeline(
    BernoulliRBM(n_components=40, random_state=random_state,
                 learning_rate=0.01, n_iter=100),
    BernoulliRBM(n_components=10, random_state=random_state,
                 learning_rate=0.01, n_iter=100),
).fit(X_train)

Also true pre-training is done only on X_train rather than the full X.

@ogrisel
Member
ogrisel commented Jun 18, 2014

Also, please put a global random_state=0 at the beginning and use that as argument everywhere. You will see that pre-training does not always help when you try several times with random_state=None. I think this is might be caused by the fact that we have very few samples to train the RBMs. It should be explicitly stressed in the docstring that pre-training does not seem to help in this case as.

@ogrisel
Member
ogrisel commented Jun 18, 2014

@IssamLaradji could work on the newly created #3289 independently? I think it should be pretty quick to merge and can be useful in all the MLP + digits examples.

@IssamLaradji
Contributor

Hi @ogrisel , I updated the code to have RBMs train using make_pipeline. Is this the right, concise way to initialize mlp? It looks quite generic but one thing is hard coded, the RBMs are manually initialized. Maybe I could dynamically create RBMs based on the number of layers and their number of neurons. What do you think?

mlp = MultilayerPerceptronClassifier(
    n_hidden=n_hidden, random_state=random_state)

# Set warm start to true
mlp.warm_start = True

# Train RBMs
dbn = make_pipeline(
    BernoulliRBM(n_components=n_hidden, random_state=random_state,
                 learning_rate=0.001, n_iter=50),
    BernoulliRBM(n_components=10, random_state=random_state,
                 learning_rate=0.001, n_iter=50),
).fit(X_train)

coefficients = [param[1].components_ for param in dbn.steps]
intercepts = [param[1].intercept_hidden_ for param in dbn.steps]

# Assign initial coefficients and intercepts with rbm's parameters
mlp.layers_coef_ = coefficients
mlp.layers_intercept_ = intercepts

Thanks.

@coveralls

Coverage Status

Coverage increased (+0.08%) when pulling 70d8f8c on IssamLaradji:mlp-with-pretraining into 31f2e07 on scikit-learn:master.

@ogrisel
Member
ogrisel commented Jun 24, 2014
coefficients = [param[1].components_ for param in dbn.steps]
intercepts = [param[1].intercept_hidden_ for param in dbn.steps]

# Assign initial coefficients and intercepts with rbm's parameters
mlp.layers_coef_ = coefficients
mlp.layers_intercept_ = intercepts

could be directly written:

# Assign initial coefficients and intercepts with rbm's parameters
mlp.layers_coef_ = [param[1].components_ for param in dbn.steps]
mlp.layers_intercept_ = [param[1].intercept_hidden_ for param in dbn.steps]
@ogrisel
Member
ogrisel commented Jun 24, 2014

And:

mlp = MultilayerPerceptronClassifier(
    n_hidden=n_hidden, random_state=random_state)

# Set warm start to true
mlp.warm_start = True

could be written:

mlp = MultilayerPerceptronClassifier(
    n_hidden=n_hidden, random_state=random_state, warm_start=True)
@IssamLaradji
Contributor

Fixed, thanks :)

@coveralls

Coverage Status

Coverage increased (+0.08%) when pulling 408e0af on IssamLaradji:mlp-with-pretraining into 31f2e07 on scikit-learn:master.

@IssamLaradji
Contributor

Here is a dynamic, generic way of constructing and training RBMs. I am using Pipeline because it allows me to construct the estimators as a list. This might be better than make_pipeline unless it supports list parameters as input. What do you think?

# Cross-validate multi-layer perceptron with rbm pre-training
mlp = MultilayerPerceptronClassifier(
    n_hidden=n_hidden, random_state=random_state, warm_start=True)

# Construct RBMs
layer_sizes = n_hidden + [n_output]
estimators = [('rbm' +str(i), BernoulliRBM(n_components=layer_sizes[i], 
                random_state=random_state, learning_rate=0.001, n_iter=50)) 
                for i in range(len(layer_sizes))]
# Train RBMs
dbn = Pipeline(estimators).fit(X_train)

# Assign initial coefficients and intercepts with rbm's parameters
mlp.layers_coef_ = [param[1].components_ for param in dbn.steps]
mlp.layers_intercept_ = [param[1].intercept_hidden_ for param in dbn.steps]
@IssamLaradji
Contributor

In addition, I could add a list parameter_list whose ith index defines the parameters of the ith rbm. The user can fill the list with the parameters using the keyword argument syntax, **kwargs.

@IssamLaradji
Contributor

I think the parameter_list is an overkill. I believe the user should only need to set the estimators list manually which can be given as input to the warm_start parameter. The rest should be automated.

@ogrisel ogrisel and 1 other commented on an outdated diff Jun 27, 2014
examples/neural_network/mlp_with_pretraining.py
@@ -0,0 +1,68 @@
+"""
+=======================================================================
+Pre-training Multi-layer Perceptron using Restricted Boltzmann Machines
+=======================================================================
+
+This compares the performance of multi-layer perceptron (MLP) with and without
+pre-training using Restricted Boltzmann Machines. MLP without pre-training
+initializes the coefficient and intercept components using scaled, random
+distribution. With pre-training, an RBM trains on the dataset and the resultant
+parameters are given to MLP as initial coefficient and intercept parameters.
+This example justifies the hypothesis that pre-training allows MLP to converge
+in a better local minima.
@ogrisel
ogrisel Jun 27, 2014 Member

Have you tried to change the random_state to other values. Is pre-training always beneficial on this dataset?

I would add a remark that pre-training can be beneficial when the training set is small which is the case here but is generally considered useless when the (labeled) training set grows large.

@IssamLaradji
IssamLaradji Jun 27, 2014 Contributor

@ogrisel , in 7 out of 10 random states, MLP with pre-training performed better - so it is not always beneficial.

I changed the wording to state that, as the labeled training set grows larger the less pre-training is beneficial. Moreover, it can even result in a performance less than if the weights were randomly initialized.

@GaelVaroquaux
Member

The user can fill the list with the parameters using the keyword argument syntax, **kwargs.

I personally strive to avoid such function signatures. They make code
that is not autodocumented (the function signature is not helpful to
understand the function) and has weak error-catching behavior (a typo in
an argument name will not raise an error message in the right part of the
codebase, and in the worst case will be ignored). Finally, they are not
great for further extension of the API, as it often happen that in the
evolution of an API, the **kwargs get routed to several sub-routines.

If you really need a fairly generic recipient for arguments to
initialization of the objects, I'd rather have a dictionary. But in
general I favor passing an object, rather than a dictionary of parameters
to initialize the corresponding object.

@IssamLaradji
Contributor

@GaelVaroquaux you are absolutely right. I realized that a while after I suggested **kwargs. It is too low level for the user. Like you said, passing an object is best since the user wouldn't have to dig too deep to know the syntax of initializing pre-training parameters.

@IssamLaradji
Contributor

So, for pre-training, I allowed warm_start to accept a list of unsupervised learning objects - such as RBMs (please see the example file, mlp_with_pretraining.py). I also added a method _pretraining in mlp that trains the rbms list, which initialize their corresponding layer weights. This might be a clean implementation.

Thanks.

@coveralls

Coverage Status

Coverage increased (+0.07%) when pulling 7cd49c4 on IssamLaradji:mlp-with-pretraining into 31f2e07 on scikit-learn:master.

@coveralls

Coverage Status

Coverage increased (+0.06%) when pulling 46ae255 on IssamLaradji:mlp-with-pretraining into 6d8ccbc on scikit-learn:master.

@IssamLaradji IssamLaradji reopened this Aug 16, 2014
@coveralls

Coverage Status

Coverage increased (+0.06%) when pulling 46ae255 on IssamLaradji:mlp-with-pretraining into 6d8ccbc on scikit-learn:master.

@coveralls

Coverage Status

Coverage increased (+0.06%) when pulling 4261df1 on IssamLaradji:mlp-with-pretraining into 6d8ccbc on scikit-learn:master.

@ogrisel
Member
ogrisel commented Aug 21, 2014

This PR creates the examples/neural_network folder while there already exists a examples/neural_networks folder. Please reuse the existing folder instead.

@ogrisel
Member
ogrisel commented Aug 21, 2014

This PR creates the examples/neural_network folder while there already exists a examples/neural_networks folder. Please reuse the existing folder instead.

Actually this should be fixed in the parent #3204 PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment