Skip to content

Loading…

[MRG] GSoC 2014: Standard Extreme Learning Machines #3306

Open
wants to merge 2 commits into from
@IssamLaradji

Finished implementing the standard extreme learning machines (ELMs). I am getting the following results with 550 hidden neurons against the digits datasets,

Training accuracy using the logistic activation function: 0.999444
Training accuracy using the tanh activation function: 1.000000

Fortunately, this algorithm is much easier to implement and debug than multi-layer perceptron :).
I will push a test file soon.

@ogrisel , @larsmans

@coveralls

Coverage Status

Coverage increased (+0.0%) when pulling e5e363d on IssamLaradji:Extreme-Learning-Machines into 68b0a28 on scikit-learn:master.

@sveitser sveitser referenced this pull request in Shippable/support
Closed

Display Code Coverage on Github Pull Request Page #239

@larsmans larsmans commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((109 lines not shown))
+ A += self.intercept_hidden_
+
+ Z = self._activation_func(A)
+
+ return Z
+
+ def fit(self, X, y):
+ """Fit the model to the data X and target y.
+
+ Parameters
+ ----------
+ X : {array-like, sparse matrix}, shape (n_samples, n_features)
+ Training data, where n_samples in the number of samples
+ and n_features is the number of features.
+
+ y : numpy array of shape (n_samples)
@larsmans scikit-learn member

y should be an "array-like" and be validated as such.

Thanks for bringing this up. I made the changes in multi-layer perceptron as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@NelleV NelleV commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((149 lines not shown))
+ Parameters
+ ----------
+ X : {array-like, sparse matrix}, shape (n_samples, n_features)
+
+ Returns
+ -------
+ array, shape (n_samples)
+ Predicted target values per element in X.
+ """
+ X = atleast2d_or_csr(X)
+
+ self.hidden_activations_ = self._get_hidden_activations(X)
+ output = safe_sparse_dot(self.hidden_activations_, self.coef_output_)
+
+ return output
+
@NelleV scikit-learn member
NelleV added a note

There should be an extra blank line here. Can you run pep8 on the file and check for pep8 compliance ?

Thanks - I had problems with pep8 auto formatting in sublime text, I fixed it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@IssamLaradji

Hi, I am wondering what verbose extreme learning machines should display. Any ideas ?

Thanks

@IssamLaradji

Travis is acting strange, in that it raises an error for test_multilabel_classification(), although, in my local machine, the test_multilabel_classification() method in test_elm runs correctly with 1000 different seeds. Also, the pull request passed the local test after executing make test on the whole library.

Is there a chance that Travis uses libraries different (or a modified version) from the local for testing ?

@IssamLaradji IssamLaradji referenced this pull request
Closed

[MRG] Generic multi layer perceptron #3204

3 of 4 tasks complete
@arjoly
scikit-learn member

This might be worth having a look at https://github.com/dclambert/Python-ELM.

@larsmans
scikit-learn member

Training squared error loss would seem appropriate for verbose output. Not every estimator has verbose output, though (naive Bayes doesn't because it runs instantly on typical problem sizes).

@coveralls

Coverage Status

Coverage increased (+0.07%) when pulling 2be2941 on IssamLaradji:Extreme-Learning-Machines into 68b0a28 on scikit-learn:master.

@IssamLaradji

Thanks, displaying the training error as verbose is such a useful idea.

@ogrisel
scikit-learn member

However, Travis raises an error for test_multilabel_classification(). Is there a chance that Travis uses libraries different (or a modified version) from the local for testing ?

The version of numpy / scipy used by the various travis workers are given in the environment variable of each build. You can see the exact setup in:

@IssamLaradji

@ogrisel thanks I will dig deeper to see where multi-label classification is being affected.

@IssamLaradji

Hi guys, I implemented weighted and regularized ELMs - here are their awesome results on the imbalanced dataset. :) :)

Non-Regularized ELMs (Large C)
non_regularized_elm

Regularized ELMs (Small C)
regularized_elm

@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((174 lines not shown))
+
+ self._init_random_weights()
+
+ H_tmp = self._get_hidden_activations(X)
+
+ if self.class_weight != None:
+ # compute weighted output coefficients using eq. 12 in
+ # reference [1]
+ W = self._set_weights(y, n_samples)
+ H = safe_sparse_dot(H_tmp.T, W).T
+ else:
+ H = H_tmp
+
+ # compute regularized output coefficients using eq. 3 in reference [1]
+ left_part = pinv2(
+ safe_sparse_dot(H.T, H_tmp) + identity(self.n_hidden) / self.C)
@agramfort scikit-learn member

you should use ridge implementation here.

Hi @agramfort , isn't this technically ridge regression? I am minimizing the L2 norm of the coefficients in the objective function - like in the equation below. Or do you mean I should use scikit-learn implementation of ridge ? Thanks.

l_elm

@agramfort scikit-learn member

this does not look like ridge but you seem to do

(H'H + 1/C Id)^{-1} H'

and this is really a ridge solution where H is X and y is y and C = 1/alpha

Sorry, the equation I gave is for weighted ELMs as it contains the weight term W which is not part of ridge. However, the implementation contains both versions - with W and without W.
The version without W computes the formulae you mentioned, (H'H + 1/C Id)^{-1} H'y.
Thanks.

@agramfort scikit-learn member

without w then it is a ridge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((330 lines not shown))
+ ----------
+ X : {array-like, sparse matrix}, shape (n_samples, n_features)
+
+ Returns
+ -------
+ array, shape (n_samples, n_outputs)
+ Returns the probability of the sample for each class in the model,
+ where classes are ordered as they are in `self.classes_`.
+ """
+ scores = self.decision_function(X)
+
+ if len(self.classes_) == 2:
+ scores = logistic_sigmoid(scores)
+ return np.hstack([1 - scores, scores])
+ else:
+ return _softmax(scores)
@agramfort scikit-learn member

getting a proba here seems like a hack unless you use a log reg on top of your hidden features. Not a ridge.

I see, so to get proper probabilities I should use the regular least square solutions ||Ax - B||^2 without minimizing the norm of the coefficients?
Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@IssamLaradji

Pushed a lot of improvements.
1) Added sequential ELM support - with partial_fit
2) Added relevant tests for sequential ELM and weighted ELM

Created two examples.
1) Weighted ELM plot
plot_weighted

2) Training vs. Testing with respect to hidden neurons
plot_testing_training

Will be leaving the documentation till the end - after I implement the remaining part which is kernel support and after the code is reviewed. Thanks.

@agramfort agramfort commented on an outdated diff
examples/neural_networks/plot_weighted_elm.py
((54 lines not shown))
+
+clf_weightless = ELMClassifier(n_hidden=n_hidden, class_weight=None)
+clf_weightless.fit(X, Y)
+
+clf_weight_auto = ELMClassifier(n_hidden=n_hidden, class_weight='auto')
+clf_weight_auto.fit(X, Y)
+
+clf_weight_1000 = ELMClassifier(n_hidden=n_hidden, class_weight={1: 1000})
+clf_weight_1000.fit(X, Y)
+
+fig, axes = plt.subplots(1, 3, figsize=(20, 7))
+
+plot_decision_function(
+ clf_weightless, axes[0], 'ELM(class_weight=None, C=10e5)')
+plot_decision_function(
+ clf_weight_auto, axes[1], 'ELM(class_weight=\'auto\', C=10e5)')
@agramfort scikit-learn member

rather than using \' use " to define the string : 'ELM(class_weight="auto", C=10e5)'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((21 lines not shown))
+from ..utils import check_random_state, atleast2d_or_csr
+from ..utils.extmath import safe_sparse_dot
+from ..utils.fixes import expit as logistic_sigmoid
+
+
+def _identity(X):
+ """Return the same input array."""
+ return X
+
+
+def _tanh(X):
+ """Compute the hyperbolic tan function
+
+ Parameters
+ ----------
+ X : {array-like, sparse matrix}, shape (n_samples, n_features)
@agramfort scikit-learn member

param desc are missing here and below

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
@@ -0,0 +1,680 @@
+"""Extreme Learning Machines
+"""
+
+# Author: Issam H. Laradji <issam.laradji@gmail.com>
+# Licence: BSD 3 clause
+
+from abc import ABCMeta, abstractmethod
+
+import numpy as np
+
+from numpy import diag
@agramfort scikit-learn member

why not using np.diag ? that's what we usually do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((41 lines not shown))
+ """
+ return np.tanh(X, X)
+
+
+def _softmax(Z):
+ """Compute the K-way softmax, (exp(Z).T / exp(Z).sum(axis=1)).T
+
+ Parameters
+ ----------
+ X : {array-like, sparse matrix}, shape (n_samples, n_features)
+
+ Returns
+ -------
+ X_new : {array-like, sparse matrix}, shape (n_samples, n_features)
+ """
+ exp_Z = np.exp(Z.T - Z.max(axis=1)).T
@agramfort scikit-learn member

rather than using these T twice use newaxis:

np.exp(Z - Z.max(axis=1)[:, np.newaxis])

same below in return statement. It's more readable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((60 lines not shown))
+class BaseELM(six.with_metaclass(ABCMeta, BaseEstimator)):
+
+ """Base class for ELM classification and regression.
+
+ Warning: This class should not be used directly.
+ Use derived classes instead.
+ """
+ _activation_functions = {
+ 'tanh': _tanh,
+ 'logistic': logistic_sigmoid
+ }
+
+ @abstractmethod
+ def __init__(
+ self, n_hidden, activation, algorithm, C, class_weight, batch_size,
+ verbose, random_state):
@agramfort scikit-learn member
    def __init__(self, n_hidden, activation, algorithm, C, class_weight,
                 batch_size, verbose, random_state):

is more standard indentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((136 lines not shown))
+ class_weight = {}
+
+ for class_ in np.unique(y_original):
+ class_size = len(np.where(y_original == class_)[0])
+ class_weight[class_] = 0.618 / class_size
+ else:
+ class_weight = dict(self.class_weight)
+
+ for class_ in self.classes_:
+ indices = np.where(y_original == class_)[0]
+ if class_ in class_weight.keys():
+ diagonals[indices] = class_weight[class_]
+ else:
+ diagonals[indices] = 1
+
+ return diag(diagonals)
@agramfort scikit-learn member

do you really need to allocate a full dense matrix? I doubt it.

Fixed :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((162 lines not shown))
+ """Compute the least-square solutions for the whole dataset."""
+ H = self._get_hidden_activations(X)
+
+ # compute output coefficients by evaluating
+ # (ZH + identity/C)^{-1}Zy
+ if self.class_weight != None:
+ # set Z = H'W for weighted ELM
+ W = self._assign_weights(y)
+ Z = safe_sparse_dot(H.T, W)
+ else:
+ # set Z = H' for ELM
+ Z = H.T
+
+ left_part = safe_sparse_dot(Z, H) + identity(self.n_hidden) / self.C
+ right_part = safe_sparse_dot(Z, y)
+ self.coef_output_ = safe_sparse_dot(pinv2(left_part), right_part)
@agramfort scikit-learn member

solving linear system with a pinv2 is never recommended due to numerical errors. I am pretty sure there is something better to do it.

I am now solving the system using scipy.linalg.solve, is this more efficient? Thanks
It takes around half the computation time as well. :)

@agramfort scikit-learn member

Oh yes, I am using what ridge is using. I will try creating a ridgeClassifier object to compute elm solutions. Thanks.

So, I am reusing ridge.ridge_regression, which makes the code much cleaner :). But I don't think I can reuse it for sequential elm since the equation is fundamentally different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((379 lines not shown))
+ References
+ ----------
+ Zong, Weiwei, Guang-Bin Huang, and Yiqiang Chen.
+ "Weighted extreme learning machine for imbalance learning."
+ Neurocomputing 101 (2013): 229-242.
+
+ Liang, Nan-Ying, et al.
+ "A fast and accurate online sequential learning algorithm for
+ feedforward networks." Neural Networks, IEEE Transactions on
+ 17.6 (2006): 1411-1423.
+ http://www.ntu.edu.sg/home/egbhuang/pdf/OS-ELM-TNN.pdf
+ """
+
+ def __init__(
+ self, n_hidden=500, activation='tanh', algorithm='regular', C=10e5,
+ class_weight=None, batch_size=200, verbose=False, random_state=None):
@agramfort scikit-learn member

same remark on indent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((528 lines not shown))
+
+ y = column_or_1d(y, warn=True)
+
+ y = self._lbin.fit_transform(y)
+ super(ELMClassifier, self).partial_fit(X, y)
+
+ return self
+
+
+class ELMRegressor(BaseELM, RegressorMixin):
+
+ """Extreme learning machines regressor.
+
+ The algorithm trains a single-hidden layer feedforward network by computing
+ the hidden layer values using randomized parameters, then solving
+ for the output weights using least-square solutions.
@agramfort scikit-learn member

this description is the same as for ELMClassifier is it normal?

No, sorry. The difference is that ELMClassifier has an output gate function that converts continuous values to integers. Will change it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((593 lines not shown))
+ ----------
+ Zong, Weiwei, Guang-Bin Huang, and Yiqiang Chen.
+ "Weighted extreme learning machine for imbalance learning."
+ Neurocomputing 101 (2013): 229-242.
+
+ Liang, Nan-Ying, et al.
+ "A fast and accurate online sequential learning algorithm for
+ feedforward networks." Neural Networks, IEEE Transactions on
+ 17.6 (2006): 1411-1423.
+ http://www.ntu.edu.sg/home/egbhuang/pdf/OS-ELM-TNN.pdf
+ """
+
+ def __init__(
+ self, n_hidden=100, activation='tanh', algorithm='regular',
+ batch_size=200, C=10e5, verbose=False, random_state=None):
+ class_weight = None
@agramfort scikit-learn member

indent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@IssamLaradji

@agramfort thanks for your comments. I pushed the updated code.

@IssamLaradji

Updates,
1) ELM is now using ridge-regression as off-the-shelf solver to compute its solutions.
2) Added support for kernels - linear, poly, rbf, sigmoid.
Is there a way we could reuse the fast, efficient SVM kernel methods?
Thanks.

@larsmans
scikit-learn member

There are kernels in sklearn.metrics. The ones in sklearn.svm are buried deep down in the C++ code for LibSVM.

@agramfort agramfort commented on an outdated diff
doc/modules/neural_networks_supervised.rst
@@ -0,0 +1,330 @@
+.. _neural_network:
+
+==================================
+Neural network models (supervised)
+==================================
+
+.. currentmodule:: sklearn.neural_network
+
+
+.. _multilayer_perceptron:
+
+Multi-layer Perceptron
+======================
@agramfort scikit-learn member

why is the doc for MLP in this ELM PR?

Hi, I removed it, it was added by accident in the last push :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@IssamLaradji

Thanks! reusing scikit-learn kernels made the code much cleaner.

@agramfort agramfort commented on an outdated diff
examples/neural_networks/plot_elm_training_vs_testing.py
((7 lines not shown))
+neurons. The more hidden neurons the less the training error, which eventually
+reaches zero. However, testing error does not necessarily decrease as having
+more hidden neurons than necessary would cause overfitting on the data.
+
+"""
+print(__doc__)
+
+# Author: Issam H. Laradji <issam.laradji@gmail.com>
+# License: BSD 3 clause
+
+import numpy as np
+import matplotlib.pyplot as plt
+import random
+
+from sklearn import cross_validation
+from sklearn.datasets import load_digits, fetch_mldata
@agramfort scikit-learn member

'load_digits' imported but unused

run pyflakes on your files

@agramfort scikit-learn member

do we really need to fetch the full MNIST to illustrate this? it makes it impossible to run on crappy internet connection like mine now :(

@agramfort scikit-learn member

do we really need to fetch the full MNIST to illustrate this? it makes it impossible to run on crappy internet connection like mine now :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
examples/neural_networks/plot_weighted_elm.py
((37 lines not shown))
+ axis.scatter(X[:, 0], X[:, 1], s=30, c=Y, cmap=plt.cm.Paired)
+ axis.axis('off')
+ axis.set_title(title)
+
+
+# we create 40 separable points
+rng = np.random.RandomState(0)
+n_samples_1 = 1000
+n_samples_2 = 100
+X = np.r_[1.5 * rng.randn(n_samples_1, 2),
+ 0.5 * rng.randn(n_samples_2, 2) + [2, 2]]
+Y = [0] * (n_samples_1) + [1] * (n_samples_2)
+
+
+# for reference, first fit without class weights
+# fit the model
@agramfort scikit-learn member

why 2 lines of comments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
examples/neural_networks/plot_weighted_elm.py
((50 lines not shown))
+
+# for reference, first fit without class weights
+# fit the model
+n_hidden = 100
+
+clf_weightless = ELMClassifier(n_hidden=n_hidden, C=10e5, class_weight=None)
+clf_weightless.fit(X, Y)
+
+clf_weight_auto = ELMClassifier(n_hidden=n_hidden, C=10e5, class_weight='auto')
+clf_weight_auto.fit(X, Y)
+
+clf_weight_1000 = ELMClassifier(n_hidden=n_hidden, C=10e5,
+ class_weight={1: 1000})
+clf_weight_1000.fit(X, Y)
+
+fig, axes = plt.subplots(1, 3, figsize=(20, 7))
@agramfort scikit-learn member

this figsize is way to big for the doc. Make it not bigger than 10 inches wide and remove empty spaces with plt.subplots_adjust

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
examples/neural_networks/plot_elm_training_vs_testing.py
((16 lines not shown))
+
+import numpy as np
+import matplotlib.pyplot as plt
+import random
+
+from sklearn import cross_validation
+from sklearn.datasets import load_digits, fetch_mldata
+from sklearn.neural_network import ELMClassifier
+
+np.random.seed(0)
+
+# Generate sample data
+mnist = fetch_mldata('MNIST original')
+X, y = mnist.data, mnist.target
+
+indices = np.array(random.sample(range(70000), 2000))
@agramfort scikit-learn member

use np.random and get rid of the random from standard lib

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
examples/neural_networks/plot_elm_training_vs_testing.py
((18 lines not shown))
+import matplotlib.pyplot as plt
+import random
+
+from sklearn import cross_validation
+from sklearn.datasets import load_digits, fetch_mldata
+from sklearn.neural_network import ELMClassifier
+
+np.random.seed(0)
+
+# Generate sample data
+mnist = fetch_mldata('MNIST original')
+X, y = mnist.data, mnist.target
+
+indices = np.array(random.sample(range(70000), 2000))
+X, y = X[indices].astype('float64'), y[indices]
+X /= 255
@agramfort scikit-learn member

255.

it's a float

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
examples/neural_networks/plot_weighted_elm.py
((5 lines not shown))
+
+Plot decision functions of extreme learning machines with different class
+weights. Assigning larger weight to a class will push the decision function
+away from that class to have more of its samples correctly classified.
+Such scheme is useful for imbalanced data so that underrepresented classes
+are emphasized and therefore not ignored by the classifier.
+
+"""
+print(__doc__)
+
+# Author: Issam H. Laradji <issam.laradji@gmail.com>
+# License: BSD 3 clause
+
+import numpy as np
+import matplotlib.pyplot as plt
+from sklearn import svm
@agramfort scikit-learn member

svm unused

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((18 lines not shown))
+from ..linear_model import ridge
+from ..utils import gen_even_slices
+from ..utils import atleast2d_or_csr, check_random_state, column_or_1d
+from ..utils import check_random_state, atleast2d_or_csr
+from ..utils.extmath import safe_sparse_dot
+from ..utils.fixes import expit as logistic_sigmoid
+
+
+def _tanh(X):
+ """Compute the hyperbolic tan function."""
+ return np.tanh(X, X)
+
+
+def _softmax(Z):
+ """Compute the K-way softmax function. """
+ exp_Z = np.exp(Z - Z.max(axis=1)[:, np.newaxis])
@agramfort scikit-learn member

can you use Z to store the output of np.exp ? like you did for tanh above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((20 lines not shown))
+from ..utils import atleast2d_or_csr, check_random_state, column_or_1d
+from ..utils import check_random_state, atleast2d_or_csr
+from ..utils.extmath import safe_sparse_dot
+from ..utils.fixes import expit as logistic_sigmoid
+
+
+def _tanh(X):
+ """Compute the hyperbolic tan function."""
+ return np.tanh(X, X)
+
+
+def _softmax(Z):
+ """Compute the K-way softmax function. """
+ exp_Z = np.exp(Z - Z.max(axis=1)[:, np.newaxis])
+
+ return (exp_Z / exp_Z.sum(axis=1)[:, np.newaxis])
@agramfort scikit-learn member

and do this division inplace with a /=

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((95 lines not shown))
+ self._activation_func = self._activation_functions[self.activation]
+
+ if (self.kernel in ['poly', 'rbf', 'sigmoid']) and (self.gamma == 0):
+ # if custom gamma is not provided ...
+ self.gamma = 1.0 / self._n_features
+
+ if self.kernel != 'random':
+ self._X_train = X
+
+ def _scaled_weight_init(self, fan_in, fan_out):
+ """Scale the initial, random parameters for a specific layer."""
+ if self.activation == 'tanh':
+ interval = np.sqrt(6. / (fan_in + fan_out))
+
+ elif self.activation == 'logistic':
+ interval = 4. * np.sqrt(6. / (fan_in + fan_out))
@agramfort scikit-learn member

where are these numbers coming from? please point to the paper

Hi @agramfort , I am using the scaling scheme given here, [Xavier10] http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_GlorotB10.pdf
it is giving me better results than otherwise - cleaner plots and usually higher score.

Since I am adding the reLU activation function, I will add an else clause that sets the interval as 1./np.sqrt(n_features), a popular initialization method as claimed by the paper. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((194 lines not shown))
+ H = self._get_hidden_activations(X_batch)
+
+ if self._K is None:
+ # initialize K and coef_output_
+ self._K = safe_sparse_dot(H.T, H)
+ y_ = safe_sparse_dot(H.T, y_batch)
+
+ self.coef_output_ = ridge.ridge_regression(self._K, y_,
+ 1.0 / self.C).T
+ else:
+ self._K += safe_sparse_dot(H.T, H)
+ H_updated = safe_sparse_dot(H, self.coef_output_)
+ y_ = safe_sparse_dot(H.T, (y_batch - H_updated))
+
+ self.coef_output_ += ridge.ridge_regression(self._K, y_,
+ 1.0 / self.C).T
@agramfort scikit-learn member

put this line outside of the if and remove this call above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((227 lines not shown))
+ X = atleast2d_or_csr(X)
+
+ self._validate_params()
+
+ n_samples, self._n_features = X.shape
+ self.n_outputs_ = y.shape[1]
+ self._init_param(X)
+
+ if self.algorithm == 'standard':
+ # compute the least-square solutions for the whole dataset
+ self._solve_lsqr(X, y)
+
+ elif self.algorithm == 'sequential':
+ # compute the least-square solutions in batches
+ batch_size = np.clip(self.batch_size, 0, n_samples)
+ n_batches = int(n_samples / batch_size)
@agramfort scikit-learn member

n_samples // batch_size

to force integer division

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((239 lines not shown))
+ elif self.algorithm == 'sequential':
+ # compute the least-square solutions in batches
+ batch_size = np.clip(self.batch_size, 0, n_samples)
+ n_batches = int(n_samples / batch_size)
+ batch_slices = list(gen_even_slices(n_batches * batch_size,
+ n_batches))
+ self._K = None
+
+ for batch, batch_slice in enumerate(batch_slices):
+ self._sequential_solve_procedure(X[batch_slice],
+ y[batch_slice])
+
+ if self.verbose:
+ # compute training square error
+ cost = np.sum((y[batch_slice] - self.decision_function(
+ X[batch_slice])) ** 2) / (2 * batch_size)
@agramfort scikit-learn member
  1. * batch_size
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((242 lines not shown))
+ n_batches = int(n_samples / batch_size)
+ batch_slices = list(gen_even_slices(n_batches * batch_size,
+ n_batches))
+ self._K = None
+
+ for batch, batch_slice in enumerate(batch_slices):
+ self._sequential_solve_procedure(X[batch_slice],
+ y[batch_slice])
+
+ if self.verbose:
+ # compute training square error
+ cost = np.sum((y[batch_slice] - self.decision_function(
+ X[batch_slice])) ** 2) / (2 * batch_size)
+
+ print("Training square error for batch %d = %f" %
+ (batch, cost))
@agramfort scikit-learn member

bad indent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((247 lines not shown))
+ for batch, batch_slice in enumerate(batch_slices):
+ self._sequential_solve_procedure(X[batch_slice],
+ y[batch_slice])
+
+ if self.verbose:
+ # compute training square error
+ cost = np.sum((y[batch_slice] - self.decision_function(
+ X[batch_slice])) ** 2) / (2 * batch_size)
+
+ print("Training square error for batch %d = %f" %
+ (batch, cost))
+
+ if self.verbose:
+ # compute training square error
+ cost = (np.sum((y - self.decision_function(X)) ** 2) /
+ (2 * n_samples))
@agramfort scikit-learn member

bad indent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((248 lines not shown))
+ self._sequential_solve_procedure(X[batch_slice],
+ y[batch_slice])
+
+ if self.verbose:
+ # compute training square error
+ cost = np.sum((y[batch_slice] - self.decision_function(
+ X[batch_slice])) ** 2) / (2 * batch_size)
+
+ print("Training square error for batch %d = %f" %
+ (batch, cost))
+
+ if self.verbose:
+ # compute training square error
+ cost = (np.sum((y - self.decision_function(X)) ** 2) /
+ (2 * n_samples))
+ print("Training square error for the dataset = %f" % (cost))
@agramfort scikit-learn member

move () around cost

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((253 lines not shown))
+ cost = np.sum((y[batch_slice] - self.decision_function(
+ X[batch_slice])) ** 2) / (2 * batch_size)
+
+ print("Training square error for batch %d = %f" %
+ (batch, cost))
+
+ if self.verbose:
+ # compute training square error
+ cost = (np.sum((y - self.decision_function(X)) ** 2) /
+ (2 * n_samples))
+ print("Training square error for the dataset = %f" % (cost))
+
+ return self
+
+ def decision_function(self, X):
+ """Fit the model to the data X and target y.
@agramfort scikit-learn member

bad docstring

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((281 lines not shown))
+ X = atleast2d_or_csr(X)
+
+ self.hidden_activations_ = self._get_hidden_activations(X)
+ output = safe_sparse_dot(self.hidden_activations_, self.coef_output_)
+
+ return output
+
+ def partial_fit(self, X, y):
+ """Fit the model to the data X and target y.
+
+ Parameters
+ ----------
+ X : {array-like, sparse matrix}, shape (n_samples, n_features)
+ Subset of training data.
+
+ y : array-like, shape (n_samples)
@agramfort scikit-learn member

(n_samples) -> (n_samples,)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((286 lines not shown))
+ return output
+
+ def partial_fit(self, X, y):
+ """Fit the model to the data X and target y.
+
+ Parameters
+ ----------
+ X : {array-like, sparse matrix}, shape (n_samples, n_features)
+ Subset of training data.
+
+ y : array-like, shape (n_samples)
+ Subset of target values.
+
+ Returns
+ -------
+ self : returns an instance of self.
@agramfort scikit-learn member

make the formatting of return self consistent

@ogrisel scikit-learn member
ogrisel added a note

Also it's better to explain the motivation rather than stating a tautology:

self : return the estimator itself to chain a call to the predict method for instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((9 lines not shown))
+import numpy as np
+from scipy import linalg
+from scipy.sparse import identity
+
+from ..base import BaseEstimator, ClassifierMixin, RegressorMixin
+from ..externals import six
+from ..preprocessing import LabelBinarizer
+from ..metrics.pairwise import linear_kernel, polynomial_kernel, rbf_kernel
+from ..metrics.pairwise import sigmoid_kernel
+from ..linear_model import ridge
+from ..utils import gen_even_slices
+from ..utils import atleast2d_or_csr, check_random_state, column_or_1d
+from ..utils import check_random_state, atleast2d_or_csr
+from ..utils.extmath import safe_sparse_dot
+from ..utils.fixes import expit as logistic_sigmoid
+
@agramfort scikit-learn member

run pyflakes:

sklearn/neural_network/extreme_learning_machines.py:10 'linalg' imported but unused
sklearn/neural_network/extreme_learning_machines.py:11
  'identity' imported but unused
sklearn/neural_network/extreme_learning_machines.py:21
  redefinition of unused 'check_random_state' from line 20
sklearn/neural_network/extreme_learning_machines.py:21
  redefinition of unused 'atleast2d_or_csr' from line 20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((500 lines not shown))
+ """Return the log of probability estimates.
+
+ Parameters
+ ----------
+ X : array-like, shape (n_samples, n_features)
+ Data, where n_samples is the number of samples
+ and n_features is the number of features.
+
+ Returns
+ -------
+ T : array-like, shape (n_samples, n_outputs)
+ Returns the log-probability of the sample for each class in the
+ model, where classes are ordered as they are in
+ `self.classes_`. Equivalent to log(predict_proba(X))
+ """
+ return np.log(self.predict_proba(X))
@agramfort scikit-learn member

use a temp var and apply the log inplace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((553 lines not shown))
+ self.classes_ = classes
+
+ if not hasattr(self, '_lbin'):
+ self._lbin = LabelBinarizer()
+ self._lbin._classes = classes
+
+ y = column_or_1d(y, warn=True)
+
+ y = self._lbin.fit_transform(y)
+ super(ELMClassifier, self).partial_fit(X, y)
+
+ return self
+
+
+class ELMRegressor(BaseELM, RegressorMixin):
+
@agramfort scikit-learn member

remove empty line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((704 lines not shown))
+ Predicted target values per element in X.
+ """
+ X = atleast2d_or_csr(X)
+
+ return self.decision_function(X)
+
+ def partial_fit(self, X, y):
+ """Fit the model to the data X and target y.
+
+ Parameters
+ ----------
+ X : {array-like, sparse matrix}, shape (n_samples, n_features)
+ Training data, where n_samples in the number of samples
+ and n_features is the number of features.
+
+ y : array-like, shape (n_samples)
@agramfort scikit-learn member

(n_samples) -> (n_samples,)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((714 lines not shown))
+ ----------
+ X : {array-like, sparse matrix}, shape (n_samples, n_features)
+ Training data, where n_samples in the number of samples
+ and n_features is the number of features.
+
+ y : array-like, shape (n_samples)
+ Subset of the target values.
+
+ Returns
+ -------
+ self
+ """
+ y = np.atleast_1d(y)
+
+ if y.ndim == 1:
+ y = np.reshape(y, (-1, 1))
@agramfort scikit-learn member

doc says ndim has to be 1 so force it to be the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/tests/test_elm.py
((67 lines not shown))
+ y_train = y[:150]
+ X_test = X[150:]
+
+ expected_shape_dtype = (X_test.shape[0], y_train.dtype.kind)
+
+ for activation in ACTIVATION_TYPES:
+ elm = ELMClassifier(n_hidden=50, activation=activation,
+ random_state=random_state)
+ elm.fit(X_train, y_train)
+
+ y_predict = elm.predict(X_test)
+ assert_greater(elm.score(X_train, y_train), 0.95)
+ assert_equal(
+ (y_predict.shape[0],
+ y_predict.dtype.kind),
+ expected_shape_dtype)
@agramfort scikit-learn member

indent looks weird

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/tests/test_elm.py
((179 lines not shown))
+ assert_raises(ValueError, clf(algorithm='standard').partial_fit, X, y)
+
+ elm = clf(algorithm='sequential')
+ elm.partial_fit(X, y, classes=[0, 1])
+ # different classes passed
+ assert_raises(ValueError, elm.partial_fit, X, y, classes=[0, 1, 2])
+
+
+def test_partial_fit_classification():
+ """
+ Test that partial_fit yields same results as 'fit'
+ for binary- and multi-class classification.
+ """
+ for X, y in classification_datasets:
+ X = X
+ y = y
@agramfort scikit-learn member

??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/tests/test_elm.py
((190 lines not shown))
+ for binary- and multi-class classification.
+ """
+ for X, y in classification_datasets:
+ X = X
+ y = y
+ batch_size = 200
+ n_samples = X.shape[0]
+
+ elm = ELMClassifier(algorithm='sequential', random_state=random_state,
+ batch_size=batch_size)
+ elm.fit(X, y)
+ pred1 = elm.predict(X)
+
+ elm = ELMClassifier(algorithm='sequential', random_state=random_state)
+
+ n_batches = int(n_samples / batch_size)
@agramfort scikit-learn member

//

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/tests/test_elm.py
((222 lines not shown))
+ """
+ X = Xboston
+ y = yboston
+ batch_size = 100
+ n_samples = X.shape[0]
+
+ for activation in ACTIVATION_TYPES:
+ elm = ELMRegressor(algorithm='sequential', random_state=random_state,
+ activation=activation, batch_size=batch_size)
+ elm.fit(X, y)
+ pred1 = elm.predict(X)
+
+ elm = ELMRegressor(algorithm='sequential', activation=activation,
+ random_state=random_state)
+
+ n_batches = int(n_samples / batch_size)
@agramfort scikit-learn member

//

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort agramfort commented on an outdated diff
sklearn/neural_network/tests/test_elm.py
((351 lines not shown))
+ elm_weightless = ELMClassifier(n_hidden=n_hidden,
+ class_weight=None,
+ random_state=random_state)
+ elm_weightless.fit(X_train, y_train)
+
+ elm_weight_auto = ELMClassifier(n_hidden=n_hidden,
+ class_weight='auto',
+ random_state=random_state)
+ elm_weight_auto.fit(X_train, y_train)
+
+ score_weightless = roc_auc_score(
+ y_test, elm_weightless.predict_proba(X_test)[:, 1])
+ score_weighted = roc_auc_score(
+ y_test, elm_weight_auto.predict_proba(X_test)[:, 1])
+
+ assert_greater(score_weighted, score_weightless)
@agramfort scikit-learn member

bad indent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@agramfort
scikit-learn member

@IssamLaradji please address my comments add starts a complete benchmark to find the best default parameters. I would also add ReLu activation functions which are fast to compute.

cc @ogrisel

@ogrisel
scikit-learn member

And also sparse random weights (reusing code from sklearn.random_projection) as an alternative to dense Gaussian random weights. It can be significantly faster.

Also, the amplitude of the random weights seems to be a very impacting hyperparameter as demonstrated in slides 14 and 15 of this deck: http://www.lce.hut.fi/~eiparvia/publ/KDIR_Parviainen_slides.pdf

We should therefore make the scale of the random weights an explicit hyper parameter of the ELM estimator(s) and write an example to highlight its importance, for instance using a grid search with a grid that includes the regularizer strength, the number of hidden nodes and the scale of the random weights.

@IssamLaradji

Thanks @agramfort and @ogrisel for all your comments. I pushed the updated code.
I added a weight_scale parameter that sets the interval that the uniform distribution picks values from.
By default weight_scale="auto" which selects the interval based on this paper (http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_GlorotB10.pdf), depending on the activation function. It works very well compared to other initialization methods.

For sklearn.random_projection.sparse_random_matrix, the output matrix contains only 3 unique values, a,
-a, and 0 which doesn't provide the asymmetry we need for weight initialization, right?

Running grid-search on the load_digits dataset with the following range of parameters,

parameters = {'weight_scale': np.arange(0.1, 1, 0.1),
              'n_hidden': np.arange(50, 800, 50), 'C': [1, 10, 100, 1000]}

I got the following best combination,

ELMClassifier(C=1, activation='tanh', algorithm='standard', batch_size=200,
       class_weight=None, coef0=0.0, degree=3, gamma=0.0, kernel='random',
       n_hidden=500, random_state=None, verbose=False,
       weight_scale=0.10000000000000001)

Thanks.

@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((100 lines not shown))
+ if self.algorithm is not 'standard' and self.class_weight is not None:
+ raise NotImplementedError("class_weight is only supported "
+ "when algorithm='standard'.")
+
+ def _init_param(self, X):
+ """Set initial parameters."""
+ self._activation_func = self._activation_functions[self.activation]
+
+ if (self.kernel in ['poly', 'rbf', 'sigmoid']) and (self.gamma == 0):
+ # if custom gamma is not provided ...
+ self.gamma = 1.0 / self._n_features
+
+ if self.kernel != 'random':
+ self._X_train = X
+
+ def _scaled_weight_init(self, fan_in, fan_out):
@arjoly scikit-learn member
arjoly added a note

Maybe inline this function? it is called only once.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((210 lines not shown))
+
+ if self._K is None:
+ # initialize K and coef_output_
+ self.coef_output_ = np.zeros((self.n_hidden, self.n_outputs_))
+ self._K = safe_sparse_dot(H.T, H)
+ y_ = safe_sparse_dot(H.T, y_batch)
+
+ else:
+ self._K += safe_sparse_dot(H.T, H)
+ H_updated = safe_sparse_dot(H, self.coef_output_)
+ y_ = safe_sparse_dot(H.T, (y_batch - H_updated))
+
+ self.coef_output_ += ridge.ridge_regression(self._K, y_,
+ 1.0 / self.C).T
+
+ def fit(self, X, y):
@arjoly scikit-learn member
arjoly added a note

Do you think the common implementation of fit and partial_fit could be merged?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((370 lines not shown))
+
+ n_hidden: int, default 100
+ The number of neurons in the hidden layer, it only applies to
+ kernel='random'.
+
+ activation : {'logistic', 'tanh', 'relu'}, default 'tanh'
+ Activation function for the hidden layer. It only applies to
+ kernel='random'.
+
+ - 'logistic' for 1 / (1 + exp(x)).
+
+ - 'tanh' for the hyperbolic tangent.
+
+ - 'relu' for log(1 + exp(x))
+
+ algorithm : {'standard', 'sequential'}, default 'standard'
@arjoly scikit-learn member
arjoly added a note

Does it make any sense to have other solver such as sgd?

@arjoly scikit-learn member
arjoly added a note

Why not calling this solver?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((391 lines not shown))
+
+ - 'sequential' computes the least-square solutions by training
+ on the dataset in batches using a recursive least-square
+ algorithm.
+
+ kernel : {'random', 'linear', 'poly', 'rbf', 'sigmoid'},
+ optional, default 'random'
+ Specifies the kernel type to be used in the algorithm.
+
+ degree : int, optional, default 3
+ Degree of the polynomial kernel function 'poly'.
+ Ignored by all other kernels.
+
+ gamma : float, optional, default 0.0
+ Kernel coefficient for 'rbf', 'poly' and 'sigmoid'. If gamma is
+ 0.0 then 1/n_features will be used instead.
@arjoly scikit-learn member
arjoly added a note

Maybe a better default is to set it to None instead of 0..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((566 lines not shown))
+ raise ValueError(
+ "only 'sequential' algorithm supports partial fit")
+
+ if self.classes_ is None and classes is None:
+ raise ValueError("classes must be passed on the first call "
+ "to partial_fit.")
+ elif self.classes_ is not None and classes is not None:
+ if np.any(self.classes_ != np.unique(classes)):
+ raise ValueError("`classes` is not the same as on last call "
+ "to partial_fit.")
+ elif classes is not None:
+ self.classes_ = classes
+
+ if not hasattr(self, '_lbin'):
+ self._lbin = LabelBinarizer()
+ self._lbin._classes = classes
@arjoly scikit-learn member
arjoly added a note

I would avoid this. It would better patching the label binarizer to accept a classes argument. (+1 for another pr)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((750 lines not shown))
+ Training data, where n_samples in the number of samples
+ and n_features is the number of features.
+
+ y : array-like, shape (n_samples, n_outputs)
+ Subset of the target values.
+
+ Returns
+ -------
+ self : returns an instance of self.
+ """
+ y = np.atleast_1d(y)
+
+ if y.ndim == 1:
+ # reshape is necessary to preserve the data contiguity against vs
+ # [:, np.newaxis] that does not.
+ y = np.reshape(y, (-1, 1))
@arjoly scikit-learn member
arjoly added a note

With a private _validate_y in the base class and the partial_fit function could be shared between regression and classification.

@arjoly scikit-learn member
arjoly added a note

It might not be necessary to have a _validate_y since it's possible to distinguish classification from regression using the ClassifierMixin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((725 lines not shown))
+
+ def predict(self, X):
+ """Predict using the multi-layer perceptron model.
+
+ Parameters
+ ----------
+ X : {array-like, sparse matrix}, shape (n_samples, n_features)
+ Data, where n_samples is the number of samples
+ and n_features is the number of features.
+
+ Returns
+ -------
+ array, shape (n_samples,)
+ Predicted target values per element in X.
+ """
+ X = atleast2d_or_csr(X)
@arjoly scikit-learn member
arjoly added a note

Since @amueller improvement in the utils module, we could the new awesome check_X_y, check_array or check_consistency_length. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((676 lines not shown))
+ Zong, Weiwei, Guang-Bin Huang, and Yiqiang Chen.
+ "Weighted extreme learning machine for imbalance learning."
+ Neurocomputing 101 (2013): 229-242.
+
+ Liang, Nan-Ying, et al.
+ "A fast and accurate online sequential learning algorithm for
+ feedforward networks." Neural Networks, IEEE Transactions on
+ 17.6 (2006): 1411-1423.
+ http://www.ntu.edu.sg/home/egbhuang/pdf/OS-ELM-TNN.pdf
+ """
+ def __init__(self, n_hidden=100, activation='tanh', algorithm='standard',
+ weight_scale='auto', kernel='random', batch_size=200, C=10e5,
+ degree=3, gamma=0.0, coef0=0.0, verbose=False,
+ random_state=None):
+
+ class_weight = None
@arjoly scikit-learn member
arjoly added a note

This could be handled in the base class.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((683 lines not shown))
+ 17.6 (2006): 1411-1423.
+ http://www.ntu.edu.sg/home/egbhuang/pdf/OS-ELM-TNN.pdf
+ """
+ def __init__(self, n_hidden=100, activation='tanh', algorithm='standard',
+ weight_scale='auto', kernel='random', batch_size=200, C=10e5,
+ degree=3, gamma=0.0, coef0=0.0, verbose=False,
+ random_state=None):
+
+ class_weight = None
+
+ super(ELMRegressor, self).__init__(n_hidden, activation, algorithm,
+ kernel, C, degree, gamma, coef0,
+ class_weight, weight_scale,
+ batch_size, verbose, random_state)
+
+ self.classes_ = None
@arjoly scikit-learn member
arjoly added a note

Same here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((602 lines not shown))
+
+ Parameters
+ ----------
+ C: float, optional, default 10e5
+ Regularization term.
+
+ weight_scale : float or 'auto', default 'auto'
+ Scales the weights that initialize the outgoing weights of the first
+ hidden layer. The weight values will range between plus and minus an
+ interval based on the uniform distribution. That interval
+ is 1 / (n_features + n_hidden) if weight_scale='auto'; otherwise,
+ the interval is the value given to weight_scale.
+
+ n_hidden: int, default 100
+ The number of neurons in the hidden layer, it only applies to
+ kernel='random'.
@arjoly scikit-learn member
arjoly added a note

What happens for the other kernel?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((598 lines not shown))
+ values.
+
+ This implementation works with data represented as dense and sparse numpy
+ arrays of floating point values for the features.
+
+ Parameters
+ ----------
+ C: float, optional, default 10e5
+ Regularization term.
+
+ weight_scale : float or 'auto', default 'auto'
+ Scales the weights that initialize the outgoing weights of the first
+ hidden layer. The weight values will range between plus and minus an
+ interval based on the uniform distribution. That interval
+ is 1 / (n_features + n_hidden) if weight_scale='auto'; otherwise,
+ the interval is the value given to weight_scale.
@arjoly scikit-learn member
arjoly added a note

That interval is 1 / (n_features + n_hidden) if weight_scale='auto'; otherwise, the interval is the value given to weight_scale.
=>
If weight_scale='auto', then weight_scale is set to 1 / (n_features + n_hidden)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((591 lines not shown))
+class ELMRegressor(BaseELM, RegressorMixin):
+ """Extreme learning machines regressor.
+
+ The algorithm trains a single-hidden layer feedforward network by computing
+ the hidden layer values using randomized parameters, then solving
+ for the output weights using least-square solutions. For prediction,
+ ELMRegressor computes the forward pass resulting in contiuous output
+ values.
+
+ This implementation works with data represented as dense and sparse numpy
+ arrays of floating point values for the features.
+
+ Parameters
+ ----------
+ C: float, optional, default 10e5
+ Regularization term.
@arjoly scikit-learn member
arjoly added a note

Which regularisation term?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((518 lines not shown))
+ return np.hstack([1 - scores, scores])
+ else:
+ return _softmax(scores)
+
+ def predict_log_proba(self, X):
+ """Return the log of probability estimates.
+
+ Parameters
+ ----------
+ X : array-like, shape (n_samples, n_features)
+ Data, where n_samples is the number of samples
+ and n_features is the number of features.
+
+ Returns
+ -------
+ T : array-like, shape (n_samples, n_outputs)
@arjoly scikit-learn member
arjoly added a note

T => y_proba?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((523 lines not shown))
+ """Return the log of probability estimates.
+
+ Parameters
+ ----------
+ X : array-like, shape (n_samples, n_features)
+ Data, where n_samples is the number of samples
+ and n_features is the number of features.
+
+ Returns
+ -------
+ T : array-like, shape (n_samples, n_outputs)
+ Returns the log-probability of the sample for each class in the
+ model, where classes are ordered as they are in
+ `self.classes_`. Equivalent to log(predict_proba(X))
+ """
+ tmp = self.predict_proba(X)
@arjoly scikit-learn member
arjoly added a note

tmp => y_proba

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((524 lines not shown))
+
+ Parameters
+ ----------
+ X : array-like, shape (n_samples, n_features)
+ Data, where n_samples is the number of samples
+ and n_features is the number of features.
+
+ Returns
+ -------
+ T : array-like, shape (n_samples, n_outputs)
+ Returns the log-probability of the sample for each class in the
+ model, where classes are ordered as they are in
+ `self.classes_`. Equivalent to log(predict_proba(X))
+ """
+ tmp = self.predict_proba(X)
+ return np.log(tmp, tmp)
@arjoly scikit-learn member
arjoly added a note

return np.log(y_proba, out= y_proba)?

@arjoly scikit-learn member
arjoly added a note

I got a bit surprised by the second argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((495 lines not shown))
+ scores = self.decision_function(X)
+
+ return self._lbin.inverse_transform(scores)
+
+ def predict_proba(self, X):
+ """Probability estimates.
+
+ Parameters
+ ----------
+ X : {array-like, sparse matrix}, shape (n_samples, n_features)
+ Data, where n_samples is the number of samples
+ and n_features is the number of features.
+
+ Returns
+ -------
+ array, shape (n_samples, n_outputs)
@arjoly scikit-learn member
arjoly added a note

I assume you mean (n_samples, n_classes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((459 lines not shown))
+ Parameters
+ ----------
+ X : {array-like, sparse matrix}, shape (n_samples, n_features)
+ Training data, where n_samples is the number of samples
+ and n_features is the number of features.
+
+ y : array-like, shape (n_samples,)
+ Target values.
+
+ Returns
+ -------
+ self : returns an instance of self.
+ """
+ y = column_or_1d(y, warn=True)
+ self.classes_ = np.unique(y)
+ y = self._lbin.fit_transform(y)
@arjoly scikit-learn member
arjoly added a note

Could be factored in a self._validate_y function

@arjoly scikit-learn member
arjoly added a note

It might not be necessary to have a _validate_y since it's possible to distinguish classification from regression using the ClassifierMixin.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((312 lines not shown))
+ Subset of target values.
+
+ Returns
+ -------
+ self : returns an instance of self.
+ """
+ X = atleast2d_or_csr(X)
+
+ self.n_outputs_ = y.shape[1]
+
+ n_samples, self._n_features = X.shape
+ self._validate_params()
+ self._init_param(X)
+
+ if self.coef_output_ is None:
+ self._K = None
@arjoly scikit-learn member
arjoly added a note

What is _k?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((30 lines not shown))
+ # print X.shape
+ tmp = 1 + np.exp(X, X)
+ return np.log(tmp, tmp)
+
+
+def _softmax(Z):
+ """Compute the K-way softmax function. """
+ Z = np.exp(Z - Z.max(axis=1)[:, np.newaxis])
+ Z /= Z.sum(axis=1)[:, np.newaxis]
+
+ return Z
+
+
+def _square_error(y, y_pred, n_samples):
+ """Compute the square error."""
+ return (np.sum(y - y_pred) ** 2) / (2 * n_samples)
@arjoly scikit-learn member
arjoly added a note

Why not infering n_samples from the data?

Why not re-using the mean_squared error from the metrics module?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((43 lines not shown))
+def _square_error(y, y_pred, n_samples):
+ """Compute the square error."""
+ return (np.sum(y - y_pred) ** 2) / (2 * n_samples)
+
+
+class BaseELM(six.with_metaclass(ABCMeta, BaseEstimator)):
+ """Base class for ELM classification and regression.
+
+ Warning: This class should not be used directly.
+ Use derived classes instead.
+ """
+ _activation_functions = {
+ 'tanh': _tanh,
+ 'logistic': logistic_sigmoid,
+ 'relu': _relu
+ }
@arjoly scikit-learn member
arjoly added a note

Why this need to be an attribute instead of global constant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((79 lines not shown))
+ self.coef_output_ = None
+
+ def _validate_params(self):
+ """Validate input params."""
+ if self.n_hidden <= 0:
+ raise ValueError("n_hidden must be greater or equal zero")
+ if self.C <= 0.0:
+ raise ValueError("C must be > 0")
+
+ if self.activation not in self._activation_functions:
+ raise ValueError("The activation %s"
+ " is not supported. " % self.activation)
+
+ if self.algorithm not in ['standard', 'sequential']:
+ raise ValueError("The algorithm %s"
+ " is not supported. " % self.algorithm)
@arjoly scikit-learn member
arjoly added a note

It's nice to remind what are the possible choices in that sort of exception raised.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((85 lines not shown))
+ if self.C <= 0.0:
+ raise ValueError("C must be > 0")
+
+ if self.activation not in self._activation_functions:
+ raise ValueError("The activation %s"
+ " is not supported. " % self.activation)
+
+ if self.algorithm not in ['standard', 'sequential']:
+ raise ValueError("The algorithm %s"
+ " is not supported. " % self.algorithm)
+
+ if self.kernel not in ['random', 'linear', 'poly', 'rbf', 'sigmoid']:
+ raise ValueError("The kernel %s"
+ " is not supported. " % self.kernel)
+
+ if self.algorithm is not 'standard' and self.class_weight is not None:
@arjoly scikit-learn member
arjoly added a note

self.algorithm != 'standard'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((278 lines not shown))
+ print("Training square error for the dataset = %f" % cost)
+
+ return self
+
+ def decision_function(self, X):
+ """Predict using the trained model
+
+ Parameters
+ ----------
+ X : {array-like, sparse matrix}, shape (n_samples, n_features)
+ Data, where n_samples is the number of samples
+ and n_features is the number of features.
+
+ Returns
+ -------
+ array, shape (n_samples,)
@arjoly scikit-learn member
arjoly added a note

There is no variable for the return. In classification, this looks like to be (n_samples, n_classes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((181 lines not shown))
+
+ elif self.kernel == 'poly':
+ H = polynomial_kernel(X, self._X_train, self.degree, self.gamma,
+ self.coef0)
+
+ elif self.kernel == 'rbf':
+ H = rbf_kernel(X, self._X_train, self.gamma)
+
+ elif self.kernel == 'sigmoid':
+ H = sigmoid_kernel(X, self._X_train, self.gamma, self.coef0)
+
+ return H
+
+ def _solve_lsqr(self, X, y):
+ """Compute the least-square solutions for the whole dataset."""
+ H = self._get_hidden_activations(X)
@arjoly scikit-learn member
arjoly added a note

I don't find H to be a particularly descriptive variable name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((127 lines not shown))
+ return interval
+
+ def _init_hidden_weights(self):
+ """Initialize coef and intercept parameters for the hidden layer."""
+ rng = check_random_state(self.random_state)
+ fan_in, fan_out = self._n_features, self.n_hidden
+
+ interval = self._scaled_weight_init(fan_in, fan_out)
+
+ coef = rng.uniform(-interval, interval, (fan_in, fan_out))
+ intercept = rng.uniform(-interval, interval, (fan_out))
+
+ self.coef_hidden_ = coef
+ self.intercept_hidden_ = intercept
+
+ def _assign_weights(self, y):
@arjoly scikit-learn member
arjoly added a note

This is called only once. Maybe this function should be inline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((267 lines not shown))
+ y_pred = self.decision_function(X[batch_slice])
+ cost = _square_error(y[batch_slice], y_pred, batch_size)
+
+ print("Training square error for batch %d = %f" % (batch,
+ cost))
+
+ if self.verbose:
+ # compute training square error
+ y_pred = self.decision_function(X)
+ cost = _square_error(y, y_pred, n_samples)
+
+ print("Training square error for the dataset = %f" % cost)
+
+ return self
+
+ def decision_function(self, X):
@arjoly scikit-learn member
arjoly added a note

Should we also have this in regression?

@ogrisel scikit-learn member
ogrisel added a note

+1 for renaming this to _predict and defining:

class ELMClassifier(...):
    ...
    def decision_function(self, X):
        return self._predict(X)

and:

class ELMRegressor(...):
    ...
    def predict(self, X):
        return self._predict(X)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((1 lines not shown))
+"""Extreme Learning Machines
+"""
+
+# Author: Issam H. Laradji <issam.laradji@gmail.com>
+# Licence: BSD 3 clause
+
+from abc import ABCMeta, abstractmethod
+
+import numpy as np
+
+from ..base import BaseEstimator, ClassifierMixin, RegressorMixin
+from ..externals import six
+from ..preprocessing import LabelBinarizer
+from ..metrics.pairwise import linear_kernel, polynomial_kernel, rbf_kernel
+from ..metrics.pairwise import sigmoid_kernel
+from ..linear_model import ridge
@arjoly scikit-learn member
arjoly added a note

maybe importing directly ridge_regression would be equivalent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((16 lines not shown))
+from ..linear_model import ridge
+from ..utils import gen_even_slices
+from ..utils import atleast2d_or_csr, check_random_state, column_or_1d
+from ..utils.extmath import safe_sparse_dot
+from ..utils.fixes import expit as logistic_sigmoid
+
+
+def _tanh(X):
+ """Compute the hyperbolic tan function."""
+ return np.tanh(X, X)
+
+
+def _relu(X):
+ """Compute the rectified linear unit function."""
+ # print X.shape
+ tmp = 1 + np.exp(X, X)
@arjoly scikit-learn member
arjoly added a note

np.exp(X, out=X)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((17 lines not shown))
+from ..utils import gen_even_slices
+from ..utils import atleast2d_or_csr, check_random_state, column_or_1d
+from ..utils.extmath import safe_sparse_dot
+from ..utils.fixes import expit as logistic_sigmoid
+
+
+def _tanh(X):
+ """Compute the hyperbolic tan function."""
+ return np.tanh(X, X)
+
+
+def _relu(X):
+ """Compute the rectified linear unit function."""
+ # print X.shape
+ tmp = 1 + np.exp(X, X)
+ return np.log(tmp, tmp)
@arjoly scikit-learn member
arjoly added a note

np.log(tmp, out=tmp)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((10 lines not shown))
+
+from ..base import BaseEstimator, ClassifierMixin, RegressorMixin
+from ..externals import six
+from ..preprocessing import LabelBinarizer
+from ..metrics.pairwise import linear_kernel, polynomial_kernel, rbf_kernel
+from ..metrics.pairwise import sigmoid_kernel
+from ..linear_model import ridge
+from ..utils import gen_even_slices
+from ..utils import atleast2d_or_csr, check_random_state, column_or_1d
+from ..utils.extmath import safe_sparse_dot
+from ..utils.fixes import expit as logistic_sigmoid
+
+
+def _tanh(X):
+ """Compute the hyperbolic tan function."""
+ return np.tanh(X, X)
@arjoly scikit-learn member
arjoly added a note

np.tanh(X, out=X)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((8 lines not shown))
+
+import numpy as np
+
+from ..base import BaseEstimator, ClassifierMixin, RegressorMixin
+from ..externals import six
+from ..preprocessing import LabelBinarizer
+from ..metrics.pairwise import linear_kernel, polynomial_kernel, rbf_kernel
+from ..metrics.pairwise import sigmoid_kernel
+from ..linear_model import ridge
+from ..utils import gen_even_slices
+from ..utils import atleast2d_or_csr, check_random_state, column_or_1d
+from ..utils.extmath import safe_sparse_dot
+from ..utils.fixes import expit as logistic_sigmoid
+
+
+def _tanh(X):
@arjoly scikit-learn member
arjoly added a note

_tanh, _relu, _softmax should probably say that this is done inplace

@arjoly scikit-learn member
arjoly added a note

_inplace_tanh, _inplace_...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((20 lines not shown))
+from ..utils.fixes import expit as logistic_sigmoid
+
+
+def _tanh(X):
+ """Compute the hyperbolic tan function."""
+ return np.tanh(X, X)
+
+
+def _relu(X):
+ """Compute the rectified linear unit function."""
+ # print X.shape
+ tmp = 1 + np.exp(X, X)
+ return np.log(tmp, tmp)
+
+
+def _softmax(Z):
@arjoly scikit-learn member
arjoly added a note

maybe rename Z to X to be consistent with the tanh and relu?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((93 lines not shown))
+ raise ValueError("The algorithm %s"
+ " is not supported. " % self.algorithm)
+
+ if self.kernel not in ['random', 'linear', 'poly', 'rbf', 'sigmoid']:
+ raise ValueError("The kernel %s"
+ " is not supported. " % self.kernel)
+
+ if self.algorithm is not 'standard' and self.class_weight is not None:
+ raise NotImplementedError("class_weight is only supported "
+ "when algorithm='standard'.")
+
+ def _init_param(self, X):
+ """Set initial parameters."""
+ self._activation_func = self._activation_functions[self.activation]
+
+ if (self.kernel in ['poly', 'rbf', 'sigmoid']) and (self.gamma == 0):
@arjoly scikit-learn member
arjoly added a note

Parenthesis are not needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((108 lines not shown))
+ if (self.kernel in ['poly', 'rbf', 'sigmoid']) and (self.gamma == 0):
+ # if custom gamma is not provided ...
+ self.gamma = 1.0 / self._n_features
+
+ if self.kernel != 'random':
+ self._X_train = X
+
+ def _scaled_weight_init(self, fan_in, fan_out):
+ """Scale the initial, random parameters for a specific layer."""
+ if self.weight_scale == 'auto':
+ if self.activation == 'tanh':
+ interval = np.sqrt(6. / (fan_in + fan_out))
+ elif self.activation == 'logistic':
+ interval = 4. * np.sqrt(6. / (fan_in + fan_out))
+ else:
+ interval = np.sqrt(1. / (fan_in))
@arjoly scikit-learn member
arjoly added a note

The auto mode is not inline with the documentation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on an outdated diff
sklearn/neural_network/extreme_learning_machines.py
((125 lines not shown))
+ interval = self.weight_scale
+
+ return interval
+
+ def _init_hidden_weights(self):
+ """Initialize coef and intercept parameters for the hidden layer."""
+ rng = check_random_state(self.random_state)
+ fan_in, fan_out = self._n_features, self.n_hidden
+
+ interval = self._scaled_weight_init(fan_in, fan_out)
+
+ coef = rng.uniform(-interval, interval, (fan_in, fan_out))
+ intercept = rng.uniform(-interval, interval, (fan_out))
+
+ self.coef_hidden_ = coef
+ self.intercept_hidden_ = intercept
@arjoly scikit-learn member
arjoly added a note

I have the feeling there are line that could be easily merged without impeding the reading

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@amueller amueller commented on the diff
examples/neural_networks/plot_weighted_elm.py
@@ -0,0 +1,71 @@
+"""
+==================================
+Weighted Extreme Learning Machines
+==================================
+
+Plot decision functions of extreme learning machines with different class
+weights. Assigning larger weight to a class will push the decision function
+away from that class to have more of its samples correctly classified.
+Such scheme is useful for imbalanced data so that underrepresented classes
@amueller scikit-learn member

"Such a scheme"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@amueller amueller commented on the diff
examples/neural_networks/plot_weighted_elm.py
((37 lines not shown))
+ axis.scatter(X[:, 0], X[:, 1], s=20, c=y, cmap=plt.cm.Paired)
+ axis.axis('off')
+ axis.set_title(title)
+
+
+rng = np.random.RandomState(0)
+n_samples_1 = 1000
+n_samples_2 = 100