Skip to content

Random hidden layer and Extreme Learning Machine implementation #1714

Closed
wants to merge 6 commits into from

5 participants

@dclambert

still missing some docs/doctests

demo IPython notebook in https://gist.github.com/dclambert/b1081db34ddaa282f147

dclambert added some commits Feb 25, 2013
@dclambert dclambert ENH: initial ELM/random_hidden_layer commit, passes tests, pep8, pyfl…
…akes

added examples plot_elm_comparison and simple_elm_example
added elm and random_hidden_layer to __all__ in __init__.py
added ELMClassifier and fixed docstring in bench_covertype.py
172fad3
@dclambert dclambert ENH: refactor, add BaseRandomHiddenLayer and derive other transformer…
…s from it

change xfer_func to user_func and modify examples accordingly
e721474
@dclambert dclambert added ELM explanation to plot_elm_comparison example, removed
example that should be in doctests
ce3f26c
@larsmans larsmans and 2 others commented on an outdated diff Feb 26, 2013
sklearn/elm.py
+
+ @_take_docstring_from(BaseELM)
+ def predict(self, X):
+ raw_predictions = self.decision_function(X)
+ class_predictions = self.binarizer_.inverse_transform(raw_predictions)
+
+ return class_predictions
+
+
+#########################################
+# SimpleELMRegressor doc string assembly
+#########################################
+_SimpleELMRegressor_doc = """
+SimpleELMRegressor is a regressor based on the Extreme Learning Machine
+
+{}
@larsmans
scikit-learn member
larsmans added a note Feb 26, 2013

What does this {} do?

@dclambert
dclambert added a note Feb 26, 2013

{} is a format field for string.format(), instead of old style string formatting using "%s" % some_string, which is on it's way out according to the docs

@larsmans
scikit-learn member
larsmans added a note Feb 26, 2013

Ah, right. I think we've never assembled docstrings this way, except in deprecated...

(%s and friends will be around for some time, removing them would break massive amounts of code in non-trivial ways.)

@dclambert
dclambert added a note Feb 26, 2013

Does never assembled mean must not be assembled, or can I leave it? I was just trying to avoid a bunch of cut and pasted text that had to remain synchronized.

@larsmans
scikit-learn member
larsmans added a note Feb 26, 2013

I'm not sure. I'm not very opposed to this, but the other developers will have to agree.

@ogrisel
scikit-learn member
ogrisel added a note Mar 3, 2013

I think I prefer duplicated yet flat code. The docstring assembly logic looks complicated to me and make the source code less readable. +0 for flattening it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@larsmans larsmans and 2 others commented on an outdated diff Feb 26, 2013
sklearn/random_hidden_layer.py
+ Creates a layer of units as a specified functions of an activation
+ value determined by the dot product of the input and a random vector
+ plus a random bias term:
+
+ f(a), s.t. a = dot(x, hidden_weights) + bias
+
+ and xfer function f() which defaults to numpy.tanh if not supplied,
+ but can be any callable that returns an array of the same shape as
+ its argument (input activation array, shape [n_samples, n_hidden])
+
+ Parameters
+ __________
+ `n_hidden` : int
+ number of units to generate (default=20)
+
+ `user_func` : callable
@larsmans
scikit-learn member
larsmans added a note Feb 26, 2013

The type should be "callable, optional". I'd prefer "tanh" (as a string) to None for the default value, in which case the type would be "{callable, string}, optional".

@dclambert
dclambert added a note Feb 26, 2013

Just to make sure I understand you - if some additional transfer functions are included in the public API, you would have the internally implemented functions selected by the string arg, and if the string doesn't correspond to one of the internal ones, I would look for the user function in locals()?

@larsmans
scikit-learn member
larsmans added a note Feb 26, 2013

No no -- if the user passes a string, it denotes one of the functions defined internally, and if it's a callable, you call it. No need to go snooping in locals().

@ogrisel
scikit-learn member
ogrisel added a note Mar 3, 2013

+1 for larsmans comment. Also I would rather name this parameter something more explicit such as activation_function or link_function. WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@larsmans larsmans and 1 other commented on an outdated diff Feb 26, 2013
sklearn/random_hidden_layer.py
+
+ # take n_hidden and random_state, init components_ and
+ # input_activations_
+ def __init__(self, n_hidden=20, random_state=0, user_func=None,
+ user_args={}):
+
+ self.n_hidden = n_hidden
+ self.random_state = random_state
+ self.user_func = user_func
+ self.user_args = user_args
+
+ self.components_ = dict()
+ self.input_activations_ = None
+
+ @abstractmethod
+ def generate_components(self, X):
@larsmans
scikit-learn member
larsmans added a note Feb 26, 2013

Unless you have a good reason to expose these methods, I'd say they should be private. (They can be made public at a later stage when needed.)

@dclambert
dclambert added a note Feb 26, 2013

Do you mean just the generate_component() method, or the compute_*_activation methods too?

@larsmans
scikit-learn member
larsmans added a note Feb 26, 2013

All of them. I'm not even sure if you need all these methods.

@dclambert
dclambert added a note Feb 26, 2013

They are from a more general implementation that didn't make it into the pull request. They were originally there to support user provided input activation and component generation methods without subclassing.

I was looking at implementing the generalized RBF from MELM-GRBF, Gutierrez which could use the generate_component() method, and generalized input activations of the form alpha*f(dot_product()) + beta*g(distance()) or f(alpha*dot_product() + beta*distance()) (see Conic Section Function Networks, Dorffner) which could make use of the compute_input_activation() method, to add more ELM candidates in the Caruana ensemble selection techniques that led me to wanting to use the ELM in the first place.

(TL;DR) I personally have use for them, so it would be nice if I could keep them present but private.

@larsmans
scikit-learn member
larsmans added a note Feb 26, 2013

I'm sorry, but we can't guarantee that this won't get refactored/optimized away because one of the developers relies on implementation details in their own code (we're too big and loose a team for that, and we have too many users). Either a method is documented and supported, or it's up for grabs.

@dclambert
dclambert added a note Feb 26, 2013

Understood. I'll just cross my fingers...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@larsmans larsmans and 1 other commented on an outdated diff Feb 26, 2013
sklearn/random_hidden_layer.py
+
+ # perform transformation by calling compute_hidden_activations
+ # (which will normally call compute_input_activations first)
+ def transform(self, X, y=None):
+ """Generate the random hidden layer's activations given X as input.
+
+ Parameters
+ ----------
+ X : numpy array or scipy.sparse of shape [n_samples, n_features]
+ The input data to project into a random hidden layer's activations
+
+ y : is not used: placeholder to allow for usage in a Pipeline.
+
+ Returns
+ -------
+ X_new : numpy array or scipy sparse of shape [n_samples, n_components]
@larsmans
scikit-learn member
larsmans added a note Feb 26, 2013

Isn't this always going to be dense?

@dclambert
dclambert added a note Feb 26, 2013

Sorry, yes, I assume so. Pasted text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@larsmans larsmans and 2 others commented on an outdated diff Feb 26, 2013
examples/plot_elm_comparison.py
+ ax.set_xticks(())
+ ax.set_yticks(())
+
+ ax.set_title(name)
+ ax.text(xx.max() - 0.3, yy.min() + 0.3, ('%.2f' % score).lstrip('0'),
+ size=13, horizontalalignment='right')
+
+
+def make_datasets():
+ return [make_moons(n_samples=200, noise=0.3, random_state=0),
+ make_circles(n_samples=200, noise=0.2, factor=0.5, random_state=1),
+ make_linearly_separable()]
+
+
+def make_classifiers():
+ sinsq = (lambda x: np.power(np.sin(x), 2.0))
@larsmans
scikit-learn member
larsmans added a note Feb 26, 2013

Are any of these useful enough to include them in the public API? (As string arguments, like with tanh?)

@dclambert
dclambert added a note Feb 26, 2013

The ones that are found in some MATLAB implementations of the extreme learning machine are the standard logistic function, sine, and the MATLAB functions hardlim(), radbas(), and tribas(). They're easy enough to include.

@ogrisel
scikit-learn member
ogrisel added a note Mar 3, 2013

They might be easy to include but is there any evidence they are useful in practice? Is there any dataset in a published empirical study (peer reviewed paper or just a gist somewhere) where they actually beat tanh?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@larsmans
scikit-learn member

I'd say all of this should go into sklearn.neural_networks. That module doesn't exist ATM, but several other PRs are planned for it.

@larsmans larsmans and 1 other commented on an outdated diff Feb 26, 2013
sklearn/random_hidden_layer.py
+ n_samples, n_features = X.shape
+ rs = check_random_state(self.random_state)
+
+ # use examples from the data as centers
+ if (self.use_exemplars):
+ if (n_samples < self.n_hidden):
+ msg = "n_hidden must be <= n_samples when using exemplars"
+ raise ValueError(msg)
+ max_index = X.shape[0] - 1
+ indices = rs.permutation(max_index)[:self.n_hidden]
+ centers = X[indices, :]
+
+ # use uniformly distributed points from the input space as centers
+ else:
+ if (sparse):
+ min_X, max_X = min(X.data), max(X.data)
@larsmans
scikit-learn member
larsmans added a note Feb 26, 2013

This doesn't work when all elements of X are either >0 or <0. The proper, type-preserving way of computing the minimum of a sparse matrix is np.minimum(0, np.min(X.data)).

@dclambert
dclambert added a note Feb 26, 2013

thanks, my sparse skills are weak.

@larsmans
scikit-learn member
larsmans added a note Feb 26, 2013

I have to correct myself: that doesn't necessarily preserve types (it may upcast from np.float32 to np.float64). The correct way is

np.minimum(X.dtype.type(0), np.min(X.data))

(See PR over at SciPy.)

@larsmans
scikit-learn member
larsmans added a note Feb 26, 2013

Got it wrong again. When the matrix is completely dense ("full"), this will fail.

@dclambert
dclambert added a note Feb 26, 2013

Can't I just leave it in the

if (sparse):

clause?

@larsmans
scikit-learn member
larsmans added a note Feb 26, 2013

Yes, that's fine, I just meant that when the "sparse" matrix actually contains no zeroes, then my trick with np.minimum fails.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@larsmans larsmans and 1 other commented on an outdated diff Feb 26, 2013
sklearn/elm.py
+ y : array-like of shape [n_samples, n_outputs]
+ Training target.
+
+ Returns
+ -------
+ self : object
+ Returns an instance of self.
+ """
+
+ @abstractmethod
+ def predict(self, X):
+ """Predict values using the model
+
+ Parameters
+ ----------
+ X : numpy array of shape [n_samples, n_features]
@larsmans
scikit-learn member
larsmans added a note Feb 26, 2013

This should read "array-like" if it does proper input validation; we follow Postel's law. Also, sparse matrices?

@dclambert
dclambert added a note Feb 26, 2013

all right

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@larsmans larsmans commented on an outdated diff Feb 26, 2013
sklearn/random_hidden_layer.py
+
+class SimpleRandomHiddenLayer(BaseRandomHiddenLayer):
+ """Simple Random Hidden Layer transformer
+
+ Creates a layer of units as a specified functions of an activation
+ value determined by the dot product of the input and a random vector
+ plus a random bias term:
+
+ f(a), s.t. a = dot(x, hidden_weights) + bias
+
+ and xfer function f() which defaults to numpy.tanh if not supplied,
+ but can be any callable that returns an array of the same shape as
+ its argument (input activation array, shape [n_samples, n_hidden])
+
+ Parameters
+ __________
@larsmans
scikit-learn member
larsmans added a note Feb 26, 2013

This should be ===== -- it's a reST formatting code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@dclambert dclambert moved elm and random_hidden_layer to neural_networks;
adjusted __init__.py files to match;
changed user_func to xfer_func and made it a string or callable;
adjusted examples and benchmark to compensate for changes;
added 'tanh','sine','hardlim','sigmoid','tribas' transfer funcs to SimpleRHL;
added 'gaussian','poly_spline','multiquadric' transfer funcs to RBFRHL;
change multiquadric to lambda;
moved internal functions out of if/else block into dictionary;
refactored _compute_hidden_activations into BaseRHL;
smoothed the docstrings
ba4b370
@dclambert

Made suggested changes, refactored some stuff, improved doc strings. Still no doctests...

@ogrisel ogrisel and 1 other commented on an outdated diff Mar 3, 2013
benchmarks/bench_covertype.py
@@ -227,6 +231,10 @@ def benchmark(clf):
n_jobs=opts.n_jobs,
random_state=opts.random_seed)
+######################################################################
+## Train ELMClassifier model
+rhl = RBFRandomHiddenLayer(n_hidden=35, gamma=0.02, random_state=opts.random_seed)
+classifiers['ELMClassifier'] = ELMClassifier(hidden_layer=rhl, regressor=LinearRegression())
@ogrisel
scikit-learn member
ogrisel added a note Mar 3, 2013

Please run pep8 on this code (and the rest of the python files touched by this PR) and fix the reported violations.

@dclambert
dclambert added a note Mar 3, 2013

Note to self: no more late night commits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ogrisel ogrisel and 3 others commented on an outdated diff Mar 3, 2013
benchmarks/bench_covertype.py
@@ -227,6 +231,10 @@ def benchmark(clf):
n_jobs=opts.n_jobs,
random_state=opts.random_seed)
+######################################################################
+## Train ELMClassifier model
+rhl = RBFRandomHiddenLayer(n_hidden=35, gamma=0.02, random_state=opts.random_seed)
@ogrisel
scikit-learn member
ogrisel added a note Mar 3, 2013

Why such a small number of hidden units? Has this been grid searched? Isn't it possible to gain much better than this by increasing the number of hidden units, e.g. to 1000?

Right now ELM is slower than linear SGD without improving on the test score.

@dclambert
dclambert added a note Mar 3, 2013

Wasn't sure what to do here, so I just poked around until I got results that were comparable to the others. 1000 hidden units runs out of memory. I can get better results with more hidden units, or different basis functions.

@amueller
scikit-learn member
amueller added a note Mar 3, 2013

can you see if there are better results with more units that still fit into memory? 1000 doesn't seem so much to me, but the dataset is quite big. Maybe there should be a batch-mechanism then?

@ogrisel
scikit-learn member
ogrisel added a note Mar 3, 2013

+1

@larsmans
scikit-learn member
larsmans added a note Mar 4, 2013

An SGD-based implementation should have no problem; just initialize, then transform and partial_fit minibatches in a loop.

@dclambert
dclambert added a note Mar 4, 2013

I'll try a grid search later today. There is a batch variant and an incremental variant of the algorithm but I haven't tried either of them.

@dclambert
dclambert added a note Mar 7, 2013

Had many memory issues trying to to do a grid search. The best score that I was able to get was 0.2002 with 240 gaussian units. I was not able to beat SGD with simple transfer functions before running out of memory. The benchmark runs out of memory if I try to use the 240 unit classifier with all of the other ones.

The results with 230 units (the most I could use with every other classifier) are:

Classifier   train-time test-time error-rate
--------------------------------------------
ExtraTrees    259.5521s   0.6472s     0.0202
RandomForest  307.8546s   0.5662s     0.0233
CART           25.6408s   0.0613s     0.0426
ELMClassifier  24.3607s   1.3345s     0.2013
SGD            0.4685s    0.0506s     0.2300
liblinear      5.3599s    0.0569s     0.2305
GaussianNB     2.3428s    0.1721s     0.4841
@amueller
scikit-learn member
amueller added a note Mar 7, 2013
@larsmans
scikit-learn member
larsmans added a note Mar 7, 2013

Part of the problem might be that we load covertype as an ndarray, while it's quite sparse. But we can't load it as CSR because that breaks the random forests.

@dclambert
dclambert added a note Mar 13, 2013

I cobbled up an SGD version. These are the best results:

 Classifier     #units  train-time test-time error-rate
----------------------------------------------------------
sine             10000    192.732    20.986     0.1188
sine              7500    146.777    16.006     0.1252
tribas           10000    171.252    18.498     0.1408
sine              5000     98.330    10.657     0.1412
gaussian(0.05)   10000    514.265    56.337     0.1473
tribas            7500    130.425    14.143     0.1492
gaussian(0.10)   10000    524.255    56.543     0.1537
gaussian(0.05)    7500    381.234    41.704     0.1607
tribas            5000     87.895     9.432     0.1646
gaussian(0.10)    7500    384.946    41.812     0.1659

(The value in parens after gaussian is the width parameter)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ogrisel ogrisel commented on an outdated diff Mar 3, 2013
examples/plot_elm_comparison.py
+ names = ["ELM(10,tanh)", "ELM(10,tanh,LR)", "ELM(10,sinsq)",
+ "ELM(10,tribas)", "ELM(hardlim)", "ELM(20,rbf(0.1))"]
+
+ nh = 10
+
+ # pass user defined transfer func
+ sinsq = (lambda x: np.power(np.sin(x), 2.0))
+ srhl_sinsq = SimpleRandomHiddenLayer(n_hidden=nh, xfer_func=sinsq, random_state=0)
+
+ # use internal transfer funcs
+ srhl_tanh = SimpleRandomHiddenLayer(n_hidden=nh, xfer_func='tanh', random_state=0)
+ srhl_tribas = SimpleRandomHiddenLayer(n_hidden=nh, xfer_func='tribas', random_state=0)
+ srhl_hardlim = SimpleRandomHiddenLayer(n_hidden=nh, xfer_func='hardlim', random_state=0)
+
+ # use gaussian RBF
+ srhl_rbf = RBFRandomHiddenLayer(n_hidden=nh*2, gamma=0.1, random_state=0)
@ogrisel
scikit-learn member
ogrisel added a note Mar 3, 2013

pep8 violation here too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ogrisel ogrisel commented on an outdated diff Mar 3, 2013
sklearn/neural_networks/elm.py
+_ELM_blurb = """An Extreme Learning Machine (ELM) is a single layer feedforward
+network with a random hidden layer components and least-squares fitting
+of the hidden->output weights by default. [1][2]
+"""
+
+_ELM_refs = """References
+----------
+.. [1] http://www.extreme-learning-machines.org
+.. [2] G.-B. Huang, Q.-Y. Zhu and C.-K. Siew, "Extreme Learning Machine:
+ Theory and Applications", Neurocomputing, vol. 70, pp. 489-501,
+ 2006.
+"""
+
+############################
+# module docstring assembly
+############################
@ogrisel
scikit-learn member
ogrisel added a note Mar 3, 2013

Please don't put such heavily formatted comments in the main implementation modules. Futhermore, docstring parts do not really need commenting anyway. Just remove those comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ogrisel ogrisel and 1 other commented on an outdated diff Mar 3, 2013
examples/plot_elm_comparison.py
+def make_datasets():
+ return [make_moons(n_samples=200, noise=0.3, random_state=0),
+ make_circles(n_samples=200, noise=0.2, factor=0.5, random_state=1),
+ make_linearly_separable()]
+
+
+def make_classifiers():
+
+ names = ["ELM(10,tanh)", "ELM(10,tanh,LR)", "ELM(10,sinsq)",
+ "ELM(10,tribas)", "ELM(hardlim)", "ELM(20,rbf(0.1))"]
+
+ nh = 10
+
+ # pass user defined transfer func
+ sinsq = (lambda x: np.power(np.sin(x), 2.0))
+ srhl_sinsq = SimpleRandomHiddenLayer(n_hidden=nh, xfer_func=sinsq, random_state=0)
@ogrisel
scikit-learn member
ogrisel added a note Mar 3, 2013

xfer_func is not an explicit name. I would either rename it to transfer_function, activation_function or link_function

@ogrisel
scikit-learn member
ogrisel added a note Mar 3, 2013

Or basis_function. Which ever is the most common in the neural network / ELM literature.

@larsmans
scikit-learn member
larsmans added a note Mar 4, 2013

The thing with the NN literature is that everyone uses different terminology (based on where they're from an who they're addressing, or so it seems). I prefer activation_function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@ogrisel
scikit-learn member
ogrisel commented Mar 3, 2013

I find this code much to nested & frameworkish. It makes it complex to read and to use (the user should not have to build the layer instance and then pass it as a constructor param to the estimator class). If there is only 2 default implementation of the hidden layer (simpler and RBF) we should find a way to collapse the layer classes with the ELM (base?) class, maybe using a mixin classes.

I don't have much time this week to further review this PR nor propose a refactoring but I think we should start from simple code (as done in @larsmans gist: https://gist.github.com/larsmans/2493300 ) and then think about how to make it more extensible. Starting from an overly generic implementation makes the code too complex for the users to understand and also too complex to maintain and refactor on the long run.

@dclambert

The SimpleELMRegressor and SimpleELMClassifier classes provide an interface that does not require the user to build the layer:

SimpleELMRegressor(n_hidden, xfer_func, xfer_args, random_state)

If this interface is more suitable then I could ditch the RBF hidden layer entirely, leaving just the dot product based hidden layer class (SimpleRandomHiddenLayer) and make the SimpleELM... interface the only one.

Alternatively...

The random hidden layer (RHL) classes differ in the components generated/input activation type (using the randomly generated vectors for distance calculations or dot products), and the addition of the parameter gamma (used to scale the distances in the radial basis functions).

If both distance and dot product activation types were to be kept, the RHL classes could be combined, leaving an interface like:

RandomHiddenLayer(n_hidden, activation_func, activation_args, input_activation_type,
                  input_activation_scale, random_state)

where input_activation_type would be one of dotproduct or distance (for now), and input_activation_scale would just be a float parameter that scales the input activation (of whatever type) before the activation function is applied, and would take the place of the RBF gamma parameter.

Then the main classes could extend the interface from SimpleELMRegressor, leaving a constructor something like this:

ELMRegressor(n_hidden, activation_func, activation_args, input_activation_type,
             input_activation_scale, regressor, random_state)

where input_activation_type and input_activation_scale are as above.

It still seems appropriate to keep a RandomHiddenLayer class as a separate transformer, both for simplicity's sake and since it might be useful on its own. There are also other activation types and basis functions that could be provided.

@dclambert

and would take the place of the RBF gamma parameter

sorry, this is wrong, it wouldn't work for the radial basis functions in general, they would still need an extra parameter

dclambert added some commits Mar 3, 2013
@dclambert dclambert fix pep8 problems, flatten docstrings;
changed internal fit to use LinearRegression object;
change ELMClassifier to use 70 gaussian units in
bench_covertype to boost accuracy;
changed xfer_* to activation_*, changed docs to match
70e9e23
@dclambert dclambert remove unused parameters;
removed _coefs from docstring
fd5ba2b
@arjoly
scikit-learn member
arjoly commented Jul 15, 2014

Closing this pr in favor of #3306

@arjoly arjoly closed this Jul 15, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.