Kernel Extreme Learning Machine #10602

cperales · 2018-02-07T15:56:17Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

The following pull request aims to add Kernel Extreme Learning Machine (KELM) to the neural network classifiers, in order to make approachable this useful multiclass classifier in scikit-learn, the main library about machine learning in Python. Implementation follows the mathematical developpment from this paper.

KELM is added as the object sklearn.neural_network.kelm.KernelELM. Tests for KernelELM are in examples/neural_networks/plot_kelm_custom_kernel.py and examples/neural_networks/plot_kelm_rbf_parameters.py. With both tests, coverage of kelm.py is 96%.

Any other comments?

I am a PhD student in Mathematics, and I have been working with Extreme Learning Machine in MATLAB. I really miss a good standard implementation of Kernel Extreme Learning Machine in sciki-learn, so this is my proposal. I have seen other pull requests about ELM following a single-hidden-layer feed-forward network (SLFN), but not one with the kernel trick.

Although first concept of ELM was a neural network with some random weights, from 2012 it was redesigned following a notation similar to Support Vector Machine. This new design allows ELM to be used along a kernel trick. Actually, the tests are practically the same as SVM. However, mathematical construction of KELM allows multiclass problems without using "one-by-one" or "one-against-rest" tricks.

Custom kernel test for ELM

cperales · 2018-02-07T15:58:19Z

Minor change in a test

cperales · 2018-02-07T16:18:07Z

Paper used for implementation:
DOI: https://doi.org/10.1109/TSMCB.2011.2168604
download link: http://www.neuromorphs.net/nm/raw-attachment/wiki/2015/scc15/ELM-Unified-Learning.pdf

glemaitre · 2018-02-07T16:56:38Z

Tests for KernelELM are in examples/neural_networks/plot_kelm_custom_kernel.py and examples/neural_networks/plot_kelm_rbf_parameters.py. With both tests, coverage of kelm.py is 96%.

We do not test with example. You need to create a proper test file with unit testing.

amueller · 2018-02-07T17:39:18Z

EML is a scam. See my comments here: https://en.wikipedia.org/wiki/Talk:Extreme_learning_machine#Bogus_math

amueller · 2018-02-07T17:40:09Z

Also see https://www.reddit.com/r/MachineLearning/comments/34y2nk/the_elm_scandal_a_formal_complaint_launched/

amueller · 2018-02-07T17:43:55Z

Ugh maybe I should do a formal complaint to the journal.....

doctorcorral · 2018-02-07T20:50:15Z

@amueller Please do. Most of the comments I've seen on this topic turn to be very inappropriate from the ethical perspective. For example, comments of Yann LeCun referring to these ideas like the stupidest thing one could do, I feel, is far from responsible.
Another big attack I've seen to this approach is about not properly cite previous work or even equivalent previous formalisms, but I'm amazed how in this paper: https://www.nature.com/articles/nature14539 authors (Yann LeCun, Yoshua Bengio, and Geoffrey Hinton) heavily cite each other without credit the pioneers of the field.
@cperales has practical experience on this topic, let's take what is useful from it.

amueller · 2018-02-07T22:32:09Z

Did you see my wikipedia comment? Have you read the original paper? It looks like the math came out of a paper generator. It makes no sense at all. I made the arguments in the wiki comment, feel free to comment on those.

amueller · 2018-02-07T22:37:55Z

Can you say which section of the paper actually introduces the algorithm you're implementing here?

amueller · 2018-02-07T22:39:26Z

Is this kernel ridge regression?

cperales · 2018-02-08T14:03:03Z

Due to the controversy about this pull request, I must give a properly explanation about my code and the maths. But first, @glemaitre I'm sorry for not creating a proper test file. I will commit the coverage tests with unittest as soon as I can.

Also, some checks were not successful because I write the code thinking only in Python 3, but I've already notice sklearn new implementations also must work with Python 2, so I'll fix my
code in the next days.

About what @amueller claims, I'm afraid ELM is not scam, as soon as it works. I have read your links and Wikipedia comments, but even same paper compares different ELM versions (mainly Kernel ELM and Neural ELM) against Kernel Support Vector Machine (SVM) and Kernel Least Square Support Vector Machine (LS-SVM) for several datasets in section V. If scikit-learn community thinks more accuracy tests should be done, specifically in this pull request, I gladly add the necessary tests with this or other metrics. Everyone can also clone this pull request and make locally the tests.

I'm not here to discuss about Huang's ethic, or its first paper, I can even accept that its first proposal could be tricky or/and mistaken, but it doesn't really matter. What's more, the idea of a hidden layer with neuron's weights chosen randomnly sounds really supicious even for me. That's why I don't present that version, but the Kernel ELM.

Both the mathematical notation and the results exposed in the Kernel reformulation of ELM in 2012 gives little space to discussion. It is, in fact, truly similar to SVM, with the addition that it solves multiclass problems directly.

This is the Quadratic Programming (QP) problem of minimization for Kernel SVM (see section II):

Where w is the support vector, C the regularization, \epsilon_i the error of missclassification, \phi(x) the mapping function (used for kernel trick) and b a parameter associated with w.

And this is QP problem of minimization for ELM (it works both for Kernel implementation and Neural one)

Where \beta is the matrix used for classification, h(x) is the mapping function, which in Kernel ELM is the \phi(x) of SVM, and t_i is the target expressed with J codification (a vector with zeros in all positions except in the position associated with the target, which takes the value 1).

Loss function is basically the same, but the difference with restrictions allows:

To solve multiclass problems directly, without applying techniques such as one vs one.
To solve analytically the minimization problem, when applying KKT conditions.

Due to this advantages, I believe Kernel ELM could make a significant difference in this important machine learning library. Because I am more a mathematician than a programmer, code complains or improvements are welcome.

amueller · 2018-02-08T15:36:25Z

About what @amueller claims, I'm afraid ELM is not scam, as soon as it works. [...] . Because I am more a mathematician than a programmer, code complains or improvements are welcome.

They provide math that is just wrong, clearly to deceive the reader. That's a scam, unrelated to whether the method is working. I am surprised someone that identifies themselves as a mathematician is not upset by someone providing faulty arguments in an attempt to deceive.
The question is not "is this method working" but "what is the standard name of this method and what's the publication that introduced it".

Sorry, I don't think the formulation of the problem you present is very clear. The code seems to just solve a single linear problem, not a QP.

My question as before: is this Kernel Ridge Regression, and if not, what is the difference?

…compatibility with Python 2 test_precomputed fixed test_tweak_params fixed test_bad_input fixed Unicode test fixed test_linear_elm fixed test_linear_elm_iris fixed Documentation fixed unused imports removed Unused test was removed Problem with __init__ solved

Float correction Unicode correction PEP corrections with flake8 Unicode treatment for Python 2 and Python 3

sklearn-lgtm · 2018-02-09T13:18:28Z

This pull request introduces 1 alert - view on lgtm.com

new alerts:

1 for Non-callable called

Comment posted by lgtm.com

Compatibility for Python 2 and 3 in unicode, while passing flake8 test

sklearn-lgtm · 2018-02-09T13:54:43Z

This pull request introduces 1 alert - view on lgtm.com

new alerts:

1 for Non-callable called

Comment posted by lgtm.com

cperales · 2018-02-12T11:37:35Z

The mathematical formulation of Huang's paper is not tricky or scam. Firstly, because it's similar to SVM and KRR, and there is no math doubt about these methods. Secondly, it's a supervised classification algorithm, and when classifier is fitted, it classifies new instances (it works). So, essentially, it is expresses a quadratic minimization problem with linear restrictions, and this mathematical expression works when it is implemented in code.

Also, I'm not a matematician, just a physicist who studies mathematics, I just said

I am more a mathematician than a programmer

Because I'm not as good at programming as I would like. I even learn a lot of things about tests while doing this pull request (that's why I lasted so many days to answer).

Besides that, @amueller I see you are the release manager of this big repository, so you understand better scikit-learn where this algorithm would fit (maybe is not neural_network the folder I should put ELM?). I read about KRR as you said, and it is also a QP problem which is solved as a single linear problem when it calls to sklearn.linear_model.ridge._solve_cholesky_kernel, just as you said about ELM. Both algorithms have analytical solution. Two main differences with KRR and KELM:

KRR is used for regressions and ELM for both classifications and regressions (same paper shows performance comparisons of binary class, multi class and regression data sets). What's more, Huang in the paper cite Ridge Theory's paper while solving ELM minimization.
ELM formulation works both for neural version (random weights) and kernel trick, because 2012 ELM it's a generalization of original ELM. I just implemented Kernel version because I thought it was less obscure to implement.

Also, because of your answers, and the cites you did, like that Reddit post full of hatred about ELM, it seems to me you are ideologically biased against this proposal and ELM in general. I hope I am wrong in this assumption, but if not, I'm afraid I could do nothing to convice you or to improve this pull request, so you should close it and save us both time.

cperales · 2018-02-12T14:48:14Z

Other proposal: If my pull request of Kernel Extreme Learning Machine seems convincing in terms of tests, but you think it is to close to Kernel Ridge Regression, we can avoid the polemic.

I will modify my pull request in order to make a Kernel Ridge Classification, modifying actual Kernel Ridge Regression to do the classification, just as KELM in this pull does. Would you be interested? ELM would not appear as an algorithm, although I believe in the documentation should be referenced.

jnothman · 2018-02-12T20:53:31Z

kernel ridge classification sounds like it could be useful, and it would be best if you can show that it is on some data. but I'm not familiar with the formulation. I'd look at a pull request.

cperales · 2018-02-13T08:06:26Z

@jnothman as far as I have reviewed these days, Kernel Ridge Regression and Extreme Learning Machine have the same minimization problem (explained in this comment). ELM is formulated both for classification and regression and KRR only for regression.

Altough I think no much time will be required to change the code from this pull request to Kernel Ridge Classification, I prefer @amueller give his approval first, before writing code needlessly.

cperales · 2018-02-14T08:25:29Z

@jnothman Pull request is done! (10633) It was surprisingly easy see how sklearn.kernel_ridge.KernelRidge could be modified in order to allow classifications. It can be seen here ( #10633 ).

@amueller I think I you were goddamn right. I have been reviewed more about ELM and Kernel Ridge, and there is a paper from 2007 which explain how to pass from Kernel Ridge Regression to Kernel Ridge Classification, and it is virtually the same as Huang does in 2012. As Kernel ELM, it works surprisingly well, so I moved tests from this pull request to the other, and now I truly believe it could be a good contribution to scikit-learn.

I close this pull request due to this other one about Kernel Ridge Classification ( #10633 )

amueller · 2018-02-14T22:05:29Z

I am indeed biased against ELM, because every time I'm looking into something branded as ELM, I see issues. I'm glad we could come to an agreement. Thank you for your contribution and reviewing the objections in detail.

Going from KRR to using it for classification is a pretty obvious and trivial step, that's not the main contribution of the 2007 paper, I think. The half page of math that they do is actually not even necessary as everything decomposes over the independent outputs, from what I can see, but I guess it helps to clarify / repeat the derivation of KRR.

cperales added 5 commits February 7, 2018 15:44

Kernel Extreme Learning Machine added

bff4eb9

Travis to run tests

bb97793

RBF test for ELM

67100ad

Custom kernel test for ELM

.travis.yml back to original

8d39c95

PEP8 modifications

d40253c

cperales closed this Feb 7, 2018

Test fixed

8e7f666

cperales reopened this Feb 7, 2018

cperales added 3 commits February 9, 2018 12:16

PEP corrections with flake8

7897e20

Code adapted to Python 2 and 3

b34671e

Float correction Unicode correction PEP corrections with flake8 Unicode treatment for Python 2 and Python 3

Compatibility for Python 2 and 3 in unicode, while passing flake8 test

22a31c9

Compatibility for Python 2 and 3 in unicode, while passing flake8 test

Try/Except for unicode handling changed to a more pythonic way

badce56

cperales mentioned this pull request Feb 14, 2018

Kernel Ridge Classification #10633

Closed

cperales closed this Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel Extreme Learning Machine #10602

Kernel Extreme Learning Machine #10602

cperales commented Feb 7, 2018

cperales commented Feb 7, 2018

cperales commented Feb 7, 2018

glemaitre commented Feb 7, 2018

amueller commented Feb 7, 2018

amueller commented Feb 7, 2018

amueller commented Feb 7, 2018

doctorcorral commented Feb 7, 2018

amueller commented Feb 7, 2018

amueller commented Feb 7, 2018

amueller commented Feb 7, 2018

cperales commented Feb 8, 2018

amueller commented Feb 8, 2018

sklearn-lgtm commented Feb 9, 2018

sklearn-lgtm commented Feb 9, 2018

cperales commented Feb 12, 2018 •

edited

Loading

cperales commented Feb 12, 2018

jnothman commented Feb 12, 2018 via email

cperales commented Feb 13, 2018

cperales commented Feb 14, 2018

amueller commented Feb 14, 2018

Kernel Extreme Learning Machine #10602

Kernel Extreme Learning Machine #10602

Conversation

cperales commented Feb 7, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

cperales commented Feb 7, 2018

cperales commented Feb 7, 2018

glemaitre commented Feb 7, 2018

amueller commented Feb 7, 2018

amueller commented Feb 7, 2018

amueller commented Feb 7, 2018

doctorcorral commented Feb 7, 2018

amueller commented Feb 7, 2018

amueller commented Feb 7, 2018

amueller commented Feb 7, 2018

cperales commented Feb 8, 2018

amueller commented Feb 8, 2018

sklearn-lgtm commented Feb 9, 2018

sklearn-lgtm commented Feb 9, 2018

cperales commented Feb 12, 2018 • edited Loading

cperales commented Feb 12, 2018

jnothman commented Feb 12, 2018 via email

cperales commented Feb 13, 2018

cperales commented Feb 14, 2018

amueller commented Feb 14, 2018

cperales commented Feb 12, 2018 •

edited

Loading