Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel Extreme Learning Machine #10602

Closed
wants to merge 11 commits into from
Closed

Kernel Extreme Learning Machine #10602

wants to merge 11 commits into from

Conversation

cperales
Copy link

@cperales cperales commented Feb 7, 2018

Reference Issues/PRs

What does this implement/fix? Explain your changes.

The following pull request aims to add Kernel Extreme Learning Machine (KELM) to the neural network classifiers, in order to make approachable this useful multiclass classifier in scikit-learn, the main library about machine learning in Python. Implementation follows the mathematical developpment from this paper.

KELM is added as the object sklearn.neural_network.kelm.KernelELM. Tests for KernelELM are in examples/neural_networks/plot_kelm_custom_kernel.py and examples/neural_networks/plot_kelm_rbf_parameters.py. With both tests, coverage of kelm.py is 96%.

Any other comments?

I am a PhD student in Mathematics, and I have been working with Extreme Learning Machine in MATLAB. I really miss a good standard implementation of Kernel Extreme Learning Machine in sciki-learn, so this is my proposal. I have seen other pull requests about ELM following a single-hidden-layer feed-forward network (SLFN), but not one with the kernel trick.

Although first concept of ELM was a neural network with some random weights, from 2012 it was redesigned following a notation similar to Support Vector Machine. This new design allows ELM to be used along a kernel trick. Actually, the tests are practically the same as SVM. However, mathematical construction of KELM allows multiclass problems without using "one-by-one" or "one-against-rest" tricks.

@cperales cperales closed this Feb 7, 2018
@cperales
Copy link
Author

cperales commented Feb 7, 2018

Minor change in a test

@cperales cperales reopened this Feb 7, 2018
@cperales
Copy link
Author

cperales commented Feb 7, 2018

@glemaitre
Copy link
Member

Tests for KernelELM are in examples/neural_networks/plot_kelm_custom_kernel.py and examples/neural_networks/plot_kelm_rbf_parameters.py. With both tests, coverage of kelm.py is 96%.

We do not test with example. You need to create a proper test file with unit testing.

@amueller
Copy link
Member

amueller commented Feb 7, 2018

EML is a scam. See my comments here: https://en.wikipedia.org/wiki/Talk:Extreme_learning_machine#Bogus_math

@amueller
Copy link
Member

amueller commented Feb 7, 2018

@amueller
Copy link
Member

amueller commented Feb 7, 2018

Ugh maybe I should do a formal complaint to the journal.....

@doctorcorral
Copy link

@amueller Please do. Most of the comments I've seen on this topic turn to be very inappropriate from the ethical perspective. For example, comments of Yann LeCun referring to these ideas like the stupidest thing one could do, I feel, is far from responsible.
Another big attack I've seen to this approach is about not properly cite previous work or even equivalent previous formalisms, but I'm amazed how in this paper: https://www.nature.com/articles/nature14539 authors (Yann LeCun, Yoshua Bengio, and Geoffrey Hinton) heavily cite each other without credit the pioneers of the field.
@cperales has practical experience on this topic, let's take what is useful from it.

@amueller
Copy link
Member

amueller commented Feb 7, 2018

Did you see my wikipedia comment? Have you read the original paper? It looks like the math came out of a paper generator. It makes no sense at all. I made the arguments in the wiki comment, feel free to comment on those.

@amueller
Copy link
Member

amueller commented Feb 7, 2018

Can you say which section of the paper actually introduces the algorithm you're implementing here?

@amueller
Copy link
Member

amueller commented Feb 7, 2018

Is this kernel ridge regression?

@cperales
Copy link
Author

cperales commented Feb 8, 2018

Due to the controversy about this pull request, I must give a properly explanation about my code and the maths. But first, @glemaitre I'm sorry for not creating a proper test file. I will commit the coverage tests with unittest as soon as I can.

Also, some checks were not successful because I write the code thinking only in Python 3, but I've already notice sklearn new implementations also must work with Python 2, so I'll fix my
code in the next days.

About what @amueller claims, I'm afraid ELM is not scam, as soon as it works. I have read your links and Wikipedia comments, but even same paper compares different ELM versions (mainly Kernel ELM and Neural ELM) against Kernel Support Vector Machine (SVM) and Kernel Least Square Support Vector Machine (LS-SVM) for several datasets in section V. If scikit-learn community thinks more accuracy tests should be done, specifically in this pull request, I gladly add the necessary tests with this or other metrics. Everyone can also clone this pull request and make locally the tests.

I'm not here to discuss about Huang's ethic, or its first paper, I can even accept that its first proposal could be tricky or/and mistaken, but it doesn't really matter. What's more, the idea of a hidden layer with neuron's weights chosen randomnly sounds really supicious even for me. That's why I don't present that version, but the Kernel ELM.

Both the mathematical notation and the results exposed in the Kernel reformulation of ELM in 2012 gives little space to discussion. It is, in fact, truly similar to SVM, with the addition that it solves multiclass problems directly.

This is the Quadratic Programming (QP) problem of minimization for Kernel SVM (see section II):

svm_min

Where w is the support vector, C the regularization, \epsilon_i the error of missclassification, \phi(x) the mapping function (used for kernel trick) and b a parameter associated with w.

And this is QP problem of minimization for ELM (it works both for Kernel implementation and Neural one)

elm_min

Where \beta is the matrix used for classification, h(x) is the mapping function, which in Kernel ELM is the \phi(x) of SVM, and t_i is the target expressed with J codification (a vector with zeros in all positions except in the position associated with the target, which takes the value 1).

Loss function is basically the same, but the difference with restrictions allows:

  • To solve multiclass problems directly, without applying techniques such as one vs one.
  • To solve analytically the minimization problem, when applying KKT conditions.

Due to this advantages, I believe Kernel ELM could make a significant difference in this important machine learning library. Because I am more a mathematician than a programmer, code complains or improvements are welcome.

@amueller
Copy link
Member

amueller commented Feb 8, 2018

About what @amueller claims, I'm afraid ELM is not scam, as soon as it works. [...] . Because I am more a mathematician than a programmer, code complains or improvements are welcome.

They provide math that is just wrong, clearly to deceive the reader. That's a scam, unrelated to whether the method is working. I am surprised someone that identifies themselves as a mathematician is not upset by someone providing faulty arguments in an attempt to deceive.
The question is not "is this method working" but "what is the standard name of this method and what's the publication that introduced it".

Sorry, I don't think the formulation of the problem you present is very clear. The code seems to just solve a single linear problem, not a QP.

My question as before: is this Kernel Ridge Regression, and if not, what is the difference?

cperales added 3 commits February 9, 2018 12:16
…compatibility with Python 2

test_precomputed fixed

test_tweak_params fixed

test_bad_input fixed

Unicode test fixed

test_linear_elm fixed

test_linear_elm_iris fixed

Documentation fixed

unused imports removed

Unused test was removed

Problem with __init__ solved
Float correction

Unicode correction

PEP corrections with flake8

Unicode treatment for Python 2 and Python 3
@sklearn-lgtm
Copy link

This pull request introduces 1 alert - view on lgtm.com

new alerts:

  • 1 for Non-callable called

Comment posted by lgtm.com

Compatibility for Python 2 and 3 in unicode, while passing flake8 test
@sklearn-lgtm
Copy link

This pull request introduces 1 alert - view on lgtm.com

new alerts:

  • 1 for Non-callable called

Comment posted by lgtm.com

@cperales
Copy link
Author

cperales commented Feb 12, 2018

The mathematical formulation of Huang's paper is not tricky or scam. Firstly, because it's similar to SVM and KRR, and there is no math doubt about these methods. Secondly, it's a supervised classification algorithm, and when classifier is fitted, it classifies new instances (it works). So, essentially, it is expresses a quadratic minimization problem with linear restrictions, and this mathematical expression works when it is implemented in code.

Also, I'm not a matematician, just a physicist who studies mathematics, I just said

I am more a mathematician than a programmer

Because I'm not as good at programming as I would like. I even learn a lot of things about tests while doing this pull request (that's why I lasted so many days to answer).

Besides that, @amueller I see you are the release manager of this big repository, so you understand better scikit-learn where this algorithm would fit (maybe is not neural_network the folder I should put ELM?). I read about KRR as you said, and it is also a QP problem which is solved as a single linear problem when it calls to sklearn.linear_model.ridge._solve_cholesky_kernel, just as you said about ELM. Both algorithms have analytical solution. Two main differences with KRR and KELM:

  • KRR is used for regressions and ELM for both classifications and regressions (same paper shows performance comparisons of binary class, multi class and regression data sets). What's more, Huang in the paper cite Ridge Theory's paper while solving ELM minimization.
  • ELM formulation works both for neural version (random weights) and kernel trick, because 2012 ELM it's a generalization of original ELM. I just implemented Kernel version because I thought it was less obscure to implement.

Also, because of your answers, and the cites you did, like that Reddit post full of hatred about ELM, it seems to me you are ideologically biased against this proposal and ELM in general. I hope I am wrong in this assumption, but if not, I'm afraid I could do nothing to convice you or to improve this pull request, so you should close it and save us both time.

@cperales
Copy link
Author

Other proposal: If my pull request of Kernel Extreme Learning Machine seems convincing in terms of tests, but you think it is to close to Kernel Ridge Regression, we can avoid the polemic.

I will modify my pull request in order to make a Kernel Ridge Classification, modifying actual Kernel Ridge Regression to do the classification, just as KELM in this pull does. Would you be interested? ELM would not appear as an algorithm, although I believe in the documentation should be referenced.

@jnothman
Copy link
Member

jnothman commented Feb 12, 2018 via email

@cperales
Copy link
Author

@jnothman as far as I have reviewed these days, Kernel Ridge Regression and Extreme Learning Machine have the same minimization problem (explained in this comment). ELM is formulated both for classification and regression and KRR only for regression.

Altough I think no much time will be required to change the code from this pull request to Kernel Ridge Classification, I prefer @amueller give his approval first, before writing code needlessly.

@cperales
Copy link
Author

@jnothman Pull request is done! (10633) It was surprisingly easy see how sklearn.kernel_ridge.KernelRidge could be modified in order to allow classifications. It can be seen here ( #10633 ).

@amueller I think I you were goddamn right. I have been reviewed more about ELM and Kernel Ridge, and there is a paper from 2007 which explain how to pass from Kernel Ridge Regression to Kernel Ridge Classification, and it is virtually the same as Huang does in 2012. As Kernel ELM, it works surprisingly well, so I moved tests from this pull request to the other, and now I truly believe it could be a good contribution to scikit-learn.

I close this pull request due to this other one about Kernel Ridge Classification ( #10633 )

@cperales cperales closed this Feb 14, 2018
@amueller
Copy link
Member

I am indeed biased against ELM, because every time I'm looking into something branded as ELM, I see issues. I'm glad we could come to an agreement. Thank you for your contribution and reviewing the objections in detail.

Going from KRR to using it for classification is a pretty obvious and trivial step, that's not the main contribution of the 2007 paper, I think. The half page of math that they do is actually not even necessary as everything decomposes over the independent outputs, from what I can see, but I guess it helps to clarify / repeat the derivation of KRR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants