[MRG] Added kernel weighting functions for neighbors classes #3117

nmayorov · 2014-04-28T16:50:12Z

This patch enables the use of kernel functions for neighbors weighting.

It adds the following keywords for weights argument: tophat, gaussian, epanechnikov, exponential, linear, cosine, i.e. all kernels presented in KernelDensity class.

For KNeighborsClassifier and KNeighborsRegressor the kernel bandwidth is equal to the distance to the k+1 nearest neighbor (i. e. it depends on a query point).

For RadiusNeighborsClassifier and RadiusNeighborsRegressor the kernel bandwidth is equal to the radius parameter of the classifier (i. e. it is constant).

Please, take a look.

coveralls · 2014-04-29T14:37:07Z

Coverage remained the same when pulling 9d3f813 on nmayorov:neighbors_kernels into 6945d5b on scikit-learn:master.

nmayorov · 2014-10-07T23:25:44Z

Hello! This has been here for months, no attention unfortunately.

Let me try to explain what the intention of this PR in more detail.

Currently there are two options for weighting neighbor predictions: uniform (majority vote) and dist, which uses 1/dist weights. The first one is classic, the second one is quite controversial (infinity which might occur is not fun to deal with, I'm not sure if it's a good option to be honest.)

There is also a probabilistic interpretation on neighbor methods, which manifests itself in sklearn.neighbors.KernelDensity. We can also use it for prediction in kNN: estimate the PDF for each class at a query point and then pick one with the highest probability (Bayesian approach). It can be very easily done by using kernels (as in kernel density estimation) as weighting functions.

One subtle point is that some kernel functions (like gaussian) are non-zero in infinite interval, in kNN prediction we have to use their "truncated" versions. But I don't think it matters much in practice. As far as selected kernel bandwidth concerned, please refer to my opening message.

Other neighbor weighting strategies also exist, which aren't directly associated with kernel density estimation. Potentially we can also incorporate them into sklearn.neighbors. Overall I think there should be more options besides uniform and dist.

Please tell me whether you think it is useful or not. I'm willing to properly finish this PR (like add narrative doc and so on.)

Ping @agramfort @jakevdp @larsman Anyone?

agramfort · 2014-10-12T19:23:22Z

can you provide some benchmark results that demonstrate the usefulness of this on a public dataset? in terms of accuracy and computation time.

thanks

nmayorov · 2014-10-13T00:01:17Z

Hi, Alexandre.

I created an ipython notebook where I test different weighs on the famous data set. Take a look http://nbviewer.ipython.org/gist/nmayorov/9b11161f9b66df12d2b9.

agramfort · 2014-10-13T06:53:24Z

ok good. Can you comment on extra computation time if any significant?
How long is test time?
You'll need to add a paragraph to the narrative docs explaining the kernels
and why one might want to use them.

nmayorov · 2014-10-13T21:55:53Z

It does not required any significant extra time, it's simply a matter of evaluation of a different weighting function (as 1 / dist). I added benchmarks in ipython notebook.

Also I remembered one thing: such technique for regression is known by the name of Nadaraya-Watson estimator. And in fact there is the whole chapter about similar methods in "The elements of statistical learning" (for example, check out FIGURE 6.1 from there, pretty illustrative.)

With a proper kernel (non-zero only in the range of bandwidth) we can do this regression locally using only small number of neighbors. Perhaps we should keep kernels which are non-zero only locally in the bandwidth range, to have theoretical integrity. What do you think?

About narrative docs. I think I'll just mention that it can be interpreted as KDE for classification and Nadaraya-Watson estimation for regression, but won't go deep into that (also I don't think I can.) After all this is just a few new reasonable weighting functions, which gives more credit to closer neighbors.

Give me some feedback.

arjoly · 2014-10-14T08:04:27Z

ping @jakevdp You might want have a look to this pr.

agramfort · 2014-10-14T17:00:15Z

sklearn/neighbors/base.py

    """Get the weights from an array of distances and a parameter ``weights``

    Parameters
    ===========
    dist: ndarray
        The input distances
-    weights: {'uniform', 'distance' or a callable}
+    weights: None, string from VALID_WEIGHTS or callable


I would write

weights: None, str or callable The kind of weighting used. The valid string parameters are 'uniform', 'distance', 'tophat', etc....

the mathematical formula of the different kernels should be in the narrative
doc ideally with a plot of the kernel shapes.

It is a private function of the module, I though I can be more technically explicit and mention VALID_WEIGHTS. (Makes sense?)

it was explicit and clear before please keep it clear and explicit.

agramfort · 2014-10-14T17:05:10Z

you need to add a paragraph to the narrative doc and update an example to show case this feature.

agramfort · 2014-10-15T12:35:36Z

sklearn/neighbors/base.py

        return dist
    elif callable(weights):
        return weights(dist)
    else:
-        raise ValueError("weights not recognized: should be 'uniform', "
-                         "'distance', or a callable function")


there was a nice error message now it's gone... please step into the shoes of the user that messes up the name of the kernel

It is checked in _check_weights, previously there was a duplication. The error message will appear all right.

Good job redesigning this!

nmayorov · 2014-10-17T23:31:44Z

I've done some work, please review.

agramfort · 2014-10-18T08:12:40Z

examples/neighbors/plot_regression.py

@@ -34,16 +34,16 @@
 # Fit regression model
 n_neighbors = 5

-for i, weights in enumerate(['uniform', 'distance']):
+plt.figure(figsize=(8, 9))
+for i, weights in enumerate(['uniform', 'distance', 'epanechnikov']):


running this example it seems that epanechnikov is a kernel in this list that does not force the line to go through the training point.

it terms of user understanding this point should be explained in the doc.

I wouldn't say that it is a characteristic property of smoothing kernels, The estimate with smoothing kernels is less bumpy and more smooth than with uniform weights and that's all. (Why did you imply that 'uniform' forces the line to go through training points?)

And only 'distance' shows this weird property, that the line has to pass through every training point. (Which again the argument not to use it all.)

nmayorov · 2014-10-29T00:21:20Z

About examples comparing performance of weighting schemes. I decided not to add them because of the following reasons:

On toy / synthetic data sets it is misleading and deceiving. Results depend mostly on train / test split and not on actual weighting schemes.
Doing it on a real data set doesn't seem to fit into docs. (There are no such boring comparisons for other methods.) Also I'm struggling to find suitable data sets among presented in scikit-learn.

Also that's the reason I removed MSE estimations (previously added by me) from plot_regression.py (They are rather meaningless.)

OK, would you guys to do the final review, @agramfort @GaelVaroquaux please.

agramfort · 2014-10-29T13:04:18Z

I am bit lost. You posted at some point results demonstrating some benefit of these new kernels. Are you saying that none of the datasets we commonly use backup this claim?

there are some scripts which go beyond simple examples in examples/applications/

nmayorov · 2014-10-29T14:01:00Z

You are right, it sounds confusing. I'm a bit lost myself.

I demonstrated 2% accuracy increase when using kernel weights in data set, containing 4435 train samples and 2000 test samples, This result is statistically significant I believe (and it is a real life example.). But when I experimented with iris and digits data set I found that there was no clear benefit of using weights different than uniform. Iris is too small and the accuracy mostly depends on train / test split. In digits the best results are obtained by 1 nearest neighbor, so the weights are irrelevant.

Experiments with small synthetic data sets also show that the accuracy changes significantly with different train / test splits. And I don't want to delude a user by choosing a "proper" random seed. I may continue looking into this direction though.

We need three properties for a data set: it's a classification problem, it's big enough, it's from real life. I don't think that synthetic data sets are that interesting.

Maybe I can add the fetch function for https://archive.ics.uci.edu/ml/datasets/Statlog+(Landsat+Satellite) ?

agramfort · 2014-10-29T16:39:19Z

+1 for fetching a more convincing dataset.

nmayorov · 2014-11-02T01:34:52Z

Hi!

I experimented more with different weights. I have to admit that I overstated their influence. The accuracy boost was only 1% (not 2), and again it depends on train / test split. In general it gives an improvement within 1% range. So it is somewhat useful, but not very.

I think it's not worth it to add 5 new weights, because they are rather similar and give only marginal improvements.

But I think some scheme called 'distance+' might be added. It can be a linear kernel, or a scheme described in http://www.joics.com/publishedpapers/2012_9_6_1429_1436.pdf I suspect that their train / test splits weren't completely random. But no doubt the proposed scheme gives some improvement.

Please give me your opinion: should I continue this PR or create a new one with a single 'distance+' scheme? Or maybe neither of that.

agramfort · 2014-11-02T09:41:12Z

I experimented more with different weights. I have to admit that I overstated their influence. The accuracy boost was only 1% (not 2), and again it depends on train / test split. In general it gives the improvement withing 1% range (on landsat). So it is somewhat useful, but not very.

ok.

I think it's not worth it to add 5 new weights, because they are rather similar and give only marginal improvements.

indeed.

But I think some scheme called 'distance+' might be added. It can be a linear kernel, or a scheme described in http://www.joics.com/publishedpapers/2012_9_6_1429_1436.pdf I suspect that their train / test splits weren't completely random. But no doubt the proposed scheme gives some improvement.

you'll need to quantify the improvement. Also remember that what you
add should be textbook material or from a highly cited paper.

Please give me your opinion: should I continue this PR or create a new one with a single 'distance+' scheme? Or maybe neither of that.

same PR is good.

nmayorov · 2014-11-03T21:19:43Z

So what I've done:

Removed all kernels but 'linear'. It's the most simple and theoretically sound option for weighting.
Added fetch_landsat.
Added example comparing accuracy on landsat for different n_neighbors.
Shortened additions in rst doc

agramfort · 2014-11-04T10:29:58Z

I have no time to look. Somebody please review.

nmayorov · 2014-11-04T11:40:58Z

@agramfort maybe you could do it later? @GaelVaroquaux, would be great if you join.

nmayorov · 2015-02-13T09:54:02Z

Hi!

If you think this PR is not worth including in the project, you can close it. Otherwise, I'm ready to continue working on it. I don't mind both variants.

amueller · 2015-02-25T19:50:41Z

Sorry for the lack of feedback. This could be interesting. Do you have any relevant paper references?

nmayorov · 2015-02-26T14:47:23Z

The main reference would be "The elements of statistical learning", chapter 6.

amueller · 2015-02-26T15:34:52Z

The problem with using this as a reference is that it is hard to tell if people find it valuable in practice ;)

nmayorov · 2015-02-26T18:46:07Z

The situation here is the same as with kernels for KDE. Look at different shapes of kernels. They all work very similarly, but it's impossible to choose one "best" kernel, thus let's have some variety.

Initially I wanted to add all kernels presented in KDE for consistency, because NN classification is kernel density estimation. But I noticed that they give marginal improvement over standard NN and decided to keep only triangular kernel as the most "straightforward". But it surely can give about +1% accuracy on some datasets, I added an example on landsat dataset.

amueller · 2015-02-26T19:10:53Z

Thanks for your comments.
Sorry, we are a bit overwhelmed with PRs at the moment, and will focus on bugfixes for the upcoming release.
I'll try to look at this in more detail soon.

nmayorov · 2015-02-27T10:38:55Z

Thanks for taking interest!

haiatn · 2023-07-29T08:11:25Z

Is this PR just waiting for review? Or are we still doubting if this is needed?

adrinjalali · 2023-07-29T11:09:07Z

Given the lack of requests for this in recent years I think we can close this. Happy to include it if somehow it comes up fresh. Thanks for the work you put into this @nmayorov

Nikolay Mayorov added 6 commits April 28, 2014 20:08

Implemented kernel weights in _get_weights function

231099a

Integrated new weight kernels into neighbors classes + tests

f9bd7c6

Docstrings modification

615b488

Changed _get_weights docstring

06cf012

Indentation fix

22d46a6

Removed an unnecessary space

9d3f813

PEP8 violations fix

ed74ed5

larsmans force-pushed the master branch from 58a55ad to 4b82379 Compare August 25, 2014 21:50

Merge branch 'master' into neighbors_kernels

fc0d300

nmayorov changed the title ~~Added kernel weighting functions for neighbors classes~~ [WIP] Added kernel weighting functions for neighbors classes Oct 7, 2014

nmayorov mentioned this pull request Oct 12, 2014

[MRG + 1] Fix KNeighborsRegressor and RadiusNeighborsRegressor returning NaN predi... #3760

Closed

agramfort reviewed Oct 14, 2014
View reviewed changes

agramfort reviewed Oct 15, 2014
View reviewed changes

Formatting tiny fix

61efb39

arjoly mentioned this pull request Oct 17, 2014

[WIP] Kernel regression #3780

Closed

Nikolay Mayorov added 4 commits October 18, 2014 01:39

Refactored out common lines + doc fix

4c6fc01

Modified narrative doc and examples

4f8c962

Now example modifications added

e2c272e

Cleaned up class doc strings

e69fbcc

agramfort reviewed Oct 18, 2014
View reviewed changes

Tiny doc fix

17391d3

Added fetch_landsat function

548e243

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

Nikolay Mayorov added 2 commits November 3, 2014 19:10

Removed redundant weights + example on weights with Landast

fb2e86c

Modified narrative doc

a2df31f

amueller added the Waiting for Reviewer label Dec 10, 2015

github-actions bot added module:datasets module:neighbors labels Mar 2, 2020

cmarmo added Needs Decision Requires decision and removed Waiting for Reviewer labels Sep 29, 2020

Base automatically changed from master to main January 22, 2021 10:48

adrinjalali closed this Jul 29, 2023

[MRG] Added kernel weighting functions for neighbors classes #3117

[MRG] Added kernel weighting functions for neighbors classes #3117

Conversation

nmayorov commented Apr 28, 2014

coveralls commented Apr 29, 2014

nmayorov commented Oct 7, 2014

agramfort commented Oct 12, 2014

nmayorov commented Oct 13, 2014

agramfort commented Oct 13, 2014

nmayorov commented Oct 13, 2014

arjoly commented Oct 14, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

agramfort commented Oct 14, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nmayorov commented Oct 17, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nmayorov commented Oct 29, 2014

agramfort commented Oct 29, 2014

nmayorov commented Oct 29, 2014

agramfort commented Oct 29, 2014

nmayorov commented Nov 2, 2014

agramfort commented Nov 2, 2014

nmayorov commented Nov 3, 2014

agramfort commented Nov 4, 2014

nmayorov commented Nov 4, 2014

nmayorov commented Feb 13, 2015

amueller commented Feb 25, 2015

nmayorov commented Feb 26, 2015

amueller commented Feb 26, 2015

nmayorov commented Feb 26, 2015

amueller commented Feb 26, 2015

nmayorov commented Feb 27, 2015

haiatn commented Jul 29, 2023

adrinjalali commented Jul 29, 2023