MRG Fit pairwise #803

amueller · 2012-04-29T21:20:34Z

This is supposed to be (a draft of?) the fit_pairwise proposal.
It creates fit_pairwise and predict_pairwise functions for SVM, KernelPCA, SpectralClustering and AffinityPropagation.
It also makes all these usable using GridSearchCV with a new fit_pairwise for the grid search.

Now uses a property to check whether the estimator was fit using a "pairwise" X.

mblondel · 2012-04-30T02:16:58Z

In retrospect, I am a bit worried by the fact that this will add 4 new methods (fit_pairwise, predict_pairwise, transform_pairwise and fit_transform_pairwise). Another way to solve the grid search problem would be to agree upon on a flag that must be set by the estimator and can be checked by the grid search. So, for example, if kernel="precomputed", we internally set self.pairwise = True, flag that would be checked by grid search.

In algorithms that use a similarity / affinity matrices in fit, we should add a new metric option. The input X in fit and other methods would then be considered as a similarity / distance matrix if and only if metric is set to "precomputed". Otherwise, X would be conveniently transformed for the user by calling pairwise_distances with the given metric.

amueller · 2012-04-30T06:16:54Z

Yeah, four new methods are a lot. I didn't really see that until I tried to implement everything and adjust the tests.
Just setting a self.pairwise variable might be a good thing.

I would not call the option you suggest "metric" as I think it does not need to be one. This can probably be decided on an per-algorithm base and made as consistent as possible ;)
As long as internally self.pairwise is set, this should be fine.

mblondel · 2012-04-30T07:08:49Z

I would not call the option you suggest "metric" as I think it does not

need to be one. This can probably be decided on an per-algorithm base and
made as consistent as possible ;)
As long as internally self.pairwise is set, this should be fine.

Agreed. But, if internally pairwise_distances is used, we can call it
metric (for pairwise_kernels, I would have preferred kernel...)

agramfort · 2012-04-30T07:12:49Z

self.pairwise is a simple and tempting idea. I would however use a function like is_pairwise() as you probably don't want to have svc.kernel and svc.pairwise defined in the init (required by clone). A property would work too but a function is more explicit.

mblondel · 2012-04-30T07:25:54Z

If I'm not mistaken, this requirement only holds if the attribute is a constructor parameter, which wouldn't be the case here.

agramfort · 2012-04-30T07:44:32Z

I certainly agree but to rephrase, svc.pairwise to remain properly defined after a clone should either be an init param or a property. Otherwise we could have svc.is_pairwise()

mblondel · 2012-04-30T09:16:59Z

Ah ok, I got your point now. A 3rd way is to modify clone so as to check for the presence of pairwise and reset it in the cloned object.

agramfort · 2012-04-30T12:03:45Z

Ah ok, I got your point now. A 3rd way is to modify clone so as to check for the presence of pairwise and reset it in the cloned object.

why not if it's an exception for pairwise. I am sure @GaelVaroquaux
will have a strong opinion on this :)

mblondel · 2012-04-30T12:23:42Z

I wouldn't be shocked to have it in clone since pairwise could be thought as a reserved attribute that is part of the API specification. Let's wait for @GaelVaroquaux 's verdict :)

…tions.

amueller · 2012-06-03T18:45:53Z

ping @mblondel @agramfort @GaelVaroquaux
I would love some opinions on this PR ;)

amueller · 2012-06-03T18:48:21Z

Maybe the affinity propagation still needs some love... though I'm not very familiar with that.

Conflicts: sklearn/svm/base.py

amueller · 2012-08-07T19:48:58Z

One more comment on property vs function:
If _pairwise is a function instead of a property, the test becomes

getattr(est, '_pairwise', lambda: False)()

Which looks a bit ugly to me. Therefor I'm +0 on property vs function.

amueller · 2012-08-21T18:28:05Z

Any opinions on this at all?

ogrisel · 2012-08-21T21:25:27Z

I would +1 reusing pairwise_kernels whenever it makes sense, possibly extending it with new affinities (not necessarily PSD as the sigmoid kernel is not PSD) such as "nearest_neighbors" as implemented in the current SpectralClustering fit method.

I don't see why you say that spectral clustering and affinity propagation use different styles of affinities: you can select it according to what make sense depending on the nature of the data and the assumption you make on the cluster structure.

ogrisel · 2012-08-21T21:26:39Z

As for the _pairwise attribute / property vs method debate I would be +0 on the attribute / property option as it seems more natural that a method without argument.

amueller · 2012-08-21T22:32:29Z

@ogrisel thanks for your comment. Maybe I'm just not familiar enough with affinity propagation. Have you an example where it is used with something that is not the euclidean distance?
I think I'll add string options to affinity propagation and spectral clustering, extend the pairwise_kernels, rename the attribute in affinity propagation and then hopefully we are good to go.

ogrisel · 2012-08-21T22:44:07Z

2012/8/22 Andreas Mueller notifications@github.com

@ogrisel https://github.com/ogrisel thanks for your comment. Maybe I'm
just not familiar enough with affinity propagation. Have you an example
where it is used with something that is not the euclidean distance?

I think that cosine similarity (e.g. dot product of normalized TF-IDF
vectors) would make sense for text data for instance.

Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

…ightly change the meaning of scalar parameter. Scaling the medium seems more intuitive that giving absolute values.

mblondel · 2012-08-26T15:16:35Z

sklearn/cluster/affinity_propagation_.py


        Parameters
        ----------

-        S: array [n_points, n_points]
+        X: array [n_points, n_points]


The shape should be [n_samples, n_samples] if affinity == "precomputed" and [n_samples, n_features] otherwise.

…f a ``precomputed`` parameter, to support other affinities in the future.

…or consistency.

mblondel · 2012-08-26T15:21:34Z

sklearn/cluster/spectral.py

+
+            np.exp(-gamma * d(X,X) ** 2)
+
+    or a k-nearest neighbors matrix.


connectivity matrix

amueller · 2012-08-26T15:24:03Z

@mblondel Thanks for your comments. You gave -1 on something somewhere but i can't see it any more. Was that the thing I just fixed?

@mblondel

…g that @mblondel pointed out.

mblondel · 2012-08-26T15:34:13Z

I thought you wanted to rename affinity="precomputed" to affinity="affinity"... Your change is fine :)

mblondel · 2012-08-26T15:38:14Z

It's nice that SpectralClustering and AffinityPropagation will follow the scikit-learn API, at last :)

As I said on the ML, we could postpone the decision about the code factorization of affinity propagations to later. This way we can address the more cosmetic changes (like using a property or not) and merge soon. This PR has been pending for too long :)

amueller · 2012-08-26T15:40:14Z

I think this looks ok now. It does not do a lot of code sharing, because of the inclusion loop problem (see ML).
But it does change the API in a way that should make the refactoring very easy.

What this PR accomplishes now is:

Have a consistent API for clustering, that is the same as in the rest of the scikit
Enable grid-search for all estimators with precomputed similarities / kernels

This is important because it make these next steps possible:

Work on the precomputed kernel interface for SVMs (sparse case)
Unify testing for clustering
Enable grid search for clustering (needs scoring function)
Refactor affinity computations

@mblondel I fully agree :)

… to parse

ogrisel · 2012-08-27T06:10:24Z

+1 for postponing the affinity == neighbors refactoring and get this merged first.

ogrisel · 2012-08-27T06:13:32Z

I think I am 👍 for merging if all the tests pass. Can you quickly check the test coverage report to check that the new code does not introduce new untested code blocks?

Also can you check that the affinity / spectral clustering example still run and have their plots look good with your refactoring?

BTW thanks for working on this.

… (was a bit over-eager there)

amueller · 2012-08-27T07:47:49Z

Always a pleasure ;)
I'll add some more tests and then merge.

amueller · 2012-08-27T08:41:10Z

hurray :)

GaelVaroquaux · 2012-08-29T09:11:15Z

Good job!!!

amueller added 16 commits June 3, 2012 20:33

ENH "fit_pairwise" for spectral clustering.

86baa8a

ENH Starting on affinity propagation

28fd13f

DOC typo

faea50d

DOC Improving docstring for SpectralClustering

7054701

ENH fixed affinity propagation test. Need more tests.

1d88edc

ENH fit_pairwise, transform_pairwise for KernelPCA

c39ad04

ENH base svm has fit_pairwise and predict_pairwise.

37dbc5b

ENH fit_transform_pairwise for KernelPCA

4f11178

ENH isomap uses new interface.

d91cf3c

COSMIT get rid of debugging output

8903668

ENH GridSearchCV uses the new API

9f20a9c

COSMIT forgot one print...

a26edc4

DOC Deprecation warning with removal version 0.13.

5072bb0

ENH going for a universal property _pairwise instead of many func…

ef82d75

…tions.

ENH Cleanup

ac1fb00

FIX Fixing rebasing problems...

6c21e25

amueller mentioned this pull request Jun 7, 2012

MRG factorize common tests. #893

Merged

amueller added 2 commits June 22, 2012 22:20

Merge branch 'master' into fit_pairwise

37cadf0

Conflicts: sklearn/svm/base.py

MISC callable kernel gridsearch fix...

537911e

Merge branch 'master' into fit_pairwise

194b0f1

ENH rename paramter p of AffinitPropagation to preference, sl…

da80997

…ightly change the meaning of scalar parameter. Scaling the medium seems more intuitive that giving absolute values.

mblondel reviewed Aug 26, 2012
View reviewed changes

amueller added 2 commits August 26, 2012 17:20

ENH affinity propagation now has an affinity parameter, instead o…

f6aaa4d

…f a ``precomputed`` parameter, to support other affinities in the future.

ENH renamed gaussian affinity to rbf in spectral clustering f…

7c4f112

…or consistency.

mblondel reviewed Aug 26, 2012
View reviewed changes

sklearn/cluster/spectral.py

np.exp(-gamma * d(X,X) ** 2)

or a k-nearest neighbors matrix.

Copy link

Member

mblondel Aug 26, 2012

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

connectivity matrix

COSMIT renamed n_points to n_samples everywhere, fixed shape docstrin…

2b328ca

…g that @mblondel pointed out.

ENH renamed neq_sqr_euclidean to euclidean so we it is easier…

949ddd1

… to parse

ENH add test, revert affinity propagation to previous parametrization…

0df98bb

… (was a bit over-eager there)

TST added tests for different spectral clustering affinities

dcbe01f

amueller merged commit dcbe01f into scikit-learn:master Aug 27, 2012

This was referenced Jun 17, 2016

Make KernelCenterer a _pairwise operation fishcorn/scikit-learn#1

Closed

[MRG+1] Make KernelCenterer a _pairwise operation #6900

Merged

amueller mentioned this pull request Dec 7, 2016

[MRG+1] allow callable kernels in cross-validation #8005

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MRG Fit pairwise #803

MRG Fit pairwise #803

amueller commented Apr 29, 2012

mblondel commented Apr 30, 2012

amueller commented Apr 30, 2012

mblondel commented Apr 30, 2012

agramfort commented Apr 30, 2012

mblondel commented Apr 30, 2012

agramfort commented Apr 30, 2012

mblondel commented Apr 30, 2012

agramfort commented Apr 30, 2012

mblondel commented Apr 30, 2012

amueller commented Jun 3, 2012

amueller commented Jun 3, 2012

amueller commented Aug 7, 2012

amueller commented Aug 21, 2012

ogrisel commented Aug 21, 2012

ogrisel commented Aug 21, 2012

amueller commented Aug 21, 2012

ogrisel commented Aug 21, 2012

mblondel Aug 26, 2012

mblondel Aug 26, 2012

amueller commented Aug 26, 2012

mblondel commented Aug 26, 2012

mblondel commented Aug 26, 2012

amueller commented Aug 26, 2012

ogrisel commented Aug 27, 2012

ogrisel commented Aug 27, 2012

amueller commented Aug 27, 2012

amueller commented Aug 27, 2012

GaelVaroquaux commented Aug 29, 2012


		np.exp(-gamma * d(X,X) ** 2)

		or a k-nearest neighbors matrix.

MRG Fit pairwise #803

MRG Fit pairwise #803

Conversation

amueller commented Apr 29, 2012

mblondel commented Apr 30, 2012

amueller commented Apr 30, 2012

mblondel commented Apr 30, 2012

agramfort commented Apr 30, 2012

mblondel commented Apr 30, 2012

agramfort commented Apr 30, 2012

mblondel commented Apr 30, 2012

agramfort commented Apr 30, 2012

mblondel commented Apr 30, 2012

amueller commented Jun 3, 2012

amueller commented Jun 3, 2012

amueller commented Aug 7, 2012

amueller commented Aug 21, 2012

ogrisel commented Aug 21, 2012

ogrisel commented Aug 21, 2012

amueller commented Aug 21, 2012

ogrisel commented Aug 21, 2012

mblondel Aug 26, 2012

Choose a reason for hiding this comment

mblondel Aug 26, 2012

Choose a reason for hiding this comment

amueller commented Aug 26, 2012

mblondel commented Aug 26, 2012

mblondel commented Aug 26, 2012

amueller commented Aug 26, 2012

ogrisel commented Aug 27, 2012

ogrisel commented Aug 27, 2012

amueller commented Aug 27, 2012

amueller commented Aug 27, 2012

GaelVaroquaux commented Aug 29, 2012