Addition of kmeanscrossvalidator #61

kayhoogland · 2019-03-20T15:41:20Z

First version of the kmeans cross validator discussed here #5

sklego/model_selection.py

MBrouns · 2019-03-28T09:36:07Z

Maybe it's a bit nitpicky, but currently this implementation is fully built on KMeans whereas any clusteringmethod should work right? Maybe something for the future to make the CV take any clusterer as a parameter

koaning · 2019-03-28T17:25:31Z

@MBrouns I agree. Related: I think KlusterFoldValidation has a nice ring to it.

koaning · 2019-03-28T17:26:51Z

@kayhoogland whats your opinion on this? it feels weird to merge this knowing that we will add a more custom thing as well. (also ... typically clustering algorithms are very prone to overfitting on a column if there's no standardisation so i think we don't just want to accept multiple clustering algorithms but that we want to allow for pipelines)

kayhoogland · 2019-03-28T20:04:30Z

I agree, it seems the logical thing to do. Also, the name is indeed catchy ;)

…g methods

…hoogland/scikit-lego into feature_kmeanscrossvalidator

kayhoogland · 2019-04-05T15:40:16Z

I did some changes to enable for more clustering methods. A difficult thing to take into account is the n_splits from _BaseKFold. Some (most) clustering methods (for example DBScan) generate splits based on the data itself.

sklego/model_selection.py

…ssvalidator

koaning · 2019-06-18T20:35:19Z

closed it because i hadnt heard from it in a while. ill gladly re-open if there's more attention for it.

MBrouns · 2019-06-19T12:24:38Z

I just discussed this with @kayhoogland and we see a good way going forward. I'm putting my feedback in a revie

MBrouns · 2019-06-19T12:26:43Z

sklego/model_selection.py

+        if isinstance(X, pd.DataFrame):
+            X = X.values
+
+        clusters = self.cluster_method.fit_predict(X)


For not refitting if self.cluster_method is already fitted:

from sklearn.exceptions import NotFittedError try: clusters = self.cluster_method.predict(X) except NotFittedError: clusters = self.cluster_method.fit_predict(X)

MBrouns · 2019-06-19T12:27:08Z

sklego/model_selection.py

+        super(KlusterFoldValidation, self).__init__(n_splits=3,
+                                                    shuffle=False,
+                                                    random_state=random_state)
+


It's probably nice to set self.n_splits = None here

sklego/model_selection.py

Requested changes are made

koaning · 2019-07-03T11:28:30Z

w00t.

Addition of kmeanscrossvalidator

36e88d2

kayhoogland mentioned this pull request Mar 20, 2019

feature request: kmeans crossvalidator #5

Closed

Merge branch 'master' into feature_kmeanscrossvalidator

e5d6e2b

MBrouns requested changes Mar 21, 2019

View reviewed changes

sklego/model_selection.py Outdated Show resolved Hide resolved

sklego/model_selection.py Outdated Show resolved Hide resolved

Hoogland, Kay (ITCDDV) - KLM and others added 3 commits March 26, 2019 10:54

Changes based on review

47e2cd8

Issue template from merge

cd8d4d5

Merge branch 'master' into feature_kmeanscrossvalidator

c6a0a65

Merge branch 'master' into feature_kmeanscrossvalidator

a0807dc

koaning and others added 3 commits March 29, 2019 12:39

Merge branch 'master' into feature_kmeanscrossvalidator

08a0819

Changed kmeans crossvalidator in KlusterFold to enable more clusterin…

6486586

…g methods

Merge branch 'feature_kmeanscrossvalidator' of https://github.com/kay…

f781591

…hoogland/scikit-lego into feature_kmeanscrossvalidator

koaning previously requested changes Apr 5, 2019

View reviewed changes

sklego/model_selection.py Outdated Show resolved Hide resolved

sklego/model_selection.py Outdated Show resolved Hide resolved

Merge remote-tracking branch 'upstream/master' into feature_kmeanscro…

cbcb144

…ssvalidator

koaning closed this Jun 18, 2019

MBrouns reopened this Jun 19, 2019

Merge with master

efaef5e

kayhoogland force-pushed the feature_kmeanscrossvalidator branch from 2541602 to efaef5e Compare July 3, 2019 07:46

Hoogland, Kay (ITCDDV) - KLM added 3 commits July 3, 2019 11:51

Fix for not fitted, removed inherit from _BaseKfold

025ebf5

Reverted format of TimeGapSplit

c3fd13c

More elegant way to test for fit_predict

9326b65

MBrouns approved these changes Jul 3, 2019

View reviewed changes

MBrouns merged commit c41cacd into koaning:master Jul 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Addition of kmeanscrossvalidator #61

Addition of kmeanscrossvalidator #61

kayhoogland commented Mar 20, 2019

MBrouns commented Mar 28, 2019

koaning commented Mar 28, 2019

koaning commented Mar 28, 2019

kayhoogland commented Mar 28, 2019

kayhoogland commented Apr 5, 2019 •

edited

Loading

koaning commented Jun 18, 2019

MBrouns commented Jun 19, 2019

MBrouns Jun 19, 2019

MBrouns Jun 19, 2019

koaning commented Jul 3, 2019

Addition of kmeanscrossvalidator #61

Addition of kmeanscrossvalidator #61

Conversation

kayhoogland commented Mar 20, 2019

MBrouns commented Mar 28, 2019

koaning commented Mar 28, 2019

koaning commented Mar 28, 2019

kayhoogland commented Mar 28, 2019

kayhoogland commented Apr 5, 2019 • edited Loading

koaning commented Jun 18, 2019

MBrouns commented Jun 19, 2019

MBrouns Jun 19, 2019

Choose a reason for hiding this comment

MBrouns Jun 19, 2019

Choose a reason for hiding this comment

koaning commented Jul 3, 2019

kayhoogland commented Apr 5, 2019 •

edited

Loading