[WIP] TST Add test to check if estimators reset model when fit is called #4162

raghavrv · 2015-01-26T16:32:04Z

Partially fixes #406

Merge after ~~#3907~~ #4841

Also see #3907 for partial_fit tests...

amueller · 2015-01-26T16:43:07Z

I would change as much as possible about that data, that is number of data points, number of features, number of classes, scale of features. Maybe fix the random state but scale the whole data or separate features differently.
If no tests fails maybe you are not testing enough ;)

raghavrv · 2015-01-26T17:41:04Z

How do these look?

I've set the center box to scale the data ( kind of )
varied centers to change no of classes
varied n_features
varied no of centers --> change effective no. of classes...
varied n_samples
One estimator undergoes 2 fits while the other only one.

amueller · 2015-01-26T19:07:00Z

can you fix the import failures on travis so we can see if anything actually fails?

amueller · 2015-01-26T19:12:04Z

btw, I would rather add this in the test_non_meta_estimators to make sure all estimators are tested.

amueller · 2015-01-26T19:13:17Z

How does that work with current master btw, you are testing clustering algorithms, but the fix for clustering algorithms to work with optional y is in #4064.

raghavrv · 2015-01-26T19:32:51Z

can you fix the import failures on travis so we can see if anything actually fails?

This needs ignore_warnings to be imported and more importantly the assert_same_model and assert_not_same_model helpers implemented in #3907... I thought we could come back to this after that one gets merged... what do you say?

btw, I would rather add this in the test_non_meta_estimators to make sure all estimators are tested.

Sure! thanks will add it...

but the fix for clustering algorithms to work with optional y is in #-4064

The _fit() helper in #3907 should take care of that I think...

Actually this was supposed to be a part of that PR itself... but I thought it would be less relevant to the partial_fit tests and add some more noise to the diffs...

amueller · 2015-01-26T19:35:16Z

ok, got it.
The _fit helper will not be needed any more once #4064 is merged.

amueller · 2015-01-26T19:36:43Z

You can make this one also "on top" of the other one, so continue the commits from there, if this one relies on it. That might make the changes here harder to review, but at least you have working code.

raghavrv · 2015-01-26T19:37:10Z

Okay will remove those in both these PRs... and wait for #4064 to be merged...

amueller · 2015-01-26T19:38:37Z

No don't. I'm not sure which will be merged first, yours or #4064.

raghavrv · 2015-01-26T19:39:32Z

You can make this one also "on top" of the other one

Sure! I'll do that... ;)

raghavrv · 2015-01-26T19:40:39Z

No don't. I'm not sure which will be merged first, yours or #4064.

Oh okay... I'll cross reference it so that I remember to clean it all up after all three gets merged...

raghavrv · 2015-01-27T08:30:39Z

@amueller A few clusterers like AgglomerativeClustering do not have predict and access the labels_ directly for the result...

We could :

Special case such clusterers and test them separately
Get the result from clusterer.labels_ while checking equality of models ( inside assert_same_model / assert_not_same_model )

which would be preferable?

( Also why are they designed such ( without predict ) ? Is there any discussion on that? )

jnothman · 2015-01-27T10:38:58Z

They're transductive, rather than inductive, algorithms. They're not designed to create a general model that can then be applied to other data, but a model of the specific data they're given. It's a bit annoying that they can't be tested in this manner. One option is to test fit_predict and fit_transform where available (which are intended for transduction).

raghavrv · 2015-01-27T15:15:17Z

Thanks for the response :)

I feel we could special case them inside the assert_*same_model function by checking for the specific results for the specific Estimators...

These are the transductive algorithms and their results to be checked ( Please feel free to edit this comment ):

AgglomerativeClustering - labels_
DBSCAN - labels_
EmpiricalCovariance - covariance_, and precision_
GraphLasso - covariance_, and precision_
GraphLassoCV - covariance_, and precision_, and grid_scores
KernelDensity - ?
LSHForest - ( do I have to evaluate individual hash function objects? )
LedoitWolf - covariance_, and precision_, and shrinkage_
MDS - embedding_ and stress_
MinCovDet - covariance_, and precision_, and support_ and dist_
NearestNeighbors - skip maybe?
OAS - covariance_, and precision_, and shrinkage_
ShrunkCovariance - covariance_, and precision_, and shrinkage_
SpectralBiclustering - rows_ and columns_ and row_labels_ and column_labels_
SpectralClustering - labels_ + affinity_matrix_
SpectralCoclustering - rows_ and columns_ and row_labels_ and column_labels_
SpectralEmbedding - embedding_ and afffinity_matrix_
TSNE - embedding_ + training_data
Ward - labels_

raghavrv · 2015-01-27T15:19:53Z

And yeah this will look ugly :/ please feel free to suggest a better way...

BTW I cannot use fit_predict and fit_transform without checking all other estimators' result also directly from the fit_predict and fit_transform... ( In which case the assert*_same_model will not be used in this test alone... )

amueller · 2015-01-27T19:30:00Z

Can you elaborate on the last comment? Why not try fit(X_train).predict(X_test) when possible and otherwise use fit_predict(X_train)?

jnothman · 2015-01-27T20:48:23Z

Because we're trying to evaluate whether two models are the same after they've been fit in different ways. So the fit step isn't within the assert_same_model procedure.

Yes, we could consider falling back to directly looking for and comparing attributes, where they are of appropriate type and dtype.

…_model

raghavrv · 2015-02-11T06:58:29Z

NOTE - The assert_fitted_attributes_equal as defined in ~~#3907~~ #4841 takes care of transductive algorithms too...

raghavrv · 2015-06-16T16:49:27Z

closing in favour of #4841

raghavrv force-pushed the fit_reset_test branch from 2d07e59 to 31a2626 Compare January 26, 2015 17:26

amueller mentioned this pull request Jan 26, 2015

[MRG+1] FIX Pipelined fitting of Clustering algorithms, scoring of K-Means in pipelines #4064

Merged

raghavrv force-pushed the fit_reset_test branch from 31a2626 to 2183d28 Compare January 27, 2015 14:13

TST Add test to check if estimators reset model when fit is called

602856b

raghavrv force-pushed the fit_reset_test branch from 2183d28 to 602856b Compare January 28, 2015 19:17

raghavrv referenced this pull request in raghavrv/scikit-learn Jan 30, 2015

TST Add tests for assert_not_raises assert_same_model assert_not_same…

f38959d

…_model

raghavrv mentioned this pull request May 16, 2015

[WIP] Adding tests for estimators implementing partial_fit and a few other related fixes / enhancements #3907

Closed

6 tasks

raghavrv changed the title ~~[MRG after #3907] TST Add test to check if estimators reset model when fit is called~~ [WIP] TST Add test to check if estimators reset model when fit is called May 17, 2015

jnothman mentioned this pull request Jun 10, 2015

[WIP] New assert helpers for model comparison and fit reset checks #4841

Closed

4 tasks

raghavrv closed this Jun 16, 2015

raghavrv deleted the fit_reset_test branch June 16, 2015 16:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] TST Add test to check if estimators reset model when fit is called #4162

[WIP] TST Add test to check if estimators reset model when fit is called #4162

raghavrv commented Jan 26, 2015

amueller commented Jan 26, 2015

raghavrv commented Jan 26, 2015

amueller commented Jan 26, 2015

amueller commented Jan 26, 2015

amueller commented Jan 26, 2015

raghavrv commented Jan 26, 2015

amueller commented Jan 26, 2015

amueller commented Jan 26, 2015

raghavrv commented Jan 26, 2015

amueller commented Jan 26, 2015

raghavrv commented Jan 26, 2015

raghavrv commented Jan 26, 2015

raghavrv commented Jan 27, 2015

jnothman commented Jan 27, 2015

raghavrv commented Jan 27, 2015

raghavrv commented Jan 27, 2015

amueller commented Jan 27, 2015

jnothman commented Jan 27, 2015

raghavrv commented Feb 11, 2015

raghavrv commented Jun 16, 2015

[WIP] TST Add test to check if estimators reset model when fit is called #4162

[WIP] TST Add test to check if estimators reset model when fit is called #4162

Conversation

raghavrv commented Jan 26, 2015

amueller commented Jan 26, 2015

raghavrv commented Jan 26, 2015

amueller commented Jan 26, 2015

amueller commented Jan 26, 2015

amueller commented Jan 26, 2015

raghavrv commented Jan 26, 2015

amueller commented Jan 26, 2015

amueller commented Jan 26, 2015

raghavrv commented Jan 26, 2015

amueller commented Jan 26, 2015

raghavrv commented Jan 26, 2015

raghavrv commented Jan 26, 2015

raghavrv commented Jan 27, 2015

jnothman commented Jan 27, 2015

raghavrv commented Jan 27, 2015

raghavrv commented Jan 27, 2015

amueller commented Jan 27, 2015

jnothman commented Jan 27, 2015

raghavrv commented Feb 11, 2015

raghavrv commented Jun 16, 2015