Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] TST Add test to check if estimators reset model when fit is called #4162

Closed
wants to merge 1 commit into from

Conversation

raghavrv
Copy link
Member

Partially fixes #406

Merge after #3907 #4841

Also see #3907 for partial_fit tests...

@amueller Ping!

@amueller
Copy link
Member

I would change as much as possible about that data, that is number of data points, number of features, number of classes, scale of features. Maybe fix the random state but scale the whole data or separate features differently.
If no tests fails maybe you are not testing enough ;)

@raghavrv
Copy link
Member Author

How do these look?

  • I've set the center box to scale the data ( kind of )
  • varied centers to change no of classes
  • varied n_features
  • varied no of centers --> change effective no. of classes...
  • varied n_samples
  • One estimator undergoes 2 fits while the other only one.

@amueller
Copy link
Member

can you fix the import failures on travis so we can see if anything actually fails?

@amueller
Copy link
Member

btw, I would rather add this in the test_non_meta_estimators to make sure all estimators are tested.

@amueller
Copy link
Member

How does that work with current master btw, you are testing clustering algorithms, but the fix for clustering algorithms to work with optional y is in #4064.

@raghavrv
Copy link
Member Author

can you fix the import failures on travis so we can see if anything actually fails?

This needs ignore_warnings to be imported and more importantly the assert_same_model and assert_not_same_model helpers implemented in #3907... I thought we could come back to this after that one gets merged... what do you say?

btw, I would rather add this in the test_non_meta_estimators to make sure all estimators are tested.

Sure! thanks will add it...

but the fix for clustering algorithms to work with optional y is in #-4064

The _fit() helper in #3907 should take care of that I think...

Actually this was supposed to be a part of that PR itself... but I thought it would be less relevant to the partial_fit tests and add some more noise to the diffs...

@amueller
Copy link
Member

ok, got it.
The _fit helper will not be needed any more once #4064 is merged.

@amueller
Copy link
Member

You can make this one also "on top" of the other one, so continue the commits from there, if this one relies on it. That might make the changes here harder to review, but at least you have working code.

@raghavrv
Copy link
Member Author

Okay will remove those in both these PRs... and wait for #4064 to be merged...

@amueller
Copy link
Member

No don't. I'm not sure which will be merged first, yours or #4064.

@raghavrv
Copy link
Member Author

You can make this one also "on top" of the other one

Sure! I'll do that... ;)

@raghavrv
Copy link
Member Author

No don't. I'm not sure which will be merged first, yours or #4064.

Oh okay... I'll cross reference it so that I remember to clean it all up after all three gets merged...

@raghavrv
Copy link
Member Author

@amueller A few clusterers like AgglomerativeClustering do not have predict and access the labels_ directly for the result...

We could :

  • Special case such clusterers and test them separately
  • Get the result from clusterer.labels_ while checking equality of models ( inside assert_same_model / assert_not_same_model )

which would be preferable?

( Also why are they designed such ( without predict ) ? Is there any discussion on that? )

@jnothman
Copy link
Member

They're transductive, rather than inductive, algorithms. They're not designed to create a general model that can then be applied to other data, but a model of the specific data they're given. It's a bit annoying that they can't be tested in this manner. One option is to test fit_predict and fit_transform where available (which are intended for transduction).

@raghavrv
Copy link
Member Author

Thanks for the response :)

I feel we could special case them inside the assert_*same_model function by checking for the specific results for the specific Estimators...

These are the transductive algorithms and their results to be checked ( Please feel free to edit this comment ):

  • AgglomerativeClustering - labels_
  • DBSCAN - labels_
  • EmpiricalCovariance - covariance_, and precision_
  • GraphLasso - covariance_, and precision_
  • GraphLassoCV - covariance_, and precision_, and grid_scores
  • KernelDensity - ?
  • LSHForest - ( do I have to evaluate individual hash function objects? )
  • LedoitWolf - covariance_, and precision_, and shrinkage_
  • MDS - embedding_ and stress_
  • MinCovDet - covariance_, and precision_, and support_ and dist_
  • NearestNeighbors - skip maybe?
  • OAS - covariance_, and precision_, and shrinkage_
  • ShrunkCovariance - covariance_, and precision_, and shrinkage_
  • SpectralBiclustering - rows_ and columns_ and row_labels_ and column_labels_
  • SpectralClustering - labels_ + affinity_matrix_
  • SpectralCoclustering - rows_ and columns_ and row_labels_ and column_labels_
  • SpectralEmbedding - embedding_ and afffinity_matrix_
  • TSNE - embedding_ + training_data
  • Ward - labels_

@raghavrv
Copy link
Member Author

And yeah this will look ugly :/ please feel free to suggest a better way...

BTW I cannot use fit_predict and fit_transform without checking all other estimators' result also directly from the fit_predict and fit_transform... ( In which case the assert*_same_model will not be used in this test alone... )

@amueller
Copy link
Member

Can you elaborate on the last comment? Why not try fit(X_train).predict(X_test) when possible and otherwise use fit_predict(X_train)?

@jnothman
Copy link
Member

Because we're trying to evaluate whether two models are the same after they've been fit in different ways. So the fit step isn't within the assert_same_model procedure.

Yes, we could consider falling back to directly looking for and comparing attributes, where they are of appropriate type and dtype.

@raghavrv
Copy link
Member Author

NOTE - The assert_fitted_attributes_equal as defined in #3907 #4841 takes care of transductive algorithms too...

@raghavrv raghavrv changed the title [MRG after #3907] TST Add test to check if estimators reset model when fit is called [WIP] TST Add test to check if estimators reset model when fit is called May 17, 2015
@raghavrv
Copy link
Member Author

closing in favour of #4841

@raghavrv raghavrv closed this Jun 16, 2015
@raghavrv raghavrv deleted the fit_reset_test branch June 16, 2015 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

factorize common tests
3 participants