factorize common tests #406

mblondel · 2011-10-20T14:05:17Z

Easy:

in test_common, check that the ValueError raise has a useful error message. (see sparse test for an example)
put as many of the "specific" tests in test_clustering, test_transformers, ... into test_non_meta_estimators.

Not so easy:

calling fit forgets the previous model if any
~~check how classifiers handle only one class being present~~
~~test how models handle non-float input (does uint8 cause overflows?)~~

Things done

We should factorize common tests in a new file test_common.py (or maybe test_input.py?). Things to check:

~~can pickle the object~~
~~raise an exception when data contains nans~~
~~raise an exception for invalid input (e.g., np.matrix or sp.csr_matrix if dense only implementation)~~
~~raise an exception if n_features is not the same in fit and predict or transform~~
~~__repr__ and clone work~~
~~check that we can pickle and unpickle estimators.~~
~~check that all classifiers have a classes_ attribute (needs some fixes)~~

Edit by @amueller!

Edit by @GaelVaroquaux on Aug 13th 2014 to reflect progress in the codebase.

The text was updated successfully, but these errors were encountered:

GaelVaroquaux · 2011-10-20T14:26:55Z

Thinks to check:

That the repr actually works, and that we can 'clone' the object. In
other words, that the init does set the parameters it pretends it
does.

amueller · 2012-06-26T17:08:24Z

Ok so #893 addressed some of these. Still todo

testing clustering algorithms
testing transformers (which are quite a lot, i.e. decomposition, manifold, preprocessing)
using training / test set split
fit forgets model
input validation testing
sparse matrix support testing

...

amueller · 2012-08-10T11:52:16Z

From #943 (which is a duplicate of this one I guess):

test how classifiers handle only one class being present

~~- check clustering and transformer objects~~

check how non-numeric-contiguous labels are handled in classifiers

~~- test for consistent output shapes in regressors / classifiers / transformers~~

test how methods handle non-float input (does uint8 give overflow errors?)
test that all classifiers have classes_
test that value errors that are raised are actually informative - @GaelVaroquaux did a good job on the sparse check.

Sparse matrix support testing is done thanks to @GaelVaroquaux. Input validation testing should be good now, too.
For handling non-numeric labels and handling only one class in classification, I didn't have the impression that there is a consensus on what to do, so we have to decide that first.

amueller · 2012-08-10T11:55:12Z

Oh and I don't really know how to see that fit forgets the model. There is no way to see which attributes are set by fit, right?
We could test the __dict__ of the model, if we wanted...

raghavrv · 2015-01-26T14:21:57Z

@amueller

We could test the __dict__ of the model, if we wanted...

Could we test that by fit ( with x1 ) --> fit ( with x2 ) --> fit ( with x1 ) and making sure 1 and 3 are equal and 1, 2 are not?

amueller · 2015-01-26T15:35:20Z

Yeah, that would be possible test, not super strong, though.
The other would be .fit(X1, y1).fit(X2, y2) == lfit(X3, y3).fit(X2, y2)

raghavrv · 2015-01-26T16:12:18Z

thanks for the response!

yeah that seems better... will open a PR...

amueller · 2015-04-01T19:38:52Z

Just to bump this up again: Replacing "assert_raises" by "assert_raise_message" or "assert_raises_regexp" in the common tests should be pretty simple if any of the GSoC people are bored ;)

vinayak-mehta · 2015-04-03T20:03:39Z

@amueller, you mean replacing assert_raises throughout the sklearn tests?

raghavrv · 2015-04-04T08:49:34Z

Yes he means replacing assert_raises with assert_raises_regexp like you did here.

raghavrv · 2015-04-04T08:50:36Z

Also sometimes it would be simpler to not use the full error message and use only parts of it like done here.

vinayak-mehta · 2015-04-04T09:03:01Z

Ok, but what I meant was, are we in a way deprecating assert_raises from sklearn by replacing all of its instances?

raghavrv · 2015-04-04T10:20:27Z

Nope... We just want to assert that the code not only raises a given exception, but also raises the intended error message... This helps in making sure that the test is correct.

Say for example

KMeans(n_init=0).fit(np.random.random_sample((1, 2)))
KMeans(n_init=5).fit(np.random.random_sample((1, 2)))
KMeans(n_init=0).fit(np.random.random_sample((10, 2)))

All raise ValueError

but only the 3rd line along with assert_raises_regex correctly checks if n_init=0 raises that ValueError like tested here.

raghavrv · 2015-04-04T10:33:38Z

Its also probably a good idea to use assert_raises_regex instead of assert_raises_regexp. Refer this.

amueller · 2015-04-05T18:51:09Z

I was mostly talking about this file:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/tests/test_common.py

But in general it is a good idea to be explicit about the error messages everywhere.

vinayak-mehta · 2015-04-05T19:31:41Z

Couldn't find any instance of assert_raises in that file, so was a bit confused. I'll try to replace as many of the other instances as I can.

amueller · 2015-04-05T19:42:58Z

Because all the checks are in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/estimator_checks.py sorry, I should have made that more clear.

vinayak-mehta · 2015-04-05T19:48:13Z

Oops, I should've looked into that long import on Line 31. Replacing.

amueller · 2015-09-11T19:57:07Z

Fixed in #4550

amueller mentioned this issue Jun 7, 2012

MRG factorize common tests. #893

Merged

amueller mentioned this issue Aug 10, 2012

Common tests todo #943

Closed

amueller mentioned this issue Nov 15, 2012

MRG: Dummy estimators #1373

Merged

GaelVaroquaux modified the milestones: 0.15, 0.16 Aug 13, 2014

raghavrv mentioned this issue Jan 5, 2015

[WIP] Adding tests for estimators implementing partial_fit and a few other related fixes / enhancements #3907

Closed

6 tasks

raghavrv mentioned this issue Feb 11, 2015

[MRG+2] Handle numerical instability in ElasticNetCV and LassoCV #4226

Merged

2 tasks

amueller removed this from the 0.16 milestone Apr 1, 2015

raghavrv mentioned this issue Apr 1, 2015

[WIP] TST Add test to check if estimators reset model when fit is called #4162

Closed

raghavrv mentioned this issue Jun 16, 2015

[WIP] New assert helpers for model comparison and fit reset checks #4841

Closed

4 tasks

amueller closed this as completed Sep 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

factorize common tests #406

factorize common tests #406

mblondel commented Oct 20, 2011

GaelVaroquaux commented Oct 20, 2011

amueller commented Jun 26, 2012

amueller commented Aug 10, 2012

amueller commented Aug 10, 2012

raghavrv commented Jan 26, 2015

amueller commented Jan 26, 2015

raghavrv commented Jan 26, 2015

amueller commented Apr 1, 2015

vinayak-mehta commented Apr 3, 2015

raghavrv commented Apr 4, 2015

raghavrv commented Apr 4, 2015

vinayak-mehta commented Apr 4, 2015

raghavrv commented Apr 4, 2015

raghavrv commented Apr 4, 2015

amueller commented Apr 5, 2015

vinayak-mehta commented Apr 5, 2015

amueller commented Apr 5, 2015

vinayak-mehta commented Apr 5, 2015

amueller commented Sep 11, 2015

factorize common tests #406

factorize common tests #406

Comments

mblondel commented Oct 20, 2011

GaelVaroquaux commented Oct 20, 2011

amueller commented Jun 26, 2012

amueller commented Aug 10, 2012

amueller commented Aug 10, 2012

raghavrv commented Jan 26, 2015

amueller commented Jan 26, 2015

raghavrv commented Jan 26, 2015

amueller commented Apr 1, 2015

vinayak-mehta commented Apr 3, 2015

raghavrv commented Apr 4, 2015

raghavrv commented Apr 4, 2015

vinayak-mehta commented Apr 4, 2015

raghavrv commented Apr 4, 2015

raghavrv commented Apr 4, 2015

amueller commented Apr 5, 2015

vinayak-mehta commented Apr 5, 2015

amueller commented Apr 5, 2015

vinayak-mehta commented Apr 5, 2015

amueller commented Sep 11, 2015