new tests for mean_shift algo #13179

rajdeepd · 2019-02-17T11:31:04Z

Reference Issues/PRs

none

What does this implement/fix? Explain your changes.

Add test cases to cover un-tested portions of mean_shift.py

Any other comments?

no other comments

rajdeepd · 2019-02-22T14:02:37Z

@ogrisel can help review this

jnothman · 2019-02-23T21:21:02Z

sklearn/cluster/tests/test_mean_shift.py

+def test_mean_shift_negative_bandwidth():
+    bandwidth = -1
+    ms = MeanShift(bandwidth=bandwidth)
+    msg = \


Use parentheses to enclose expressions and split them over multiple lines rather than using \ for line continuation

@jnothman this comment is not clear will following statement work?

msg = "bandwidth needs to be greater than zero or None,"
" got -1.000000"

This will:

msg = ("bandwidth needs to be greater than zero or None," " got -1.000000")

sklearn/cluster/tests/test_mean_shift.py

jnothman · 2019-02-23T21:24:47Z

sklearn/cluster/tests/test_mean_shift.py

+
+def test_seeds():
+    ms = MeanShift(seeds=None)
+    _ = ms.fit(X).labels_


Why do you get labels_?

jnothman · 2019-02-23T21:25:44Z

sklearn/cluster/tests/test_mean_shift.py

+    assert_raise_message(ValueError, msg, ms.fit, X)
+
+
+def test_seeds():


I don't get what this is testing. Checking that parameters are maintained should usually be covered by common tests not tests for each specific estimator

jnothman · 2019-02-23T21:26:17Z

sklearn/cluster/tests/test_mean_shift.py

+    labels = ms.fit(X).labels_
+    labels_unique = np.unique(labels)
+    n_clusters_ = len(labels_unique)
+    assert_equal(n_clusters_ > n_clusters, True)


Use bare assert as with seeds above

jnothman · 2019-02-23T21:27:39Z

sklearn/cluster/tests/test_mean_shift.py

+    n_clusters_ = len(labels_unique)
+    assert_equal(n_clusters_ > n_clusters, True)
+
+    cluster_centers, labels = mean_shift(X, bandwidth=bandwidth,


Rather than repeat the code, please use pytest.mark.parameterize to test multiple settings of bandwidth

changed to use
pytest.mark.parameterize

@jnothman please review

@jnothman @ogrisel please review

jnothman

I confirm this covers untested lines.

jnothman · 2019-03-12T10:00:57Z

sklearn/cluster/tests/test_mean_shift.py

+    bandwidth = -1
+    ms = MeanShift(bandwidth=bandwidth)
+    msg = ("bandwidth needs to be greater than zero or None,"
+           "            got -1.000000")


This whitespace looks like an error in the code raising the message. Please change the code to have a single space between the comma and "got"

This is unresolved. Please fix the error message in mean_shift_.py

jnothman · 2019-03-12T10:13:38Z

sklearn/cluster/tests/test_mean_shift.py

+    (1.2, True, 3),
+    (1.2, False, 4)
+])
+def test_eval(bandwidth, cluster_all, expected):


what do you mean by calling this "eval"? Can't we just paramertrize test_mean_shift above, rather than adding a new test?

But ideally we should also test that cluster_all=False is actually effective at allowing some points to be left unclustered. Create a dataset where a point will be left with label -1 to test this properly.

@jnothman fixed as suggested

jnothman · 2019-03-31T22:58:48Z

Please merge the current master

jnothman · 2019-03-31T23:00:12Z

sklearn/cluster/tests/test_mean_shift.py

-def test_mean_shift():
+@pytest.mark.parametrize("bandwidth, cluster_all, expected, "
+                         "first_cluster_label",
+                         [(1.2, True, 3, 0), (1.2, False, 4, -1)])


Much clearer, thanks!

jnothman · 2019-03-31T23:00:57Z

sklearn/cluster/tests/test_mean_shift.py

+    bandwidth = -1
+    ms = MeanShift(bandwidth=bandwidth)
+    msg = ("bandwidth needs to be greater than zero or None,"
+           "            got -1.000000")


This is unresolved. Please fix the error message in mean_shift_.py

rajdeepd · 2019-04-02T00:36:12Z

@jnothman fixed the comments

jnothman · 2019-04-02T08:45:25Z

Thanks @rajdeepd

rajdeepd · 2019-04-05T15:34:32Z

@jnothman how do we get this pull request merged into master?

jnothman · 2019-04-06T11:38:22Z

4 days is not long to wait for a second review, @rajdeepd... hopefully one will come soon.

thomasjpfan · 2019-04-06T17:53:06Z

sklearn/cluster/tests/test_mean_shift.py


-    cluster_centers, labels = mean_shift(X, bandwidth=bandwidth)


Removing this means we are not testing the mean_shift function directly anymore.

we are testing using
ms = MeanShift(bandwidth=bandwidth, cluster_all=cluster_all)
labels = ms.fit(X).labels_

The testing of mean_shift should be independent of ms.fit. At the moment, ms.fit calls mean_shift, but we do not know how the code base will change.

@thomasjpfan do we need another test for mean_shift?

Leaving the original test here will sufficiently test mean_shift.

@thomasjpfan added test for mean_shift as well

thomasjpfan · 2019-04-06T17:55:38Z

sklearn/cluster/tests/test_mean_shift.py

+    ms = MeanShift(bandwidth=bandwidth)
+    msg = ("bandwidth needs to be greater than zero or None,"
+           " got -1.000000")
+    assert_raise_message(ValueError, msg, ms.fit, X)


We are moving to using pytest.raises:

msg = (r"bandwidth needs to be greater than zero or None," r" got -1\.000000") with pytest.raises(ValueError, match=msg): ms.fit(X)

@thomasjpfan fixed

NicolasHug

LGTM otherwise

NicolasHug · 2019-04-21T14:59:15Z

sklearn/cluster/tests/test_mean_shift.py

-    n_clusters_ = len(labels_unique)
-    assert_equal(n_clusters_, n_clusters)
+    cluster_centers, labels_mean_shift = mean_shift(X, cluster_all=cluster_all)
+    print(cluster_centers)


please remove

NicolasHug · 2019-04-21T15:01:07Z

sklearn/cluster/tests/test_mean_shift.py

@@ -36,23 +37,36 @@ def test_estimate_bandwidth_1sample():
    # Test estimate_bandwidth when n_samples=1 and quantile<1, so that
    # n_neighbors is set to 1.
    bandwidth = estimate_bandwidth(X, n_samples=1, quantile=0.3)
-    assert_array_almost_equal(bandwidth, 0., decimal=5)
+    assert_equal(bandwidth, 0.)


could just be assert a == b then

updated @NicolasHug

NicolasHug · 2019-04-25T11:04:21Z

Thanks @rajdeepd

This reverts commit 67f53dc.

jnothman reviewed Feb 23, 2019

View reviewed changes

rajdeepd force-pushed the test_mean_shift branch 2 times, most recently from b018e99 to 4cf6413 Compare March 1, 2019 14:36

jnothman reviewed Mar 12, 2019

View reviewed changes

new tests for mean_shift algo

dea8840

rajdeepd force-pushed the test_mean_shift branch from 4cf6413 to dea8840 Compare March 31, 2019 16:00

jnothman reviewed Mar 31, 2019

View reviewed changes

rajdeepd force-pushed the test_mean_shift branch from bb1dd95 to f40648d Compare April 1, 2019 15:52

jnothman approved these changes Apr 2, 2019

View reviewed changes

jnothman added the Waiting for Reviewer label Apr 6, 2019

thomasjpfan reviewed Apr 6, 2019

View reviewed changes

rajdeepd force-pushed the test_mean_shift branch 2 times, most recently from 71df239 to 1b9f928 Compare April 21, 2019 08:56

NicolasHug approved these changes Apr 21, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into test_mean_shift

aa17ea1

rajdeepd force-pushed the test_mean_shift branch from 1b9f928 to aa17ea1 Compare April 24, 2019 18:06

NicolasHug merged commit 690464b into scikit-learn:master Apr 25, 2019

jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Apr 25, 2019

additional tests for mean_shift algo (scikit-learn#13179)

77ac3df

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

additional tests for mean_shift algo (scikit-learn#13179)

67f53dc

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "additional tests for mean_shift algo (scikit-learn#13179)"

32c640f

This reverts commit 67f53dc.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "additional tests for mean_shift algo (scikit-learn#13179)"

8273af6

This reverts commit 67f53dc.

rth mentioned this pull request Jun 25, 2019

TST Fix atol in test_estimate_bandwidth_1sample #14187

Merged

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

additional tests for mean_shift algo (scikit-learn#13179)

c50a029

		assert_raise_message(ValueError, msg, ms.fit, X)


		def test_seeds():


		cluster_centers, labels = mean_shift(X, bandwidth=bandwidth)

new tests for mean_shift algo #13179

new tests for mean_shift algo #13179

Conversation

rajdeepd commented Feb 17, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

rajdeepd commented Feb 22, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman commented Mar 31, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rajdeepd commented Apr 2, 2019

jnothman commented Apr 2, 2019

rajdeepd commented Apr 5, 2019

jnothman commented Apr 6, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NicolasHug commented Apr 25, 2019