New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mixture modeling reports wrong number of `n_iter_` #10740

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
None yet
4 participants
@kno10
Contributor

kno10 commented Mar 1, 2018

If you run mixture modeling with max_iter set to 5, it will report to have finished 4 iterations.

Because it is reporting the last iteration number 0...max_iter-1 in mixture/base.by, not the number of completed iterations.

This patch adds + 1 in the appropriate place.

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

@agramfort

This comment has been minimized.

Member

agramfort commented Mar 2, 2018

you need to fix travis

@glemaitre

You need to add a regression test with the patch.

@@ -228,7 +228,8 @@ def fit(self, X, y=None):
if self.lower_bound_ > max_lower_bound:
max_lower_bound = self.lower_bound_
best_params = self._get_parameters()
best_n_iter = n_iter
# Report the number of completed iterations, not 0-indexed

This comment has been minimized.

@glemaitre

glemaitre Mar 2, 2018

Contributor

I would remove the comment

This comment has been minimized.

@kno10

kno10 Mar 2, 2018

Contributor

Then people will just think this is a mistake.
sklearn code could do with much more inline comments, actually. I also do not like how the current code relies on "leaking" the n_iter variable out of the for loop. That is pretty bad code style, but I tried to keep the patch small. And the variable is badly named. The name appears to be the "number of iterations", but it is a 0-based index.
Feel free to do any such changes, add a regression test, etc.

This comment has been minimized.

@glemaitre

glemaitre Mar 2, 2018

Contributor

Second thought, the comment is ok there. It will remove some ambiguity.

Then people will just think this is a mistake.

I hope that people try to understand code before to mention this is a mistake :).

sklearn code could do with much more inline comments, actually

I would argue that code should be self-explanatory. Comments should be there to remove ambiguity.

And the variable is badly named

True and it brings ambiguity ;)

I also do not like how the current code relies on "leaking" the n_iter variable out of the for loop. That is pretty bad code style

I don't really see the issue with it.

Feel free to do any such changes, add a regression test, etc.

We are reviewing code of contributors but do not modify theirs PRs directly (apart of minor changes before merging).

@jnothman

The reporting in _print_verbose_msg_iter_end is still zero-based. Can't we just change the range?

@glemaitre

This comment has been minimized.

Contributor

glemaitre commented Mar 15, 2018

@jnothman do we have the same issue than in #10723 in which a user could check the n_iter and would not only be interested about the printing?

@kno10

renumeration

@kno10

This comment has been minimized.

Contributor

kno10 commented Mar 22, 2018

@glemaitre Please re-review, as I did the requested change.

@glemaitre

This comment has been minimized.

Contributor

glemaitre commented Apr 21, 2018

Can you add a what's new entry and a regression test.

kno10 added some commits Mar 1, 2018

Mixture modeling reports wrong number of `n_iter_`
If you run mixture modeling with max_iter set to 5, it will report to have finished 4 iterations.

Because it is reporting the last iteration number 0...max_iter-1 in mixture/base.by, not the number of *completed* iterations.

This patch changes the iterations to be numbered 1...max_iter (inclusive)
@kno10

This comment has been minimized.

Contributor

kno10 commented Apr 23, 2018

Added test that fails on master with AssertionError: 0 != 1

(On master, n_iter_=0 when running with max_iter=1).

Contributions to sklearn would be easier if you would demand less boilerplate from occasional contributors. Even a trivial fix like this requires me to do many iterations. I probably won't go through this hassle again, the effort/benefit is too low.

@jnothman

This comment has been minimized.

Member

jnothman commented Apr 23, 2018

Contributions to sklearn would be easier if you would demand less boilerplate from occasional contributors. Even a trivial fix like this requires me to do many iterations. I probably won't go through this hassle again, the effort/benefit is too low.

I'm sorry you feel this way, but the requirement of tests is pretty standard in software development. Changelog entries are necessary to help users understand changed behaviour when running the script from version to version. The project is almost entirely run on volunteer time, so it's not like we're asking for these things just because we feel like it.

@kno10

This comment has been minimized.

Contributor

kno10 commented Apr 23, 2018

Do not apply the standards of commercial development to volunteers.
I agree that changelogs etc. should be maintained, but I don't think it is appropriate to completely offload this to casual contributors.
See: if I had just made a issue, I would have had much less work. Next time, I am probably not going to make a pull request.

@glemaitre

Do not apply the standards of commercial development to volunteers.

This is not commercial development but rather good practices. A lot of things are sometimes consider useless (documentation, PEP8, etc.)

Could you make the CI happy.

@@ -172,6 +172,18 @@ def test_gaussian_mixture_attributes():
assert_equal(gmm.n_init, n_init)
assert_equal(gmm.init_params, init_params)
def test_regression_10740():

This comment has been minimized.

@glemaitre

glemaitre Apr 23, 2018

Contributor

rename test_regression_10740 to test_gmm_n_iter

@@ -172,6 +172,18 @@ def test_gaussian_mixture_attributes():
assert_equal(gmm.n_init, n_init)
assert_equal(gmm.init_params, init_params)
def test_regression_10740():
# Regression test for #10740. Ensure that n_iter_ is the number of iterations done

This comment has been minimized.

@glemaitre

glemaitre Apr 23, 2018

Contributor

Too long ...

@jnothman

This comment has been minimized.

Member

jnothman commented Apr 23, 2018

Please, if you feel more comfortable making an issue than a PR, that's fine. Do what you are comfortable with. Clearly different volunteers have different feelings about what is appropriate and what is burdensome. Those of us who spend many hours a week on this are hardly fazed by a small non-regression test.

Do not apply the standards of commercial development to volunteers

This project would have nowhere near its level of success without some rigour (albeit fallible) in testing and documentation. And yet our testing does not even come close to a commercial software house with dedicated QA personnel. I could have, for instance, asked you to write a generic test for all iterative estimators. Or to make sure that the result holds under certain kinds of randomised and edge case inputs. That's what you might expect from enterprise development.

@kno10

This comment has been minimized.

Contributor

kno10 commented Apr 23, 2018

Ok. Forget about it. I do not care anymore. Goodbye.

This is how you lose potential contributors.

@kno10 kno10 closed this Apr 23, 2018

@kno10 kno10 deleted the kno10:patch-2 branch Apr 23, 2018

@kno10 kno10 restored the kno10:patch-2 branch Apr 23, 2018

@kno10 kno10 deleted the kno10:patch-2 branch Apr 23, 2018

@kno10 kno10 restored the kno10:patch-2 branch Apr 23, 2018

@jnothman

This comment has been minimized.

Member

jnothman commented Apr 23, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment