New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] FIX report n_iter_ as at most max_iter, without always reducing by 1 #10723

Merged
merged 9 commits into from Mar 5, 2018

Conversation

Projects
None yet
4 participants
@jnothman
Member

jnothman commented Feb 28, 2018

Fix #10619

@jnothman jnothman changed the title from [MRG] FIX report n_iter_ faithfully to scipy.optimize to [MRG] FIX report n_iter_ in accordance with scipy.optimize Feb 28, 2018

@jnothman jnothman added the Bug label Feb 28, 2018

@jnothman jnothman requested a review from lesteve Feb 28, 2018

@lesteve

This comment has been minimized.

Member

lesteve commented Feb 28, 2018

I pushed a minor change in the test that checks that the solver is lbfgs and that the scipy version is affected by this bug. Other than this, LGTM.

@jnothman

This comment has been minimized.

Member

jnothman commented Feb 28, 2018

@lesteve

This comment has been minimized.

Member

lesteve commented Mar 1, 2018

It would be nice to have a second opinion on this one. Maybe @rth, @glemaitre or @qinhanmin2014?

@qinhanmin2014

LGTM, +1 for faithfully report the result from scipy. I don't think users should blame scikit-learn if they get different results here.

@@ -382,6 +382,11 @@ Linear, kernelized and related models
underlying implementation is broken. Use :class:`linear_model.Lasso` instead.
:issue:`9837` by `Alexandre Gramfort`_.
- :class:`linear_model.LogisticRegression` with ``solver='lbfgs'`` formerly

This comment has been minimized.

@qinhanmin2014

qinhanmin2014 Mar 1, 2018

Member

I doubt whether it belongs to API changes, either Enhancements or Bug fixes seems fine from my side.

This comment has been minimized.

@lesteve

lesteve Mar 1, 2018

Member

Going from scikit-learn 0.19 to 0.20, logistic_regression.n_iter_ will change (unless you use scipy > 1.0.0). In this respect "API changes" feels like the right place.

@qinhanmin2014 qinhanmin2014 changed the title from [MRG] FIX report n_iter_ in accordance with scipy.optimize to [MRG+1] FIX report n_iter_ in accordance with scipy.optimize Mar 1, 2018

@qinhanmin2014

This comment has been minimized.

Member

qinhanmin2014 commented Mar 1, 2018

Going from scikit-learn 0.19 to 0.20, logistic_regression.n_iter_ will change (unless you use scipy > 1.0.0). In this respect "API changes" feels like the right place.

Fair enough. @lesteve @jnothman merge?

@lesteve

This comment has been minimized.

Member

lesteve commented Mar 1, 2018

It'd be good to have a third opinion, it could potentially break some user code, e.g. if someone was checking logistic_regression.n_iter_ == logistic_regression.max_iter to figure out whether max_iter was reached before converging.

@jnothman

This comment has been minimized.

Member

jnothman commented Mar 1, 2018

@lesteve

This comment has been minimized.

Member

lesteve commented Mar 2, 2018

IMO it's fine to leave n_iter_ as reported by scipy. Because this can break user code (a bit of a edge case though maybe), I think it'd be good to have another opinion.

@glemaitre

This comment has been minimized.

Contributor

glemaitre commented Mar 2, 2018

logistic_regression.n_iter_ == logistic_regression.max_iter

Tricky one. If we don't want to have any issue we should cover this case.

However, we could also expect that user set verbose=1 and catch the ConvergenceWarning from there. This should be one of the reason to raise those warnings, isn't it :) Not sure this a good excuse to go around the issue.

@lesteve

This comment has been minimized.

Member

lesteve commented Mar 2, 2018

Tricky one.

Yep ...

Thinking a bit more about it, I think doing n_iter_ = min(n_iter_, max_iter) is the least worse option. At least we maintain a bit of internal consistency within scikit-learn (i.e. we keep n_iter_ <= max_iter). Obviously n_iter_ is still going to be wrong (in scipy <= 1.0.0).

@jnothman jnothman changed the title from [MRG+1] FIX report n_iter_ in accordance with scipy.optimize to [MRG] FIX report n_iter_ as at most max_iter, without always reducing by 1 Mar 3, 2018

@qinhanmin2014

LGTM. I think it's a better solution.

@qinhanmin2014

This comment has been minimized.

Member

qinhanmin2014 commented Mar 4, 2018

ping @jnothman @lesteve @glemaitre I quickly searched the codebase with git grep "'nit'". It seems that HuberRegressor has the same problem. Should we fix that here?

from sklearn.linear_model import HuberRegressor
from sklearn.datasets import load_boston
X, y = load_boston(return_X_y=True)
reg = HuberRegressor(max_iter=1)
reg.fit(X, y)
print(reg.n_iter_)
# 2
@jnothman

This comment has been minimized.

Member

jnothman commented Mar 4, 2018

@qinhanmin2014

LGTM. I'll give my +1 again.

.. versionchanged:: 0.20
In SciPy <= 1.0.0 the number of lbfgs iterations may exceed
``max_iter``. ``n_iter_`` will nowreport at most ``max_iter``.

This comment has been minimized.

@qinhanmin2014

qinhanmin2014 Mar 4, 2018

Member

nowreport -> now report

@qinhanmin2014 qinhanmin2014 changed the title from [MRG] FIX report n_iter_ as at most max_iter, without always reducing by 1 to [MRG+1] FIX report n_iter_ as at most max_iter, without always reducing by 1 Mar 4, 2018

jnothman and others added some commits Mar 4, 2018

@jnothman

This comment has been minimized.

Member

jnothman commented on b125afe Mar 5, 2018

Thanks

@lesteve

This comment has been minimized.

Member

lesteve commented Mar 5, 2018

I pushed some minor tweaks. I'll merge this one when Travis is green. Thanks everyone, I feel like we reached a reasonable solution!

@lesteve lesteve merged commit 2aba6e2 into scikit-learn:master Mar 5, 2018

8 checks passed

ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: python2 Your tests passed on CircleCI!
Details
ci/circleci: python3 Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 95%)
Details
codecov/project Absolute coverage decreased by -<.01% but relative coverage increased by +4.99% compared to f96ee8d
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
lgtm analysis: Python No alert changes
Details
@jnothman

This comment has been minimized.

Member

jnothman commented Mar 7, 2018

Yay! Travis Cron passed! No more daily "fail" emails!

@lesteve

This comment has been minimized.

Member

lesteve commented Mar 7, 2018

Great stuff, thanks for tackling this one @jnothman!

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Jul 4, 2018

Ma27 added a commit to Ma27/nixpkgs that referenced this pull request Jul 13, 2018

pythonPackages.scikitlearn: apply `max_iter` patch from scikitlearn m…
…aster

See scikit-learn/scikit-learn#10723

This fixes the build of `scikitlearn` on master and nixos-unstable.

The issue is originally an upstream issue
(see scikit-learn/scikit-learn#10619) which
was fixed on master and was mainly caused by changes to the environment.

Closes NixOS#43466

Ma27 added a commit to Ma27/nixpkgs that referenced this pull request Jul 14, 2018

pythonPackages.scikitlearn: apply `max_iter` patch from scikitlearn m…
…aster

See scikit-learn/scikit-learn#10723

This fixes the build of `scikitlearn` on master and nixos-unstable.

The issue is originally an upstream issue
(see scikit-learn/scikit-learn#10619) which
was fixed on master and was mainly caused by changes to the environment.

Closes NixOS#43466

dotlambda added a commit to NixOS/nixpkgs that referenced this pull request Jul 14, 2018

pythonPackages.scikitlearn: apply `max_iter` patch from scikitlearn m…
…aster (#43483)

See scikit-learn/scikit-learn#10723

This fixes the build of `scikitlearn` on master and nixos-unstable.

The issue is originally an upstream issue
(see scikit-learn/scikit-learn#10619) which
was fixed on master and was mainly caused by changes to the environment.

Closes #43466
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment