[MRG+1] Fix estimators to work if sample_weight parameter is pandas Series type #7825

Merged
merged 18 commits into from Dec 3, 2016

Conversation

Projects
None yet
3 participants
@kathyxchen
Contributor

kathyxchen commented Nov 4, 2016

Reference Issue

Finishes work in #5642

What does this implement/fix? Explain your changes.

This addresses the comments made on @pradyu1993's commit

Any other comments?

May still warrant a PEP8 linting. (Having trouble with my usual development machine so once I get back on that I'll run the linter & all). Let me know if there are any suggestions on tests or improvements to how the check is made. Thanks!

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 5, 2016

Member

Travis is indeed reporting linter failures: https://travis-ci.org/scikit-learn/scikit-learn/jobs/173378047

Member

jnothman commented Nov 5, 2016

Travis is indeed reporting linter failures: https://travis-ci.org/scikit-learn/scikit-learn/jobs/173378047

@kathyxchen

This comment has been minimized.

Show comment
Hide comment
@kathyxchen

kathyxchen Nov 6, 2016

Contributor

@jnothman done!
Also, the sample_weight parameter is used in many places throughout the linear_model/ridge.py file. Should a modification to accept pandas.Series be applied to all of these, or is that outside the scope of this issue? @amueller

Contributor

kathyxchen commented Nov 6, 2016

@jnothman done!
Also, the sample_weight parameter is used in many places throughout the linear_model/ridge.py file. Should a modification to accept pandas.Series be applied to all of these, or is that outside the scope of this issue? @amueller

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 7, 2016

Member

Also, the sample_weight parameter is used in many places throughout the linear_model/ridge.py file.

Do you mean in functions like sklearn.linear_model.ridge_regression? Or do you mean checking that it works for all solvers? At the moment we only have common tests for estimator classes. I think that checking support in functions is a good idea, but perhaps in a separate PR.

Member

jnothman commented Nov 7, 2016

Also, the sample_weight parameter is used in many places throughout the linear_model/ridge.py file.

Do you mean in functions like sklearn.linear_model.ridge_regression? Or do you mean checking that it works for all solvers? At the moment we only have common tests for estimator classes. I think that checking support in functions is a good idea, but perhaps in a separate PR.

+ estimator = Estimator()
+ if has_fit_parameter(estimator, "sample_weight"):
+ try:
+ # default solver liblinear doesn't support this parameter

This comment has been minimized.

@jnothman

jnothman Nov 7, 2016

Member

solver='liblinear' does support sample_weight . Where did you get this idea from?

@jnothman

jnothman Nov 7, 2016

Member

solver='liblinear' does support sample_weight . Where did you get this idea from?

This comment has been minimized.

@kathyxchen

kathyxchen Nov 7, 2016

Contributor

This was part of the code taken from the previous PR. I should have double checked it first!

@kathyxchen

kathyxchen Nov 7, 2016

Contributor

This was part of the code taken from the previous PR. I should have double checked it first!

This comment has been minimized.

@jnothman

jnothman Nov 7, 2016

Member

Ah and it may have been true... over a year ago. :)

@jnothman

jnothman Nov 7, 2016

Member

Ah and it may have been true... over a year ago. :)

This comment has been minimized.

@amueller

amueller Nov 9, 2016

Member

It was, I think ;)

@amueller

amueller Nov 9, 2016

Member

It was, I think ;)

+ X = pd.DataFrame([[1, 1], [1, 2], [1, 3], [2, 1], [2, 2], [2, 3]])
+ y = pd.Series([1, 1, 1, 2, 2, 2])
+ weights = pd.Series([1] * 6)
+ for estimator_name, Estimator in all_estimators():

This comment has been minimized.

@jnothman

jnothman Nov 7, 2016

Member

This should be implemented like the other checks in sklearn.utils.estimator_checks, not as a separate test.

@jnothman

jnothman Nov 7, 2016

Member

This should be implemented like the other checks in sklearn.utils.estimator_checks, not as a separate test.

@kathyxchen

This comment has been minimized.

Show comment
Hide comment
@kathyxchen

kathyxchen Nov 14, 2016

Contributor

@jnothman @amueller: I keep running into failed test cases that I don't encounter when I run nosetests utils/tests/test_estimator_checks.py on my local machine. Do you have an idea as to why this may be the case, or any tips for how I ought to set up my environment to avoid this issue?

Contributor

kathyxchen commented Nov 14, 2016

@jnothman @amueller: I keep running into failed test cases that I don't encounter when I run nosetests utils/tests/test_estimator_checks.py on my local machine. Do you have an idea as to why this may be the case, or any tips for how I ought to set up my environment to avoid this issue?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 14, 2016

Member

I generally find Travis logs much easier to check than Appveyor. At a glance, the problems on Travis may be related to Python 2, though I see the same failure in Python 3 AppVeyor.

Member

jnothman commented Nov 14, 2016

I generally find Travis logs much easier to check than Appveyor. At a glance, the problems on Travis may be related to Python 2, though I see the same failure in Python 3 AppVeyor.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 16, 2016

Member

Please fix the PEP8 issues Travis is complaining about

Member

jnothman commented Nov 16, 2016

Please fix the PEP8 issues Travis is complaining about

@jnothman

Could you please also rename this to MRG if you think the work is complete enough for a full review.

sklearn/utils/estimator_checks.py
+ weights = pd.Series([1] * 6)
+ try:
+ estimator.fit(X, y, sample_weight=weights)
+ except:

This comment has been minimized.

@jnothman

jnothman Nov 16, 2016

Member

You should (almost) never use except:. Use except Exception:

@jnothman

jnothman Nov 16, 2016

Member

You should (almost) never use except:. Use except Exception:

sklearn/utils/estimator_checks.py
@@ -380,6 +381,27 @@ def check_estimator_sparse_data(name, Estimator):
+@ignore_warnings(category=(DeprecationWarning, UserWarning))
+def check_pandas_series(name, Estimator):

This comment has been minimized.

@jnothman

jnothman Nov 16, 2016

Member

please mention sample_weight in the name

@jnothman

jnothman Nov 16, 2016

Member

please mention sample_weight in the name

+ accept_sparse=("csr", "csc"),
+ multi_output=True,
+ y_numeric=True)
+ # Loosely based on _solve_cholesky_kernel (called in KernelRidge.fit)

This comment has been minimized.

@jnothman

jnothman Nov 16, 2016

Member

This is far too confusing. Just raise an exception directly if sample_weight is a pandas Series.

@jnothman

jnothman Nov 16, 2016

Member

This is far too confusing. Just raise an exception directly if sample_weight is a pandas Series.

@@ -72,6 +91,15 @@ def test_check_estimator():
# check that fit does input validation
msg = "TypeError not raised by fit"
assert_raises_regex(AssertionError, msg, check_estimator, BaseBadClassifier)
+ # check that sample_weights in fit accepts pandas.Series type
+ try:
+ from pandas import Series

This comment has been minimized.

@kathyxchen

kathyxchen Nov 16, 2016

Contributor

Here, I check if I am able to import pandas before the new estimator check (skip if not). Is there a better way to handle this?
pep8 complained about an unused import in my previous commit because I would just do a try-catch for an ImportError but would do nothing with the import itself. Now, I am using it in msg to avoid that lint error. Should I be handling this in a different way?

@kathyxchen

kathyxchen Nov 16, 2016

Contributor

Here, I check if I am able to import pandas before the new estimator check (skip if not). Is there a better way to handle this?
pep8 complained about an unused import in my previous commit because I would just do a try-catch for an ImportError but would do nothing with the import itself. Now, I am using it in msg to avoid that lint error. Should I be handling this in a different way?

This comment has been minimized.

@amueller

amueller Nov 16, 2016

Member

This is fine. Maybe again raise a skip if there is no pandas. I feel we should do that whenever there is a pandas import in the tests but that goes beyond this issue.

@amueller

amueller Nov 16, 2016

Member

This is fine. Maybe again raise a skip if there is no pandas. I feel we should do that whenever there is a pandas import in the tests but that goes beyond this issue.

This comment has been minimized.

@jnothman

jnothman Nov 16, 2016

Member

Raising a skip if there is no pandas requires splitting test_check_estimator into more functions. Which may be a good idea, but isn't really pertinent to this issue.

There should be a way to disable the linter catching a line... Look it up?

@jnothman

jnothman Nov 16, 2016

Member

Raising a skip if there is no pandas requires splitting test_check_estimator into more functions. Which may be a good idea, but isn't really pertinent to this issue.

There should be a way to disable the linter catching a line... Look it up?

This comment has been minimized.

@amueller

amueller Nov 16, 2016

Member

right...

@amueller

amueller Nov 16, 2016

Member

right...

@kathyxchen kathyxchen changed the title from [WIP] Fix estimators to work if sample_weight parameter is pandas Series type to [MRG] Fix estimators to work if sample_weight parameter is pandas Series type Nov 16, 2016

@amueller

Looks good apart from minor comments.

@@ -957,6 +957,8 @@ def fit(self, X, y, sample_weight=None):
"""
X, y = check_X_y(X, y, ['csr', 'csc', 'coo'], dtype=np.float64,
multi_output=True, y_numeric=True)
+ if sample_weight is not None and not isinstance(sample_weight, float):

This comment has been minimized.

@amueller

amueller Nov 16, 2016

Member

Why could sample_weight be a float?

@amueller

amueller Nov 16, 2016

Member

Why could sample_weight be a float?

This comment has been minimized.

@kathyxchen

kathyxchen Nov 22, 2016

Contributor

This was based on the docstring (line 951) which said that sample_weight could accept a float

@kathyxchen

kathyxchen Nov 22, 2016

Contributor

This was based on the docstring (line 951) which said that sample_weight could accept a float

This comment has been minimized.

@amueller

amueller Dec 1, 2016

Member

I'm not sure why we support this but fair enough.

@amueller

amueller Dec 1, 2016

Member

I'm not sure why we support this but fair enough.

sklearn/utils/estimator_checks.py
@@ -380,6 +381,27 @@ def check_estimator_sparse_data(name, Estimator):
+@ignore_warnings(category=(DeprecationWarning, UserWarning))
+def check_sample_weights_pandas_series(name, Estimator):

This comment has been minimized.

@amueller

amueller Nov 16, 2016

Member

Why UserWarning?

@amueller

amueller Nov 16, 2016

Member

Why UserWarning?

sklearn/utils/estimator_checks.py
+ "'sample_weight' parameter is type "
+ "{1}".format(name, pd.Series))
+ except ImportError:
+ pass

This comment has been minimized.

@amueller

amueller Nov 16, 2016

Member

Maybe we should raise a SkipTest("Pandas not installed, not testing pandas series as class weight")? That would be more explicit.

@amueller

amueller Nov 16, 2016

Member

Maybe we should raise a SkipTest("Pandas not installed, not testing pandas series as class weight")? That would be more explicit.

@@ -72,6 +91,15 @@ def test_check_estimator():
# check that fit does input validation
msg = "TypeError not raised by fit"
assert_raises_regex(AssertionError, msg, check_estimator, BaseBadClassifier)
+ # check that sample_weights in fit accepts pandas.Series type
+ try:
+ from pandas import Series

This comment has been minimized.

@amueller

amueller Nov 16, 2016

Member

This is fine. Maybe again raise a skip if there is no pandas. I feel we should do that whenever there is a pandas import in the tests but that goes beyond this issue.

@amueller

amueller Nov 16, 2016

Member

This is fine. Maybe again raise a skip if there is no pandas. I feel we should do that whenever there is a pandas import in the tests but that goes beyond this issue.

@kathyxchen

This comment has been minimized.

Show comment
Hide comment
@kathyxchen

kathyxchen Nov 22, 2016

Contributor

@jnothman @amueller: Adding the SkipTest causes a failure in the doctest run for contributing.rst (here). Should I remove the SkipTest again (in favor of a 'pass') or change the contributing.rst example?

Contributor

kathyxchen commented Nov 22, 2016

@jnothman @amueller: Adding the SkipTest causes a failure in the doctest run for contributing.rst (here). Should I remove the SkipTest again (in favor of a 'pass') or change the contributing.rst example?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 22, 2016

Member

Right, so this is due to a discrepancy between how we suggest others run our common tests (with check_estimator) and how we do internally. It would be good to modify check_estimator to catch and warn upon SkipTest, whether in this PR or separately.

Member

jnothman commented Nov 22, 2016

Right, so this is due to a discrepancy between how we suggest others run our common tests (with check_estimator) and how we do internally. It would be good to modify check_estimator to catch and warn upon SkipTest, whether in this PR or separately.

@jnothman

LGTM

sklearn/utils/estimator_checks.py
+ except SkipTest as message:
+ # the only SkipTest thrown currently results from not
+ # being able to import pandas.
+ warnings.warn(message, ImportWarning)

This comment has been minimized.

@jnothman

jnothman Nov 23, 2016

Member

I'm not certain whether ImportWarning is correct. Should we have a SkipTestWarning or some such?

@jnothman

jnothman Nov 23, 2016

Member

I'm not certain whether ImportWarning is correct. Should we have a SkipTestWarning or some such?

This comment has been minimized.

@kathyxchen

kathyxchen Nov 23, 2016

Contributor

Should I add a SkipTestWarning to the exceptions.py file that inherits from the base Warning class? (Would it be in the scope of this PR?)

@kathyxchen

kathyxchen Nov 23, 2016

Contributor

Should I add a SkipTestWarning to the exceptions.py file that inherits from the base Warning class? (Would it be in the scope of this PR?)

This comment has been minimized.

@amueller

amueller Dec 1, 2016

Member

ImportWarning is ignored by default. I'd either use UserWarning or add a SkipTestWarning. Feel free to add this to the PR, or change to UserWarning. otherwise LGTM!

@amueller

amueller Dec 1, 2016

Member

ImportWarning is ignored by default. I'd either use UserWarning or add a SkipTestWarning. Feel free to add this to the PR, or change to UserWarning. otherwise LGTM!

@jnothman jnothman changed the title from [MRG] Fix estimators to work if sample_weight parameter is pandas Series type to [MRG+1] Fix estimators to work if sample_weight parameter is pandas Series type Nov 23, 2016

@amueller

I don't think ImportWarning is good as it's ignored by default.

@@ -957,6 +957,8 @@ def fit(self, X, y, sample_weight=None):
"""
X, y = check_X_y(X, y, ['csr', 'csc', 'coo'], dtype=np.float64,
multi_output=True, y_numeric=True)
+ if sample_weight is not None and not isinstance(sample_weight, float):

This comment has been minimized.

@amueller

amueller Dec 1, 2016

Member

I'm not sure why we support this but fair enough.

@amueller

amueller Dec 1, 2016

Member

I'm not sure why we support this but fair enough.

sklearn/utils/estimator_checks.py
@@ -379,6 +385,28 @@ def check_estimator_sparse_data(name, Estimator):
raise
+@ignore_warnings(category=(DeprecationWarning))

This comment has been minimized.

@amueller

amueller Dec 1, 2016

Member

You don't need the parentheses around DeprecationWarning, they don't do anything.

@amueller

amueller Dec 1, 2016

Member

You don't need the parentheses around DeprecationWarning, they don't do anything.

sklearn/utils/estimator_checks.py
+ except SkipTest as message:
+ # the only SkipTest thrown currently results from not
+ # being able to import pandas.
+ warnings.warn(message, ImportWarning)

This comment has been minimized.

@amueller

amueller Dec 1, 2016

Member

ImportWarning is ignored by default. I'd either use UserWarning or add a SkipTestWarning. Feel free to add this to the PR, or change to UserWarning. otherwise LGTM!

@amueller

amueller Dec 1, 2016

Member

ImportWarning is ignored by default. I'd either use UserWarning or add a SkipTestWarning. Feel free to add this to the PR, or change to UserWarning. otherwise LGTM!

@jnothman

LGTM. Please add a changelog entry to doc/whats_new.rst, under your choice of bug fix or enhancement (I can't decide!).

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Dec 3, 2016

Member

thanks :)

Member

amueller commented Dec 3, 2016

thanks :)

@amueller amueller merged commit 04b67e2 into scikit-learn:master Dec 3, 2016

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

adis300 added a commit to adis300/scikit-learn that referenced this pull request Dec 13, 2016

[MRG+1] Fix estimators to work if sample_weight parameter is pandas S…
…eries type (#7825)

* addressed comments in the PR about parameters in check_array

* update the test case for the evaluation of estimators with pandas series

* bug fix, need to check for *not* None explicitly

* updated with isinstance check if the documentation says there is acceptance of floats

* ran pep8 linter on modified files

* moving the test case to estimators_check

* add a predict function into the testing pandas.Series class

* avoid running anything beyond the newly added meta checks

* check if pandas is installed before running the specific test

* changed the order of the try-catch to check for sample_weight param beforehand

* pass on import error rather than printing something to std out

* improve test case naming and pd.Series check in the bad estimator class

* address a pep8 linter error with unused import

* pep8 warning disabled for potential unused import

* throw a warning when SkipTest is raised

* add a SkipTestWarning

* updated the whats_new.rst with this issue

* rebase and fix a spacing issue

sergeyf added a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017

[MRG+1] Fix estimators to work if sample_weight parameter is pandas S…
…eries type (#7825)

* addressed comments in the PR about parameters in check_array

* update the test case for the evaluation of estimators with pandas series

* bug fix, need to check for *not* None explicitly

* updated with isinstance check if the documentation says there is acceptance of floats

* ran pep8 linter on modified files

* moving the test case to estimators_check

* add a predict function into the testing pandas.Series class

* avoid running anything beyond the newly added meta checks

* check if pandas is installed before running the specific test

* changed the order of the try-catch to check for sample_weight param beforehand

* pass on import error rather than printing something to std out

* improve test case naming and pd.Series check in the bad estimator class

* address a pep8 linter error with unused import

* pep8 warning disabled for potential unused import

* throw a warning when SkipTest is raised

* add a SkipTestWarning

* updated the whats_new.rst with this issue

* rebase and fix a spacing issue

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG+1] Fix estimators to work if sample_weight parameter is pandas S…
…eries type (#7825)

* addressed comments in the PR about parameters in check_array

* update the test case for the evaluation of estimators with pandas series

* bug fix, need to check for *not* None explicitly

* updated with isinstance check if the documentation says there is acceptance of floats

* ran pep8 linter on modified files

* moving the test case to estimators_check

* add a predict function into the testing pandas.Series class

* avoid running anything beyond the newly added meta checks

* check if pandas is installed before running the specific test

* changed the order of the try-catch to check for sample_weight param beforehand

* pass on import error rather than printing something to std out

* improve test case naming and pd.Series check in the bad estimator class

* address a pep8 linter error with unused import

* pep8 warning disabled for potential unused import

* throw a warning when SkipTest is raised

* add a SkipTestWarning

* updated the whats_new.rst with this issue

* rebase and fix a spacing issue

NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

[MRG+1] Fix estimators to work if sample_weight parameter is pandas S…
…eries type (#7825)

* addressed comments in the PR about parameters in check_array

* update the test case for the evaluation of estimators with pandas series

* bug fix, need to check for *not* None explicitly

* updated with isinstance check if the documentation says there is acceptance of floats

* ran pep8 linter on modified files

* moving the test case to estimators_check

* add a predict function into the testing pandas.Series class

* avoid running anything beyond the newly added meta checks

* check if pandas is installed before running the specific test

* changed the order of the try-catch to check for sample_weight param beforehand

* pass on import error rather than printing something to std out

* improve test case naming and pd.Series check in the bad estimator class

* address a pep8 linter error with unused import

* pep8 warning disabled for potential unused import

* throw a warning when SkipTest is raised

* add a SkipTestWarning

* updated the whats_new.rst with this issue

* rebase and fix a spacing issue

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG+1] Fix estimators to work if sample_weight parameter is pandas S…
…eries type (#7825)

* addressed comments in the PR about parameters in check_array

* update the test case for the evaluation of estimators with pandas series

* bug fix, need to check for *not* None explicitly

* updated with isinstance check if the documentation says there is acceptance of floats

* ran pep8 linter on modified files

* moving the test case to estimators_check

* add a predict function into the testing pandas.Series class

* avoid running anything beyond the newly added meta checks

* check if pandas is installed before running the specific test

* changed the order of the try-catch to check for sample_weight param beforehand

* pass on import error rather than printing something to std out

* improve test case naming and pd.Series check in the bad estimator class

* address a pep8 linter error with unused import

* pep8 warning disabled for potential unused import

* throw a warning when SkipTest is raised

* add a SkipTestWarning

* updated the whats_new.rst with this issue

* rebase and fix a spacing issue

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+1] Fix estimators to work if sample_weight parameter is pandas S…
…eries type (#7825)

* addressed comments in the PR about parameters in check_array

* update the test case for the evaluation of estimators with pandas series

* bug fix, need to check for *not* None explicitly

* updated with isinstance check if the documentation says there is acceptance of floats

* ran pep8 linter on modified files

* moving the test case to estimators_check

* add a predict function into the testing pandas.Series class

* avoid running anything beyond the newly added meta checks

* check if pandas is installed before running the specific test

* changed the order of the try-catch to check for sample_weight param beforehand

* pass on import error rather than printing something to std out

* improve test case naming and pd.Series check in the bad estimator class

* address a pep8 linter error with unused import

* pep8 warning disabled for potential unused import

* throw a warning when SkipTest is raised

* add a SkipTestWarning

* updated the whats_new.rst with this issue

* rebase and fix a spacing issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment