New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG + 1] Issue#8062: JoblibException thrown when passing "fit_params={'sample_… #8068

Merged
merged 4 commits into from Dec 19, 2016

Conversation

Projects
None yet
5 participants
@ghost

ghost commented Dec 17, 2016

…weights': weights}" to RandomizedSearchCV with RandomForestClassifier

Reference Issue

Fixes #8062

What does this implement/fix? Explain your changes.

When the sample_weight parameter of the fit() method of a RandomForestClassifier is a python 2.x list, it raises an exception, because on a later call to _parallel_build_trees(), the copy() method is used, which it exists in python 3.x lists, panda series and nparrays, it doesn't exist however on python 2.x lists. To be clear, a python 3.x list here would also be inappropriate because even though operations like multiplication on it would succeed, the results would be different from the ones obtained from an nparray. It does explain however how the exception only happened with python 2.x lists. Another detail from the bug report was that the exception didn't happen when using a ExtraTreesClassifier. This was so because the bootstrap parameter on the ExtraTreeClassifier is False by default, which later in the _parallel_build_trees() call will divert the logic flow away from the copy() method call. If we set the bootstrap parameter on the ExtraTreesClassigier constructor, the behavior becomes the same, and the same exception is raised.

As suggested by @jnothman, I used the check_array utility to transform the original sample_weight array into a nparray, on the BaseForest class. With a little pre-check to see if the sample_weight parameter is None, since check_array raises an exception otherwise. And an additional parameter to make sure the array created isn't 2d.

I added no tests to check for this issue.
A test was added.

I tried the full make rebuild with no changes to the source, and it failed several tests. So I ran the full make again with my changes and just corrected what was different. So I'm not sure if all relevant tests were done.

pyflakes and pep8 showed no irregularities.
coverage for the module changed was at 65%, I'm not sure how relevant this is for such a small change, but there you have it.

Issue#8062: JoblibException thrown when passing "fit_params={'sample_…
…weights': weights}" to RandomizedSearchCV with RandomForestClassifier
@agramfort

This comment has been minimized.

Show comment
Hide comment
@agramfort

agramfort Dec 17, 2016

Member

please add a test

Member

agramfort commented Dec 17, 2016

please add a test

@ghost

This comment has been minimized.

Show comment
Hide comment
@ghost

ghost Dec 18, 2016

I added a test to test_forest.py:check_class_weights(), I considered creating a separate function for it, but seemed excessive, at the same time that small test seemed out of place in there, so I placed it at the end, with a longer comment.

The test itself just recreates the conditions mentioned in the bug report, and tries to trigger the exception raised when Python 2.x lists are used that have no copy() method. Doing a type check on sample_weights seemed unidiomatic. And doing deeper testing to make sure even Python 3.x lists weren't used seemed pointless, since when those are used the feature_importances_ satisfied the almost_equal assertion.

The issue I mentioned earlier with the full make rebuild just happens on Python 2.x with no code changes, the rebuild on Python 3.x works fine.

ghost commented Dec 18, 2016

I added a test to test_forest.py:check_class_weights(), I considered creating a separate function for it, but seemed excessive, at the same time that small test seemed out of place in there, so I placed it at the end, with a longer comment.

The test itself just recreates the conditions mentioned in the bug report, and tries to trigger the exception raised when Python 2.x lists are used that have no copy() method. Doing a type check on sample_weights seemed unidiomatic. And doing deeper testing to make sure even Python 3.x lists weren't used seemed pointless, since when those are used the feature_importances_ satisfied the almost_equal assertion.

The issue I mentioned earlier with the full make rebuild just happens on Python 2.x with no code changes, the rebuild on Python 3.x works fine.

@ghost ghost changed the title from [WIP] Issue#8062: JoblibException thrown when passing "fit_params={'sample_… to [MRG] Issue#8062: JoblibException thrown when passing "fit_params={'sample_… Dec 18, 2016

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Dec 19, 2016

Member

The AppVeyor failure is unrelated (we have an issue for it already IIRC), so LGTM.

Member

lesteve commented Dec 19, 2016

The AppVeyor failure is unrelated (we have an issue for it already IIRC), so LGTM.

@lesteve lesteve changed the title from [MRG] Issue#8062: JoblibException thrown when passing "fit_params={'sample_… to [MRG + 1] Issue#8062: JoblibException thrown when passing "fit_params={'sample_… Dec 19, 2016

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Dec 19, 2016

Member

apart from somewhat cryptic comment LGTM

Member

amueller commented Dec 19, 2016

apart from somewhat cryptic comment LGTM

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Dec 19, 2016

Member

LTGM. Thx. Merging.

Member

GaelVaroquaux commented Dec 19, 2016

LTGM. Thx. Merging.

@GaelVaroquaux GaelVaroquaux merged commit dbf64e9 into scikit-learn:master Dec 19, 2016

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@ghost ghost deleted the issue#8062 branch Dec 20, 2016

sergeyf added a commit to sergeyf/scikit-learn that referenced this pull request Feb 28, 2017

[MRG + 1] Issue#8062: JoblibException thrown when passing "fit_params…
…={'sample_… (#8068)

* Issue#8062: JoblibException thrown when passing "fit_params={'sample_weights': weights}" to RandomizedSearchCV with RandomForestClassifier

* Added test for issues #8068 and #8064.

* Clean up with pyflakes.

* Changed cryptic comment.

@Przemo10 Przemo10 referenced this pull request Mar 17, 2017

Closed

update fork (#1) #8606

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG + 1] Issue#8062: JoblibException thrown when passing "fit_params…
…={'sample_… (#8068)

* Issue#8062: JoblibException thrown when passing "fit_params={'sample_weights': weights}" to RandomizedSearchCV with RandomForestClassifier

* Added test for issues #8068 and #8064.

* Clean up with pyflakes.

* Changed cryptic comment.

NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

[MRG + 1] Issue#8062: JoblibException thrown when passing "fit_params…
…={'sample_… (#8068)

* Issue#8062: JoblibException thrown when passing "fit_params={'sample_weights': weights}" to RandomizedSearchCV with RandomForestClassifier

* Added test for issues #8068 and #8064.

* Clean up with pyflakes.

* Changed cryptic comment.

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG + 1] Issue#8062: JoblibException thrown when passing "fit_params…
…={'sample_… (#8068)

* Issue#8062: JoblibException thrown when passing "fit_params={'sample_weights': weights}" to RandomizedSearchCV with RandomForestClassifier

* Added test for issues #8068 and #8064.

* Clean up with pyflakes.

* Changed cryptic comment.

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG + 1] Issue#8062: JoblibException thrown when passing "fit_params…
…={'sample_… (#8068)

* Issue#8062: JoblibException thrown when passing "fit_params={'sample_weights': weights}" to RandomizedSearchCV with RandomForestClassifier

* Added test for issues #8068 and #8064.

* Clean up with pyflakes.

* Changed cryptic comment.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment