-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
[MRG] Determine IsotonicRegression increasing
by Pearson or Spearman corr rho
#3157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Determine IsotonicRegression increasing
by Pearson or Spearman corr rho
#3157
Conversation
…otonicRegression.
…increasing argument options
increasing
by Pearson or Spearman corr rho
|
||
if self.increasing == 'pearson': | ||
# Calculate Pearson rho estimate and set accordingly | ||
rho, p_val = pearsonr(X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace p_val by _ as you don't need it
@agramfort, refactored as requested and PEP8ed. |
I relaunched the travis test to check that the failure is unrelated. |
+1 for merge if tests pass |
@@ -113,6 +114,15 @@ class IsotonicRegression(BaseEstimator, TransformerMixin, RegressorMixin): | |||
y_max : optional, default: None | |||
If not None, set the highest value of the fit to y_max. | |||
|
|||
increasing : optional, boolean or string, default : True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick, but the numpydoc convention places the type of the parameter, before the fact it is optional, not after.
What about setting |
This patch looks good to me, apart from my two remarks. @ogrisel it would make more sense to use spearman correlation here. In fact, I don't there is any cases where you'd want to use pearson correlation instead of spearman for such a task. |
@NelleV, fixed both docstring issues. @ogrisel, what if I suggested as next steps that we implement a Fisher transformation to determine confidence intervals and raise an exception if the confidence interval spans zero? Would you be OK leaving the _check_increasing method factored this way so as to make this an easy next step? Fisher transformation CI is valid for both Pearson and Spearman. If we do change the default behavior to one of these approaches, I agree with @NelleV that Spearman should be the default choice. |
If Pearson does not make sense here I would vote for:
|
+1 |
…auto'; implement Fisher transform and warning on 0 \in CI
OK, @ogrisel and @GaelVaroquaux, done as requested. Also added a check on the CI warning getting raised. |
If boolean, whether or not to fit the isotonic regression with y | ||
increasing or decreasing. | ||
|
||
If string and set to "auto," determine whether y should |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comma shouldn't be inside the quotes. You could just say: "auto" determines whether ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
x = np.arange(len(y)) | ||
|
||
y_ = IsotonicRegression(increasing='auto').fit_transform( | ||
x, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please add a check that no warning is raised in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, added the context handler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am somewhat confused: I don't see it on the diff on github.
is_increasing = check_increasing(X, y) | ||
assert_equal(is_increasing, False) | ||
assert_equal(len(w), 1) | ||
assert_equal(True, "interval" in str(w[-1].message)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should be using assert_in (nose.tools.assert_in) here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, assert_warns_message supports checking substrings in the warning message, e.g.,:
# Check that we got increasing=False and CI warning
is_increasing = assert_warns_message(UserWarning, "interval",
check_increasing,
x, y)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed.
Hey! Thanks for all your efforts. We are almost there. All these little details make the code of scikit-learn better, and that why we all love it! |
No worries. Just trying to scratch a personal itch as quickly as possible, so appreciate the counter force for quality. Mind chatting on #scikit-learn/DM sometime to iterate a bit more quickly? Handle is mjbommar |
I'd rather not use IM. I do a lot of things in parallel and it is very hard for me to keep track of everything. Github's interface is great for that. -------- Original message -------- From: Michael Bommarito notifications@github.com Date:22/05/2014 19:31 (GMT+01:00) To: scikit-learn/scikit-learn scikit-learn@noreply.github.com Cc: Gael Varoquaux gael.varoquaux@normalesup.org Subject: Re: [scikit-learn] [MRG] Determine IsotonicRegression `increasing`
by Pearson or Spearman corr rho (#3157) Mind chatting on #scikit-learn/DM sometime to iterate a bit more quickly? Handle is mjbommar — |
OK, @GaelVaroquaux, I think I've addressed the outstanding issues. Would you like to include a rework to the isotonic regression example in this PR or consider it separately? |
:toctree: generated | ||
:template: function.rst | ||
|
||
isotonic.check_increasing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't create a new section just add isotonic.check_increasing below isotonic.isotonic_regression
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Are there notes on doc practices or is it mostly manual + make doc
to test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Are there notes on doc practices or is it mostly manual + make doc to
test?
manual ie copy is the way it's done with similar code AFAIK
cd sklearn
make test-doc
cd doc
make html
Looks like a spurious failure in sklearn.tests.test_common.test_regressor_pickle.
|
yes it's unrelated |
# Run Fisher transform to get the rho CI, but handle rho=+/-1 | ||
if rho not in [-1.0, 1.0]: | ||
F = 0.5 * np.log((1 + rho) / (1 - rho)) | ||
F_se = 1 / np.sqrt(len(x) - 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick for floats use math.sqrt and math.log not numpy.
besides +1 for merge |
@agramfort gave his 👍 I am merging. Thanks a lot @mjbommar . Excellent work. |
[MRG] Determine IsotonicRegression ``increasing`` by Pearson or Spearman corr rho
hm what was the reason to catch the warnings? #6332 errors because numpy became more strict, and now |
if rho >= 0: | ||
increasing_bool = True | ||
else: | ||
increasing_bool = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been written:
increasing_bool = rho >= 0
Also the variable should have been named just increasing
. There is no need to put the expected type of the variable in the variable name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ogrisel, if you check below, you can see that the user provides an input increasing
which may be either string or True/False. If a string is provided, the proper method is applied to determine the direction, thereby setting increasing_bool
. Agreed on other comments but just want to make sure we see the reason re: increasing
vs. increasing_bool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, I missed that.
Sorry to mix a bit of docstring fix with a real PR, but:
increasing
docstring to IsotonicRegression. Missing here: http://scikit-learn.org/stable/modules/generated/sklearn.isotonic.IsotonicRegression.html#sklearn.isotonic.IsotonicRegressionscipy.stats.pearsonr
orscipy.stats.spearmanr
to estimate whether increasing should be True or False. Tests included and passing for isotonic, though there appears to be an issue withsklearn.utils.tests.test_sparsefuncs.test_mean_variance_axis0
for me at the moment when I merged to upstream earlier this morning.