Skip to content

Conversation

mjbommar
Copy link
Contributor

Sorry to mix a bit of docstring fix with a real PR, but:

  1. Adding the missing increasing docstring to IsotonicRegression. Missing here: http://scikit-learn.org/stable/modules/generated/sklearn.isotonic.IsotonicRegression.html#sklearn.isotonic.IsotonicRegression
  2. Adding the option to use either scipy.stats.pearsonr or scipy.stats.spearmanr to estimate whether increasing should be True or False. Tests included and passing for isotonic, though there appears to be an issue with sklearn.utils.tests.test_sparsefuncs.test_mean_variance_axis0 for me at the moment when I merged to upstream earlier this morning.

@mjbommar mjbommar changed the title Isotonic increasing auto Determine IsotonicRegression increasing by Pearson or Spearman corr rho May 18, 2014
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.01%) when pulling 031e5e7 on mjbommar:isotonic-increasing-auto into 974fb95 on scikit-learn:master.


if self.increasing == 'pearson':
# Calculate Pearson rho estimate and set accordingly
rho, p_val = pearsonr(X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace p_val by _ as you don't need it

@mjbommar
Copy link
Contributor Author

@agramfort, refactored as requested and PEP8ed.

@ogrisel
Copy link
Member

ogrisel commented May 20, 2014

I relaunched the travis test to check that the failure is unrelated.

@agramfort
Copy link
Member

+1 for merge if tests pass

@@ -113,6 +114,15 @@ class IsotonicRegression(BaseEstimator, TransformerMixin, RegressorMixin):
y_max : optional, default: None
If not None, set the highest value of the fit to y_max.

increasing : optional, boolean or string, default : True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, but the numpydoc convention places the type of the parameter, before the fact it is optional, not after.

@ogrisel
Copy link
Member

ogrisel commented May 20, 2014

What about setting increasing='pearson' by default to be more user-friendly?

@NelleV
Copy link
Member

NelleV commented May 20, 2014

This patch looks good to me, apart from my two remarks.

@ogrisel it would make more sense to use spearman correlation here. In fact, I don't there is any cases where you'd want to use pearson correlation instead of spearman for such a task.

@mjbommar
Copy link
Contributor Author

@NelleV, fixed both docstring issues.

@ogrisel, what if I suggested as next steps that we implement a Fisher transformation to determine confidence intervals and raise an exception if the confidence interval spans zero? Would you be OK leaving the _check_increasing method factored this way so as to make this an easy next step? Fisher transformation CI is valid for both Pearson and Spearman.

If we do change the default behavior to one of these approaches, I agree with @NelleV that Spearman should be the default choice.

@ogrisel
Copy link
Member

ogrisel commented May 20, 2014

If Pearson does not make sense here I would vote for:

  • increasing in {True, False, "auto" (default)} where "auto" means using Spearman. If 0 is in the 99% CI I would just issue a warning rather than raise an exception.

@GaelVaroquaux
Copy link
Member

  • increasing in {True, False, "auto" (default)} where auto means using Spearman. If 0 is in the 99 CI I would just issue a warning rather than raise an exception.

+1

@mjbommar
Copy link
Contributor Author

OK, @ogrisel and @GaelVaroquaux, done as requested. Also added a check on the CI warning getting raised.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling e730a6d on mjbommar:isotonic-increasing-auto into 974fb95 on scikit-learn:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling e730a6d on mjbommar:isotonic-increasing-auto into 974fb95 on scikit-learn:master.

If boolean, whether or not to fit the isotonic regression with y
increasing or decreasing.

If string and set to "auto," determine whether y should
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comma shouldn't be inside the quotes. You could just say: "auto" determines whether ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

x = np.arange(len(y))

y_ = IsotonicRegression(increasing='auto').fit_transform(
x, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add a check that no warning is raised in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, added the context handler

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am somewhat confused: I don't see it on the diff on github.

is_increasing = check_increasing(X, y)
assert_equal(is_increasing, False)
assert_equal(len(w), 1)
assert_equal(True, "interval" in str(w[-1].message))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be using assert_in (nose.tools.assert_in) here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, assert_warns_message supports checking substrings in the warning message, e.g.,:

# Check that we got increasing=False and CI warning
is_increasing = assert_warns_message(UserWarning, "interval", 
                                         check_increasing,
                                         x, y)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

@GaelVaroquaux
Copy link
Member

Hey!

Thanks for all your efforts. We are almost there.

All these little details make the code of scikit-learn better, and that why we all love it!

@mjbommar
Copy link
Contributor Author

No worries. Just trying to scratch a personal itch as quickly as possible, so appreciate the counter force for quality.

Mind chatting on #scikit-learn/DM sometime to iterate a bit more quickly? Handle is mjbommar

@GaelVaroquaux
Copy link
Member

I'd rather not use IM. I do a lot of things in parallel and it is very hard for me to keep track of everything. Github's interface is great for that. 

-------- Original message --------
From: Michael Bommarito notifications@github.com
Date:22/05/2014 19:31 (GMT+01:00)
To: scikit-learn/scikit-learn scikit-learn@noreply.github.com
Cc: Gael Varoquaux gael.varoquaux@normalesup.org
Subject: Re: [scikit-learn] [MRG] Determine IsotonicRegression `increasing` by Pearson or Spearman corr rho (#3157)
No worries. Just trying to scratch a personal itch as quickly as possible, so appreciate the counter force for quality.

Mind chatting on #scikit-learn/DM sometime to iterate a bit more quickly? Handle is mjbommar


Reply to this email directly or view it on GitHub.

@mjbommar
Copy link
Contributor Author

OK, @GaelVaroquaux, I think I've addressed the outstanding issues.

Would you like to include a rework to the isotonic regression example in this PR or consider it separately?
http://scikit-learn.org/dev/auto_examples/plot_isotonic_regression.html

:toctree: generated
:template: function.rst

isotonic.check_increasing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't create a new section just add isotonic.check_increasing below isotonic.isotonic_regression

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Are there notes on doc practices or is it mostly manual + make doc to test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Are there notes on doc practices or is it mostly manual + make doc to
test?

manual ie copy is the way it's done with similar code AFAIK

cd sklearn
make test-doc
cd doc
make html

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling 3fe9641 on mjbommar:isotonic-increasing-auto into 974fb95 on scikit-learn:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling 3fe9641 on mjbommar:isotonic-increasing-auto into 974fb95 on scikit-learn:master.

@mjbommar
Copy link
Contributor Author

Looks like a spurious failure in sklearn.tests.test_common.test_regressor_pickle.

======================================================================
ERROR: sklearn.tests.test_common.test_regressor_pickle('OrthogonalMatchingPursuitCV', <class 'sklearn.linear_model.omp.OrthogonalMatchingPursuitCV'>, array([[-0.44836249, -0.47282444, -1.20608008, ..., -0.75500806,
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/virtualenv/python2.7_with_system_site_packages/local/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/tests/test_common.py", line 890, in check_regressors_pickle
    regressor.fit(X, y_)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/linear_model/omp.py", line 878, in fit
    omp.fit(X, y)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/linear_model/omp.py", line 695, in fit
    copy_Gram, True).T
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/linear_model/omp.py", line 483, in orthogonal_mp_gram
    return_path=return_path)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/linear_model/omp.py", line 221, in _gram_omp
    **solve_triangular_args)
  File "/usr/lib/python2.7/dist-packages/scipy/linalg/basic.py", line 115, in solve_triangular
    a1, b1 = map(asarray_chkfinite,(a,b))
  File "/home/travis/virtualenv/python2.7_with_system_site_packages/local/lib/python2.7/site-packages/numpy/lib/function_base.py", line 595, in asarray_chkfinite
    "array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs

@agramfort
Copy link
Member

yes it's unrelated

# Run Fisher transform to get the rho CI, but handle rho=+/-1
if rho not in [-1.0, 1.0]:
F = 0.5 * np.log((1 + rho) / (1 - rho))
F_se = 1 / np.sqrt(len(x) - 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick for floats use math.sqrt and math.log not numpy.

@agramfort
Copy link
Member

besides +1 for merge

@GaelVaroquaux
Copy link
Member

@agramfort gave his 👍 I am merging. Thanks a lot @mjbommar . Excellent work.

GaelVaroquaux added a commit that referenced this pull request May 25, 2014
[MRG] Determine IsotonicRegression ``increasing`` by Pearson or Spearman corr rho
@GaelVaroquaux GaelVaroquaux merged commit 5cef947 into scikit-learn:master May 25, 2014
@amueller
Copy link
Member

hm what was the reason to catch the warnings? #6332 errors because numpy became more strict, and now spearmanr errors. Scipy master has a fix but we need to work around that probably?

if rho >= 0:
increasing_bool = True
else:
increasing_bool = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have been written:

    increasing_bool = rho >= 0

Also the variable should have been named just increasing. There is no need to put the expected type of the variable in the variable name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ogrisel, if you check below, you can see that the user provides an input increasing which may be either string or True/False. If a string is provided, the proper method is applied to determine the direction, thereby setting increasing_bool. Agreed on other comments but just want to make sure we see the reason re: increasing vs. increasing_bool

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, I missed that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants