Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

[WIP] Quantile dummy #3426

Closed
wants to merge 1 commit into from
+17,030 −17,424

6 participants

@RolT

Added quantile strategy in dummy classifier.
Added quantile strategy tests in test_dummy.

@arjoly
Owner

Thanks for the pr :-)

sklearn/dummy.py
@@ -317,9 +320,10 @@ class DummyRegressor(BaseEstimator, RegressorMixin):
True if the output at fit is 2d, else false.
"""
- def __init__(self, strategy="mean", constant=None):
+ def __init__(self, strategy="mean", constant=None, alpha=None):
@arjoly Owner
arjoly added a note

Could you document the new parameter and say whenever it's used?

@arjoly Owner
arjoly added a note

I would also highlight the similarity with the median strategy.

@RolT
RolT added a note

ok I'll do that

@arjoly Owner
arjoly added a note

What should be the default? alpha=0.5?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly
Owner

for reference #3421

sklearn/dummy.py
@@ -374,6 +378,18 @@ def fit(self, X, y):
self.constant_ = np.reshape(self.constant, (1, -1))
+ elif self.strategy == "quantile":
+ if not self.alpha:
+ raise TypeError("`alpha` value has to be specified "
+ "when the quantile strategy is used.")
@arjoly Owner
arjoly added a note

It would be nice to have default that doesn't raise error by default. Why do you think of 0.5 or None would be equivalent to 0.5.

Maybe the path for median quantile could be merged. What do you think?

@RolT
RolT added a note

One reason not to do it is that median uses numpy while percentile uses scipy. The numpy median is probably faster.
The other reason would be that it's easier to do the weighted median in numpy than to implement a weighted percentile.

@arjoly Owner
arjoly added a note

For the weighted percentile, that's not that hard. See #3420.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly
Owner

Tests looks good :-)

sklearn/dummy.py
@@ -283,6 +285,7 @@ def predict_log_proba(self, X):
class DummyRegressor(BaseEstimator, RegressorMixin):
+
@arjoly Owner
arjoly added a note

Can you remove this lines?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/dummy.py
((5 lines not shown))
class DummyClassifier(BaseEstimator, ClassifierMixin):
+
@arjoly Owner
arjoly added a note

Can you remove this lines?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly
Owner

Given Travis, the axis keyword is not present in scipy==0.9. Could you backport it? There are examples of backports in sklearn.utils.fixes?

@GaelVaroquaux GaelVaroquaux changed the title from [MRG] Quantile dummy to [WIP] Quantile dummy
@GaelVaroquaux

I switched the title back to WIP until the travis failure is not fixed.

@coveralls

Coverage Status

Changes Unknown when pulling 8aff3fa on RolT:quantile_dummy into ** on scikit-learn:master**.

@coveralls

Coverage Status

Changes Unknown when pulling acf4495 on RolT:quantile_dummy into ** on scikit-learn:master**.

@coveralls

Coverage Status

Changes Unknown when pulling bd92588 on RolT:quantile_dummy into ** on scikit-learn:master**.

sklearn/utils/fixes.py
((63 lines not shown))
+ weights = np.array(1)
+ sumval = 1.0
+ else:
+ indexer[axis] = slice(i, i + 2)
+ j = i + 1
+ weights = np.array([(j - idx), (idx - i)], float)
+ wshape = [1] * sorted.ndim
+ wshape[axis] = 2
+ weights.shape = wshape
+ sumval = weights.sum()
+
+ # Use np.add.reduce (== np.sum but a little faster) to coerce data type
+ return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
+
+ def scoreatpercentile_axis(a, per, axis):
+ return _scoreatpercentile(a, per, axis=axis)
@arjoly Owner
arjoly added a note

Would it possible to have the same name as the scipy function?
This would help whenever we support the axis argument.

The whole backport could be made outside the try ... except ...

@arjoly Owner
arjoly added a note

This would help whenever we support the axis argument.

More precisely, this would help whenever we will want to remove the backport.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on the diff
doc/sphinxext/gen_rst.py
@@ -601,12 +601,8 @@ def generate_dir_rst(dir, fhindex, example_dir, root_dir, plot_gallery, seen_bac
target_dir = root_dir
src_dir = example_dir
if not os.path.exists(os.path.join(src_dir, 'README.txt')):
- print(80 * '_')
- print('Example directory %s does not have a README.txt file' %
- src_dir)
- print('Skipping this directory')
- print(80 * '_')
- return
+ raise ValueError('Example directory %s does not have a README.txt' %
+ src_dir)
@arjoly Owner
arjoly added a note

Hm strange. I wouldn't expect this was modified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly arjoly commented on the diff
examples/neural_networks/README.txt
@@ -0,0 +1,6 @@
+.. _neural_network_examples:
+
+Neural Networks
+-----------------------
+
+Examples concerning the :mod:`sklearn.neural_network` module.
@arjoly Owner
arjoly added a note

Hm strange. I wouldn't expect this was modified.

@amueller Owner

Rebase error with #3460. Can you try to rebase again @RolT ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
sklearn/dummy.py
((6 lines not shown))
constant : int or float or array of shape = [n_outputs]
The explicit constant as predicted by the "constant" strategy. This
parameter is useful only for the "constant" strategy.
+ alpha : float, optional.
@arjoly Owner
arjoly added a note

What do you think of naming this parameter quantile?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@arjoly
Owner

Looks good with an update in doc / modules / model_evaluation.rst about DummyRegressor.

@arjoly
Owner

@RolT Do you plan to finish this pr? I am checking that this pr hasn't stalled.

@RolT

I completely forgot about that PR, I'm a bit busy right now but I'll look into that soon. Sorry about the delay

@amueller amueller split test_common.py into checks and test file.
move dataset generation into estimator_checks

Closes #2360. Fix tiebreaking.

some cleanups in common_test, speedup.

Make fit_transform and fit().transform() equivalent in nmf

slight speedup
some cleanup

Make everything accept lists as input.

Remove 'DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future' warnings

Remove a couple more 'using a non-integer number instead of an integer DeprecationWarning'

Added a parameter check on n_estimators in BaseEnsemble to raise error if it is not strictly positive.

ENH Make all decision functions have the same shape. Fixes SVC and GradientBoosting. Closes #1490.

ENH better default for test for SelectKBest and random projection

MAINT tree compute feature_importance by default

pep8

ENH add label ranking average precision

DOC write narrative doc for label ranking average precision

DOC FIX error + wording

TST invariance testing + handle degenerate case

flake8

FIX use np.bincount

DOC friendlier narrative documentation

ENH simplify label ranking average precision (thanks @jnothman)

ENH be backward compatible for old version of scipy

Typo

pep8

DOC remove confusing mention to mean average precision

DOC improve documentatino thanks to @vene and remove mention of relevant labels

DOC more intuition about corner case

DOC add documentation to backported function

ENH less nested code

FIX encoding issue

DOC update what's new

MAINT deprecate fit_ovr, fit_ovo, fit_ecoc, predict_ovr, predict_ovo, predict_ecoc and predict_proba_ovr

ENH + DOC set a default scorer in the multiclass module

DOC typo + not forgetting single output case

scorer: add sample_weight support

COSMIT Use explicit if/else in scorer

TST default scorers with sample_weight

DOC Update What's New

MAINT split sklearn/metrics/metrics.py

DOC improve documentation and distinguish each module

ENH add a friendly warnings before deleting the file

TST: fix tests on numpy 1.9.b2

Refactor input validation.

remove check_arrays stuff and old input validation

ENH add allowed_sparse named argument for @ogrisel

FIX classes name in OvR

DOC mention doc-building dependency on Pillow

DOC link example gallery scripts rather than inline

COSMIT

DOC show referring examples on API reference pages

DOC ensure longer underline

DOC Fix example path

DOC fix 'Return' -> 'Returns'

FIX Py3k support for out-of-core example

DOC make neural networks example appear

Also make the absence of a README fail doc compilation

DOC add links to github sourcecode in API reference

DOC fix opaque background glitch when hovering example icons

FIX better RandomizedPCA sparse deprecation

added quantile strategy to dummy classifier

replaced quantile by percentile, added docstrings

switched back to quantile

final commit

forgot to save the file

removed blank lines, added default alpha param

added backport for axis keyword in scipy.scoreatpercentile

added quantile strategy to dummy classifier

replaced quantile by percentile, added docstrings

switched back to quantile

final commit

forgot to save the file

added backport for axis keyword in scipy.scoreatpercentile

removed scoreatpercentile axis dependency in the dummy tests

typo

fixed import

fixed axis keyword mistake

changed fix function name, fixed rebase bug

removed blank lines, added default alpha param

removed blank lines, added default alpha param

removed unnecessary import

docstring merge typo

updated doc (quantile dummy regressor)
1a376af
@RolT

This PR is a mess, I'll do a new clean one in 1 step when I get some time... I cant rebase anymore

@RolT RolT closed this
@RolT RolT deleted the RolT:quantile_dummy branch
@staple

@RoIT Just wanted to check in to see if you are still planning to work on this. In case not, I'd like to give it a try. But if you're still planning to work on it, then no worries.

@arjoly
Owner

@staple I think that you can give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Sep 7, 2014
  1. @amueller @RolT

    split test_common.py into checks and test file.

    amueller authored RolT committed
    move dataset generation into estimator_checks
    
    Closes #2360. Fix tiebreaking.
    
    some cleanups in common_test, speedup.
    
    Make fit_transform and fit().transform() equivalent in nmf
    
    slight speedup
    some cleanup
    
    Make everything accept lists as input.
    
    Remove 'DeprecationWarning: using a non-integer number instead of an integer will result in an error in the future' warnings
    
    Remove a couple more 'using a non-integer number instead of an integer DeprecationWarning'
    
    Added a parameter check on n_estimators in BaseEnsemble to raise error if it is not strictly positive.
    
    ENH Make all decision functions have the same shape. Fixes SVC and GradientBoosting. Closes #1490.
    
    ENH better default for test for SelectKBest and random projection
    
    MAINT tree compute feature_importance by default
    
    pep8
    
    ENH add label ranking average precision
    
    DOC write narrative doc for label ranking average precision
    
    DOC FIX error + wording
    
    TST invariance testing + handle degenerate case
    
    flake8
    
    FIX use np.bincount
    
    DOC friendlier narrative documentation
    
    ENH simplify label ranking average precision (thanks @jnothman)
    
    ENH be backward compatible for old version of scipy
    
    Typo
    
    pep8
    
    DOC remove confusing mention to mean average precision
    
    DOC improve documentatino thanks to @vene and remove mention of relevant labels
    
    DOC more intuition about corner case
    
    DOC add documentation to backported function
    
    ENH less nested code
    
    FIX encoding issue
    
    DOC update what's new
    
    MAINT deprecate fit_ovr, fit_ovo, fit_ecoc, predict_ovr, predict_ovo, predict_ecoc and predict_proba_ovr
    
    ENH + DOC set a default scorer in the multiclass module
    
    DOC typo + not forgetting single output case
    
    scorer: add sample_weight support
    
    COSMIT Use explicit if/else in scorer
    
    TST default scorers with sample_weight
    
    DOC Update What's New
    
    MAINT split sklearn/metrics/metrics.py
    
    DOC improve documentation and distinguish each module
    
    ENH add a friendly warnings before deleting the file
    
    TST: fix tests on numpy 1.9.b2
    
    Refactor input validation.
    
    remove check_arrays stuff and old input validation
    
    ENH add allowed_sparse named argument for @ogrisel
    
    FIX classes name in OvR
    
    DOC mention doc-building dependency on Pillow
    
    DOC link example gallery scripts rather than inline
    
    COSMIT
    
    DOC show referring examples on API reference pages
    
    DOC ensure longer underline
    
    DOC Fix example path
    
    DOC fix 'Return' -> 'Returns'
    
    FIX Py3k support for out-of-core example
    
    DOC make neural networks example appear
    
    Also make the absence of a README fail doc compilation
    
    DOC add links to github sourcecode in API reference
    
    DOC fix opaque background glitch when hovering example icons
    
    FIX better RandomizedPCA sparse deprecation
    
    added quantile strategy to dummy classifier
    
    replaced quantile by percentile, added docstrings
    
    switched back to quantile
    
    final commit
    
    forgot to save the file
    
    removed blank lines, added default alpha param
    
    added backport for axis keyword in scipy.scoreatpercentile
    
    added quantile strategy to dummy classifier
    
    replaced quantile by percentile, added docstrings
    
    switched back to quantile
    
    final commit
    
    forgot to save the file
    
    added backport for axis keyword in scipy.scoreatpercentile
    
    removed scoreatpercentile axis dependency in the dummy tests
    
    typo
    
    fixed import
    
    fixed axis keyword mistake
    
    changed fix function name, fixed rebase bug
    
    removed blank lines, added default alpha param
    
    removed blank lines, added default alpha param
    
    removed unnecessary import
    
    docstring merge typo
    
    updated doc (quantile dummy regressor)
Something went wrong with that request. Please try again.