Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

MRG: GBRT loss-functions #924

Merged
merged 38 commits into from

4 participants

Peter Prettenhofer Olivier Grisel Gilles Louppe Alexandre Gramfort
Peter Prettenhofer
Owner

Two new regression loss functions for gradient boosting:

  • Huber loss (aka M-Loss) - robust loss function
  • Quantile loss - loss function for quantile regression; can be used to create prediction intervals (see example).

Some other gimmicks:

  • GBRT supports max_features for variance reduction (similar to RandomForest).
pprett added some commits
Peter Prettenhofer pprett started work on Huber loss function for robust regression bc9f799
Peter Prettenhofer pprett Merge branch 'master' into gbrt-huber f20296a
Peter Prettenhofer pprett add huber loss to test 104bb82
Peter Prettenhofer pprett implemented huber loss for robust regression a170d84
Peter Prettenhofer pprett fix errors in huber loss 583ce70
Peter Prettenhofer pprett add alpha parameter for huber robust regression loss 7f436ab
Peter Prettenhofer pprett fix: ensure X is c-continuous 038a489
Peter Prettenhofer pprett Merge branch 'master' into gbrt-huber
Conflicts:
	sklearn/ensemble/gradient_boosting.py
196fd69
Peter Prettenhofer pprett added feature subsampling to GBRT (via max_features) fab9857
Peter Prettenhofer pprett fix: forgot comma 48a08d7
Peter Prettenhofer pprett added test for max_features c696ce9
Peter Prettenhofer pprett fix: alpha needs to be scaled by 100 b6b564d
Peter Prettenhofer pprett wip: added quantile regression loss; this allows for prediction inter…
…vals; adopted the GP regression example to show-case prediction intervals
aa8354a
Peter Prettenhofer pprett added title to example 14b5c05
Peter Prettenhofer pprett performance improvement for random split (ctyped two variables). a087e29
Peter Prettenhofer pprett import random split 505abb4
Peter Prettenhofer pprett test for quantile loss function 878ddaf
Peter Prettenhofer pprett Use BaseEstimator for constant predictors 688b612
Peter Prettenhofer pprett cosmit 5b67d11
Peter Prettenhofer pprett huber and quantile loss for gbrt 7e2ffbf
Peter Prettenhofer pprett better docs for quantile reg 1832641
Peter Prettenhofer pprett Merge branch 'master' into gbrt-huber 802ed82
Peter Prettenhofer pprett Merge remote branch 'upstream/master' into gbrt-huber 8ea9262
Peter Prettenhofer pprett Merge branch 'master' into gbrt-huber
Conflicts:
	sklearn/tree/_tree.c
	sklearn/tree/_tree.pyx
3ca2c71
Peter Prettenhofer pprett Merge branch 'master' into gbrt-huber
Conflicts:
	sklearn/tree/_tree.c
773ea6c
Peter Prettenhofer pprett Merge remote branch 'upstream/master' into gbrt-huber 8792cbd
Olivier Grisel
Owner

I tried running the example and it seems to never complete: I launched the process more that 1min ago and it should have converged on such a small dataset (infinite loop?). I cannot stop it with a keyboard interruption to see the python traceback as it's in the compiled cython code.

Peter Prettenhofer
Owner

very strange - did you run make? could you send me your configuration (numpy, scipy)? thx

Olivier Grisel
Owner

Yes I ran make on pprett/gbrt-loss. I am running on python 2.6 from an old 2009 mac osx laptop with numpy 1.6.1 and scipy 0.10.1.

I tried to run it under gdb and the script run fine in less than 10s ...

Peter Prettenhofer
Owner

hmm... the example fits 3 gbrt models (250 iterations and 100 samples) and predicts the mean, upper and lower quantile on 1000 samples - so it should run in a couple of seconds.

maybe there is another dtype issue in the cython code that I haven't covered yet - I'll add some more tests to nail it down.

could you do me a favor and re-run the example using the latest commit (cf09064) and check if the error occurs?

thanks, p

Olivier Grisel
Owner

Alright false alarm.... It was just that the matplotlib window was opening in the background (behind my other open windows)...

Sorry for the noise...

Olivier Grisel
Owner

Alright, I had a quick look at the code and it seems fine although I don't know much about GBRT. +1 for merging from my side.

Peter Prettenhofer
Owner

np - thanks for checking!

examples/ensemble/plot_gradient_boosting_quantile.py
@@ -0,0 +1,78 @@
+"""
+=====================================================
+Prediction Intervals for Gradient Boosting Regression
+=====================================================
+
+This example shows how quantile regression can be used
+to create prediction intervals.
+"""
+
+import numpy as np
+from sklearn.ensemble import GradientBoostingRegressor
+from matplotlib import pyplot as pl
Alexandre Gramfort Owner

got converted to pyplot? :)

Peter Prettenhofer Owner
pprett added a note

changed that - thx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
examples/ensemble/plot_gradient_boosting_quantile.py
@@ -0,0 +1,78 @@
+"""
+=====================================================
+Prediction Intervals for Gradient Boosting Regression
+=====================================================
+
+This example shows how quantile regression can be used
+to create prediction intervals.
+"""
+
+import numpy as np
+from sklearn.ensemble import GradientBoostingRegressor
+from matplotlib import pyplot as pl
+
+np.random.seed(1)
Alexandre Gramfort Owner

do we want global seeding in the examples?

Peter Prettenhofer Owner
pprett added a note

@agramfort I've copied the data generation from plot_gp_regression.py. There are currently more than 30 examples that use np.random.seed; I think we should adress this in a dedicated PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Gilles Louppe
Owner

Code looks good. All tests pass on my machine. +1 for merge.

The example is nice! However I was wondering, how does it compare with Gaussian Processes? We actually have the same example for GP, maybe merging them should be better?

Alexandre Gramfort
Owner
Olivier Grisel
Owner

+1 for combining GP and quantile GBRT examples.

Peter Prettenhofer
Owner

regarding the example: do you think its better to merge them into one example or should the GP example remain unchanged and I include GP in the quantile example?

Olivier Grisel
Owner

+1 for mergning them and having one less example to maintain.

Peter Prettenhofer
Owner

after looking at the GP example in more detail I don't think its a good idea to merge it with the quantile example - Vincent and Jake have done a great job in explaining the effect of some of the hyper-parameters on the GP posterior - IMHO adding the quantile regression plot would not contribute to the purpose of the example.

I agree that too much examples are a maintenance burden thus I decided to remove the quantile regression example. I don't think that it is of interest to a broader audience anyways - I rather think that I create a gist from the code and make a short blog post.

Olivier Grisel
Owner

Please keep the quantile regression as it is, as a separate example then and maybe cross-link the two examples as they are completely different solution to the same kind of problems.

Olivier Grisel
Owner

BTW: looking forward to reading your blog post too :)

Gilles Louppe
Owner

Good. I am +1 for merge.

Alexandre Gramfort
Owner

+1 to keep the quantile regression example.

you should also update the what's new

Peter Prettenhofer pprett merged commit e9590b6 into from
Peter Prettenhofer pprett closed this
Peter Prettenhofer
Owner

@ogrisel @glouppe @agramfort thanks for the reviews!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on May 23, 2012
  1. Peter Prettenhofer
Commits on May 25, 2012
  1. Peter Prettenhofer
  2. Peter Prettenhofer

    add huber loss to test

    pprett authored
  3. Peter Prettenhofer
Commits on May 28, 2012
  1. Peter Prettenhofer

    fix errors in huber loss

    pprett authored
Commits on May 29, 2012
  1. Peter Prettenhofer
Commits on May 30, 2012
  1. Peter Prettenhofer

    fix: ensure X is c-continuous

    pprett authored
  2. Peter Prettenhofer

    Merge branch 'master' into gbrt-huber

    pprett authored
    Conflicts:
    	sklearn/ensemble/gradient_boosting.py
Commits on Jun 2, 2012
  1. Peter Prettenhofer
  2. Peter Prettenhofer

    fix: forgot comma

    pprett authored
  3. Peter Prettenhofer

    added test for max_features

    pprett authored
Commits on Jun 3, 2012
  1. Peter Prettenhofer
Commits on Jun 5, 2012
  1. Peter Prettenhofer

    wip: added quantile regression loss; this allows for prediction inter…

    pprett authored
    …vals; adopted the GP regression example to show-case prediction intervals
  2. Peter Prettenhofer

    added title to example

    pprett authored
Commits on Jun 6, 2012
  1. Peter Prettenhofer
  2. Peter Prettenhofer

    import random split

    pprett authored
Commits on Jun 14, 2012
  1. Peter Prettenhofer

    test for quantile loss function

    pprett authored
  2. Peter Prettenhofer
  3. Peter Prettenhofer

    cosmit

    pprett authored
  4. Peter Prettenhofer
Commits on Jun 15, 2012
  1. Peter Prettenhofer

    better docs for quantile reg

    pprett authored
Commits on Jun 18, 2012
  1. Peter Prettenhofer
Commits on Jun 20, 2012
  1. Peter Prettenhofer
Commits on Jun 25, 2012
  1. Peter Prettenhofer

    Merge branch 'master' into gbrt-huber

    pprett authored
    Conflicts:
    	sklearn/tree/_tree.c
    	sklearn/tree/_tree.pyx
Commits on Jun 26, 2012
  1. Peter Prettenhofer

    Merge branch 'master' into gbrt-huber

    pprett authored
    Conflicts:
    	sklearn/tree/_tree.c
Commits on Jul 2, 2012
  1. Peter Prettenhofer
  2. Peter Prettenhofer

    use dtype float32

    pprett authored
  3. Peter Prettenhofer

    proper pylab import

    pprett authored
Commits on Jul 3, 2012
  1. Peter Prettenhofer
Commits on Jul 5, 2012
  1. Peter Prettenhofer
  2. Peter Prettenhofer
  3. Peter Prettenhofer

    y must be one dimensional

    pprett authored
  4. Peter Prettenhofer

    more tests

    pprett authored
Commits on Jul 7, 2012
  1. Peter Prettenhofer
  2. Peter Prettenhofer
  3. Peter Prettenhofer
Commits on Jul 9, 2012
  1. Peter Prettenhofer

    add quantile example again

    pprett authored
  2. Peter Prettenhofer

    added new features to whatsnew

    pprett authored
This page is out of date. Refresh to see the latest.
32 doc/modules/ensemble.rst
View
@@ -172,9 +172,6 @@ amount of time (e.g., on large datasets).
trees", Machine Learning, 63(1), 3-42, 2006.
-.. _gradient_boosting:
-
-
Feature importance evaluation
-----------------------------
@@ -213,6 +210,7 @@ the matching feature to the prediction function.
* :ref:`example_ensemble_plot_forest_importances_faces.py`
* :ref:`example_ensemble_plot_forest_importances.py`
+.. _gradient_boosting:
Gradient Tree Boosting
======================
@@ -279,11 +277,10 @@ that controls overfitting via :ref:`shrinkage <gradient_boosting_shrinkage>`.
Regression
----------
-:class:`GradientBoostingRegressor` supports a number of different loss
-functions for regression which can be specified via the argument
-``loss``. Currently, supported are least squares (``loss='ls'``) and
-least absolute deviation (``loss='lad'``), which is more robust w.r.t.
-outliers. See [F2001]_ for detailed information.
+:class:`GradientBoostingRegressor` supports a number of
+:ref:`different loss functions <gradient_boosting_loss>`
+for regression which can be specified via the argument
+``loss`` which defaults to least squares (``'ls'``).
::
@@ -373,6 +370,7 @@ Where the step length :math:`\gamma_m` is choosen using line search:
The algorithms for regression and classification
only differ in the concrete loss function used.
+.. _gradient_boosting_loss:
Loss Functions
...............
@@ -388,6 +386,13 @@ the parameter ``loss``:
* Least absolute deviation (``'lad'``): A robust loss function for
regression. The initial model is given by the median of the
target values.
+ * Huber (``'huber'``): Another robust loss function that combines
+ least squares and least absolute deviation; use ``alpha`` to
+ control the sensitivity w.r.t. outliers (see [F2001]_ for more
+ details).
+ * Quantile (``'quantile'``): A loss function for quantile regression.
+ Use ``0 < alpha < 1`` to specify the quantile. This loss function
+ can be used to create prediction intervals.
* Classification
@@ -438,8 +443,7 @@ Subsampling
[F1999]_ proposed stochastic gradient boosting, which combines gradient
boosting with bootstrap averaging (bagging). At each iteration
the base classifier is trained on a fraction ``subsample`` of
-the available training data.
-The subsample is drawn without replacement.
+the available training data. The subsample is drawn without replacement.
A typical value of ``subsample`` is 0.5.
The figure below illustrates the effect of shrinkage and subsampling
@@ -453,6 +457,14 @@ does poorly.
:align: center
:scale: 75
+For ``subsample < 1``, the deviance on the out-of-bag samples in the i-the iteration
+is stored in the attribute ``oob_score_[i]``. Out-of-bag estimates can be
+used for model selection (e.g. to determine the optimal number of iterations).
+
+Another strategy to reduce the variance is by subsampling the features
+analogous to the random splits in Random Forests. The size of the subsample
+can be controled via the ``max_features`` parameter.
+
.. topic:: Examples:
13 doc/whats_new.rst
View
@@ -9,6 +9,13 @@
Changelog
---------
+ - :class:`ensemble.GradientBoostingRegressor` and
+ :class:`ensemble.GradientBoostingClassifier` now support feature subsampling
+ via the ``max_features`` argument.
+
+ - Added Huber and Quantile loss functions to
+ :class:`ensemble.GradientBoostingRegressor`.
+
- Added :class:`preprocessing.LabelBinarizer`, a simple utility class to
normalize labels or transform non-numerical labels, by `Mathieu Blondel`_.
@@ -26,9 +33,9 @@ Changelog
API changes summary
-------------------
- - In :class:`hmm` objects, like :class:`hmm.GaussianHMM`,
- :class:`hmm.MultinomialHMM`, etc., all parameters must be passed to the
- object when initialising it and not through ``fit``. Now ``fit`` will
+ - In :class:`hmm` objects, like :class:`hmm.GaussianHMM`,
+ :class:`hmm.MultinomialHMM`, etc., all parameters must be passed to the
+ object when initialising it and not through ``fit``. Now ``fit`` will
only accept the data as an input parameter.
- For all SVM classes, a faulty behavior of ``gamma`` was fixed. Previously,
79 examples/ensemble/plot_gradient_boosting_quantile.py
View
@@ -0,0 +1,79 @@
+"""
+=====================================================
+Prediction Intervals for Gradient Boosting Regression
+=====================================================
+
+This example shows how quantile regression can be used
+to create prediction intervals.
+"""
+
+import numpy as np
+import pylab as pl
+from sklearn.ensemble import GradientBoostingRegressor
+
+
+np.random.seed(1)
+
+
+def f(x):
+ """The function to predict."""
+ return x * np.sin(x)
+
+#----------------------------------------------------------------------
+# First the noiseless case
+X = np.atleast_2d(np.random.uniform(0, 10.0, size=100)).T
+X = X.astype(np.float32)
+
+# Observations
+y = f(X).ravel()
+
+dy = 1.5 + 1.0 * np.random.random(y.shape)
+noise = np.random.normal(0, dy)
+y += noise
+y = y.astype(np.float32)
+
+# Mesh the input space for evaluations of the real function, the prediction and
+# its MSE
+xx = np.atleast_2d(np.linspace(0, 10, 1000)).T
+xx = xx.astype(np.float32)
+
+alpha = 0.95
+
+clf = GradientBoostingRegressor(loss='quantile', alpha=alpha,
+ n_estimators=250, max_depth=3,
+ learn_rate=.1, min_samples_leaf=9,
+ min_samples_split=9)
+
+clf.fit(X, y)
+
+# Make the prediction on the meshed x-axis
+y_upper = clf.predict(xx)
+
+clf.set_params(alpha=1.0 - alpha)
+clf.fit(X, y)
+
+# Make the prediction on the meshed x-axis
+y_lower = clf.predict(xx)
+
+clf.set_params(loss='ls')
+clf.fit(X, y)
+
+# Make the prediction on the meshed x-axis
+y_pred = clf.predict(xx)
+
+# Plot the function, the prediction and the 95% confidence interval based on
+# the MSE
+fig = pl.figure()
+pl.plot(xx, f(xx), 'g:', label=u'$f(x) = x\,\sin(x)$')
+pl.plot(X, y, 'b.', markersize=10, label=u'Observations')
+pl.plot(xx, y_pred, 'r-', label=u'Prediction')
+pl.plot(xx, y_upper, 'k-')
+pl.plot(xx, y_lower, 'k-')
+pl.fill(np.concatenate([xx, xx[::-1]]),
+ np.concatenate([y_upper, y_lower[::-1]]),
+ alpha=.5, fc='b', ec='None', label='95% prediction interval')
+pl.xlabel('$x$')
+pl.ylabel('$f(x)$')
+pl.ylim(-10, 20)
+pl.legend(loc='upper left')
+pl.show()
24 examples/ensemble/plot_gradient_boosting_regularization.py
View
@@ -6,10 +6,15 @@
Illustration of the effect of different regularization strategies
for Gradient Boosting. The example is taken from Hastie et al 2009.
-The loss function used is binomial deviance. In combination with
-shrinkage, stochastic gradient boosting (Sample 0.5) can produce
-more accurate models.
+The loss function used is binomial deviance. Regularization via
+shrinkage (``learn_rate < 1.0``) improves performance considerably.
+In combination with shrinkage, stochastic gradient boosting
+(``subsample < 1.0``) can produce more accurate models by reducing the
+variance via bagging.
Subsampling without shrinkage usually does poorly.
+Another strategy to reduce the variance is by subsampling the features
+analogous to the random splits in Random Forests
+(via the ``max_features`` parameter).
.. [1] T. Hastie, R. Tibshirani and J. Friedman, "Elements of Statistical
Learning Ed. 2", Springer, 2009.
@@ -39,12 +44,14 @@
for label, color, setting in [('No shrinkage', 'orange',
{'learn_rate': 1.0, 'subsample': 1.0}),
- ('Shrink=0.1', 'turquoise',
+ ('learn_rate=0.1', 'turquoise',
{'learn_rate': 0.1, 'subsample': 1.0}),
- ('Sample=0.5', 'blue',
+ ('subsample=0.5', 'blue',
{'learn_rate': 1.0, 'subsample': 0.5}),
- ('Shrink=0.1, Sample=0.5', 'gray',
- {'learn_rate': 0.1, 'subsample': 0.5})]:
+ ('learn_rate=0.1, subsample=0.5', 'gray',
+ {'learn_rate': 0.1, 'subsample': 0.5}),
+ ('learn_rate=0.1, max_features=2', 'magenta',
+ {'learn_rate': 0.1, 'max_features': 2})]:
params = dict(original_params)
params.update(setting)
@@ -57,10 +64,9 @@
for i, y_pred in enumerate(clf.staged_decision_function(X_test)):
test_deviance[i] = clf.loss_(y_test, y_pred)
- pl.plot(np.arange(test_deviance.shape[0]) + 1, test_deviance, '-',
+ pl.plot((np.arange(test_deviance.shape[0]) + 1)[::5], test_deviance[::5], '-',
color=color, label=label)
-pl.title('Deviance')
pl.legend(loc='upper left')
pl.xlabel('Boosting Iterations')
pl.ylabel('Test Set Deviance')
1  sklearn/ensemble/__init__.py
View
@@ -8,6 +8,5 @@
from .forest import RandomForestRegressor
from .forest import ExtraTreesClassifier
from .forest import ExtraTreesRegressor
-
from .gradient_boosting import GradientBoostingClassifier
from .gradient_boosting import GradientBoostingRegressor
183 sklearn/ensemble/gradient_boosting.py
View
@@ -24,7 +24,10 @@
import numpy as np
+from scipy import stats
+
from .base import BaseEnsemble
+from ..base import BaseEstimator
from ..base import ClassifierMixin
from ..base import RegressorMixin
from ..utils import check_random_state, array2d
@@ -39,11 +42,23 @@
from ._gradient_boosting import predict_stages
from ._gradient_boosting import predict_stage
-__all__ = ["GradientBoostingClassifier",
- "GradientBoostingRegressor"]
+
+class QuantileEstimator(BaseEstimator):
+ """An estimator predicting the alpha-quantile of the training targets."""
+ def __init__(self, alpha=0.9):
+ assert 0 < alpha < 1.0
+ self.alpha = alpha
+
+ def fit(self, X, y):
+ self.quantile = stats.scoreatpercentile(y, self.alpha * 100.0)
+
+ def predict(self, X):
+ y = np.empty((X.shape[0], 1), dtype=np.float64)
+ y.fill(self.quantile)
+ return y
-class MedianEstimator(object):
+class MedianEstimator(BaseEstimator):
"""An estimator predicting the median of the training targets."""
def fit(self, X, y):
self.median = np.median(y)
@@ -54,7 +69,7 @@ def predict(self, X):
return y
-class MeanEstimator(object):
+class MeanEstimator(BaseEstimator):
"""An estimator predicting the mean of the training targets."""
def fit(self, X, y):
self.mean = np.mean(y)
@@ -65,7 +80,7 @@ def predict(self, X):
return y
-class LogOddsEstimator(object):
+class LogOddsEstimator(BaseEstimator):
"""An estimator predicting the log odds ratio."""
def fit(self, X, y):
n_pos = np.sum(y)
@@ -77,7 +92,7 @@ def predict(self, X):
return y
-class PriorProbabilityEstimator(object):
+class PriorProbabilityEstimator(BaseEstimator):
"""An estimator predicting the probability of each
class in the training data.
"""
@@ -216,7 +231,9 @@ def __call__(self, y, pred):
return np.abs(y - pred.ravel()).mean()
def negative_gradient(self, y, pred, **kargs):
- return np.sign(y - pred.ravel())
+ """1.0 if y - pred > 0.0 else -1.0"""
+ pred = pred.ravel()
+ return 2.0 * (y - pred > 0.0) - 1.0
def _update_terminal_region(self, tree, terminal_regions, leaf, X, y,
residual, pred):
@@ -226,6 +243,91 @@ def _update_terminal_region(self, tree, terminal_regions, leaf, X, y,
pred.take(terminal_region, axis=0))
+class HuberLossFunction(RegressionLossFunction):
+ """Loss function for least absolute deviation (LAD) regression. """
+
+ def __init__(self, n_classes, alpha=0.9):
+ super(HuberLossFunction, self).__init__(n_classes)
+ self.alpha = alpha
+
+ def init_estimator(self):
+ return MedianEstimator()
+
+ def __call__(self, y, pred):
+ pred = pred.ravel()
+ diff = y - pred
+ gamma = self.gamma
+ gamma_mask = np.abs(diff) <= gamma
+ sq_loss = np.sum(0.5 * diff[gamma_mask] ** 2.0)
+ lin_loss = np.sum(gamma * (np.abs(diff[~gamma_mask]) - gamma / 2.0))
+ return (sq_loss + lin_loss) / y.shape[0]
+
+ def negative_gradient(self, y, pred, **kargs):
+ pred = pred.ravel()
+ diff = y - pred
+ gamma = stats.scoreatpercentile(np.abs(diff), self.alpha * 100)
+ gamma_mask = np.abs(diff) <= gamma
+ residual = np.zeros((y.shape[0],), dtype=np.float64)
+ residual[gamma_mask] = diff[gamma_mask]
+ residual[~gamma_mask] = gamma * np.sign(diff[~gamma_mask])
+ self.gamma = gamma
+ return residual
+
+ def _update_terminal_region(self, tree, terminal_regions, leaf, X, y,
+ residual, pred):
+ """LAD updates terminal regions to median estimates. """
+ terminal_region = np.where(terminal_regions == leaf)[0]
+ gamma = self.gamma
+ diff = y.take(terminal_region, axis=0) - \
+ pred.take(terminal_region, axis=0)
+ median = np.median(diff)
+ diff_minus_median = diff - median
+ tree.value[leaf, 0] = median + np.mean(
+ np.sign(diff_minus_median) *
+ np.minimum(np.abs(diff_minus_median), gamma))
+
+
+class QuantileLossFunction(RegressionLossFunction):
+ """Loss function for quantile regression.
+
+ Quantile regression allows to estimate the percentiles
+ of the conditional distribution of the target.
+ """
+
+ def __init__(self, n_classes, alpha=0.9):
+ super(QuantileLossFunction, self).__init__(n_classes)
+ assert 0 < alpha < 1.0
+ self.alpha = alpha
+ self.percentile = alpha * 100.0
+
+ def init_estimator(self):
+ return QuantileEstimator(self.alpha)
+
+ def __call__(self, y, pred):
+ pred = pred.ravel()
+ diff = y - pred
+ alpha = self.alpha
+
+ mask = y > pred
+ return (alpha * diff[mask].sum() +
+ (1.0 - alpha) * diff[~mask].sum()) / y.shape[0]
+
+ def negative_gradient(self, y, pred, **kargs):
+ alpha = self.alpha
+ pred = pred.ravel()
+ mask = y > pred
+ return (alpha * mask) - ((1.0 - alpha) * ~mask)
+
+ def _update_terminal_region(self, tree, terminal_regions, leaf, X, y,
+ residual, pred):
+ """LAD updates terminal regions to median estimates. """
+ terminal_region = np.where(terminal_regions == leaf)[0]
+ diff = y.take(terminal_region, axis=0) - \
+ pred.take(terminal_region, axis=0)
+ val = stats.scoreatpercentile(diff, self.percentile)
+ tree.value[leaf, 0] = val
+
+
class BinomialDeviance(LossFunction):
"""Binomial deviance loss function for binary classification.
@@ -319,6 +421,8 @@ def _update_terminal_region(self, tree, terminal_regions, leaf, X, y,
LOSS_FUNCTIONS = {'ls': LeastSquaresError,
'lad': LeastAbsoluteError,
+ 'huber': HuberLossFunction,
+ 'quantile': QuantileLossFunction,
'bdeviance': BinomialDeviance,
'mdeviance': MultinomialDeviance,
'deviance': None} # for both, multinomial and binomial
@@ -330,7 +434,8 @@ class BaseGradientBoosting(BaseEnsemble):
@abstractmethod
def __init__(self, loss, learn_rate, n_estimators, min_samples_split,
- min_samples_leaf, max_depth, init, subsample, random_state):
+ min_samples_leaf, max_depth, init, subsample,
+ max_features, random_state, alpha=0.9):
if n_estimators <= 0:
raise ValueError("n_estimators must be greater than 0")
self.n_estimators = n_estimators
@@ -344,19 +449,21 @@ def __init__(self, loss, learn_rate, n_estimators, min_samples_split,
self.loss = loss
if min_samples_split <= 0:
- raise ValueError("min_samples_split must be larger than 0.")
+ raise ValueError("min_samples_split must be larger than 0")
self.min_samples_split = min_samples_split
if min_samples_leaf <= 0:
- raise ValueError("min_samples_leaf must be larger than 0.")
+ raise ValueError("min_samples_leaf must be larger than 0")
self.min_samples_leaf = min_samples_leaf
if subsample <= 0.0 or subsample > 1:
raise ValueError("subsample must be in (0,1]")
self.subsample = subsample
+ self.max_features = max_features
+
if max_depth <= 0:
- raise ValueError("max_depth must be larger than 0.")
+ raise ValueError("max_depth must be larger than 0")
self.max_depth = max_depth
if init is not None:
@@ -366,6 +473,10 @@ def __init__(self, loss, learn_rate, n_estimators, min_samples_split,
self.random_state = check_random_state(random_state)
+ if not (0.0 < alpha < 1.0):
+ raise ValueError("alpha must be in (0.0, 1.0)")
+ self.alpha = alpha
+
self.estimators_ = None
def fit_stage(self, i, X, X_argsorted, y, y_pred, sample_mask):
@@ -383,7 +494,7 @@ def fit_stage(self, i, X, X_argsorted, y, y_pred, sample_mask):
tree = Tree(1, self.n_features)
tree.build(X, residual, MSE(), self.max_depth,
self.min_samples_split, self.min_samples_leaf, 0.0,
- self.n_features, self.random_state, _find_best_split,
+ self.max_features, self.random_state, _find_best_split,
sample_mask, X_argsorted)
# update tree leaves
@@ -418,7 +529,7 @@ def fit(self, X, y):
Returns self.
"""
X = np.asfortranarray(X, dtype=DTYPE)
- y = np.ascontiguousarray(y)
+ y = np.ravel(y, order='C')
n_samples, n_features = X.shape
if y.shape[0] != n_samples:
@@ -426,7 +537,17 @@ def fit(self, X, y):
"number of samples.")
self.n_features = n_features
- loss = LOSS_FUNCTIONS[self.loss](self.n_classes_)
+ if self.max_features == None:
+ self.max_features = n_features
+
+ if not (0 < self.max_features <= n_features):
+ raise ValueError("max_features must be in (0, n_features]")
+
+ loss_class = LOSS_FUNCTIONS[self.loss]
+ if self.loss in ('huber', 'quantile'):
+ loss = loss_class(self.n_classes_, self.alpha)
+ else:
+ loss = loss_class(self.n_classes_)
# store loss object for future use
self.loss_ = loss
@@ -572,6 +693,15 @@ class GradientBoostingClassifier(BaseGradientBoosting, ClassifierMixin):
The fraction of samples to be used for fitting the individual base
learners. If smaller than 1.0 this results in Stochastic Gradient
Boosting. `subsample` interacts with the parameter `n_estimators`.
+ Choosing `subsample < 1.0` leads to a reduction of variance
+ and an increase in bias.
+
+ max_features : int, None, optional (default=None)
+ The number of features to consider when looking for the best split.
+ Features are choosen randomly at each split point.
+ If None, then `max_features=n_features`. Choosing
+ `max_features < n_features` leads to a reduction of variance
+ and an increase in bias.
Examples
--------
@@ -599,11 +729,13 @@ class GradientBoostingClassifier(BaseGradientBoosting, ClassifierMixin):
def __init__(self, loss='deviance', learn_rate=0.1, n_estimators=100,
subsample=1.0, min_samples_split=1, min_samples_leaf=1,
- max_depth=3, init=None, random_state=None):
+ max_depth=3, init=None, random_state=None,
+ max_features=None):
super(GradientBoostingClassifier, self).__init__(
loss, learn_rate, n_estimators, min_samples_split,
- min_samples_leaf, max_depth, init, subsample, random_state)
+ min_samples_leaf, max_depth, init, subsample, max_features,
+ random_state)
def fit(self, X, y):
"""Fit the gradient boosting model.
@@ -727,6 +859,19 @@ class GradientBoostingRegressor(BaseGradientBoosting, RegressorMixin):
The fraction of samples to be used for fitting the individual base
learners. If smaller than 1.0 this results in Stochastic Gradient
Boosting. `subsample` interacts with the parameter `n_estimators`.
+ Choosing `subsample < 1.0` leads to a reduction of variance
+ and an increase in bias.
+
+ max_features : int, None, optional (default=None)
+ The number of features to consider when looking for the best split.
+ Features are choosen randomly at each split point.
+ If None, then `max_features=n_features`. Choosing
+ `max_features < n_features` leads to a reduction of variance
+ and an increase in bias.
+
+ alpha : float (default=0.9)
+ The alpha-quantile of the huber loss function and the quantile
+ loss function. Only if ``loss='huber'`` or ``loss='quantile'``.
Attributes
----------
@@ -784,11 +929,13 @@ class GradientBoostingRegressor(BaseGradientBoosting, RegressorMixin):
def __init__(self, loss='ls', learn_rate=0.1, n_estimators=100,
subsample=1.0, min_samples_split=1, min_samples_leaf=1,
- max_depth=3, init=None, random_state=None):
+ max_depth=3, init=None, random_state=None,
+ max_features=None, alpha=0.9):
super(GradientBoostingRegressor, self).__init__(
loss, learn_rate, n_estimators, min_samples_split,
- min_samples_leaf, max_depth, init, subsample, random_state)
+ min_samples_leaf, max_depth, init, subsample, max_features,
+ random_state, alpha)
def fit(self, X, y):
"""Fit the gradient boosting model.
98 sklearn/ensemble/tests/test_gradient_boosting.py
View
@@ -4,14 +4,17 @@
import numpy as np
from numpy.testing import assert_array_equal
+from numpy.testing import assert_array_almost_equal
from numpy.testing import assert_equal
from nose.tools import assert_raises
from sklearn.metrics import mean_squared_error
from sklearn.utils import check_random_state
+
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import GradientBoostingRegressor
+
from sklearn import datasets
# toy sample
@@ -130,7 +133,7 @@ def test_classification_synthetic():
def test_boston():
"""Check consistency on dataset boston house prices with least squares
and least absolute deviation. """
- for loss in ("ls", "lad"):
+ for loss in ("ls", "lad", "huber"):
clf = GradientBoostingRegressor(n_estimators=100, loss=loss,
max_depth=4,
min_samples_split=1, random_state=1)
@@ -261,6 +264,17 @@ def test_check_inputs_predict():
assert_raises(ValueError, clf.predict, x)
+def test_check_max_features():
+ """test if max_features is valid. """
+ clf = GradientBoostingRegressor(n_estimators=100, random_state=1,
+ max_features=0)
+ assert_raises(ValueError, clf.fit, X, y)
+
+ clf = GradientBoostingRegressor(n_estimators=100, random_state=1,
+ max_features=(len(X[0]) + 1))
+ assert_raises(ValueError, clf.fit, X, y)
+
+
def test_staged_predict():
"""Test whether staged decision function eventually gives
the same prediction.
@@ -316,3 +330,85 @@ def test_degenerate_targets():
clf.predict(rng.rand(2))
assert_array_equal(np.ones((1,), dtype=np.float64),
clf.predict(rng.rand(2)))
+
+
+def test_quantile_loss():
+ """Check if quantile loss with alpha=0.5 equals lad. """
+ clf_quantile = GradientBoostingRegressor(n_estimators=100, loss='quantile',
+ max_depth=4, alpha=0.5,
+ random_state=7)
+
+ clf_quantile.fit(boston.data, boston.target)
+ y_quantile = clf_quantile.predict(boston.data)
+
+ clf_lad = GradientBoostingRegressor(n_estimators=100, loss='lad',
+ max_depth=4, random_state=7)
+
+ clf_lad.fit(boston.data, boston.target)
+ y_lad = clf_lad.predict(boston.data)
+ assert_array_almost_equal(y_quantile, y_lad, decimal=4)
+
+
+def test_symbol_labels():
+ """Test with non-integer class labels. """
+ clf = GradientBoostingClassifier(n_estimators=100, random_state=1)
+
+ symbol_y = map(str, y)
+
+ clf.fit(X, symbol_y)
+ assert_array_equal(clf.predict(T), map(str, true_result))
+ assert_equal(100, len(clf.estimators_))
+
+
+def test_float_class_labels():
+ """Test with float class labels. """
+ clf = GradientBoostingClassifier(n_estimators=100, random_state=1)
+
+ float_y = np.asarray(y, dtype=np.float32)
+
+ clf.fit(X, float_y)
+ assert_array_equal(clf.predict(T), np.asarray(true_result, dtype=np.float32))
+ assert_equal(100, len(clf.estimators_))
+
+
+def test_shape_y():
+ """Test with float class labels. """
+ clf = GradientBoostingClassifier(n_estimators=100, random_state=1)
+
+ y_ = np.asarray(y, dtype=np.int32)
+ y_ = y_[:, np.newaxis]
+
+ clf.fit(X, y_)
+ assert_array_equal(clf.predict(T), true_result)
+ assert_equal(100, len(clf.estimators_))
+
+
+def test_mem_layout():
+ """Test with different memory layouts of X and y"""
+ X_ = np.asfortranarray(X)
+ clf = GradientBoostingClassifier(n_estimators=100, random_state=1)
+ clf.fit(X_, y)
+ assert_array_equal(clf.predict(T), true_result)
+ assert_equal(100, len(clf.estimators_))
+
+ X_ = np.ascontiguousarray(X)
+ clf = GradientBoostingClassifier(n_estimators=100, random_state=1)
+ clf.fit(X_, y)
+ assert_array_equal(clf.predict(T), true_result)
+ assert_equal(100, len(clf.estimators_))
+
+ y_ = np.asarray(y, dtype=np.int32)
+ y_ = y_[:, np.newaxis]
+ y_ = np.ascontiguousarray(y_)
+ clf = GradientBoostingClassifier(n_estimators=100, random_state=1)
+ clf.fit(X, y_)
+ assert_array_equal(clf.predict(T), true_result)
+ assert_equal(100, len(clf.estimators_))
+
+ y_ = np.asarray(y, dtype=np.int32)
+ y_ = y_[:, np.newaxis]
+ y_ = np.asfortranarray(y_)
+ clf = GradientBoostingClassifier(n_estimators=100, random_state=1)
+ clf.fit(X, y_)
+ assert_array_equal(clf.predict(T), true_result)
+ assert_equal(100, len(clf.estimators_))
2  sklearn/tree/_tree.c
View
@@ -1,4 +1,4 @@
-/* Generated by Cython 0.16 on Tue Jun 26 14:09:58 2012 */
+/* Generated by Cython 0.16 on Tue Jun 26 14:21:54 2012 */
#define PY_SSIZE_T_CLEAN
#include "Python.h"
Something went wrong with that request. Please try again.