[MRG] Bagging meta-estimator #2375

glouppe · 2013-08-20T16:39:33Z

Git history in #2198 was messed up so I make a new pull request. Sorry for the noise...

TODO:

narrative documentation
add an example

ogrisel · 2013-08-20T16:53:15Z

For the example it would be very interesting to try the BaggingClassifier with the kernel SVC as the base estimator for instance on the digits dataset. As SVC has a more than quadratic time complexity w.r.t. n_samples we can expect bagging to actually improve the speed for the same level of accuracy as the regular SVC model on the full dataset.

glouppe · 2013-08-20T17:29:22Z

That might be a good idea, but the digits dataset is actually quite small. It doesn't take less than a second to train an SVC on that - at the scale, I'd rather not make any conclusions if one appears faster than the other.

ogrisel · 2013-08-20T17:35:40Z

You can nudge it as done in the RBM example to make it both larger and harder.

glouppe · 2013-08-20T17:45:23Z

In another direction, I was thinking about a figure like the ones I had done in my paper (see http://orbi.ulg.ac.be/bitstream/2268/130099/1/glouppe12.pdf page 11): it can be used to show the effect of max_samples and max_features (either alone or combined).

Another great example would be a bias-variance decomposition of the error, illustrating what happens when base estimators are averaged together. (No matter what we choose here, such an example should anyway be in our documentation in my opinion...)

ogrisel · 2013-08-20T17:58:44Z

In another direction, I was thinking about a figure like the ones I had done in my paper (see http://orbi.ulg.ac.be/bitstream/2268/130099/1/glouppe12.pdf page 11): it can be used to show the effect of max_samples and max_features (either alone or combined).

+1. It would be interesting to do this kind of plot with on a bagged GBRT regressor on a non-trivial regression dataset.

ogrisel · 2013-08-20T18:00:07Z

Another great example would be a bias-variance decomposition of the error, illustrating what happens when base estimators are averaged together. (No matter what we choose here, such an example should anyway be in our documentation in my opinion...)

+1 as well.

glouppe · 2013-08-21T15:27:03Z

I have got a working example of the bias-variance decomposition of the mean squared error of a single estimator versus bagging. It still needs some work and documentation, but here is how it renders on a toy 1d regression problem. The first plot displays the function to predict, the predictions of single estimators over several instances of the problem and the mean prediction. The second plot is a decomposition of the mean square error at point x.

Script output =
Single estimator: 0.003498 (mse) = 0.000079 (bias^2) + 0.003420 (var)
Bagging: 0.001900 (mse) = 0.000074 (bias^2) + 0.001825 (var)

In particular, one can see from the lower plot (compare the plain green line and the dashed greed line), or from the script output, that bagging mainly affect - and reduce - the variance part of the mean squared error.

arjoly · 2013-08-22T08:26:10Z

Nice plot, but I can't discern the different curves. What do you think of breaking in three plots (mse, bias^2, variance) using the same scale? Would it be interesting to add some noise?

glouppe · 2013-08-22T08:27:25Z

Nice plot, but I can't discern the different curves. What do you think of breaking in three plots (mse, bias^2, variance) using the same scale? Would it be interesting to add some noise?

The plot is not up to date, see the next commits :) I'll refresh this when I'll be done.

glouppe · 2013-08-22T09:44:01Z

Here is an updated version of the example. See the explanations in the docstring for details.

Output:
Tree: 0.025531 (error) = 0.000308 (bias^2) + 0.015245 (var) + 0.009761 (noise)
Bagging(Tree): 0.019596 (error) = 0.000437 (bias^2) + 0.009164 (var) + 0.009761 (noise)

I think this makes quite a nice example overall, illustrating both the bias-variance decomposition and the benefits of bagging. What do you think? @ogrisel @arjoly

ogrisel · 2013-08-22T09:55:18Z

Very nice!

glouppe · 2013-08-22T09:58:09Z

It is also quite interesting to explore other base estimators (KNN, SVR, etc) :)

arjoly · 2013-08-22T11:03:17Z

examples/ensemble/plot_bias_variance.py

+estimators. The larger the variance, the more sensitive are the predictions for
+`x` to small changes in the training set. The bias term corresponds to the
+difference between the average prediction of the estimator (in cyan) and the
+best possible model (in dark blue). On this problem, we can thus observe than


than => that?

arjoly · 2013-08-22T11:10:55Z

The plot is a lot nicer !!!

ogrisel · 2013-08-22T11:22:24Z

It is also quite interesting to explore other base estimators (KNN, SVR, etc) :)

Yes and the GBRT model as well. Although this problem might be too easy to emphasize the interest of bagging GBRT models.

glouppe · 2013-08-22T12:56:04Z

I have completed the example and added a new section in the narrative documentation.

This looks ready to me, going from [WIP] to [MRG].

A first round of reviews is more than welcome! Random pings: @pprett @ogrisel @arjoly

ogrisel · 2013-08-22T13:16:46Z

examples/ensemble/plot_bias_variance.py

+    y_bias = (f(X_test) - np.mean(y_predict, axis=1)) ** 2
+    y_var = np.var(y_predict, axis=1)
+
+    print("{0}: {1} (error) = {2} (bias^2) + {3} (var) + {4} (noise)".format(


You can use {1:.4f} to limit the precision to 4 decimal places and make the output easier to read.

Thanks, I was looking for that. Still not used to this Python3-way for formatting :-)

It's been there for quite some time, at least since python 2.6.

ogrisel · 2013-08-22T13:33:44Z

Before merging I would really like to have support for sparse X. This can be a bit tricky because if sample_weight is not supported in base_estimator that means converting the data first in CSC for feature wise sampling and then the subsample in CSR for sample-wise sampling.

I think it's worth doing it though (with tests).

glouppe · 2013-08-22T13:36:06Z

Before merging I would really like to have support for sparse X. This kind be a bit tricky because if sample_weight is not supported that means converting the data first in CSC for feature wise sampling and then the subsample in CSR for sample-wise sampling.

me don't like sparse formats

I agree though, I'll look at this later.

arjoly · 2013-08-23T06:30:05Z

sklearn/ensemble/bagging.py

+            - If float, then draw `max_features * X.shape[1]` features.
+
+    bootstrap : boolean, optional (default=False)
+        Whether instances are drawn with replacement.


instances => samples

arjoly · 2013-08-23T06:55:21Z

Before merging I would really like to have support for sparse X. This kind be a bit tricky because if sample_weight is not supported that means converting the data first in CSC for feature wise sampling and then the subsample in CSR for sample-wise sampling.

me don't like sparse formats

I agree though, I'll look at this later.

This pr is already pretty large (around 1300 addition). I would prefer to keep this feature
for another one.

arjoly · 2013-08-23T06:58:03Z

examples/ensemble/plot_bias_variance.py

+
+In regression, the expected mean squared error of an estimator can be
+decomposed in terms of bias, variance and noise. On average over dataset
+instances LS of the regression problem, the bias term measures the average


arjoly · 2013-09-09T11:35:00Z

let's merge this beast !!! +1

glouppe · 2013-09-09T12:04:15Z

Thanks for your review Arnaud!

@ogrisel Shall we merge this or wait for someone else review?

ogrisel · 2013-09-09T13:23:55Z

I would wait to know the opinion of past reviewers such as @larsmans and @mblondel.

larsmans · 2013-09-09T13:41:19Z

I'll try to review tonight. @glouppe can you post the generated figures here?

glouppe · 2013-09-09T13:52:08Z

Sure, here it is for the bias-variance decomposition example:

larsmans · 2013-09-09T14:18:53Z

doc/modules/ensemble.rst

+construction procedure and then making an ensemble out of it. In many cases,
+bagging methods constitute a very simple way to improve with respect to a
+single model, without making it necessary to adapt the underlying base
+algorithm.


For the noobs, it might be useful to state explicitly that bagging should be used with strong learners and that it reduces overfit (and maybe to contrast it with boosting in this sense).

arjoly · 2013-09-10T10:02:11Z

sklearn/ensemble/bagging.py

+                                              estimators_features))
+
+
+def _partition_estimators(ensemble):


Should this go in ensemble/base.py?

arjoly · 2013-09-11T06:13:42Z

LGTM!

ogrisel · 2013-09-11T10:55:27Z

Any final words @larsmans? LGTM too.

larsmans · 2013-09-11T11:52:47Z

All tests pass on my box. Merged by hand after extensive rebase.

ogrisel · 2013-09-11T11:56:33Z

Great! Thanks all!

arjoly · 2013-09-11T12:07:34Z

Great :-) !!! 🍻

glouppe · 2013-09-11T12:54:56Z

Great! Thank you all for the reviews :)

glouppe · 2013-09-11T13:33:07Z

@larsmans By the way, did you had to squash everything into a single commit? :s

larsmans · 2013-09-11T13:43:13Z

It's one feature, so you get one commit for it ;)

Seriously: this was the easiest way to get rid of the duplicate and typo commits.

jakevdp · 2013-09-11T13:46:19Z

Nice work!

glouppe · 2013-09-11T13:57:51Z

It's one feature, so you get one commit for it ;)

Meh, why life is so hard? ; ;

(joking)

mblondel · 2013-09-11T14:21:38Z

@glouppe Could you have a look at PR #2420?

arjoly reviewed Aug 22, 2013
View reviewed changes

ogrisel reviewed Aug 22, 2013
View reviewed changes

arjoly reviewed Aug 23, 2013
View reviewed changes

glouppe added 2 commits September 9, 2013 13:26

DOC: base_estimator_ + estimators_ attributes in base.py

c31759a

DOC: put classes in alphabetical order

6f3911f

larsmans reviewed Sep 9, 2013
View reviewed changes

glouppe added 3 commits September 9, 2013 22:02

FIX: address @arjoly and @larsmans comments

880b47b

FIX: address @arjoly + @jakevdp comments

b4f57c3

ENH: tidy up some of the forest code (inspired from bagging.py)

5c57e41

arjoly reviewed Sep 10, 2013
View reviewed changes

ENH: move _partition_estimators to base.py

5f34240

larsmans closed this in 524daee Sep 11, 2013

[MRG] Bagging meta-estimator #2375

[MRG] Bagging meta-estimator #2375

Conversation

glouppe commented Aug 20, 2013

ogrisel commented Aug 20, 2013

glouppe commented Aug 20, 2013

ogrisel commented Aug 20, 2013

glouppe commented Aug 20, 2013

ogrisel commented Aug 20, 2013

ogrisel commented Aug 20, 2013

glouppe commented Aug 21, 2013

arjoly commented Aug 22, 2013

glouppe commented Aug 22, 2013

glouppe commented Aug 22, 2013

ogrisel commented Aug 22, 2013

glouppe commented Aug 22, 2013

Choose a reason for hiding this comment

arjoly commented Aug 22, 2013

ogrisel commented Aug 22, 2013

glouppe commented Aug 22, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ogrisel commented Aug 22, 2013

glouppe commented Aug 22, 2013

Choose a reason for hiding this comment

arjoly commented Aug 23, 2013

Choose a reason for hiding this comment

arjoly commented Sep 9, 2013

glouppe commented Sep 9, 2013

ogrisel commented Sep 9, 2013

larsmans commented Sep 9, 2013

glouppe commented Sep 9, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arjoly commented Sep 11, 2013

ogrisel commented Sep 11, 2013

larsmans commented Sep 11, 2013

ogrisel commented Sep 11, 2013

arjoly commented Sep 11, 2013

glouppe commented Sep 11, 2013

glouppe commented Sep 11, 2013

larsmans commented Sep 11, 2013

jakevdp commented Sep 11, 2013

glouppe commented Sep 11, 2013

mblondel commented Sep 11, 2013