Update overview.rst

shaharazulay · Aug 30, 2018 · d3697ab · d3697ab
1 parent e52c6f9
commit d3697ab
Showing 1 changed file with 30 additions and 50 deletions.
diff --git a/docs/overview.rst b/docs/overview.rst
@@ -3,6 +3,9 @@
 The Stacking Ensemble
 ===================
 
+Background
+-----
+
 Stacking (sometimes called stacked generalization or bagging) is an ensemble meta-algorithm that attempts to improve a model's
 predictive power by harnessing multiple models (perferably different in nature) to a unified pipeline.
 
@@ -20,70 +23,47 @@ less efficient in predicting the data are provided lower weight in the final pre
 
 *[1] high-level description of the stacking ensemble*
 
-involves training a learning algorithm to combine the predictions of several other learning algorithms. First, all of the other algorithms are trained using the available data, then a combiner algorithm is trained to make a final prediction using all the predictions of the other algorithms as additional inputs. If an arbitrary combiner algorithm is used, then stacking can theoretically represent any of the ensemble techniques described in this article, although, in practice, a logistic regression model is often used as the combiner.
-Stacking typically yields performance better than any single one of the trained models.[23] It has been successfully used on both supervised learning tasks (regression,[24] classification and distance learning [25]) and unsupervised learning (density estimation).[26] It has also been used to estimate bagging's error rate.[3][27] It has been reported to out-perform Bayesian model-averaging.[28] The two top-performers in the Netflix competition utilized blending, which may be considered to be a form of stacking.[29]
-The ``frame`` function takes an estimator, and returns an `adapter <https://en.wikipedia.org/wiki/Adapter_pattern>`_ of that estimator. This adapter does the 
-same thing as the adapted class, except that:
-
-1. It expects to take :class:`pandas.DataFrame` and :class:`pandas.Series` objects, not :class:`numpy.array` objects. It performs verifications on these inputs, 
-    and processes the outputs, as described in :ref:`verification_and_processing`.
-
-2. It supports two additional operators: ``|`` for pipelining (see :ref:`pipeline`), and ``+`` for feature unions (see :ref:`feature_union`).
-
-Suppose we start with:
-
-    >>> from sklearn import linear_model 
-    >>> from sklearn import preprocessing
-    >>> from sklearn import base
-    >>> from ibex import frame
 
-We can use ``frame`` to adapt an object:
 
-    >>> prd = frame(linear_model.LinearRegression())
-    >>> prd
-    Adapter[LinearRegression](copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
+Getting it Wrong
+-----
 
-We can use ``frame`` to adapt a class:
+The major problem in creating a proper Stacking ensemble is getting it right.
+The wrong way to perform stacking would be to
 
-    >>> PDLinearRegression = frame(linear_model.LinearRegression)
-    >>> PDStandardScaler = frame(preprocessing.StandardScaler)
+1. **Train** the first level models over the target.
 
-Once we adapt a class, it behaves pretty much like the underlying one. We can construct it in whatever ways it the underlying class supports, for example:
+2. Get the first level models predictions over the inputs.
 
-    >>> PDLinearRegression()
-    Adapter[LinearRegression](copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
-    >>> PDLinearRegression(fit_intercept=False)
-    Adapter[LinearRegression](copy_X=True, fit_intercept=False, n_jobs=1, normalize=False)
+3. **Train** the meta-level Stacker over the predictions of the first level models.
 
-It has the same name as the underlying class:
+Why would that be the wrong way to go?
 
-    >>> PDLinearRegression.__name__
-    'LinearRegression'
+**Overfitting**
 
-It subclasses the same mixins of the underlying class:
+Our meta-level regressor would be exposed to severe overfitting from one of the first level models.
+For example, if one of five first level models would be highly overfitted to the target, practically "storing"
+the y target it is showns in train time for test time.
+The meta-level model, trained over the same target would see this model as excellent - predicting the target y 
+with impressive accuracy almost everytime.
 
-    >>> isinstance(PDLinearRegression(), base.RegressorMixin)
-    True
-    >>> isinstance(PDLinearRegression(), base.TransformerMixin)
-    False
-    >>> isinstance(PDStandardScaler(), base.RegressorMixin)
-    False
-    >>> isinstance(PDStandardScaler(), base.TransformerMixin)
-    True
+This will result in a hight weight to this model, making the entire pipeline useless in test time.
 
-As can be seen above, though, the string and representation is modified, to signify this is an adapted type:
 
-    >>> PDLinearRegression()
-    Adapter[LinearRegression](copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
-    >>> linear_model.LinearRegression()
-    LinearRegression(copy_X=True, fit_intercept=True, n_jobs=1, normalize=False)
+The Solution
+-----
 
-|
-|
+The solution is never using the train abilities of the first level model - but using their abilities in test.
+What does it mean? it means the meta-level model would never be exposed to a y_hat generated by any first level
+model where the actual target sample representing this y_hat in the data was given to that model in training.
 
-Of course, the imposition to decorate every class (not to mention object) via ``frame``, can become annoying.
+Each model will deliever its predictions in a "cross_val_predict" manner (in sklearn terms). If it's a great model,
+it will demonstrate great generalization skills making its test-time predictions valuable to the meta-level regressor.
+If it's a highly overfitted model - the test-time predictions it will hand down the line will be showns for their true
+abilities, causing it to recieve a low weight.
 
-.. image:: _static/got_frame.jpeg
+How do we achieve that? internal cross validation.
 
-If a library is used often enough, it might pay to wrap it once. Ibex does this (nearly completely) automatically for :mod:`sklearn` (see :ref:`sklearn`).
+.. image:: _static/figure_002.jpg
 
+*[1] achienving stacking ensemble using internal cross-validation*