Merge pull request #297 from rasbt/stackingcvregressordocs

fix some typos in the stackingcvregressor docs
rasbt · Dec 1, 2017 · fa5d1f7 · fa5d1f7
2 parents 3cca453 + d179fff
commit fa5d1f7
Showing 1 changed file with 50 additions and 14 deletions.
diff --git a/docs/sources/user_guide/regressor/StackingCVRegressor.ipynb b/docs/sources/user_guide/regressor/StackingCVRegressor.ipynb
@@ -32,7 +32,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Stacking is an ensemble learning technique to combine multiple regression models via a meta-regressor. The `StackingCVRegressor` extends the standard stacking algorithm (implemented as [`StackingRegressor`](StackingRegressor.md)) using out-of-fold predictions to prepare the input data for the level-2 classifier.\n",
+    "Stacking is an ensemble learning technique to combine multiple regression models via a meta-regressor. The `StackingCVRegressor` extends the standard stacking algorithm (implemented as [`StackingRegressor`](StackingRegressor.md)) using out-of-fold predictions to prepare the input data for the level-2 regressor.\n",
     "\n",
     "In the standard stacking procedure, the first-level regressors are fit to the same training set that is used prepare the inputs for the second-level regressor, which may lead to overfitting. The `StackingCVRegressor`, however, uses the concept of out-of-fold predictions: the dataset is split into k folds, and in k successive rounds, k-1 folds are used to fit the first level regressor. In each round, the first-level regressors are then applied to the remaining 1 subset that was not used for model fitting in each iteration. The resulting predictions are then stacked and provided -- as input data -- to the second-level regressor. After the training of the `StackingCVRegressor`, the first-level regressors are fit to the entire dataset for optimal predicitons."
    ]
@@ -87,7 +87,7 @@
       "R^2 Score: 0.45 (+/- 0.29) [SVM]\n",
       "R^2 Score: 0.43 (+/- 0.14) [Lasso]\n",
       "R^2 Score: 0.52 (+/- 0.28) [Random Forest]\n",
-      "R^2 Score: 0.58 (+/- 0.24) [StackingClassifier]\n"
+      "R^2 Score: 0.58 (+/- 0.24) [StackingCVRegressor]\n"
      ]
     }
    ],
@@ -121,7 +121,7 @@
     "\n",
     "for clf, label in zip([svr, lasso, rf, stack], ['SVM', 'Lasso', \n",
     "                                                'Random Forest', \n",
-    "                                                'StackingClassifier']):\n",
+    "                                                'StackingCVRegressor']):\n",
     "    scores = cross_val_score(clf, X, y, cv=5)\n",
     "    print(\"R^2 Score: %0.2f (+/- %0.2f) [%s]\" % (\n",
     "        scores.mean(), scores.std(), label))"
@@ -141,7 +141,7 @@
       "Neg. MSE Score: -33.69 (+/- 22.36) [SVM]\n",
       "Neg. MSE Score: -35.53 (+/- 16.99) [Lasso]\n",
       "Neg. MSE Score: -27.32 (+/- 16.62) [Random Forest]\n",
-      "Neg. MSE Score: -25.64 (+/- 18.11) [StackingClassifier]\n"
+      "Neg. MSE Score: -25.64 (+/- 18.11) [StackingCVRegressor]\n"
      ]
     }
    ],
@@ -158,7 +158,7 @@
     "\n",
     "for clf, label in zip([svr, lasso, rf, stack], ['SVM', 'Lasso', \n",
     "                                                'Random Forest', \n",
-    "                                                'StackingClassifier']):\n",
+    "                                                'StackingCVRegressor']):\n",
     "    scores = cross_val_score(clf, X, y, cv=5, scoring='neg_mean_squared_error')\n",
     "    print(\"Neg. MSE Score: %0.2f (+/- %0.2f) [%s]\" % (\n",
     "        scores.mean(), scores.std(), label))"
@@ -284,7 +284,7 @@
    "source": [
     "**Note**\n",
     "\n",
-    "The `StackingCVRegressor` also enables grid search over the `regressors` argument. However, due to the current implementation of `GridSearchCV` in scikit-learn, it is not possible to search over both, differenct classifiers and classifier parameters at the same time. For instance, while the following parameter dictionary works\n",
+    "The `StackingCVRegressor` also enables grid search over the `regressors` argument. However, due to the current implementation of `GridSearchCV` in scikit-learn, it is not possible to search over both, different regressors and regressor parameters at the same time. For instance, while the following parameter dictionary works\n",
     "\n",
     "    params = {'randomforestregressor__n_estimators': [1, 100],\n",
     "    'regressors': [(regr1, regr1, regr1), (regr2, regr3)]}\n",
@@ -310,7 +310,7 @@
      "text": [
       "## StackingCVRegressor\n",
       "\n",
-      "*StackingCVRegressor(regressors, meta_regressor, cv=5, shuffle=True, use_features_in_secondary=False)*\n",
+      "*StackingCVRegressor(regressors, meta_regressor, cv=5, shuffle=True, use_features_in_secondary=False, store_train_meta_features=False)*\n",
       "\n",
       "A 'Stacking Cross-Validation' regressor for scikit-learn estimators.\n",
       "\n",
@@ -329,15 +329,15 @@
       "\n",
       "- `regressors` : array-like, shape = [n_regressors]\n",
       "\n",
-      "    A list of classifiers.\n",
+      "    A list of regressors.\n",
       "    Invoking the `fit` method on the `StackingCVRegressor` will fit clones\n",
       "    of these original regressors that will\n",
       "    be stored in the class attribute `self.regr_`.\n",
       "\n",
       "- `meta_regressor` : object\n",
       "\n",
-      "    The meta-classifier to be fitted on the ensemble of\n",
-      "    classifiers\n",
+      "    The meta-regressor to be fitted on the ensemble of\n",
+      "    regressors\n",
       "\n",
       "- `cv` : int, cross-validation generator or iterable, optional (default: 5)\n",
       "\n",
@@ -351,7 +351,7 @@
       "\n",
       "- `use_features_in_secondary` : bool (default: False)\n",
       "\n",
-      "    If True, the meta-classifier will be trained both on\n",
+      "    If True, the meta-regressor will be trained both on\n",
       "    the predictions of the original regressors and the\n",
       "    original dataset.\n",
       "    If False, the meta-regressor will be trained only on\n",
@@ -364,6 +364,21 @@
       "    argument is a specific cross validation technique, this argument is\n",
       "    omitted.\n",
       "\n",
+      "- `store_train_meta_features` : bool (default: False)\n",
+      "\n",
+      "    If True, the meta-features computed from the training data used\n",
+      "    for fitting the meta-regressor stored in the\n",
+      "    `self.train_meta_features_` array, which can be\n",
+      "    accessed after calling `fit`.\n",
+      "\n",
+      "**Attributes**\n",
+      "\n",
+      "- `train_meta_features` : numpy array, shape=[n_samples, len(self.regressors)]\n",
+      "\n",
+      "    meta-features for training data, where n_samples is the number of\n",
+      "    samples in training data and len(self.regressors) is\n",
+      "    the number of regressors.\n",
+      "\n",
       "### Methods\n",
       "\n",
       "<hr>\n",
@@ -448,15 +463,36 @@
       "\n",
       "<hr>\n",
       "\n",
+      "*predict_meta_features(X)*\n",
+      "\n",
+      "Get meta-features of test-data.\n",
+      "\n",
+      "**Parameters**\n",
+      "\n",
+      "- `X` : numpy array, shape = [n_samples, n_features]\n",
+      "\n",
+      "    Test vectors, where n_samples is the number of samples and\n",
+      "    n_features is the number of features.\n",
+      "\n",
+      "**Returns**\n",
+      "\n",
+      "- `meta-features` : numpy array, shape = [n_samples, len(self.regressors)]\n",
+      "\n",
+      "    meta-features for test data, where n_samples is the number of\n",
+      "    samples in test data and len(self.regressors) is the number of\n",
+      "    regressors.\n",
+      "\n",
+      "<hr>\n",
+      "\n",
       "*score(X, y, sample_weight=None)*\n",
       "\n",
       "Returns the coefficient of determination R^2 of the prediction.\n",
       "\n",
-      "The coefficient R^2 is defined as (1 - u/v), where u is the regression\n",
-      "sum of squares ((y_true - y_pred) ** 2).sum() and v is the residual\n",
+      "The coefficient R^2 is defined as (1 - u/v), where u is the residual\n",
+      "sum of squares ((y_true - y_pred) ** 2).sum() and v is the total\n",
       "sum of squares ((y_true - y_true.mean()) ** 2).sum().\n",
       "\n",
-      "Best possible score is 1.0 and it can be negative (because the\n",
+      "The best possible score is 1.0 and it can be negative (because the\n",
       "\n",
       "model can be arbitrarily worse). A constant model that always\n",
       "predicts the expected value of y, disregarding the input features,\n",