Skip to content

Commit

Permalink
Merge pull request #297 from rasbt/stackingcvregressordocs
Browse files Browse the repository at this point in the history
fix some typos in the stackingcvregressor docs
  • Loading branch information
rasbt committed Dec 1, 2017
2 parents 3cca453 + d179fff commit fa5d1f7
Showing 1 changed file with 50 additions and 14 deletions.
64 changes: 50 additions & 14 deletions docs/sources/user_guide/regressor/StackingCVRegressor.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Stacking is an ensemble learning technique to combine multiple regression models via a meta-regressor. The `StackingCVRegressor` extends the standard stacking algorithm (implemented as [`StackingRegressor`](StackingRegressor.md)) using out-of-fold predictions to prepare the input data for the level-2 classifier.\n",
"Stacking is an ensemble learning technique to combine multiple regression models via a meta-regressor. The `StackingCVRegressor` extends the standard stacking algorithm (implemented as [`StackingRegressor`](StackingRegressor.md)) using out-of-fold predictions to prepare the input data for the level-2 regressor.\n",
"\n",
"In the standard stacking procedure, the first-level regressors are fit to the same training set that is used prepare the inputs for the second-level regressor, which may lead to overfitting. The `StackingCVRegressor`, however, uses the concept of out-of-fold predictions: the dataset is split into k folds, and in k successive rounds, k-1 folds are used to fit the first level regressor. In each round, the first-level regressors are then applied to the remaining 1 subset that was not used for model fitting in each iteration. The resulting predictions are then stacked and provided -- as input data -- to the second-level regressor. After the training of the `StackingCVRegressor`, the first-level regressors are fit to the entire dataset for optimal predicitons."
]
Expand Down Expand Up @@ -87,7 +87,7 @@
"R^2 Score: 0.45 (+/- 0.29) [SVM]\n",
"R^2 Score: 0.43 (+/- 0.14) [Lasso]\n",
"R^2 Score: 0.52 (+/- 0.28) [Random Forest]\n",
"R^2 Score: 0.58 (+/- 0.24) [StackingClassifier]\n"
"R^2 Score: 0.58 (+/- 0.24) [StackingCVRegressor]\n"
]
}
],
Expand Down Expand Up @@ -121,7 +121,7 @@
"\n",
"for clf, label in zip([svr, lasso, rf, stack], ['SVM', 'Lasso', \n",
" 'Random Forest', \n",
" 'StackingClassifier']):\n",
" 'StackingCVRegressor']):\n",
" scores = cross_val_score(clf, X, y, cv=5)\n",
" print(\"R^2 Score: %0.2f (+/- %0.2f) [%s]\" % (\n",
" scores.mean(), scores.std(), label))"
Expand All @@ -141,7 +141,7 @@
"Neg. MSE Score: -33.69 (+/- 22.36) [SVM]\n",
"Neg. MSE Score: -35.53 (+/- 16.99) [Lasso]\n",
"Neg. MSE Score: -27.32 (+/- 16.62) [Random Forest]\n",
"Neg. MSE Score: -25.64 (+/- 18.11) [StackingClassifier]\n"
"Neg. MSE Score: -25.64 (+/- 18.11) [StackingCVRegressor]\n"
]
}
],
Expand All @@ -158,7 +158,7 @@
"\n",
"for clf, label in zip([svr, lasso, rf, stack], ['SVM', 'Lasso', \n",
" 'Random Forest', \n",
" 'StackingClassifier']):\n",
" 'StackingCVRegressor']):\n",
" scores = cross_val_score(clf, X, y, cv=5, scoring='neg_mean_squared_error')\n",
" print(\"Neg. MSE Score: %0.2f (+/- %0.2f) [%s]\" % (\n",
" scores.mean(), scores.std(), label))"
Expand Down Expand Up @@ -284,7 +284,7 @@
"source": [
"**Note**\n",
"\n",
"The `StackingCVRegressor` also enables grid search over the `regressors` argument. However, due to the current implementation of `GridSearchCV` in scikit-learn, it is not possible to search over both, differenct classifiers and classifier parameters at the same time. For instance, while the following parameter dictionary works\n",
"The `StackingCVRegressor` also enables grid search over the `regressors` argument. However, due to the current implementation of `GridSearchCV` in scikit-learn, it is not possible to search over both, different regressors and regressor parameters at the same time. For instance, while the following parameter dictionary works\n",
"\n",
" params = {'randomforestregressor__n_estimators': [1, 100],\n",
" 'regressors': [(regr1, regr1, regr1), (regr2, regr3)]}\n",
Expand All @@ -310,7 +310,7 @@
"text": [
"## StackingCVRegressor\n",
"\n",
"*StackingCVRegressor(regressors, meta_regressor, cv=5, shuffle=True, use_features_in_secondary=False)*\n",
"*StackingCVRegressor(regressors, meta_regressor, cv=5, shuffle=True, use_features_in_secondary=False, store_train_meta_features=False)*\n",
"\n",
"A 'Stacking Cross-Validation' regressor for scikit-learn estimators.\n",
"\n",
Expand All @@ -329,15 +329,15 @@
"\n",
"- `regressors` : array-like, shape = [n_regressors]\n",
"\n",
" A list of classifiers.\n",
" A list of regressors.\n",
" Invoking the `fit` method on the `StackingCVRegressor` will fit clones\n",
" of these original regressors that will\n",
" be stored in the class attribute `self.regr_`.\n",
"\n",
"- `meta_regressor` : object\n",
"\n",
" The meta-classifier to be fitted on the ensemble of\n",
" classifiers\n",
" The meta-regressor to be fitted on the ensemble of\n",
" regressors\n",
"\n",
"- `cv` : int, cross-validation generator or iterable, optional (default: 5)\n",
"\n",
Expand All @@ -351,7 +351,7 @@
"\n",
"- `use_features_in_secondary` : bool (default: False)\n",
"\n",
" If True, the meta-classifier will be trained both on\n",
" If True, the meta-regressor will be trained both on\n",
" the predictions of the original regressors and the\n",
" original dataset.\n",
" If False, the meta-regressor will be trained only on\n",
Expand All @@ -364,6 +364,21 @@
" argument is a specific cross validation technique, this argument is\n",
" omitted.\n",
"\n",
"- `store_train_meta_features` : bool (default: False)\n",
"\n",
" If True, the meta-features computed from the training data used\n",
" for fitting the meta-regressor stored in the\n",
" `self.train_meta_features_` array, which can be\n",
" accessed after calling `fit`.\n",
"\n",
"**Attributes**\n",
"\n",
"- `train_meta_features` : numpy array, shape=[n_samples, len(self.regressors)]\n",
"\n",
" meta-features for training data, where n_samples is the number of\n",
" samples in training data and len(self.regressors) is\n",
" the number of regressors.\n",
"\n",
"### Methods\n",
"\n",
"<hr>\n",
Expand Down Expand Up @@ -448,15 +463,36 @@
"\n",
"<hr>\n",
"\n",
"*predict_meta_features(X)*\n",
"\n",
"Get meta-features of test-data.\n",
"\n",
"**Parameters**\n",
"\n",
"- `X` : numpy array, shape = [n_samples, n_features]\n",
"\n",
" Test vectors, where n_samples is the number of samples and\n",
" n_features is the number of features.\n",
"\n",
"**Returns**\n",
"\n",
"- `meta-features` : numpy array, shape = [n_samples, len(self.regressors)]\n",
"\n",
" meta-features for test data, where n_samples is the number of\n",
" samples in test data and len(self.regressors) is the number of\n",
" regressors.\n",
"\n",
"<hr>\n",
"\n",
"*score(X, y, sample_weight=None)*\n",
"\n",
"Returns the coefficient of determination R^2 of the prediction.\n",
"\n",
"The coefficient R^2 is defined as (1 - u/v), where u is the regression\n",
"sum of squares ((y_true - y_pred) ** 2).sum() and v is the residual\n",
"The coefficient R^2 is defined as (1 - u/v), where u is the residual\n",
"sum of squares ((y_true - y_pred) ** 2).sum() and v is the total\n",
"sum of squares ((y_true - y_true.mean()) ** 2).sum().\n",
"\n",
"Best possible score is 1.0 and it can be negative (because the\n",
"The best possible score is 1.0 and it can be negative (because the\n",
"\n",
"model can be arbitrarily worse). A constant model that always\n",
"predicts the expected value of y, disregarding the input features,\n",
Expand Down

0 comments on commit fa5d1f7

Please sign in to comment.